You are here

15 April 2013
Sometimes you need to parse HTML with PHP. There are many different ways. It's hard to parse HTML, people write incorrect HTML very often. This code forces DOMDocument to read and later write HTML.
Source code viewer
  1. $dom = new DOMDocument();
  2. $dom->loadHTML('<?xml encoding="UTF-8">' . $xml);
  3. foreach ($dom->childNodes as $item)
  4. if ($item->nodeType == XML_PI_NODE)
  5. $dom->removeChild($item);
  6. $dom->encoding = 'UTF-8';
  7.  
  8. $nodes = $dom->getElementsByTagName('a');
  9. foreach ($nodes as $node) {
  10. foreach ($node->attributes as $attribute) {
  11. if ($attribute->name == 'url_id') {
  12. $node->removeAttribute($attribute->name);
  13. $node->setAttribute('href', $urls[$attribute->value]);
  14. }
  15. }
  16. }
  17.  
  18. $result = trim(preg_replace('~<(?:!DOCTYPE|/?(?:html|head|body|\?xml))[^>]*>\s*~i', '', $dom->saveHTML()));
Programming Language: PHP