15 April 2013
This snippet is about parsing HTML with PHP. It's hard to parse HTML, because people write incorrect HTML syntax very often. This code forces DOMDocument to read and later write HTML. So your changes get made and later saved correctly. There are many different ways to parse HTML DOMDocument is bundled with PHP so it doesn't require any extra libraries, thus might be a good choice.
Source code viewer
  1. // Load html as DOMDocument.
  2. $dom = new DOMDocument();
  3. $dom->loadHTML('<?xml encoding="UTF-8">' . $xml);
  4. foreach ($dom->childNodes as $item) {
  5. if ($item->nodeType == XML_PI_NODE) {
  6. $dom->removeChild($item);
  7. }
  8. }
  9. $dom->encoding = 'UTF-8';
  10.  
  11. // Change html.
  12. $nodes = $dom->getElementsByTagName('a');
  13. foreach ($nodes as $node) {
  14. foreach ($node->attributes as $attribute) {
  15. if ($attribute->name === 'url_id') {
  16. $node->removeAttribute($attribute->name);
  17. $node->setAttribute('href', $urls[$attribute->value]);
  18. }
  19. }
  20. }
  21.  
  22. // Save the changed html.
  23. $result = trim(preg_replace('~<(?:!DOCTYPE|/?(?:html|head|body|\?xml))[^>]*>\s*~i', '', $dom->saveHTML()));
Programming Language: PHP