The legacy XHTML 1.1 modular DTD -//W3C//DTD XHTML 1.1//EN results in many implied attributes being reified and others erroneously added for some reason. The attributes xmlns="http://www.w3.org/1999/xhtml" and xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" are added on almost every element, and moreover xml:space="preserve" is added for no apparently reason. (Note that later it was found that HTML by default preserves space, so maybe that was the reasoning.)
For example this simple XHTML 1.1 document:
Results in this DOM tree:
See discussion at Java XML parser adding unnecessary xmlns and xml:space attributes.
Add some sort of cleanup step(s) that removes all this distracting cruft. For example xmlns="http://www.w3.org/1999/xhtml" should only be defined on the root element.