Convert xml:lang attribute when serializing HTML.


Improve the HTML serializer to convert xml:lang attributes.

XML has the xml:lang attribute, but HTML5 has the lang attribute.

  • HTML5 prohibits xml:lang in the XML namespace.

  • HTML5 allows xml:lang in no namespace (it's unclear how that would even come about), but it has no effect, but if present there must also be a lang attribute with the same value (compared in an ASCII case-insensitive manner).

See Can someone explain what the xml:lang attribute does in HTML5? for more discussion. See also Declaring language in HTML.

The HTML serializer should process xml:lang attributes specially:

  • If there is no lang attribute, the xml:lang attribute should be serialized as lang.

  • If there is a lang attribute with a different value, an error should be emitted. (To be implemented in JAVA-187.)

  • If there is a lang attribute with the same value, the xml:lang attribute should be silently ignored.

It should be straightforward to have unit tests for all these scenarios.




Garret Wilson
April 3, 2020, 2:46 PM

A tiny detail: the HTML5 spec speaks of an xml:lang attribute in no namespace, but there's likely no need to check for that exact situation when serializing a DOM tree. If the DOM was parsed from XML, there seems to be no way to get an xml:lang attribute that isn't in the XML namespace. The HTML5 spec seems to be talking about how it would look to an HTML parser once parsed.

Perhaps the caller is passing a DOM it got from an HTML parser, so maybe the HTML serializer should check for an attribute with the local name xml:lang attribute with no namespace, but that seems to be a very remote possibility (who even has an HTML file like that, parsed by an HTML parser?), and without a real-life case, it's hard to know what exactly to test against. So this ticket will just note that remote possibility in case we need to support it in the future.


Garret Wilson


Garret Wilson




Fix versions

Affects versions