Force correct HTML doctype when mummifying pages.

Description

When parsing XML files with other doctypes, e.g. legacy XHTML 1.1 files:

Guise will use the same doctype when producing "HTML":

This was never noticed until now because either source files had been using the HTML5 doctype, or a template was in place that used the HTML5 doctype. We instead want an (X)HTML5 doctype declaration:

This will probably require a new feature in the GlobalMentor XML/HTML serializer to allow specification of the doctype.

Environment

None

Activity

Show:
Garret Wilson
March 10, 2020, 3:56 PM

It looks like a Transformer allows the doctype public ID and system ID to be set, so we might take a similar approach with the GlobalMentor XML serializer.

Keep in mind that the org.w3c.dom.DocumentType of the serialized document also stores parsed with the original document type, so if some of those entities are used but another doctype is specified, it might result in a a non-well-formed document which could not be read. If the serializer's usePredefinedEntities setting is set to PredefinedEntitiesUse.AS_NEEDED (currently the default), this may not be a problem.

Fixed

Assignee

Garret Wilson

Reporter

Garret Wilson

Labels

None

Components

Fix versions

Priority

Critical