We need to support mummification of "normal" HTML documents, that is, HTML files that are not XHTML. We already support Markdown and XHTML which covers both the "common user" and the "power user". But we should probably support HTML as well to support the "technically-savvy user" as well as assist in converting legacy sites.
The parsing should be somewhat lenient, but there is no need to handle obscure or ancient HTML, as long as an error is produced if something isn't understood. The idea here is not to consumer every HTML source that exists, but to provide a useful format for users who know what they're doing, as well as to convert reasonable HTML that already exists. The parser should err on the site of accuracy (i.e. not changing semantics) rather than leniency; an error should be produced rather than guessing in really mangled HTML.