Output text line separator setting.

Description

Guise Mummy uses the GlobalMentor HTML serializer which by default uses the system line ending to separate lines—CRLF for Windows or LF for Linux. Normally this is a good thing, especially when working with a source control system configured to automatically converts between the repository line ending and the system line ending.

However this does not make the build reproducible, which means that HTML pages generated on a Windows machine would be different than on a Linux machine. This could cause problems when updating a server, and it definitely will cause the SHA fingerprint value to differ. We could use some fancy algorithm to normalize line endings on files detected to be text files, and/or even piggyback on existing line ending configurations in e.g. .gitattributes. But the simpler and perhaps better approach would be to just make the line endings consistent for the build across platforms. (These are generated files in the target directory, anyway, and are thus somewhat ephemeral and probably won't be edited in an editor anyway. And besides most modern editors—even Notepad!—support LF line endings across platforms.)

Create a Guise Mummy setting mummy.textOutputLineSeparator to set the newline character for all generated text files (principally HTML pages), defaulting to LF (0x0A).

Note that in the future if we want to allow the configuration to indicate that the system newline should be used, i.e. how it works now, we'll probably wait until we have some sort of configuration references, so that some other e.g. env.systemNewline would be set by Guise Mummy and mummy.targetNewline could be set to reference it

Environment

None

Activity

Show:
Garret Wilson
May 2, 2020, 9:45 PM
Edited

mummy.textNewline is simpler than mummy.targetNewline and is more consistent with mummy.pageNamesBare.

It is still unclear how or whether we even need to distinguish between "source" vs "destination" settings. In this case it may be obvious (is "may be obvious" an oxymoron?) that the newline sequence is for generation, as the input newline can be detected. Although on the other hand if we ever were to decide to indicate the charset (which is certainly done in Maven for example) we would want to distinguish between the source charset and output charset.

Looking at existing configuration keys, we have mummy.pageNamesBare which refers to the target output, but mummy.templateBaseName which refers to the source. However both of these are referring to the filenames, though, and not the content.

Maven uses project.build.sourceEncoding to indicate the encoding of the source content. Maybe it's time to start explicitly indicating "source" or "target" when referring to content in the configuration. Do we need to indicate "text", or is that understood?

If we go on the assumption that we use "page", "image", etc prefixes for configuration of different types, then we would want to start the configuration key with mummy.text to account for all text types. (If for some reason we needed to specify a different value for pages, we could override that with mummy.page.…, but that seems unlikely.) Adding a "target" designation gives us mummy.textTargetNewline.

Note that this may be the first time we use "target" to refer to the content rather than location. Should use "input" and "output" when referring to the content? That would give us mummy.textOutputNewline. But then Maven uses project.build.outputDirectory for the target/classes directory, although the Maven JAR plugin uses <classesDirectory> to refer to Maven's project.build.outputDirectory, and <outputDirectory> to refer to Maven's project.build.directory. So Maven seems all over the place.

Nevertheless "source" and "target" sound like locations (even though we have the term "source code"), and "input" and "output" sound more like reading in and generating content. So if we keep that distinction, we would use mummy.textOutputNewline. And that even sounds better, and closer to what is actually happen (i.e. writing to an output stream).

(Then there is the whole discussion about Boolean configuration keys. Should we say mummy.textOutputNewlineString or mummy.textOutputNewlineText? After all, mummy.pageNamesBare takes a Boolean value, although the latter more clearly ends in an adjective. I suppose if this were a Boolean flag we could name it mummy.textOutputHasNewline, although that might mean mummy.pageNamesAreBare. There are no easy answers here without being too verbose.)

Garret Wilson
May 2, 2020, 10:09 PM

It might be clearer to go with the terminology already used to configure the GlobalMentor XML/HTML serializers: "line separator". That would make the configuration key mummy.textOutputLineSeparator.

Fixed

Assignee

Garret Wilson

Reporter

Garret Wilson

Labels

None

Components

Fix versions

Priority

Critical
Configure