Image mummifier.

Description

Guise Mummy 4.x and before handled images as opaque blobs. A specific mummifier for image types will make it possible to manipulate images (e.g. scaling in ) and extract metadata (in GUISE-148; e.g. Exif).

The initial implementation should support at least JPEG and PNG files.

Environment

None

Activity

Show:
Garret Wilson
November 8, 2020, 4:07 PM

In the first image mummifier we're going with the metadata-extractor library to detect the file type rather than just looking at the extension. This will file I/O that we otherwise wouldn't do if we just relied on the extension, but maybe it won't be so bad because we'll cache this anyway. In some ways this is being done simply because this detection is part of metadata-extractor and it's almost as easy to do detection in a base image mummifier as to have a map of extensions to media types. The only real benefit is that we could process e.g. a JPEG file that had been renamed with a .png extension. But would we even want to allow that? And if we did, we should really change the extension, which means we would need to override planArtifactTargetFilename(final MummyContext context, final String filename) and detect the type again. So maybe there isn't any point after all. Maybe we should just have a map of extensions and content types in the base image mummifiers. But since it's already written I'll leave it in for now.

Garret Wilson
November 8, 2020, 5:53 PM
Edited

I'm trying to figure out the best way to unit test image mummifiers (and really all mummifiers—it's just expected that images will be larger and potentially more processor intensive than pages). Here is the sort of performance I'm seeing to copy a ~200K JPEG file from resources to an in-memory file system (Jimfs) and then detect its media type by looking at its actual bytes as indicated above:

  • set up in-memory test file system: .8s

  • copy test file to test file system: ~.015s

  • mock mummifier: ~1.5s

  • detect media type: <.02s

The setup time of the in-memory file system seems too high, and really unacceptable. And I was really surprised by the time it takes to mock the object!! The actual copying and media detection doesn't seem to be a problem.

But if I have two identical tests running subsequently, the times drop dramatically:

So maybe there is just an initial setup cost and then all the tests after that run smoothly. (Note that I'm setting up everything again for each test, but they should all be running int he same JVM.) I suppose there is some allocation or other setup both Jimfs and Mockito need to do. I would hope that when running all the tests within Maven that this one-time setup cost would be carried across all the test so that even tests in other classes would not require this setup cost again.

I tried the exact same tests except using a temporary directory in an actual file system using JUnit @TempDir:

The difference in file system setup times isn't completely correct, because it doesn't take into account whatever JUnit does with @TempDir to set up a local temporary. but I wouldn't imagine it would take over a second to create a local directory.

Interestingly copying to the real file system seemed to be just about as fast as copying to the in-memory file system. Maybe the overhead is more reading from the resources.

Overall with this 200K image I don't see that there is much benefit in using Jimfs over a simple temporary directory in the real file system. Maybe we can try it again when we scale images in GUISE-149. But even scaling images the actual processing is probably more intense than the I/O.

Garret Wilson
November 9, 2020, 2:45 PM

For now to simplify things maybe it's best to go forward with JUnit's @TempDir facility and see what sort of performance penalty if any that brings. We can always switch to Jimfs later.

Garret Wilson
November 9, 2020, 3:08 PM

Maybe it's Mockito I should be worried about. I switched to using an actual subclass instance as a test rather than using a Mockito mock, and even using the file system the test dropped to ~.5s according to the in-Eclipse JUnit. (Using CALLS_REAL_METHODS may or may not be part of the overhead.) To test the abstract image mummifier I think I'll manually make a thin mock.

Garret Wilson
November 15, 2020, 6:45 PM

With the extensive filename extension and media type cleanup in (with improved support for image types), and after further reading on unit test best practices, I'm going to go ahead and ditch file type detection from the start and go with filename extension detection, like the other types. Testing the bytes provides little benefit—who wants source files with incorrect extensions?

We'll can consider adding filename extension case insensitivity, though, to handle things like IMAGE_001.JPG a camera might spit out. That should probably be done in a separate ticket, though, to improve this across the board, normalizing extensions for e.g. page.HTML. But this would potentially cause a problem with links—or maybe not, as we could simply require that links match whatever case is in the source; the links get modified anyway to match whatever the target changes too, so the links would get normalized as well.

Fixed

Assignee

Garret Wilson

Reporter

Garret Wilson

Labels

None

Epic Link

Components

Fix versions

Priority

Critical
Configure