Extract image metadata.

Description

When a site is being planned, an image mummifier should extract relevant information from supported images.

Probably at least Exif metadata from JPEG should be supported.

What "relevant" means is yet to be determined. It may parallel how HTML pages work, accepting any parseable metadata. In particular some some sort of title and copyright should be detected and extracted.

Environment

None

Activity

Show:
Garret Wilson
November 30, 2020, 4:09 PM
Edited

We may need to clarify "title" a bit. I said above that Carl Seibert was conflating caption and description, and indeed the Metadata Working Group's in § 5.2 Description says:

This area defines the textual description of a resource's content. Also known as “user comment”, “caption”, “abstract” or “description”. Today, this information is represented in different ways; sometimes integrated and displayed as one field – at other times revealed separately. This document combines the different sources into one overall representation, called “Description”.

So it seems that "caption" and "description" may be the same, but that may not be the same as "title". The "title" could be like a work of art, and the "caption" like its description, as explained related to SmugMug in a Digital Grin forum posting (although some claim there is a different convention in press and editorial photography). And it's also revealing that for XMP the dc:description property is used for caption/description; in fact there is an XMP dc:title (the normal Dublin Core dc:title that we've all come to know and love).

Therefore I'm thinking of letting "description" be "description" and using dc:title for title. There is also an Exif XPTitle which Windows added; we should probably avoid it. IPTC mentions a core Title property, but this is apparently just an XMP property using the http://iptc.org/std/Iptc4xmpCore/1.0/xmlns/ namespace, not something in the original IPTC-IIM.

So we have:

title

  1. XMP: dc:title

description

  1. XMP: dc:description["x-default"]

  2. IPTC: Caption (IIM 2:120, 0x0278)

  3. Exif: ImageDescription (270, 0x010E)

Garret Wilson
November 30, 2020, 4:13 PM

ExifTool provides a wonderful reference page on tag names for various vocabularies.

And while I'm thinking of it, according to Adobe the XMP namespace used for Dublin Core is http://purl.org/dc/elements/1.1/, which matches the DCMI Metadata Terms unchanged since 2000.

Garret Wilson
November 30, 2020, 4:50 PM
Edited

Confusing things yet again, digiKam provides access to dc:Title under "Status" and not "Content". Under the main "Content" it actually has a Headline, which seems to be from a separate XMP namespace for Photoshop. There is also an IPTC-IIM Headline, and what digiKam calls a "Title" but shows up as ObjectName. Nevertheless we should probably keep this simple and stick to dc:Title for the title. But looking further IPTC says that Core Title is the same as IPTC-IIM 2:05 Object Name, so we should probably use that as a fallback.

title

  1. XMP: dc:title

  2. IPTC: ObjectName (IIM 2:05, 0x205)

Garret Wilson
December 1, 2020, 4:24 PM

On the ExifTool forum there is a discussion (later than Phil Harvey's comments above about the XP fields) on how Windows populates various Exif fields if edited in the Windows Explorer properties dialog. (I never realized this was editable!) The details are interesting, but for purposes here it is notable that Windows is still using the XP* Exif properties, and so we might as well use them as an ultimate fallback.

Of course it is unlikely that they would ever be set by themselves, as editing the title in Windows seems to set several properties including XMP dc:Title. But it's possible the other blocks could have been erased or whatever. If nothing else it's good to have something to fall back to for Exif title.

title

  1. XMP: dc:title

  2. IPTC: ObjectName (IIM 2:05, 0x205)

  3. Exif: XPTitle (0x9C9B)

Garret Wilson
December 5, 2020, 5:33 PM

I'm going to forgo any creator property for the moment. We need to make decisions about HTML5 author versus a general creator, as well as use of Dublin Core metadata such as dc:creator. I'm going to add a general copyright property across Guise Mummy, but the same questions apply regarding HTML5 author and the use of Dublin Core dc:rights.

Fixed

Assignee

Garret Wilson

Reporter

Garret Wilson

Labels

None

Epic Link

Components

Fix versions

Priority

Major
Configure