Switch to angle brackets for email address delimiter.

Description

Currently SURF/TURF are using the caret ^ symbol to denote an email address, such as ^someone@example.com. This has several drawbacks:

  • It's not obvious that the ^ is supposed to represent an airplane symbol, used to indicate "send" in many modern email client IDEs.

  • It doesn't look aesthetically pleasing.

  • It's hard to remember what some would argue is an arbitrary delimiter.

In addition, as set out in the email message specification RFC 2822: Internet Message Format § 3.4. Address Specification), for decades people have been used to identifying email addresses using angle brackets, such Jane Doe <jdoe@example.com>:

Normally, a mailbox is comprised of two parts: (1) an optional display name that indicates the name of the recipient (which could be a person or a system) that could be displayed to the user of a mail application, and (2) an addr-spec address enclosed in angle brackets ("<" and ">").

Of course we already use angle brackets to denote IRIs. Fortunately it is easy to distinguish between the two lexical representations, as discussed at Distinguish between email address and IRI on Stack Overflow.

This means that angle brackets would be overloaded syntax in SURF/TURF. We could have e.g. <https://www.example.com> (an IRI) or <jdoe@example.com>. It would then probably be good to have some type that encompasses both—in any case the syntax grammar will probably need to have some name to stand for either option.

Environment

None

Activity

Garret Wilson 
June 9, 2023 at 10:03 PM

The more I’m considering this, it seems like we should just make the <...> delimiters represent some general “identifier” type (although I’d like to think of some more specific term than “identifier”). The specifications (including SURF) will require that the parser recognize a few specific types and preserve this identification for round-trip serialization:

  • URI

  • email address

  • IP address

  • IPv6 address

  • UUID

  • telephone number

  • general identifier

All these identifiers have orthogonal formats! Note in particular that I added telephone number and UUID to the list. Having a telephone number in the general syntax was always a bit suspect (and easily confused, at least by humans, with negative numbers). The & delimiter wasn’t quite so natural for UUIDs, and having it here will free it up for .

For general identifier strings that don’t fit any of these categories, we’ll probably use the form <`...`> as we’ll be doing for labels in .

Garret Wilson 
June 13, 2020 at 2:05 PM

Regarding IP addresses, see Why IPv6 use colon as delimiter instead of dot?. The article The operational trouble with the IPv6 address format goes over some issues in detail. Most importantly it points out that RFC 6874 allows square brackets around an IPv6 address inside a URI, which suggests that we could do the same thing inside angle brackets to distinguish them from URIs.

Thus eventually we might support the following "net addresses":

  • <https://example.com> (IRI)

  • <jdoe@example.com> (email address)

  • <127.0.0.1> (IP address)

  • <[::1]> (IPv6 address)

Garret Wilson 
June 7, 2020 at 6:02 PM

It's not yet obvious to me what general type to use to include both email addresses and IRIs. One option would be "web address", but technically email is completely separate from the world wide web (although today maybe the term "web" has become more general). And "Internet address" is pretty close to "Internet Protocol address", that is, "IP address", which is a different thing.

(The next obvious question is whether we might also denote IP addresses using <>, in which case "Internet address" might be a good all-encompassing name for all three things. Unfortunately in the discussion on Stack Overflow we realized that there would be no way to distinguish between an IRI and an IPv6 address.)

Details

Assignee

Reporter

Components

Fix versions

Priority

Created June 7, 2020 at 5:58 PM
Updated June 9, 2023 at 10:03 PM