Switch to angle brackets for email address delimiter.
Description
Environment
Activity
Garret Wilson June 9, 2023 at 10:03 PM
The more I’m considering this, it seems like we should just make the <...>
delimiters represent some general “identifier” type (although I’d like to think of some more specific term than “identifier”). The specifications (including SURF) will require that the parser recognize a few specific types and preserve this identification for round-trip serialization:
URI
email address
IP address
IPv6 address
UUID
telephone number
general identifier
All these identifiers have orthogonal formats! Note in particular that I added telephone number and UUID to the list. Having a telephone number in the general syntax was always a bit suspect (and easily confused, at least by humans, with negative numbers). The &
delimiter wasn’t quite so natural for UUIDs, and having it here will free it up for .
For general identifier strings that don’t fit any of these categories, we’ll probably use the form <`...`>
as we’ll be doing for labels in .
Garret Wilson June 13, 2020 at 2:05 PM
Regarding IP addresses, see Why IPv6 use colon as delimiter instead of dot?. The article The operational trouble with the IPv6 address format goes over some issues in detail. Most importantly it points out that RFC 6874 allows square brackets around an IPv6 address inside a URI, which suggests that we could do the same thing inside angle brackets to distinguish them from URIs.
Thus eventually we might support the following "net addresses":
<https://example.com>
(IRI)<jdoe@example.com>
(email address)<127.0.0.1>
(IP address)<[::1]>
(IPv6 address)
Garret Wilson June 7, 2020 at 6:02 PM
It's not yet obvious to me what general type to use to include both email addresses and IRIs. One option would be "web address", but technically email is completely separate from the world wide web (although today maybe the term "web" has become more general). And "Internet address" is pretty close to "Internet Protocol address", that is, "IP address", which is a different thing.
(The next obvious question is whether we might also denote IP addresses using <>
, in which case "Internet address" might be a good all-encompassing name for all three things. Unfortunately in the discussion on Stack Overflow we realized that there would be no way to distinguish between an IRI and an IPv6 address.)
Currently SURF/TURF are using the caret
^
symbol to denote an email address, such as^someone@example.com
. This has several drawbacks:It's not obvious that the
^
is supposed to represent an airplane symbol, used to indicate "send" in many modern email client IDEs.It doesn't look aesthetically pleasing.
It's hard to remember what some would argue is an arbitrary delimiter.
In addition, as set out in the email message specification RFC 2822: Internet Message Format § 3.4. Address Specification), for decades people have been used to identifying email addresses using angle brackets, such
Jane Doe <jdoe@example.com>
:Of course we already use angle brackets to denote IRIs. Fortunately it is easy to distinguish between the two lexical representations, as discussed at Distinguish between email address and IRI on Stack Overflow.
This means that angle brackets would be overloaded syntax in SURF/TURF. We could have e.g.
<https://www.example.com>
(an IRI) or<jdoe@example.com>
. It would then probably be good to have some type that encompasses both—in any case the syntax grammar will probably need to have some name to stand for either option.