Support UTF-8 Email Addresses
As far as I can tell, https://tools.ietf.org/html/rfc6530 provides support for unicode characters in the local part of email addresses. However, it looks like I can't send messages to email addresses with unicode characters, and can't create routing rules for receiving utf-8 emails, either.
Is there any plan to enable this functionality?
I think Unicode local addresses should be allowed, but I'll have to see what part of the mailserver library doesn't have it enabled.
When I try to make an account routing rule with a unicode email address, I get:
It looks like this part works now, at least on the routing rules table.
When I try to actually send a message from Gmail to a unicode purelymail account, I get:
The response was:
local-part of envelope RCPT address contains utf8 but remote server did not offer SMTPUTF8
Final-Recipient: utf8-addr; 📥@richards.dev
Remote-MTA: dns; mailserver.purelymail.com. (18.104.22.168, the server for the
Diagnostic-Code: smtp; local-part of envelope RCPT address contains utf8 but remote server did not offer SMTPUTF8
Last-Attempt-Date: Tue, 12 May 2020 18:15:42 -0700 (PDT)
Hm, I've checked and it looks like the mailserver library we use doesn't support SMTPUTF8 yet. This might therefore take a while to fix because we'll probably have to add that in (plus IMAP and POP variants) or wait for an update from the library itself. I'd estimate a moderate amount of effort.
I have been getting a few sieve failures with error notifications from the server and I think it is related to UTF-8 characters in the sender's name part (not address part), e.g.
An error was encountered while processing this mail with the active sieve script for user "w******@b******n.com". The error encountered was: Command if (3:1): Test address (3:4): org.apache.james.mime4j.field.address.TokenMgrError: Lexical error at line 1, column 7. Encountered: "\ufffd" (65533), after : "" From: Le Cinéma Club <firstname.lastname@example.org> Subject: Now Showing: Virgil Vernier's SAPPHIRE CRYSTAL Date: 16 May 2020 at 1:27:41 am AEST To: William Rankin <w******@b*******.com> Reply-To: Le Cinéma Club <email@example.com>
Is there a way I can rewrite the sieve to avoid the name part at least for the time being?
Another for comparison:
An error was encountered while processing this mail with the active sieve script for user "w******@b******n.com". The error encountered was: Command if (3:1): Test address (3:4): org.apache.james.mime4j.field.address.TokenMgrError: Lexical error at line 1, column 14. Encountered: "\ufffd" (65533), after : "" From: Melbourne Cinémathèque <firstname.lastname@example.org> Subject: Reminder: nominations for CTEQ Committee close May 20, 11:00pm. AGM May 27, 6:30pm on Zoom. AGM Report attached. Date: 17 May 2020 at 10:46:45 pm AEST To: William <w******@b******n.com> Reply-To: Melbourne Cinémathèque <email@example.com>
Ah, your issue actually isn't related to this one rnkn. There's no problem with having a Unicode display name; the Sieve parser is just upset that these emails aren't actually formatting properly. I.E. when you have unicode you have to do
From: "Melbourne Cinémathèque" <firstname.lastname@example.org>
From: Melbourne Cinémathèque <email@example.com>
Or it's technically invalid. But since the job of a mailserver is to begrudgingly accept minor spec violations, I've swapped the address parser used in Sieve for the lenient one, and that should be available in production in about thirty minutes. I'll let you know if it appears to work then.
rnkn's issue should be fixed now. (Hard to test though, since I don't have any malformed email clients to test with.)
I have since received mail with unquoted UTF-8 name part, so that confirms my unrelated issue is fixed :)
Is this on Apache James' roadmap? Or yours? Curious to know even rough timeline if one is available.
I don't think it's likely to be done by Apache James anytime soon. However, I could take a preliminary look myself probably around mid-November. I can't guarantee when it'd be completed though, as I'm just a single dev with lots of things to do.
FYI the reason this is harder than it seems is because the extension involved decided to tack on support for everything in the mail message headers being UTF-8, not just an encoding of the address like for domains, so there's no backwards-compatibility. You can't send a UTF address to a server that doesn't support UTF addresses, and a client that doesn't support UTF can't read UTF mail. (UTF also opens up the usual phishing vulnerability of lookalikes.)
Hopefully anyone with a UTF email address is well aware of these problems and has an alternate.
Deleting a branch is permanent. It CANNOT be undone. Continue?