Support UTF-8 Email Addresses #41

Open
opened 1 year ago by jeremy · 17 comments
jeremy commented 1 year ago

As far as I can tell, https://tools.ietf.org/html/rfc6530 provides support for unicode characters in the local part of email addresses. However, it looks like I can't send messages to email addresses with unicode characters, and can't create routing rules for receiving utf-8 emails, either.

Is there any plan to enable this functionality?

As far as I can tell, https://tools.ietf.org/html/rfc6530 provides support for unicode characters in the local part of email addresses. However, it looks like I can't send messages to email addresses with unicode characters, and can't create routing rules for receiving utf-8 emails, either. Is there any plan to enable this functionality?
Scott added this to the Beta milestone 1 year ago
Scott added the
Bug
label 1 year ago
Owner

I think Unicode local addresses should be allowed, but I'll have to see what part of the mailserver library doesn't have it enabled.

I think Unicode local addresses should be allowed, but I'll have to see what part of the mailserver library doesn't have it enabled.
Poster

When I try to send to an account with a unicode email address, I get: image

When I try to make an account routing rule with a unicode email address, I get:
It looks like this part works now, at least on the routing rules table.
When I try to actually send a message from Gmail to a unicode purelymail account, I get:

"""
The response was:

local-part of envelope RCPT address contains utf8 but remote server did not offer SMTPUTF8
Final-Recipient: utf8-addr; 📥@richards.dev
Action: failed
Status: 5.6.7
Remote-MTA: dns; mailserver.purelymail.com. (18.204.123.63, the server for the
domain richards.dev.)
Diagnostic-Code: smtp; local-part of envelope RCPT address contains utf8 but remote server did not offer SMTPUTF8
Last-Attempt-Date: Tue, 12 May 2020 18:15:42 -0700 (PDT)
test
Jeremy Richards
"""

When I try to send to an account with a unicode email address, I get: ![image](/attachments/fccf3370-7762-48f0-8b28-a8b1888d4772) ~~When I try to make an account routing rule with a unicode email address, I get:~~ It looks like this part works now, at least on the routing rules table. When I try to actually send a message from Gmail to a unicode purelymail account, I get: """ The response was: local-part of envelope RCPT address contains utf8 but remote server did not offer SMTPUTF8 Final-Recipient: utf8-addr; 📥@richards.dev Action: failed Status: 5.6.7 Remote-MTA: dns; mailserver.purelymail.com. (18.204.123.63, the server for the domain richards.dev.) Diagnostic-Code: smtp; local-part of envelope RCPT address contains utf8 but remote server did not offer SMTPUTF8 Last-Attempt-Date: Tue, 12 May 2020 18:15:42 -0700 (PDT) test Jeremy Richards """
Owner

Hm, I've checked and it looks like the mailserver library we use doesn't support SMTPUTF8 yet. This might therefore take a while to fix because we'll probably have to add that in (plus IMAP and POP variants) or wait for an update from the library itself. I'd estimate a moderate amount of effort.

Hm, I've checked and it looks like the mailserver library we use doesn't support SMTPUTF8 yet. This might therefore take a while to fix because we'll probably have to add that in (plus IMAP and POP variants) or wait for an update from the library itself. I'd estimate a moderate amount of effort.
Poster

What mailserver library are you using? Just curious.

What mailserver library are you using? Just curious.
Owner

Highly modified Apache James.

Highly modified [Apache James](https://github.com/apache/james-project).
rnkn commented 1 year ago

I have been getting a few sieve failures with error notifications from the server and I think it is related to UTF-8 characters in the sender's name part (not address part), e.g.

An error was encountered while processing this mail with the active sieve script for user "w******@b******n.com". The error encountered was:
Command if (3:1): Test address (3:4): org.apache.james.mime4j.field.address.TokenMgrError: Lexical error at line 1, column 7.  Encountered: "\ufffd" (65533), after : ""

From: Le Cinéma Club <hello@lecinemaclub.com>
Subject: Now Showing: Virgil Vernier's SAPPHIRE CRYSTAL
Date: 16 May 2020 at 1:27:41 am AEST
To: William Rankin <w******@b*******.com>
Reply-To: Le Cinéma Club <h*****@lecinemaclub.com>

Is there a way I can rewrite the sieve to avoid the name part at least for the time being?

I have been getting a few sieve failures with error notifications from the server and I think it is related to UTF-8 characters in the sender's name part (not address part), e.g. ```mime An error was encountered while processing this mail with the active sieve script for user "w******@b******n.com". The error encountered was: Command if (3:1): Test address (3:4): org.apache.james.mime4j.field.address.TokenMgrError: Lexical error at line 1, column 7. Encountered: "\ufffd" (65533), after : "" From: Le Cinéma Club <hello@lecinemaclub.com> Subject: Now Showing: Virgil Vernier's SAPPHIRE CRYSTAL Date: 16 May 2020 at 1:27:41 am AEST To: William Rankin <w******@b*******.com> Reply-To: Le Cinéma Club <h*****@lecinemaclub.com> ``` Is there a way I can rewrite the sieve to avoid the name part at least for the time being?
rnkn commented 1 year ago

Another for comparison:

An error was encountered while processing this mail with the active sieve script for user "w******@b******n.com". The error encountered was:
Command if (3:1): Test address (3:4): org.apache.james.mime4j.field.address.TokenMgrError: Lexical error at line 1, column 14.  Encountered: "\ufffd" (65533), after : ""

From: Melbourne Cinémathèque <m********************@westnet.com.au>
Subject: Reminder: nominations for CTEQ Committee close May 20, 11:00pm. AGM May 27, 6:30pm on Zoom. AGM Report attached.
Date: 17 May 2020 at 10:46:45 pm AEST
To: William <w******@b******n.com>
Reply-To: Melbourne Cinémathèque <m********************@westnet.com.au>
Another for comparison: ```mime An error was encountered while processing this mail with the active sieve script for user "w******@b******n.com". The error encountered was: Command if (3:1): Test address (3:4): org.apache.james.mime4j.field.address.TokenMgrError: Lexical error at line 1, column 14. Encountered: "\ufffd" (65533), after : "" From: Melbourne Cinémathèque <m********************@westnet.com.au> Subject: Reminder: nominations for CTEQ Committee close May 20, 11:00pm. AGM May 27, 6:30pm on Zoom. AGM Report attached. Date: 17 May 2020 at 10:46:45 pm AEST To: William <w******@b******n.com> Reply-To: Melbourne Cinémathèque <m********************@westnet.com.au> ```
Owner

Ah, your issue actually isn't related to this one rnkn. There's no problem with having a Unicode display name; the Sieve parser is just upset that these emails aren't actually formatting properly. I.E. when you have unicode you have to do

From: "Melbourne Cinémathèque" <m********************@westnet.com.au>

instead of

From: Melbourne Cinémathèque <m********************@westnet.com.au>

Or it's technically invalid. But since the job of a mailserver is to begrudgingly accept minor spec violations, I've swapped the address parser used in Sieve for the lenient one, and that should be available in production in about thirty minutes. I'll let you know if it appears to work then.

Ah, your issue actually isn't related to this one rnkn. There's no problem with having a Unicode display name; the Sieve parser is just upset that these emails aren't actually formatting properly. I.E. when you have unicode you have to do ``` From: "Melbourne Cinémathèque" <m********************@westnet.com.au> ``` instead of ``` From: Melbourne Cinémathèque <m********************@westnet.com.au> ``` Or it's technically invalid. But since the job of a mailserver is to begrudgingly accept minor spec violations, I've swapped the address parser used in Sieve for the lenient one, and that should be available in production in about thirty minutes. I'll let you know if it appears to work then.
Owner

rnkn's issue should be fixed now. (Hard to test though, since I don't have any malformed email clients to test with.)

rnkn's issue should be fixed now. (Hard to test though, since I don't have any malformed email clients to test with.)
rnkn commented 1 year ago

Thanks :)
I get these newsletters a few times a week so it shouldn't be too long to confirm.

Thanks :) I get these newsletters a few times a week so it shouldn't be too long to confirm.
rnkn commented 1 year ago

I have since received mail with unquoted UTF-8 name part, so that confirms my unrelated issue is fixed :)

I have since received mail with unquoted UTF-8 name part, so that confirms my unrelated issue is fixed :)
Poster

Is this on Apache James' roadmap? Or yours? Curious to know even rough timeline if one is available.

Is this on Apache James' roadmap? Or yours? Curious to know even rough timeline if one is available.
Owner

I don't think it's likely to be done by Apache James anytime soon. However, I could take a preliminary look myself probably around mid-November. I can't guarantee when it'd be completed though, as I'm just a single dev with lots of things to do.

FYI the reason this is harder than it seems is because the extension involved decided to tack on support for everything in the mail message headers being UTF-8, not just an encoding of the address like for domains, so there's no backwards-compatibility. You can't send a UTF address to a server that doesn't support UTF addresses, and a client that doesn't support UTF can't read UTF mail. (UTF also opens up the usual phishing vulnerability of lookalikes.)

Hopefully anyone with a UTF email address is well aware of these problems and has an alternate.

I don't think it's likely to be done by Apache James anytime soon. However, I could take a preliminary look myself probably around mid-November. I can't guarantee when it'd be completed though, as I'm just a single dev with lots of things to do. FYI the reason this is harder than it seems is because the extension involved decided to tack on support for _everything_ in the mail message headers being UTF-8, not just an encoding of the address like for domains, so there's no backwards-compatibility. You can't send a UTF address to a server that doesn't support UTF addresses, and a client that doesn't support UTF can't read UTF mail. (UTF also opens up the usual phishing vulnerability of lookalikes.) _Hopefully_ anyone with a UTF email address is well aware of these problems and has an alternate.
Poster

Hey Scott, just wondering if there's been any progress on this or if progress is planned. Thanks!

Hey Scott, just wondering if there's been any progress on this or if progress is planned. Thanks!
Owner

It's still on my backlog unfortunately, haven't gotten to it yet.

It's still on my backlog unfortunately, haven't gotten to it yet.

I'm not sure if this is related. I just tried to create a mailbox via IMAP using Apple Mail app which contained a "." in the name. The mailbox was created, but the name was only as far as the "." and everything after the "." is the name of a sub-mailbox that is created within the first mailbox.

As a result when you try to use the Mail Import/Export Tool, it fails if it encounters a mailbox with a "." in the name.

I'm not sure if this is related. I just tried to create a mailbox via IMAP using Apple Mail app which contained a "." in the name. The mailbox was created, but the name was only as far as the "." and everything after the "." is the name of a sub-mailbox that is created within the first mailbox. As a result when you try to use the Mail Import/Export Tool, it fails if it encounters a mailbox with a "." in the name.
Owner

The "." is a folder separator. I should probably figure out how to switch that to something modern like "/".

The "." is a folder separator. I should probably figure out how to switch that to something modern like "/".
Sign in to join this conversation.
No Milestone
No Assignees
4 Participants
Notifications
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
There is no content yet.