Projects tigase _server server-core Issues #630
Unicode emojis not allowed (#630)
Daniele Ricci opened 9 years ago

I just found out with latest release branch Tigase is disconnecting my client for "XML content parse error". I've investigated and I can regolarly reproduce it if I put emojis in the XML stream (messages, presence status element, etc.).

I don't think a log is needed, but if you need one I can restart my server with more verbose logging to prove it.

I've been told by another XMPP software maintainer that Openfire had a similar issue and they had to fix it to allow a more wide range of UTF-8 characters.

I'm marking this bug as High becuase it casues clients to disconnect.

Andrzej Wójcik (Tigase) commented 9 years ago

This issue is caused by fact that we added validation of allowed chars in XML 1.0 specification which is declared by XMPP protocol. From protocol specification this is proper behavior of Tigase XMPP Server but it may not be good to end user.

Daniele Ricci commented 9 years ago

Andrzej Wójcik wrote:

This issue is caused by fact that we added validation of allowed chars in XML 1.0 specification which is declared by XMPP protocol. From protocol specification this is proper behavior of Tigase XMPP Server but it may not be good to end user.

Ok so what do you suggest?

From the client side, I could encode those characters as entities (&#xxxx;), but isn't that allowed for HTML only? I can't remember...

Artur Hefczyc commented 9 years ago

It depends on what the spec says. I think I saw some discussion about this on one of the XMPP mailing lists. If the emoji characters are allowed or are planed to be allowed for XMPP, then we should update our code to let them through. I think it is a safe assumption that these new emoji characters will be part of the XMPP spec at some point.

Andrzej, what do you think? If you think, there are negative, side effects of adding these chars to allowed set, please update our code.

Florian Schmaus commented 9 years ago

If the emoji characters are allowed or are planed to be allowed for XMPP

XMPP has always been allowing Unicode emojis, because XML 1.0 § 2.2 allows #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] as codepoints in text and this includes most emojis. The [#x10000-#x10FFFF] range needs special treatment in Java, since you have to handle surrogate pairs, which is often simply forgotten when implementing some sort of XML validator. So if I can disconnect a stream by sending e.g. U+1F4A9, then this is definitely a Tigase issue. Openfire had the same issue, which was fixed with https://github.com/igniterealtime/Openfire/commit/c0a4fc1889d4cced817fc2bee8e0e4a92e06ba60

Andrzej Wójcik (Tigase) commented 9 years ago

I fixed issue by change in implementation of validation of chars.

This change will be part of next snapshot build

Daniele Ricci commented 9 years ago

Thanks for the quick fix Andrzej, I will try it tomorrow.

Daniele Ricci commented 9 years ago

I've been running my server for a couple days with the fix and I can confirm it works correctly now.

Thanks!

issue 1 of 1
Type
Bug
Priority
Major
Assignee
RedmineID
3838
Spent time
11h 15m
Issue Votes (0)
Watchers (0)
Reference
tigase/_server/server-core#630
Please wait...
Page is in error, reload to recover