archiveMessage: Data truncation: Data too long for column `_body` (#65)
Closed
wojciech.kapcia@tigase.net opened 4 years ago

It's not possible to store large messages (in the notification it was roughly 200K) and exception is thrown.

log: [2020-08-13 08:30:55:956] [FINEST  ] [  in_1-message-archive ] MessageDeliveryLogic.handleDelivery(): Message 'to' this user, packet: from=sess-man@ip-10-0-44-219.us-west-2.compute.internal, to=sess-man@ip-10-0-11-90.us-west-2.compute.internal, DATA=[message …/message]
com.mysql.cj.jdbc.exceptions.MysqlDataTruncation: Data truncation: Data too long for column '_body' at row 1
	at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:104)
	at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:955)
	at com.mysql.cj.jdbc.ClientPreparedStatement.executeUpdateInternal(ClientPreparedStatement.java:1094)
	at com.mysql.cj.jdbc.ClientPreparedStatement.executeUpdateInternal(ClientPreparedStatement.java:1042)
	at com.mysql.cj.jdbc.ClientPreparedStatement.executeLargeUpdate(ClientPreparedStatement.java:1345)
	at com.mysql.cj.jdbc.CallableStatement.executeLargeUpdate(CallableStatement.java:2526)
	at com.mysql.cj.jdbc.CallableStatement.executeUpdate(CallableStatement.java:896)
	at jdk.internal.reflect.GeneratedMethodAccessor26.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:567)
	at tigase.db.jdbc.PreparedStatementInvocationHandler.invoke(PreparedStatementInvocationHandler.java:38)
	at com.sun.proxy.$Proxy35.executeUpdate(Unknown Source)
	at tigase.archive.db.JDBCMessageArchiveRepository.archiveMessage(JDBCMessageArchiveRepository.java:325)
	at tigase.archive.db.JDBCMessageArchiveRepository.archiveMessage(JDBCMessageArchiveRepository.java:50)
	at tigase.archive.db.AbstractMessageArchiveRepository.archiveMessage(AbstractMessageArchiveRepository.java:94)
	at tigase.archive.db.JDBCMessageArchiveRepository.archiveMessage(JDBCMessageArchiveRepository.java:121)
	at jdk.internal.reflect.GeneratedMethodAccessor70.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:567)
	at tigase.stats.StatisticsInvocationHandler.invoke(StatisticsInvocationHandler.java:75)
	at com.sun.proxy.$Proxy47.archiveMessage(Unknown Source)
	at tigase.archive.db.MessageArchiveRepositoryPool.archiveMessage(MessageArchiveRepositoryPool.java:66)
	at tigase.archive.modules.ArchivingModule.process(ArchivingModule.java:101)
	at tigase.component.modules.StanzaProcessor.process(StanzaProcessor.java:164)
	at tigase.component.modules.StanzaProcessor.processPacket(StanzaProcessor.java:85)
	at tigase.component.AbstractKernelBasedComponent.processPacket(AbstractKernelBasedComponent.java:81)
	at tigase.server.AbstractMessageReceiver$QueueListener.run(AbstractMessageReceiver.java:1405)

Obviously this is not ideal and leads to data loss and we should somehow handle it. The most straightforward solution would be probably changing column type. However I think that we also should impose some sort of limitations to avoid archiving who knows what (and maybe gracefully inform user that sent message was not archived?).

Andrzej Wójcik (Tigase) commented 4 years ago

It is not always possible to inform the user as ie. sender may not be in our roster (ie. spammer) and we do not want to inform him that that account exists.

The issue is caused by usage of text as a field type, which has a limit of 64KB and that is MySQL specific (I hate restrictions imposed on ANSI types!). Most likely we should hange it to mediumtext (16MB - bigger than allowed size of a single stanza) or longtext (4GB).

Note to self: The same issue may apply to updated PubSub schema and needs to be evaluated!

@wojtek Should we use mediumtext?

wojciech.kapcia@tigase.net commented 4 years ago

As we discussed, text is not ANSI SQL type (and it only happens that MSSQL and Postgres decided to make it "unlimited").

I suggested using mediumtext - should be enough (especially considering stanza limitation)

Andrzej Wójcik (Tigase) commented 4 years ago

I've changed data types for body and msg columns from text to mediumtext and updated procedures to match this new type in message archiving component and in unified archive component (both we using those fields).

I've also reviewed PubSub schema as there were recent changes and found an issue with PubSub node creation procedure, which I've fixed as well.

Andrzej Wójcik (Tigase) commented 4 years ago

Everything should be working now with stanzas not bigger than 16MB (so 16000000 of bytes, not chars).

issue 1 of 1
Type
Bug
Priority
Normal
Assignee
Version
tigase-server-8.2.0
Issue Votes (0)
Watchers (0)
Reference
tigase/_server/tigase-message-archiving#65
Please wait...
Page is in error, reload to recover