Check performance of message retrieval stored procedures for MySQL (#84)
Andrzej Wójcik (Tigase) opened 4 months ago

Fetching messages still causes timeouts on our installation when a lot of data to be retrieved. It would be good to review their performance and usage of indexes.

Andrzej Wójcik (Tigase) changed state to 'In Progress' 4 months ago
Previous Value Current Value
Open
In Progress
Andrzej Wójcik (Tigase) added "Related" tigase-private/systems-maintenance/servers#384 4 months ago
Andrzej Wójcik (Tigase) commented 4 months ago

I've reviewed stored procedures and their SQL queries and I've found a suboptimal usage of indexes in one of the cases. I've modified source code of the stored procedures and committed them in a separate branch named issue-84. Those changes still need to be tested but initial tests confirmed that they improve performance of the data retrieval.

Referenced from commit 4 months ago
Referenced from commit 4 months ago
Referenced from commit 4 months ago
Referenced from commit 4 months ago
Referenced from commit 4 months ago
Referenced from commit 4 months ago
Andrzej Wójcik (Tigase) changed state to 'In QA' 3 months ago
Previous Value Current Value
In Progress
In QA
Wojciech Kapcia (Tigase) commented 2 months ago

Just got a notification from tigase.im:

[2025-01-24 02:52:04:022] [SEVERE  ] [  in_2-message-archive ] StanzaProcessor.processPacket()  : Error retrieving messages from database when processing from=sess-man@ip-172-31-22-51.us-west-2.compute.internal, to=message-archive@ip-172-31-22-51.us-west-2.compute.internal, serverAuthorisedStanzaFrom=Optional.empty, DATA=<iq type="set" xmlns="jabber:client" from="…@sure.im/monocles …" id="…"><query xmlns="urn:xmpp:mam:2" queryid="…"><x type="submit" xmlns="jabber:x:data"><field type="hidden" var="FORM_TYPE"><value>urn:xmpp:mam:2</value></field></x><set xmlns="http://jabber.org/protocol/rsm"><after>d7ae41ff-7f7f-4f0e-b5d6-524b890fc312</after><max>50</max></set></query></iq>, SIZE=398, XMLNS=jabber:client, PRIORITY=NORMAL, PERMISSION=AUTH, TYPE=set, STABLE_ID=null
java.lang.RuntimeException: Error retrieving messages from database
	at tigase.xmpp.mam.modules.QueryModule.process(QueryModule.java:86)
	at tigase.component.modules.StanzaProcessor.execute(StanzaProcessor.java:219)
	at tigase.component.modules.StanzaProcessor.process(StanzaProcessor.java:206)
	at tigase.component.modules.StanzaProcessor.processPacket(StanzaProcessor.java:103)
	at tigase.component.AbstractKernelBasedComponent.processPacket(AbstractKernelBasedComponent.java:81)
	at tigase.server.AbstractMessageReceiver$QueueListener.run(AbstractMessageReceiver.java:1398)
Caused by: tigase.db.TigaseDBException: Cound not retrieve items
	at tigase.archive.db.JDBCMessageArchiveRepository.queryItems(JDBCMessageArchiveRepository.java:212)
	at tigase.archive.db.JDBCMessageArchiveRepository.queryItems(JDBCMessageArchiveRepository.java:53)
	at jdk.internal.reflect.GeneratedMethodAccessor68.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
	at java.base/java.lang.reflect.Method.invoke(Unknown Source)
	at tigase.stats.StatisticsInvocationHandler.invoke(StatisticsInvocationHandler.java:75)
	at jdk.proxy2/jdk.proxy2.$Proxy61.queryItems(Unknown Source)
	at tigase.archive.db.MessageArchiveRepositoryPool.queryItems(MessageArchiveRepositoryPool.java:97)
	at tigase.archive.db.MessageArchiveRepositoryPool.queryItems(MessageArchiveRepositoryPool.java:45)
	at tigase.xmpp.mam.modules.QueryModule.process(QueryModule.java:84)
	... 5 more
Caused by: com.mysql.cj.jdbc.exceptions.MySQLTimeoutException: Statement cancelled due to timeout or client request
	at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:113)
	at com.mysql.cj.jdbc.StatementImpl.checkCancelTimeout(StatementImpl.java:2167)
	at com.mysql.cj.protocol.a.NativeProtocol.sendQueryPacket(NativeProtocol.java:1069)
	at com.mysql.cj.NativeSession.execSQL(NativeSession.java:666)
	at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:930)
	at com.mysql.cj.jdbc.ClientPreparedStatement.executeQuery(ClientPreparedStatement.java:1009)
	at com.mysql.cj.jdbc.CallableStatement.executeQuery(CallableStatement.java:891)
	at jdk.internal.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
	at java.base/java.lang.reflect.Method.invoke(Unknown Source)
	at tigase.db.jdbc.PreparedStatementInvocationHandler.invoke(PreparedStatementInvocationHandler.java:43)
	at jdk.proxy2/jdk.proxy2.$Proxy42.executeQuery(Unknown Source)
	at tigase.archive.db.JDBCMessageArchiveRepository.getItemsItems(JDBCMessageArchiveRepository.java:564)
	at tigase.archive.db.JDBCMessageArchiveRepository.queryItems(JDBCMessageArchiveRepository.java:209)
	... 14 more
Andrzej Wójcik (Tigase) commented 2 months ago

This could still be caused by total number of queries being run on the database. Current version running at tigase.im does not contain a fix for tigase/_server/tigase-push#65, so it is running "statistics" query every second from each cluster node. That with increased usage of tigase.im (usage peak) could cause this query to be slow (or timeout).

At this point I do not see anything more what we could do to improve the situation with optimization of SQL queries.

Wojciech Kapcia (Tigase) commented 2 months ago

This could still be caused by total number of queries being run on the database. Current version running at tigase.im does not contain a fix for tigase/_server/tigase-push#65, so it is running "statistics" query every second from each cluster node. That with increased usage of tigase.im (usage peak) could cause this query to be slow (or timeout).

Should we update the installation just to make sure we are running latest version with all the fixes?

Andrzej Wójcik (Tigase) commented 2 months ago

I think that we should update the installation anyway, as it is running older snapshot build. I've applied fixed that were in SQL stored procedures but any changes in the app code were not applied.

Wojciech Kapcia (Tigase) added "Related" tigase-private/systems-maintenance/servers#447 2 months ago
Wojciech Kapcia (Tigase) commented 2 months ago

tigase.im updated, let's see how it behaves now.

Wojciech Kapcia (Tigase) commented 2 months ago

Again timeout:

[2025-02-06 12:28:04:448] [SEVERE  ] [  in_5-message-archive ] StanzaProcessor.processPacket()  : Error retrieving messages from database when processing from=sess-man@ip-172-31-0-99.us-west-2.compute.internal, to=message-archive@ip-172-31-0-99.us-west-2.compute.internal, serverAuthorisedStanzaFrom=Optional.empty, DATA=<iq xmlns="jabber:client" id="0tHuCT2cpqZI" from="…@tigase.im/blabber.im[3.1.4].z5P2" type="set"><query xmlns="urn:xmpp:mam:2" queryid="…"><x xmlns="jabber:x:data" type="submit"><field var="FORM_TYPE" type="hidden"><value>urn:xmpp:mam:2</value></field></x><set xmlns="http://jabber.org/protocol/rsm"><after>9c495ab8-cecd-4f5c-b8d6-a05e9b7f1301</after><max>50</max></set></query></iq>, SIZE=397, XMLNS=jabber:client, PRIORITY=NORMAL, PERMISSION=AUTH, TYPE=set, STABLE_ID=null
java.lang.RuntimeException: Error retrieving messages from database
	at tigase.xmpp.mam.modules.QueryModule.process(QueryModule.java:86)
	at tigase.component.modules.StanzaProcessor.execute(StanzaProcessor.java:219)
	at tigase.component.modules.StanzaProcessor.process(StanzaProcessor.java:206)
	at tigase.component.modules.StanzaProcessor.processPacket(StanzaProcessor.java:103)
	at tigase.component.AbstractKernelBasedComponent.processPacket(AbstractKernelBasedComponent.java:81)
	at tigase.server.AbstractMessageReceiver$QueueListener.run(AbstractMessageReceiver.java:1398)
Caused by: tigase.db.TigaseDBException: Cound not retrieve items
	at tigase.archive.db.JDBCMessageArchiveRepository.queryItems(JDBCMessageArchiveRepository.java:212)
	at tigase.archive.db.JDBCMessageArchiveRepository.queryItems(JDBCMessageArchiveRepository.java:53)
	at jdk.internal.reflect.GeneratedMethodAccessor74.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
	at java.base/java.lang.reflect.Method.invoke(Unknown Source)
	at tigase.stats.StatisticsInvocationHandler.invoke(StatisticsInvocationHandler.java:75)
	at jdk.proxy2/jdk.proxy2.$Proxy74.queryItems(Unknown Source)
	at tigase.archive.db.MessageArchiveRepositoryPool.queryItems(MessageArchiveRepositoryPool.java:97)
	at tigase.archive.db.MessageArchiveRepositoryPool.queryItems(MessageArchiveRepositoryPool.java:45)
	at tigase.xmpp.mam.modules.QueryModule.process(QueryModule.java:84)
	... 5 more
Caused by: com.mysql.cj.jdbc.exceptions.MySQLTimeoutException: Statement cancelled due to timeout or client request
	at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:113)
	at com.mysql.cj.jdbc.StatementImpl.checkCancelTimeout(StatementImpl.java:2167)
	at com.mysql.cj.protocol.a.NativeProtocol.sendQueryPacket(NativeProtocol.java:1069)
	at com.mysql.cj.NativeSession.execSQL(NativeSession.java:666)
	at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:930)
	at com.mysql.cj.jdbc.ClientPreparedStatement.executeQuery(ClientPreparedStatement.java:1009)
	at com.mysql.cj.jdbc.CallableStatement.executeQuery(CallableStatement.java:891)
	at jdk.internal.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
	at java.base/java.lang.reflect.Method.invoke(Unknown Source)
	at tigase.db.jdbc.PreparedStatementInvocationHandler.invoke(PreparedStatementInvocationHandler.java:43)
	at jdk.proxy2/jdk.proxy2.$Proxy43.executeQuery(Unknown Source)
	at tigase.archive.db.JDBCMessageArchiveRepository.getItemsCount(JDBCMessageArchiveRepository.java:618)
	at tigase.archive.db.JDBCMessageArchiveRepository.queryItems(JDBCMessageArchiveRepository.java:197)
	... 14 more
Andrzej Wójcik (Tigase) commented 2 months ago

In this case, I have no idea what we can do next. We have optimized current schema so it would be as performant as possible and use correct indexes.

The query that is now causing the issue is most likely related to fetching data/messages from the database based on offset, so it shouldn't be that problematic for MySQL unless it is required to fetch quite a lot of messages.

The only other solution that I see right now, would be a change of database schema to denormalize it and change primary keys to reduce indexes sizes, increase performance and maybe that would allow us to sort messages by "ids" instead of timestamps improving query performance as well. But that would be a bigger job and most likely for a "big" release.

Wojciech Kapcia (Tigase) referenced from other issue 2 months ago
Wojciech Kapcia (Tigase) added "Related" #85 2 months ago
Wojciech Kapcia (Tigase) commented 2 months ago

Let's close it for now as it seems to be working well enough. I've created #tigase/_server/tigase-message-archiving#85 as a follwup.

issue 1 of 1
Type
Task
Priority
Normal
Assignee
Version
none
Sprints
n/a
Customer
n/a
Iterations
Issue Votes (0)
Watchers (3)
Reference
tigase/_server/tigase-message-archiving#84
Please wait...
Page is in error, reload to recover