Projects tigase _server tigase-pubsub Issues #110
Deadlock in TigPubSubRemoveService SP on MySQL (#110)
Andrzej Wójcik (Tigase) opened 4 years ago

There is a deadlock caused by concurrent execution of this method on a different cluster nodes, but there is no obvious reason of this deadlock.

Andrzej Wójcik (Tigase) commented 4 years ago

I've reviewed the code of a stored procedure and did not find anything causing directly this issue. On MySQL in RDS I've not found any details about this issue.

The only part of the stored procedure which can be problematic is that I'm deleting in 3 transactions one after the other and they are working on the same tables. With concurrent execution of SP from multiple cluster nodes that could lead to the issue (but we should not block the same records), so I've moved that to the single transaction and updated SP on the tigase.im installation.

We will need to wait to confirm if the fix is working.

wojciech.kapcia@tigase.net commented 4 years ago

I've moved that to the single transaction and updated SP on the tigase.im installation.

@andrzej.wojcik it just happened again:

[2020-09-23 18:43:44:708] [WARNING ] [       pool-1-thread-7 ] PubSubComponent.onUserRemoved()  : could not remove PubSub data for removed user blub@sure.im
tigase.component.exceptions.RepositoryException: Node subscribers fragment removing error
	at tigase.pubsub.repository.PubSubDAOJDBC.deleteService(PubSubDAOJDBC.java:1003)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.$$YJP$$invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at tigase.stats.StatisticsInvocationHandler.invoke(StatisticsInvocationHandler.java:75)
	at com.sun.proxy.$Proxy52.deleteService(Unknown Source)
	at tigase.pubsub.repository.PubSubDAOPool.deleteService(PubSubDAOPool.java:515)
	at tigase.pubsub.repository.cached.CachedPubSubRepository.deleteService(CachedPubSubRepository.java:601)
	at tigase.pubsub.cluster.CachedPubSubRepositoryClustered.deleteService(SourceFile:142)
	at tigase.pubsub.PubSubComponent.onUserRemoved(PubSubComponent.java:227)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.$$YJP$$invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at tigase.eventbus.impl.ReflectEventListenerHandler.dispatch(ReflectEventListenerHandler.java:46)
	at tigase.eventbus.impl.EventBusImplementation.lambda$doFireThreadPerHandler$1(EventBusImplementation.java:306)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
	at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:123)
	at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97)
	at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122)
	at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:955)
	at com.mysql.cj.jdbc.ClientPreparedStatement.execute(ClientPreparedStatement.java:372)
	at com.mysql.cj.jdbc.CallableStatement.execute(CallableStatement.java:842)
	at jdk.internal.reflect.GeneratedMethodAccessor35.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at tigase.db.jdbc.PreparedStatementInvocationHandler.invoke(PreparedStatementInvocationHandler.java:38)
	at com.sun.proxy.$Proxy38.execute(Unknown Source)
	at tigase.pubsub.repository.PubSubDAOJDBC.deleteService(PubSubDAOJDBC.java:1000)
	... 21 more
Andrzej Wójcik (Tigase) commented 4 years ago

I've tried to get info from MySQL about a deadlock but AWS RDS restarted cluster nodes to apply patches and there is no longer any info about a deadlock. Will need to wait for another occurrance

Andrzej Wójcik (Tigase) commented 4 years ago

I've grabbed details of the latest deadlock:

LATEST DETECTED DEADLOCK
------------------------
2020-10-04 18:42:47 2b2e19c01700
*** (1) TRANSACTION:
TRANSACTION 9173565060, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 5 lock struct(s), heap size 376, 3 row lock(s)
MySQL thread id 1141, OS thread handle 0x2b2bc0f49700, query id 100103688 10.0.11.245 xmppcloud_tigase updating
delete from tig_pubsub_service_jids where service_jid_sha1 = SHA1(LOWER(_service_jid))
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 277785 page no 38 n bits 157 index `service_jid_sha1` of table `tigasedb`.`tig_pubsub_service_jids` trx id 9173565060 lock_mode X waiting
Record lock, heap no 157 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len=30; bufptr=0x2b2beb11e130; hex= 373762653739616661323565303266393763336565323931633436376236; asc 77be79afa25e02f97c3ee291c467b6; (total 40 bytes);
 1: len=8; bufptr=0x2b2beb11e158; hex= 8000000000000c86; asc         ;;

*** (2) TRANSACTION:
TRANSACTION 9173565061, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 5 lock struct(s), heap size 376, 3 row lock(s)
MySQL thread id 1143, OS thread handle 0x2b2bc0f08700, query id 100103689 10.0.11.245 xmppcloud_tigase updating
delete from tig_pubsub_service_jids where service_jid_sha1 = SHA1(LOWER(_service_jid))
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 277785 page no 38 n bits 157 index `service_jid_sha1` of table `tigasedb`.`tig_pubsub_service_jids` trx id 9173565061 lock mode S
Record lock, heap no 157 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len=30; bufptr=0x2b2beb11e130; hex= 373762653739616661323565303266393763336565323931633436376236; asc 77be79afa25e02f97c3ee291c467b6; (total 40 bytes);
 1: len=8; bufptr=0x2b2beb11e158; hex= 8000000000000c86; asc         ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 277785 page no 38 n bits 157 index `service_jid_sha1` of table `tigasedb`.`tig_pubsub_service_jids` trx id 9173565061 lock_mode X waiting
Record lock, heap no 157 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len=30; bufptr=0x2b2beb11e130; hex= 373762653739616661323565303266393763336565323931633436376236; asc 77be79afa25e02f97c3ee291c467b6; (total 40 bytes);
 1: len=8; bufptr=0x2b2beb11e158; hex= 8000000000000c86; asc         ;;

and it looks like deleting items from other tables while query uses tig_pubsub_service_jids creates shared lock and at the end removal from tig_pubsub_service_jids requires exclusive lock.

To fix that I've added exclusive lock acquiring at tig_pubsub_service_jids at the beginning of the transaction which should serialize them and fix the deadlock.

Fix applied also directly at tigase.im/sure.im/xmpp.cloud installation.

wojciech.kapcia@tigase.net commented 4 years ago

seems to be fixed.

issue 1 of 1
Type
Bug
Priority
Normal
Assignee
Version
tigase-server-8.2.0
Spent time
4h
Issue Votes (0)
Watchers (0)
Reference
tigase/_server/tigase-pubsub#110
Please wait...
Page is in error, reload to recover