Projects tigase _server server-core Issues #1354
Improve handling of database failovers (connecting to read-only instance due to DNS propagation being to slow on AWS part) (#1354)
wojciech.kapcia@tigase.net opened 2 years ago

In some systems (at least AWS RDS Aurora) during failover it seems that it may happen that the Tigase reconnects to stale, RO instance resulting in exception:

java.sql.SQLException: The MySQL server is running with the --read-only option so it cannot execute this statement

In such case Tigase should try to recreate the connection (with delay?)

(ref: https://github.com/tigase/tigase-server/issues/196)

wojciech.kapcia@tigase.net commented 1 year ago

When trying to add check using drivers feature (Connection.isReadOnly()), while it worked locally (mysql:8.0 and driver 8.0) it seems to be failing when connecting to RDS Aurora (8.0.28) still:

[2023-09-25 15:32:22:683] [SEVERE  ] [             hostnames ] ConfigRepository$1.run()         : exception during reload of config repository items
java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 1
	at com.mysql.cj.protocol.a.NativePacketPayload.readInteger(NativePacketPayload.java:398)
	at com.mysql.cj.protocol.a.ColumnDefinitionReader.unpackField(ColumnDefinitionReader.java:99)
	at com.mysql.cj.protocol.a.ColumnDefinitionReader.read(ColumnDefinitionReader.java:77)
	at com.mysql.cj.protocol.a.ColumnDefinitionReader.read(ColumnDefinitionReader.java:40)
	at com.mysql.cj.protocol.a.NativeProtocol.read(NativeProtocol.java:1587)
	at com.mysql.cj.protocol.a.TextResultsetReader.read(TextResultsetReader.java:68)
	at com.mysql.cj.protocol.a.TextResultsetReader.read(TextResultsetReader.java:48)
	at com.mysql.cj.protocol.a.NativeProtocol.read(NativeProtocol.java:1600)
	at com.mysql.cj.protocol.a.NativeProtocol.readAllResults(NativeProtocol.java:1654)
	at com.mysql.cj.NativeSession.queryServerVariable(NativeSession.java:600)
	at com.mysql.cj.jdbc.ConnectionImpl.isReadOnly(ConnectionImpl.java:1396)
	at com.mysql.cj.jdbc.ConnectionImpl.isReadOnly(ConnectionImpl.java:1389)
	at tigase.db.jdbc.DataRepositoryImpl.checkConnection(DataRepositoryImpl.java:450)
	at tigase.db.jdbc.DataRepositoryImpl.createStatement(DataRepositoryImpl.java:201)
	at tigase.db.DataRepositoryPool.createStatement(DataRepositoryPool.java:163)
	at tigase.db.jdbc.JDBCRepository.getNodeNID(JDBCRepository.java:1114)
	at tigase.db.jdbc.JDBCRepository.getNodeNID(JDBCRepository.java:1159)
	at tigase.db.jdbc.JDBCRepository.getData(JDBCRepository.java:136)
	at tigase.db.jdbc.JDBCRepository.getData(JDBCRepository.java:184)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at tigase.stats.StatisticsInvocationHandler.invoke(StatisticsInvocationHandler.java:75)
	at jdk.proxy2/jdk.proxy2.$Proxy37.getData(Unknown Source)
	at tigase.db.UserRepositoryMDImpl.getData(UserRepositoryMDImpl.java:111)
	at tigase.db.comp.UserRepoRepository.reload(UserRepoRepository.java:69)
	at tigase.vhosts.VHostJDBCRepository.reload(VHostJDBCRepository.java:145)
	at tigase.db.comp.ConfigRepository$1.run(ConfigRepository.java:82)
	at java.base/java.util.TimerThread.mainLoop(Timer.java:566)
	at java.base/java.util.TimerThread.run(Timer.java:516)
wojciech.kapcia@tigase.net commented 1 year ago

The issue with JDBC's java.sql.Connection#isReadOnly throwing exception was most likely caused by concurrency. Additiona, explicity synchronisation on connection object solved the issue.

I also added handling of "read-only exception" as driver doesn't detect "read-only" state of the database correctly (it checks @@session.transaction_read_only and AWS RDS sets @@innodb_read_only variable…)

wojciech.kapcia@tigase.net added "Related" helpdesk#62 8 months ago
wojciech.kapcia@tigase.net batch edited 7 months ago
Name Previous Value Current Value
Iterations
empty
tigase-server-8.4.0
wojciech.kapcia@tigase.net changed state to 'Closed' 7 months ago
Previous Value Current Value
In QA
Closed
wojciech.kapcia@tigase.net added "Related" tigase/_server/tigase-utils#26 4 months ago
wojciech.kapcia@tigase.net added "Related" #1496 4 months ago
issue 1 of 1
Type
Task
Priority
Minor
Assignee
Version
tigase-server-8.4.0
Iterations
Issue Votes (0)
Watchers (2)
Reference
tigase/_server/server-core#1354
Please wait...
Page is in error, reload to recover