Version
- vertx 3.9.7
- kotlin 1.5.0 (tested also on 1.4.32)
- open-jdk-14
Context
I observed a resource leak in JDBCClient, namely in the vertx-lang-kotlin:3.9.7
function
suspend fun SQLClient.getConnectionAwait(): SQLConnection
The problem is if coroutine calling getConnectionAwait()
is cancelled before the handler on the following method is called:
SQLClient getConnection(Handler<AsyncResult<SQLConnection>> handler);
If such situation happens (e.g., calling REST connection is terminated / timed out), the obtained connection, passed to the handler, is not closed and returned to the pool. This causes leakage, depleting all free connections (if using with connection pool like C3P0). The program ends up in the terminal state without usable DB connection.
I believe that the problem lies in the https://github.com/vert-x3/vertx-lang-kotlin/blob/d991b1683c05c6d5081ee2f6513838c9c1dc977c/vertx-lang-kotlin-coroutines/src/main/java/io/vertx/kotlin/coroutines/VertxCoroutine.kt#L63
suspend fun <T> awaitEvent(block: (h: Handler<T>) -> Unit): T {
return suspendCancellableCoroutine { cont: CancellableContinuation<T> ->
try {
block.invoke(Handler { t ->
cont.resume(t) // <---- here, if cont is cancelled, t remains opened
})
} catch (e: Exception) {
cont.resumeWithException(e)
}
}
}
I solved the problem by implementing the coroutine bridging on my own with a if-branch handling cancelled coroutine, such as
try {
block.invoke(Handler { t ->
if (cont.isActive)
cont.resume(t)
} else {
t.close() // <--- close connection if coroutine is already cancelled when handler is invoked
}
})
} catch (e: Exception) {
cont.resumeWithException(e)
}
I noticed that this method is deprecated https://github.com/vert-x3/vertx-jdbc-client/issues/196 https://github.com/vert-x3/vertx-lang-kotlin/blob/c1938f890b711734d3e191d274422d3f3098726a/vertx-lang-kotlin/src/main/kotlin/io/vertx/kotlin/ext/sql/SQLClient.kt#L64-L65 but this may help others having the same issues if they didn't migrate yet.
Also, vertx 3 API is planned to be supported also in vertx 4 so this race condition can potentially affect a lot of users, making their servers unresponsive. Btw was it deprecated also because of this potential issue?
The similar issues can be present also elsewhere in the same pattern, i.e., call await method (coroutine), wait for handler, when handler is called, coroutine is already cancelled, thus resource obtained & returned in the handler is not properly closed.
Do you have a reproducer?
Not yet, I can create one if description is not enough.
Steps to reproduce
- setup MySQL JDBC-backed demo
- spawn 2000 worker threads simulating clients, each querying the database (code below), let clients timeout randomly
- watch number of failures / timeouts
val rnd = Random()
withTimeout(rnd.nextLong().rem(1000L)) {
dbClient.getConnectionAwait().use {
it.queryAwait("SELECT 1")
}
}
wontfix