After setting up mysqlplus in my Rails project I ran into an interpreter wide deadlock in certain situations. I isolated it and tracked it down to two things…
… Before I continue the simple solution to this is to use the C implementation of async_query instead of the Ruby version.
class Mysql alias_method :query, :c_async_query # instead of alias_method :query, :async_query end
If you want more details read on…
1) Mysqlplus’ default async_query which is implemented in ruby:
def async_query(sql, timeout = nil)
send_query(sql)
select [ (@sockets ||= {})[socket] ||= IO.new(socket) ], nil, nil, nil
get_result
end
The send_query, get_result, and socket methods are C functions.
and
2) ActiveRecord::Base.clear_reloadable_connections! which is called after every Rails request in development mode:
# actually implemented in connection_pool.rb
def clear_reloadable_connections!
@reserved_connections.each do |name, conn|
checkin conn
end
@reserved_connections = {}
@connections.each do |conn|
conn.disconnect! if conn.requires_reloading?
end
@connections = []
end
The actual code we care about here is conn.disconnect! which will call mysqlplus’ disconnect method which is implemented as a C function. If we have Rails skip the call to disconnect!, no deadlock. Or if we use the C implementation of async_query, which is named c_async_query, no deadlock.
The issue has something to do with calling Ruby’s IO#select on a file descriptor which you are manipulating with native functions. While looking into it I found a separate but related issue. The bug and fix I show below does not actually resolve the deadlock I was running into, but solves a similar one. The mysqlplus deadlock actually does not occur during the async_query’s IO#select but at some later point which I couldn’t exactly determine.
The condition can be reproduced by calling the native C function close() on a file descriptor which Ruby is currently IO#selecting.
require 'socket'
require 'rubygems'
require 'inline'
module C
class << self
inline do |builder|
builder.c %q{
static VALUE native_close(int s) {
close(s);
return Qnil;
}
}
end
end
end
Thread.new { loop { sleep 1; p 1 } }
Thread.new do
loop do
io = TCPSocket.new('google.com', 80)
fd = io.to_i
Thread.new { sleep 0.5; C.native_close(fd) }
# Thread.new { sleep 0.5; io.close }
p :selecting!
rdy = select [io]
p :selected!
end
end
sleep 99999
This example will produce a deadlock on select after the second thread calls native_close. If you swap the close lines so that you are closing the socket with Ruby’s close method instead of the native one you won’t get a deadlock. After lots of debugging I narrowed it to down to what seems to be a bug in Ruby’s rb_thread_schedule function:
n = select(max+1, &readfds, &writefds, &exceptfds, delay_ptr);
if (n < 0) {
// select is returning -1 indicating an error
int e = errno;
The deadlock is actually Ruby calling rb_thread_schedule over and over for the thread which is selecting instead of deferring to let other threads run. I stuffed in a call to perror() and saw that the error is caused by a bad file descriptor. But for some reason Ruby doesn’t handle that error correctly by removing that fd from the fd_set. So I fixed it by going through to determine if there are any bad file descriptors in the set and if so remove them:
Update: Simplified remove_bad_fds function thanks to costan recommending fcntl() over select()
n = select(max+1, &readfds, &writefds, &exceptfds, delay_ptr);
if (n < 0) {
int e = errno;
// ...
if (e == EBADF)
remove_bad_fds(&th->readfds, &th->writefds, &th->exceptfds, max);
// ...
#include <fcntl.h>
static void
remove_bad_fds(fd_set *r, fd_set *w, fd_set *e, int max) {
int fd;
for (fd = 0; fd <= max; fd++)
if (FD_ISSET(fd, r) || FD_ISSET(fd, w) || FD_ISSET(fd, e))
if (fcntl(fd, F_GETFD) < 0 && errno == EBADF) {
FD_CLR(fd, r);
FD_CLR(fd, w);
FD_CLR(fd, e);
}
}
The remove_bad_fds calls fcntl for each fd from the sets to determine if it is bad. If so, it is removed.

[...] I ran into a deadlock with the Ruby implementation of async_query, so use the C one instead. Then add this somewhere so [...]
Pingback by ActiveRecord threading issues and resolutions « coderrr — January 11, 2009 @ 7:08 pm
k I updated it to default to the c_async_query as you described. Let me know if it doesn’t work.
I’d recommend the bug listed be submitted to ruby-core, too :)
-=r
Comment by roger — January 12, 2009 @ 6:46 pm
does 1.9 have this bug? Does the patch work on windoze :) ?
-=r
Comment by roger — January 12, 2009 @ 7:58 pm
Haven’t tried on 1.9, is mysqlplus 1.9 compat?
Windows? lol
Comment by coderrr — January 12, 2009 @ 8:01 pm
yeah mysqlplus will compile on 1.9.
I was wondering more of the
remove_bad_fds
stuff is necessary for 1.9
I almost wonder if, like Python, the select function should raise on the thread that passed it a bad descriptor.
Yeah windows is definitely the step child of Ruby land, though I hear jruby actually runs on it with reasonable speed :)
-=r
Comment by roger — January 12, 2009 @ 8:09 pm
[...] о такой поделке, как драйвер MySqlPlus. Но лично мне, как-то ссыкотно его запускать на [...]
Pingback by MySQL vs PostgreSQL в ActiveRecord | Uniвсячина — February 24, 2009 @ 11:08 pm
I am wondering if the problem was that there was no concurrency lock around the assignment to @sockets ?
Comment by roger — April 18, 2009 @ 11:09 pm
Yea both those unsynchronized ||=’s could be it
Comment by coderrr — April 18, 2009 @ 11:13 pm