June 30, 2011

Patching The Bitcoin Client To Make It More Anonymous

Filed under: anonymity, bitcoin, cplusplus, patch — Tags: , , , — coderrr @ 5:18 pm

Shameless Plug: Hide your IP while connected to the Bitcoin P2P network with a VPN Service.

TLDR: this patch allows you to …
– see all addresses, including change
– see which addresses are linked together (does recursive expansion of address linkages)
– select which address(es) to send from, rather than letting the client to chose (randomly) for you

Bitcoin is a decentralized, peer to peer, digital currency. It has been referred to as anonymous, pseudo-anonymous, pseudonymous (whatever that means), and not anonymous at all. It seems there is a lot of misinformation about exactly how anonymous it is and how its anonymity works. I’m going to try to explain part of that here and provide a solution to one of the current big killers of its anonymity.

When you receive coins at a new Bitcoin address, that is, one you’ve never used before, the fact that you control that address is completely unknown to anyone except the sender (and anyone the sender leaked that info to). And the sender may not even know your actual identity, depending on if you revealed this to them or not. If you receive another payment at that same address then both the first and second payers will be able to see that both of them payed you at that address. This is due to how the Bitcoin block chain works and is why you are advised to create a new address for each new payment you wish to receive.

So assume you’ve created 100 addresses for 100 payments. Each of the 100 people know they paid you once, but they don’t know that 99 other people paid you or how much those payments were or how much you have total. So you have revealed very little about yourself to anyone.

Now let’s say you want to _make_ some payments or even just re-organize your funds by moving them to another address. This is where things get tricky and you start losing anonymity. The official Bitcoin client picks coins from multiple addresses in a random fashion when making payments. So let’s say you have those 100 payments from 100 different people each attached to their own address sitting in your wallet and now you want to send Wikileaks some coins. The Bitcoin client might chose coins from 3 of those incoming payments to send out. Now all 3 of the people who sent you those payments know that you received at least 3 payments, how much they were for, and when you received them.

Let me give you a scarier example. Let’s say you have 1 million dollars worth of Bitcoin sitting in one address from some withdrawals on a Bitcoin exchange. Now let’s say you have an address you use for donations, and assume you’ve gotten at least one. The next time you want to send some coins to someone, your client may pick a few coins from your million dollar address and a few coins from your donation address. This is a big problem because it gives the people who’ve donated coins the knowledge that you are also in control of the million dollar address. Plus if your donation address is publicly associated with your identity not only the donors but anyone can go through the block explorer to see which other addresses you are in control of and what their balances are.

Here is a related excerpt from the bitcoin wiki

… if one has bitcoins on several addresses, one can theoretically choose from which address to send the coins. Choosing personally generated coins or an address that you know doesn’t reveal information would protect you. Unfortunately, the default Bitcoin client doesn’t support this currently, so you must assume that your entire balance can identify you if any of the addresses can.

So what can you do about this? If you don’t have any Bitcoin yet then you can just make sure to use separate wallets for addresses you don’t want being mixed together. If you’re already in the position where you have public and private funds in the same wallet there’s not much you can do with the official Bitcoin client, other than not send coins to anyone (or yourself).

That’s why I’ve made a patch to the official client which allows you to send from _only_ a single specific address. Now you can be sure the only people who will ever know that you made that transaction are the ones who already knew about the address being under your control. If you did things right, this will only be a single person.

I’ve added a ‘Send From Address’ tab to the main interface. It actually contains information which was impossible to get from the client before. That is, every address in your wallet and the balance thereof. This includes addresses which were created for the change of your outgoing transactions. These were previously nowhere to be found in the client (even using the bitcoind RPC interface).

Simply chose the address you wish to send from and double click it. This will open the Send dialog with the Send From address filled in. If you try to send more coins than are available in that address the transaction will simply fail and you can try again. Leaving the Send From address blank will make the client behave normally and possibly pick coins from multiple addresses.

The second version of my Bitcoin client patch gives you a better view of your current address linkages. If any two or more addresses were used together for an outgoing transaction those will be considered linked. If any change is returned from an outgoing transaction that change address will be considered linked to all the originating addresses.

The ‘Send From Address’ tab now groups together linked addresses. Each group is separated by an empty line. I’ve also added a ‘Label’ column which will show you the label for the address if one has been set in the ‘Address Book’. Since your receiving addresses usually have labels this makes it easy to see which other addresses they have been linked to.

Sending from multiple addresses is now supported. Simply use the CTRL key to select multiple addresses then click the ‘Send’ button. The addresses will appear in the ‘Send From’ textbox separated by semicolons. Note, this DOES NOT guarantee all the addresses you selected will be used for the transaction. But it DOES guarantee that no unselected addresses will be used. As before, if you leave the ‘Send From’ field blank the client will fall back to its default behavior.

Version 3 of the patch now contains command line support for bitcoind:

bitcoin listaddressgroupings
bitcoin sendtoaddress <bitcoinaddress>[:<sendfromaddress1>[,<sendfromaddress2>[,…]]] <amount> [comment] [comment-to]

Add a +1 to the pull request if you believe this should be added to the official client:

My github bitcoin fork:
The commits with the changes:…v0.5.3+coderrr
Compiled Windows 32bit client:
Compiled Linux 64bit client:

May 3, 2011

Beware of Thread#kill or your ActiveRecord transactions are in danger of being partially committed

Filed under: concurrency, patch, rails — Tags: , , — coderrr @ 3:25 pm

Firstly, before you all say you should never use Thread#kill and link to Headius post, this is not the same issue which he explains, but it is somewhat related.

Secondly, before you ask why are you using Thread#kill anyway, or why should we worry about anyone who is. There is one scenario where Threads being #killed is quite likely and there are probably a bunch of people who run into it without knowing. When a ruby process terminates all Threads other than the main one are #killed (the main Thread gets an exception raised). So if you had a ruby process with a bunch of threads and this process received a TERM signal, all those Threads are going to get #killed unless you specifically handle the TERM signal and instruct/wait for all threads to finish. That is of course what you should do but I am guessing there are plenty of people who don’t.

ActiveRecord transactions and Thread#kill

First let me say your ActiveRecord transactions could already be partially committed if you use Thread#kill or Thread#raise due to the reasons Headius’ explains. But the chance of this happening due to those reasons is lower than due to this new one. Here’s the ActiveRecord transaction code. I’ve posted the Rails 2.2 version because it’s much simpler and less code than the current 3.0 source. But 3.0 is still vulnerable to the same issue:

      def transaction(start_db_transaction = true)
        transaction_open = false
          if block_given?
            if start_db_transaction
              transaction_open = true
        rescue Exception => database_transaction_rollback
          if transaction_open
            transaction_open = false
          raise unless database_transaction_rollback.is_a? ActiveRecord::Rollback
        if transaction_open
          rescue Exception => database_transaction_rollback

Now imagine you’re using it like this

ActiveRecord::Base.transaction do
  user_1.update_attribute :something, 1
  user_n.update_attribute :something, n

If Thread#kill is called anywhere inside of that block the rest of the updates will not occur but the transaction will be committed.

How does it happen?

The reason this happens and why it’s a little unexpected is because Thread#kill will bypass all code _including_ rescue blocks and jump immediately to ensure blocks. Many people code with the assumption if an ensure block is hit without a rescue block being hit the code must have finished executing. And this is the assumption made in the ActiveRecord transaction code. Imagine we move from the middle of the block passed to #transaction (where Thread#kill was called) down to the start of the ensure block. transaction_open will evaluate to true (which it would not have were an exception raised since the rescue block sets transaction_open to false) and commit_db_transaction will be called.

In contrast, for this to occur due to the Headius’ issue first an exception would have to be raised inside the transaction block and _then_ Thread#raise or Thread#kill would have to be called inside the rescue block before transaction_open is set to false, a much smaller area of attack.

Can this be fixed?

Yes and no. Setting a local variable to true after the yield statement is a simple way to determine if the block passed to yield ran to completion:

  completed = true
  puts "entire block ran!"  if completed

This assumes the code in the block doesn’t use certain flow control statements. Those statements are return, break, next, retry, redo and throw. If one of these statements is used to break out of the block prematurely it will effectively act just like Thread.kill. The interpreter will immediately jump to the ensure block bypassing everything else including the local variable being set. Which means if you relied on the local variable being set to commit the transaction then any transaction block with one of those statements would always be rolled back.

In MRI 1.8 and Rubinius Thread.current.status will be set to ‘aborting’ if the current thread is in the state of being killed. So we can detect if we are in the ensure block because of a Thread#kill rather than one of the flow control statements.

  puts "enitre block ran!"  if Thread.current.status != 'aborting'

The problem is this doesn’t work at all in 1.9 and isn’t consistent in JRuby. If all interpreters behaved as 1.8 does then fixing this issue would be trivial and have no downsides. Knowing whether you are in an ensure block prematurely because the programmer intended so via flow control statements versus due to an aborting thread seems fairly important. So I believe that 1.9 and JRuby should copy this behavior or at least provide some other similar mechanism.

The Patch

Here’s a patch which fixes the issue for MRI 1.8 and Rubinius:

Thanks to Marcos Toledo for brainstorming the issue with me and helping me proofread

May 5, 2009

ActiveRecord’s with_connection is now useful!

Filed under: concurrency, patch, rails — Tags: , , — coderrr @ 10:27 pm

A few days ago my with_connection (originally cleanup_connection) patch finally went in to rails trunk. It changes the way with_connection works a little bit, but should be mostly backwards compatible.

Note, you’re probably only going to care about this if you use ActiveRecord in long running threads.

Before this patch, with_connection was useless for most Rails devs’ situations unless we wanted to execute raw SQL:

ActiveRecord::Base.connection_pool.with_connection do |conn|
  User.find(1)  # this will NOT use the connection passed to the block (up to Rails 2.3.x), it 
                    # will create a new one, or use the one that existed before we called with_connection
  conn.execute("select raw sql here")  # this is the only way to use the connection

So that kinda sucked. Because the idea of with_connection is exactly what you want when you have lots of long running threads. You want to check out a connection, use ActiveRecord as you normally would (and have it use that new connection), and then check it back in whenever you’re done instead of leaving it lying around. This way you could have hundreds of threads using ActiveRecord sporadically with a small sized connection pool.

That’s what this patch allows you to do. Now with_connection will checkout a connection and set it as the main connection for the current thread (if it doesn’t already exist). If a connection already exists it will do nothing. This has the nice effect of allowing you to wrap your ActiveRecord code as closely as possible with with_connection. If with_connection always checked out a new connection no matter what, then you’d want to wrap your code at as high a level as possible which in turn would mean you would be keeping connections checked out of the pool when you didn’t really need them.

Here’s a (very contrived) example:

# shorthand
def db(&b); ActiveRecord::Base.connection_pool.with_connection(&b); end

class User
  def update_score
    db do
      self.score = calculate_score

  def calcuate_score
    db { self.score_cards.sum(&:points) }

100.times { User.all.each {|u| { u.update_score } } }

In this example we could call calculate_score directly, or indirectly through update_score, and we would never checkout more than one connection from the pool. We could also feel free to call these methods from any number of threads and know that when the calls finish their connections will be checked back into the pool without having to deal with any special ActiveRecord cleanup methods.

If this is the type of thing you are doing or want to be doing but you don’t want to actually to have wrap all your code in those annoying blocks, you can check out this monkeypatch (blog post here) which essentially wraps all DB touching ActiveRecord methods with with_connection for you.

March 29, 2009

Ruby 1.8 define_method scope bug

Filed under: bug, c, patch, ruby — Tags: , , , — coderrr @ 3:04 am

Update: Patch accepted, bug fixed!

While writing some tests for my tunnel_splitter project I ran into this really weird bug. It’s hard to believe no one has run into this before. I’ll give you the isolated version (which took considerable time to narrow down).

a = 1
Object.send :define_method, :x do
  lambda do  # lambda is necessary to reproduce bug
    p a # => 1
    a = 2
    p a # => 2
x(nil) # passing at least one arg is necessary to reproduce bug
p a # => 1

This is wrong. Instead of 1, 2, 1, the output should be 1, 2, 2.

Here’s another reproduction, closer to the actual way I encountered the bug. This one is sensitive to the timing of when threads are scheduled. By commenting out the p :blah line you can actually prevent the bug from happening. That makes this bug way more evil and caused me considerable pain.

def callblock; yield; end

a = 1
Object.send :define_method, :x do
  p :blah # comment this line out and things will work as expected
    callblock do
      raise 'blah'
    p a
    a = 2
    p a
end{ x(nil) }.join
p a # => 1

Offender #1
eval.c:8626 proc_invoke()

    if (_block.frame.argc && DMETHOD_P()) {
        NEWOBJ(scope, struct SCOPE);
        OBJSETUP(scope, tmp, T_SCOPE);
        scope->local_tbl = _block.scope->local_tbl;
        scope->local_vars = _block.scope->local_vars;
        scope->flags |= SCOPE_CLONE;
        _block.scope = scope;

If the block being invoked was a block passed to define_method (DMETHOD_P()) and it is being invoked with some number of args other than 0 (_block.frame.argc) then Ruby will “clone” the scope. Effectively, this doesn’t really do much. It creates a new scope struct, but then it points the local vars tables to the exact same place as the original, meaning if you modify a local in this scope, it will be reflected in the original scope. The issue is that it loses the flags of the original scope. We will see why this is important in a second.

Offender #2

static void
    struct SCOPE *scope;
    ID *tbl;
    VALUE *vars;

    scope->flags |= SCOPE_DONT_RECYCLE;
    if (scope->flags & SCOPE_MALLOC) return;

    if (scope->local_tbl) {
        tbl = scope->local_tbl;
        vars = ALLOC_N(VALUE, tbl[0]+1);
        *vars++ = scope->local_vars[-1];
        MEMCPY(vars, scope->local_vars, VALUE, tbl[0]);
        scope->local_vars = vars;
        scope->flags |= SCOPE_MALLOC;

scope_dup will copy the local_vars (local variables’ values) table into a new memory address. Whenever this is done, changes to the variables in this scope will NOT affect the original scope’s variables (unless they are explicitly copied/pointed back). Note that scope_dup will return right away and do nothing if the scope has the SCOPE_MALLOC flag set, which scope_dup sets after duping a scope, effectively making it perform only one dup per scope, even if called multiple times.

Offender #1 + Offender #2 = fail

Each of these pieces of code by themselves will not cause the bug to occur. But put them together, specifically the scope cloning conditional and then the scope_dup function, and voila, your weekend has become an MRI debug session.

The issue is that the proc_invoke() scope cloning conditional will not copy the parent scope’s SCOPE_MALLOC flag. Which means if you call scope_dup on a cloned scope you will end up duping the local variables even if they had already been duped once before. Essentially you will end up duping a scope’s local variables twice, which means the parent scope’s variables will not be affected at all by changes to the current scope’s local variables.


If we just copy the SCOPE_MALLOC flag when cloning a scope all our problems are solved:

        scope->local_vars = _block.scope->local_vars;
        scope->flags |= SCOPE_CLONE;
        scope->flags |= (_block.scope->flags & SCOPE_MALLOC);  // add this line
        _block.scope = scope;

This way if a duped scope is cloned then duped, the local var tables will still be pointing to the same place.

By itself this actually causes ruby to segfault due to multiple free()s on the same memory.


            if (!(RANY(obj)->as.scope.flags & SCOPE_CLONE) && vars[0] == 0)
            if (RANY(obj)->as.scope.flags & SCOPE_MALLOC))
                RUBY_CRITICAL(free(vars)); // double free segfault here!

We can modify the second conditional to make sure the scope doesn’t have the SCOPE_CLONE flag before free()ing.

            if (!(RANY(obj)->as.scope.flags & SCOPE_CLONE) && vars[0] == 0)
            if (RANY(obj)->as.scope.flags & SCOPE_MALLOC)) && !(RANY(obj)->as.scope.flags & SCOPE_CLONE))


I’m not sure if this is the best solution. I’m not even sure what the scope cloning conditional is needed for. I can’t think of a reason why you would need to make some modification to the scope of a block passed to define_method called with 1 or more arguments, which you wouldn’t need to make if it were called with 0 arguments. There probably is some reason though, and it’d be nice if someone could point out what it is. Also I’m not sure if it might be better to just copy all the scope flags instead of just the SCOPE_MALLOC one.

Also, I’m definitely not an expert on the internals of MRI so it’s possible this solution actually causes some other bugs.

Here’s the patch

1.8.7 and 1.9

This bug also occurs in 1.8.7 but not in 1.9.

bug filed

February 21, 2009

Ridiculous ruby meta programming hack

Filed under: bug, patch, ruby, security — Tags: , , , — coderrr @ 2:55 pm

Ruby 1.8.6 has a bug where Dir.glob will glob on a tainted string in $SAFE level 3 or 4 without raising a SecurityError as would be expected. You can see this from the following code:

lambda { $SAFE = 4; Dir.glob('/**') }.call # raises SecurityError
lambda { $SAFE = 4; Dir.glob(['/**']) }.call # => [ ... files ... ]

So I set out to fix this with pure ruby… and it ended up requiring some really crazy stuff. I’ll first show what I ended up with, then go through and explain why:

class Dir
  class << self
    safe_level_password ='/dev/urandom','r'){|f| }
    m = Dir.method(:glob)
    define_method :glob do |password, safe, *args|
      raise SecurityError if safe_level_password != password
      $SAFE = safe
      # pass along glob opts
      opts = args.last.is_a?(Fixnum) ? args.pop : [] do |arg|, *opts)
    eval %{
      alias safe_glob glob
      def glob(*a)
        safe_glob(#{safe_level_password.inspect}, $SAFE, *a)
# freeze Dir so that no one can redefine safe_glob* to catch password

So first things first. The simple way to fix this bug is to alias glob, iterate over the array passed to the new glob and then call the original glob with one argument at a time, since we know it correctly checks taint with only one argument.

But wait, if we alias the original method then someone could still access the original and pass it an array. So we have to use a “secure” version of alias method chaining. Essentially, we capture the method object of the original method in the local scope, then we use define_method to overwrite the original method name with our new implementation and call the original method object which we have ‘closed’ over. This allows us access to the original method while preventing anyone else from doing so.

But there’s another problem. $SAFE levels are captured in blocks just as local variables. This means our define_method version of glob will always run at the $SAFE level it was defined in, namely $SAFE level 0. Meaning if you call Dir.glob from a $SAFE level 4 it will still get executed at level 0. This is of course the exact opposite of what we want. We are in a worse position now than before. Now we could call Dir.glob with a single tainted parameter and it would work.

How do we fix this? We need to use def to define the method so that the current $SAFE level 0 isn’t captured. Instead $SAFE will reflect the $SAFE level of the caller. But if we use def we can’t use the secure alias method technique.

One option is to have the define_method version set the $SAFE level explicitly before calling glob. But then we run into the issue of how to know what to set it to? There is no way of determining the $SAFE level of the caller without explicitly passing it in.

Ok, so what if we def a method which then calls the define_method method and passes its $SAFE level as an argument? Well then the problem is how do you give the def method access to the define_method method without giving other evil code access to the define_method method as well. Because then that evil code could just lie and pass a level of 0.

This is the crazy part. The way to prevent the evil method from executing the define_method method is to use a password!

Both the def and define_method methods can share a secret which is only available in the local scope where they are defined. Since the password is only a string we can use eval to create the def method and insert the password into it. As long as the define_method method verifies the secret is correct before continuing we can be sure the only method able to call it is the def method.

I never thought I’d be sharing a secret between methods. I know this is a big house of cards. Can anyone figure out how to make it tumble? Or a better way to fix the glob bug (without C).

And yes, you must also do the same for Dir.[] but I left it out for sake of brevity.

BTW, here’s the patch if you actually care enough to recompile:

--- dir.c.orig	2009-02-21 21:49:09.000000000
+++ dir.c	2009-02-21 21:49:38.000000000
@@ -1659,7 +1659,7 @@
     for (i = 0; i < argc; ++i) {
 	int status;
 	VALUE str = argv[i];
-	StringValue(str);
+	SafeStringValue(str);
 	status = push_glob(ary, RSTRING(str)->ptr, flags);
 	if (status) GLOB_JUMP_TAG(status);

January 16, 2009

Monkey patching ActiveRecord to automatically release connections

Filed under: concurrency, patch, rails, ruby — Tags: , , , — coderrr @ 12:25 am

First let me say this is only really useful if you are using ActiveRecord in a non-Rails, multi-threaded context. As in standalone Ruby app or maybe even in a different web framework.

The point of this patch is to allow you to take advantage of ActiveRecord’s connection pool without having to ever deal with it explicitly. The biggest thing you have to worry about with the connection pool is releasing connections when you’re done with them, so that is what this patch does for you. This allows you to have large numbers of threads, even long running ones, using ActiveRecord without requiring an equally sized connection pool.

For example, let’s say you’ve setup your connection pool to allow 3 connections. Try to run this code:

ts = []
6.times do
  ts << do
ts.each &:join

It’s going to take approximately 5 seconds to finish. Why? Because the first 3 threads take up all the connections from the pool. When the 4th thread tries to checkout a connection it waits up to wait_timeout (defaults to 5) seconds for a connection to become available. If it can’t get any within that amount of time it will checkin connections for dead threads and then try to checkout again. If you tried to do this with 7 threads you would get a “cannot obtain database connection error” because even after 5 seconds there wasn’t an available connection for the last thread.

As I showed in my previous posts you could solve this relatively easily by calling a cleanup method periodically or at the end of every thread. But how about this example:

6.times do do
    sleep 100 # or do something non ActiveRecord related for 100 seconds

This one cannot be solved as easily. None of the threads die quickly so there are no connections to reclaim. This is where my patch comes in. Even though our threads are alive for 100 seconds we are really only using a connection for a small % of their lifespan. If your application fits this usage pattern then this patch is for you.

How it works
First we have the cleanup_connection method. Wrap any code with this method and it will ensure that your connection is checked back into the pool when the block completes:

        def cleanup_connection
          return yield if Thread.current[:__AR__cleanup_connection]

            Thread.current[:__AR__cleanup_connection] = true
            Thread.current[:__AR__cleanup_connection] = false

The thread locals are used to make sure the connection won’t be released on nested calls. I will demonstrate the need for this protection in a second. But for now, let’s fix our previous example:

10.times do do
    User.cleanup_connection do
    sleep 100

Each thread will release its connection before the sleep and all the threads should complete their find statements very quickly. But who wants to call cleanup_connections all over their code. Luckily, you don’t have to. My patch wraps all the necessary ActiveRecord internals with it instead of you having to wrap your code.

The following example illustrates the need for nesting protection. Keep in mind my patch has wrapped both transaction and create with cleanup_connection.

User.transaction do
  User.create(:name => '1')
  User.create(:name => '2')

Without nesting protection the first create call would have released our connection after it completed. Which means another thread could have checked it out in between or we could have gotten a different connection for the second call. Either way, our transaction would have been messed up. In short, nesting protection allows you to wrap the smallest amount of code necessary rather than requiring you wrap your code at a very high level. This allows you to keep the connections checked out for as short a time as possible.

There is of course a performance hit for all the added method calls and the cost of constantly checking out/in connections to the pool, but it’s not much. On a query which takes 0.15 seconds to run here are the numbers:

1000x queries
without patch:
with patch:
with patch where all 1000 calls are nested under a single cleanup_connection to prevent checkin/checkout for every call:

So the penalty is about 3%. The actual overhead penalty is about 30% (query of 0.001 seconds) so the faster your queries the closer you move toward this.

How to use it
Get it here. It’s a monkey patch so require it anywhere after ActiveRecord has been loaded.

I’ve done my best to try to find all the ActiveRecord methods that need to be wrapped but it’s possible that I have missed some. Because of this I monkey patch the connection method to raise an alert whenever it is called from outside of a cleanup_connection block. If you see one of these, you can look through the stack trace, determine which method needs to be wrapped and add it to the methods_to_wrap hash. After you are confident all necessary methods are patched you can remove the connection monkeypatch to speed things up a bit.

Rails core
I’ve submitted a patch to Rails for just the cleanup_connection portion. I think it would be a nice addition to the already existing but not very useful with_connection method. Feel free to comment or +/-1 the patch.

January 11, 2009

Ruby and mysqlplus select() deadlock

Filed under: bug, c, concurrency, patch, ruby — Tags: , , , , — coderrr @ 7:07 pm

After setting up mysqlplus in my Rails project I ran into an interpreter wide deadlock in certain situations. I isolated it and tracked it down to two things…

… Before I continue the simple solution to this is to use the C implementation of async_query instead of the Ruby version.

class Mysql
  alias_method :query, :c_async_query
# instead of 
  alias_method :query, :async_query

If you want more details read on…

1) Mysqlplus’ default async_query which is implemented in ruby:

  def async_query(sql, timeout = nil)
    select [ (@sockets ||= {})[socket] ||= ], nil, nil, nil

The send_query, get_result, and socket methods are C functions.
2) ActiveRecord::Base.clear_reloadable_connections! which is called after every Rails request in development mode:

# actually implemented in connection_pool.rb
      def clear_reloadable_connections!
        @reserved_connections.each do |name, conn|
          checkin conn
        @reserved_connections = {}
        @connections.each do |conn|
          conn.disconnect! if conn.requires_reloading?
        @connections = []

The actual code we care about here is conn.disconnect! which will call mysqlplus’ disconnect method which is implemented as a C function. If we have Rails skip the call to disconnect!, no deadlock. Or if we use the C implementation of async_query, which is named c_async_query, no deadlock.

The issue has something to do with calling Ruby’s IO#select on a file descriptor which you are manipulating with native functions. While looking into it I found a separate but related issue. The bug and fix I show below does not actually resolve the deadlock I was running into, but solves a similar one. The mysqlplus deadlock actually does not occur during the async_query‘s IO#select but at some later point which I couldn’t exactly determine.

The condition can be reproduced by calling the native C function close() on a file descriptor which Ruby is currently IO#selecting.

require 'socket'
require 'rubygems'
require 'inline'

module C
  class << self
    inline do |builder|
      builder.c %q{
        static VALUE native_close(int s) {
          return Qnil;
end { loop { sleep 1; p 1 } } do
  loop do
    io ='', 80)
    fd = io.to_i { sleep 0.5; C.native_close(fd) }
# { sleep 0.5; io.close }
    p :selecting!
    rdy = select [io]
    p :selected!

sleep 99999

This example will produce a deadlock on select after the second thread calls native_close. If you swap the close lines so that you are closing the socket with Ruby’s close method instead of the native one you won’t get a deadlock. After lots of debugging I narrowed it to down to what seems to be a bug in Ruby’s rb_thread_schedule function:

        n = select(max+1, &readfds, &writefds, &exceptfds, delay_ptr);
        if (n < 0) {
            // select is returning -1 indicating an error
            int e = errno;

The deadlock is actually Ruby calling rb_thread_schedule over and over for the thread which is selecting instead of deferring to let other threads run. I stuffed in a call to perror() and saw that the error is caused by a bad file descriptor. But for some reason Ruby doesn’t handle that error correctly by removing that fd from the fd_set. So I fixed it by going through to determine if there are any bad file descriptors in the set and if so remove them:

Update: Simplified remove_bad_fds function thanks to costan recommending fcntl() over select()

        n = select(max+1, &readfds, &writefds, &exceptfds, delay_ptr);
        if (n < 0) {
            int e = errno;
// ...
            if (e == EBADF)
              remove_bad_fds(&th->readfds, &th->writefds, &th->exceptfds, max);

// ...

#include <fcntl.h>
static void
remove_bad_fds(fd_set *r, fd_set *w, fd_set *e, int max) {
  int fd;

  for (fd = 0; fd <= max; fd++)
    if (FD_ISSET(fd, r) || FD_ISSET(fd, w) || FD_ISSET(fd, e))
      if (fcntl(fd, F_GETFD) < 0 && errno == EBADF) {
        FD_CLR(fd, r);
        FD_CLR(fd, w);
        FD_CLR(fd, e);

The remove_bad_fds calls fcntl for each fd from the sets to determine if it is bad. If so, it is removed.

bug filed

January 7, 2009

IO#readpartial now works in JRuby

Filed under: bug, java, patch, ruby — Tags: , , , — coderrr @ 9:33 pm

I was doing some concurrency benchmarks on various Rubys and I ran into a pretty nasty bug with IO#readpartial in JRuby. On EOF it would just return an empty string forever, instead of raising an EOFError as it should. I’m pretty surprised no one ran into this yet.

Since I thought this was pretty important I submitted a patch, they took notice, did some refactoring of their IO methods, and now it passes all of the readpartial rubyspecs. So you can expect a fully functioning IO#readpartial in JRuby 1.1.7. Good job JRuby guys!

December 23, 2008

Singleton recursion bug

Filed under: bug, patch, ruby — Tags: , , — coderrr @ 8:56 pm

What would you expect the following code to do?

require 'singleton'
class X
  include Singleton
  def initialize

Stack overflow right? Wrong! In Ruby 1.8 it’ll hang forever because of the way thread-safety is handled with a while loop.

    def _instantiate?()
      while false.equal?(@__instance__)
        Thread.critical = false
        sleep(0.08)   # timeout
        Thread.critical = true

Basically the first call to X.instance sets @__instance__ to false while it calls the initialize method to make sure other threads don’t call it at the same time. When initialize calls X.instance again it enters that while loop and sits there forever. This can make debugging a little tricky in a complex app where you might accidentally put a recursive call to instance 5 methods down the line and wonder why the hell your app isn’t doing anything. So watch out for this if you use Singleton.

If you wanna be really safe you can use my patched version which checks for recursive calls and raises an error. Or you can just throw this monkey patch somewhere in your app which prints a warning after it’s hanged for 5 seconds:

require 'singleton'
require 'timeout'
class << Singleton
  module SingletonClassMethods
    def _instantiate?
      start_time =
      while false.equal?(@__instance__)
        Thread.critical = false
        sleep(0.08) # timeout
        Thread.critical = true
        if - start_time > 5
          @__once__ ||= ! $stderr.puts("Possible recursive call to instance in initialize", caller)

In Ruby 1.9 Singleton is rewritten using a Mutex so you’ll get a nice deadlock message instead of the hang. In JRuby they use a Mutex too, but you won’t get any message about the deadlock. So effectively it will act the same as 1.8. My monkey patch won’t work for JRuby but the full patched version which I linked to on github will.

December 4, 2008

Contributing to open source can be hard and frustrating

Filed under: bug, c, patch, ruby — Tags: , , , — coderrr @ 7:47 pm

Has anyone ever told you that you should try contributing to some open source projects? Maybe they said it as advice for improving your resume or maybe they offered it as a solution to boredom. Either way, contributing to open source isn’t always as easy as it sounds.

Disclaimer: I’m in no way bitching or complaining about the following projects/people. These are just some past experiences I wanted to share.

Here’s a list of a few open source projects I’ve written (useful) patches for which haven’t been merged into the original project.

First up: Hpricot
As documented by this post, over a year ago I patched the Hpricot C and Java extensions to support arbitrarily large HTML elements. Instead of throwing an error after a certain buffer size had been hit, it would increase the memory allocation size. I did so because I was writing software for someone which did some website crawling and could potentially (and actually did) run into very large elements on the web (which crashed hpricot or threw an error). So it had a real world use case, judging by the responses I’ve gotten since I wrote it quite a few other people have had uses for it, and it just made sense. (Oh, it also didn’t affect performance)

But to this day, over a year later, I haven’t been able to get _why to merge it into hpricot core. I created a patch, made sure all the tests passed, sent an email to the mailing list, got a ticket filed on their trac. I’ve even rewritten the patch once to make it work with his latest code on github. I even forked his repo, did a pull request, and messaged him. He even responded saying he’d get to it soon. Over a month later, still nothing. Now I understand the reality is that he’s just really busy, but my reality is that it really sucks. Also, PLEASE, don’t take this as bitching, I’m just telling a story, I have total love for _why.
Update: _why just merged my patch, yes!! :P

Number two:
I was using the seattle.rb project ImageScience to thumbnail images for one of my client’s apps. It crashed on BMPs, and since BMPs were one of my client’s requirements, I made a (tiny) patch to their C extension to get it working. Again, all the tests passed, I created a ticket, harassed some people on IRC, etc. Over a year later, nothing. I think someone even closed the bug without giving reason. Again, judging from the number of downloads of the patched gem, people found real use in it (I’m trying to save myself from everyone thinking I write a bunch of useless patches).

Next up, Rails:
This was when Rails was just beginning their mission of “let’s be thread safe”. I really enjoy playing in the spaghetti of threaded programming so I decided to see if I could help and find some issues to patch. I found a race condition in their processing of partials and wrote a patch to fix it, along with a brute force test to produce the race condition. I posted it to the mailing list and a discussion ensued. The gist of it was that “we don’t like mutexes”, and my patch was never accepted, and never got any +1s. Well guess what they ended up doing eventually before rails was considered thread-safe… they added a mutex -_-. Needless to say, this turned me off to contributing to Rails. (In the latest version they actually did make some changes which render the mutex unnecessary but that is irrelevant for this story). Again please note, I’m not trying to bitch, I have nothing but love for the Rails guys, this is just my story.

For one of my contracts I reworked the FireWatir (automated UI testing) code to be thread-safe so that you could control multiple browsers at the same time in different threads. Something which is extremely helpful when you are trying to simulate multiple users simultaneously. Loads of people have used my patch to aid in their FireWatir testing. It passed the entire test suite. But even after multiple emails to the maintainer of the project’s repository (who actually used my patch in his projects) for some reason it never got merged.

So what’s the conclusion from all these stories? Even after doing all the “right things” like making sure all the tests pass, ensuring performance isn’t affected, contacting the appropriate people through the appropriate channels I’ve had many patches go unloved. It’s kind of disheartening but I think what I’m learning is that you just have to be relentlessly persistent. Most people are just really busy and require a lot of external motivation if you want them to do things (which in the grand scheme of things are quite unimportant) for you. Of course at the same time you don’t wanna bug the shit out of someone by emailing them all the time, so…

I wonder if anyone has similar experiences or any advice on this sort of thing…

Older Posts »

The Silver is the New Black Theme. Blog at


Get every new post delivered to your Inbox.

Join 31 other followers