coderrr

April 10, 2008

Let’s stop polluting the Thread.current hash

Filed under: ruby — Tags: — coderrr @ 10:55 pm

When writing libraries we often need “global”ish variables. In Ruby what most people use are class variables along with accessor methods to modify them. For example MyLibrary.my_variable. This is basically just namespacing a global:

class SomeLibrary
  def self.some_setting=(val)
    @@some_setting = val
  end

  def self.some_setting
    @@some_setting ||= :default
  end
end

When writing a thread-safe library sometimes we need variables which are specific to the thread that’s running. Quite often we just end up using Thread.current[:my_variable] and wrapping it with accessors:

class ThreadedLibrary
  def self.some_setting= val
    Thread.current[:some_setting] = val
  end

  def self.some_setting
    Thread.current[:some_setting] ||= :default
  end
end

The problem is that this is the equivalent of using $my_variable, just in a threaded context. Some other library might come along and decide to use Thread.current[:my_variable] too, and now you’re screwed.

I just noticed that the code for Time.zone in the latest version of Rails does this too, by using Thread.current[:time_zone]. I’m definitely not criticizing anyone though, because I did the exact same thing in my last post, lol. I’m just saying I think the use is somewhat prevalent.

Anyway, one solution is to use a class variable hash, which is keyed by the current thread. This is actually how ActiveRecord handles its per thread connections. Something like this:

class BetterThreadedLibrary
  @@some_setting = Hash.new {|h,k| h[k] = :default }
  def self.some_setting=(val)
    @@some_setting[Thread.current.object_id] = val
  end
  def self.some_setting
    @@some_setting[Thread.current.object_id]
  end
end

Since I think cattr_accessor from ActiveSupport is pretty great I thought it would be nice to have a threaded version of it. So here’s thread_local_accessor:

Update: Dan Kubb pointed out the original solution had a memory leak. As threads die and get GCed their key/values in the hash are never cleaned up. I have modified the solution to use a finalizer to take care of this.

class Class
  def thread_local_accessor name, options = {}
    m = Module.new
    m.module_eval do
      class_variable_set :"@@#{name}", Hash.new {|h,k| h[k] = options[:default] }
    end
    m.module_eval %{
      FINALIZER = lambda {|id| @@#{name}.delete id }

      def #{name}
        @@#{name}[Thread.current.object_id]
      end

      def #{name}=(val)
        ObjectSpace.define_finalizer Thread.current, FINALIZER  unless @@#{name}.has_key? Thread.current.object_id
        @@#{name}[Thread.current.object_id] = val
      end
    }

    class_eval do
      include m
      extend m
    end
  end
end

Here’s how it works:

require 'test/unit'

class ThreadedLib
  thread_local_accessor :some_setting, :default => :default
end

class TestThreadedClassAttrAccessor < Test::Unit::TestCase
  def test_that_it_works!
    instance = ThreadedLib.new

    ThreadedLib.some_setting = 5

    assert_equal 5, ThreadedLib.some_setting
    assert_equal 5, instance.some_setting

    Thread.new {
      instance.some_setting = 10

      assert_equal 10, ThreadedLib.some_setting
      assert_equal 10, instance.some_setting
    }.join

    Thread.new { assert_equal :default, ThreadedLib.some_setting }.join

    assert_equal 5, ThreadedLib.some_setting
  end
end

All suggestions and criticism welcome!

11 Comments »

  1. I was asking about your technique on the #datamapper channel in irc, and someone pointed out that when the Thread exits the Hash won’t be garbage collected. For any long running processes this could be a problem. No one was 100% sure, but we all assume the Thread.current values would be GC’d at exit, although we could be wrong.

    What do you think about this, and do you think there is a work-around?

    Comment by Dan Kubb — April 11, 2008 @ 4:41 am

  2. I should probably clarify my comment a bit. I meant that when the Thread exits, the the Hash entry that was keyed to the current Thread’s object_id will remain in memory as long as the Class is defined. For long running processes, this will usually be the case until its restarted. This technique will likely cause a memory leak over time.

    Comment by Dan Kubb — April 11, 2008 @ 5:10 am

  3. Doh, you’re totally right. That is a leak!

    ActiveRecord actually gets around this by calling a method (verify_active_connections!) on every request which goes through the hash and removes keys of threads that no longer exist.

    I’ll update the entry with a simpler solution which uses finalizers.

    Comment by coderrr — April 11, 2008 @ 6:44 am

  4. You still have a memory leak here. Finalizers in ruby are not garaunteed to run and cannot be counted on. also the way you use lambda in the finalizer is tricky, lambda captures the surrounding binding and will keep your finalizer from being able to run since it becomes a circular reference.

    Multi threading is hard, let’s go shopping ;)

    Comment by Ezra — April 13, 2008 @ 2:01 am

  5. Hey Ezra,

    Thanks for the comment.

    I’m not sure what you’re referring to when you say the finalizer won’t be run because of a circular reference. The finalizer does in fact run. You can test it by shoving p :blah into the lambda and then creating enough objects to invoke GC.

    Also about finalizers not being gauranteed to run. I’m interested in seeing some info about this. I googled and couldn’t find anyone pointing out cases for which finalizers would never be run. I know if you were to terminate a process it’s likely some finalizers wouldn’t be run but in our case that’s not an issue as we are only using them to free memory, versus cleaning up a resource.

    Are you saying that ruby randomly decides sometimes it’s never going to call a finalizer for some GCed object? Or just that we won’t know _when_ a finalizer will be called?

    Also, I do realize a better way of doing this is to just use a single lambda as the finalizer for all the threads instead of creating a new one every time. But I don’t believe the original way has a memory leak as I have confirmed the finalizer lambdas are both called and GCed.

    This is of course unless there is some special edge case in which Ruby will never call a finalizer.

    Comment by coderrr — April 14, 2008 @ 10:36 am

  6. Hello I’m new here
    And it looks like a interesting forum, so just wanted to say hello! :):):)
    And looking forward to participating.
    Going on vacation for a few days, so i’ll be back

    Comment by KeyncAntace — December 17, 2008 @ 9:05 pm

  7. I have experience with threading in general but none with Ruby. My question: is “@@#{name}[Thread.current.object_id] = val” an atomic operation? I would imagine, judging by other hashing libraries, that it is non-atomic, so that if two threads called “#{name}=” at the same time and the first was undergoing a resize, the result would be Undefined Behavior. (Nerdy drinking game: a shot for every “undefined behavior” in the pthreads docs.)

    Thread-local storage is also theoretically very different from a simple hash. (As a practical example, on some systems thread-local values can be stored next to the CPU which uses them while a global hash must be synchronized across CPUs.) Also, more idealistically, thread-specific data should be completely invisible to other threads, a trait your proposed solution circumvents.

    I’m more tempted to just store data in Thread.current using my class’s name as a key. This strategy is officially supported by Ruby, it works, it avoids locking problems, it’s (potentially) faster, and it’s theoretically correct.

    Comment by Adam Hooper — April 16, 2009 @ 8:09 pm

  8. Hey Adam

    Thanks for the comment. The answer to your question depends on the ruby interpreter we’re talking about.

    In MRI (1.8) no C function will be interrupted by a thread switch unless it specifically makes a call to the ruby thread scheduling code. And since most of ruby’s core data structs are implemented in C they are thread-safe-ish. Meaning you can add unique keys to a hash w/o synchronizing it and they’ll all make it in safely.

    I’m not 100% sure on how the threading model of 1.9 works but I do believe the same holds for at least the core data structs. I have tested on a Hash in 1.9 and it appears to be safe.

    JRuby on the other hand runs things truly concurrently and parallely if possible and Hash won’t be thread safe in it. So my solution would need to use some synchronized version of Hash.

    In terms of your final point, I agree prefixing thread local keys is the most pragmatic solution. Although the main point of the article was just to point out that other than the implementation details of them, they are not much different than globals. And if ActiveRecord or any other major lib went around setting $time_zone or even $AR_time_zone globals I’m pretty sure tons of people would have a fit.

    Comment by coderrr — April 19, 2009 @ 4:47 pm

  9. [...] Deuxième piste : une variable de classe mais attention il faut être sûr qu’elle soit valable uniquement pour le thread, voulant utiliser le hash Thread.current, j’ai quand même pris conseil chez coderr pour éviter d’abuser du Thread.current [...]

    Pingback by La magie de ruby … | XykoX — May 26, 2011 @ 8:22 pm

  10. [...] works but setting a global on the current thread felt like a big hack. (Here's a good article on Thread.current ) Lucky for me Jose was willing to spend a little time working with me and the resulting code works [...]

    Pingback by Customizing Views for a Multi-Tenant Application Using Ruby on Rails Custom Resolvers | Nobody Listens Anyway — September 27, 2011 @ 3:32 pm

  11. With havin so much content do you ever run into any issues of
    plagorism or copyright infringement? My site has a lot of unique content I’ve either written myself or outsourced but it seems a lot of it is popping it up all over the web without my permission. Do you know any methods to help prevent content from being stolen? I’d really appreciate it.

    Comment by Vilma — September 18, 2012 @ 5:54 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Silver is the New Black Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 28 other followers

%d bloggers like this: