When writing libraries we often need “global”ish variables. In Ruby what most people use are class variables along with accessor methods to modify them. For example MyLibrary.my_variable. This is basically just namespacing a global:
class SomeLibrary
def self.some_setting=(val)
@@some_setting = val
end
def self.some_setting
@@some_setting ||= :default
end
end
When writing a thread-safe library sometimes we need variables which are specific to the thread that’s running. Quite often we just end up using Thread.current[:my_variable] and wrapping it with accessors:
class ThreadedLibrary
def self.some_setting= val
Thread.current[:some_setting] = val
end
def self.some_setting
Thread.current[:some_setting] ||= :default
end
end
The problem is that this is the equivalent of using $my_variable, just in a threaded context. Some other library might come along and decide to use Thread.current[:my_variable] too, and now you’re screwed.
I just noticed that the code for Time.zone in the latest version of Rails does this too, by using Thread.current[:time_zone]. I’m definitely not criticizing anyone though, because I did the exact same thing in my last post, lol. I’m just saying I think the use is somewhat prevalent.
Anyway, one solution is to use a class variable hash, which is keyed by the current thread. This is actually how ActiveRecord handles its per thread connections. Something like this:
class BetterThreadedLibrary
@@some_setting = Hash.new {|h,k| h[k] = :default }
def self.some_setting=(val)
@@some_setting[Thread.current.object_id] = val
end
def self.some_setting
@@some_setting[Thread.current.object_id]
end
end
Since I think cattr_accessor from ActiveSupport is pretty great I thought it would be nice to have a threaded version of it. So here’s thread_local_accessor:
Update: Dan Kubb pointed out the original solution had a memory leak. As threads die and get GCed their key/values in the hash are never cleaned up. I have modified the solution to use a finalizer to take care of this.
class Class
def thread_local_accessor name, options = {}
m = Module.new
m.module_eval do
class_variable_set :"@@#{name}", Hash.new {|h,k| h[k] = options[:default] }
end
m.module_eval %{
FINALIZER = lambda {|id| @@#{name}.delete id }
def #{name}
@@#{name}[Thread.current.object_id]
end
def #{name}=(val)
ObjectSpace.define_finalizer Thread.current, FINALIZER unless @@#{name}.has_key? Thread.current.object_id
@@#{name}[Thread.current.object_id] = val
end
}
class_eval do
include m
extend m
end
end
end
Here’s how it works:
require 'test/unit'
class ThreadedLib
thread_local_accessor :some_setting, :default => :default
end
class TestThreadedClassAttrAccessor < Test::Unit::TestCase
def test_that_it_works!
instance = ThreadedLib.new
ThreadedLib.some_setting = 5
assert_equal 5, ThreadedLib.some_setting
assert_equal 5, instance.some_setting
Thread.new {
instance.some_setting = 10
assert_equal 10, ThreadedLib.some_setting
assert_equal 10, instance.some_setting
}.join
Thread.new { assert_equal :default, ThreadedLib.some_setting }.join
assert_equal 5, ThreadedLib.some_setting
end
end
All suggestions and criticism welcome!
I was asking about your technique on the #datamapper channel in irc, and someone pointed out that when the Thread exits the Hash won’t be garbage collected. For any long running processes this could be a problem. No one was 100% sure, but we all assume the Thread.current values would be GC’d at exit, although we could be wrong.
What do you think about this, and do you think there is a work-around?
Comment by Dan Kubb — April 11, 2008 @ 4:41 am
I should probably clarify my comment a bit. I meant that when the Thread exits, the the Hash entry that was keyed to the current Thread’s object_id will remain in memory as long as the Class is defined. For long running processes, this will usually be the case until its restarted. This technique will likely cause a memory leak over time.
Comment by Dan Kubb — April 11, 2008 @ 5:10 am
Doh, you’re totally right. That is a leak!
ActiveRecord actually gets around this by calling a method (verify_active_connections!) on every request which goes through the hash and removes keys of threads that no longer exist.
I’ll update the entry with a simpler solution which uses finalizers.
Comment by coderrr — April 11, 2008 @ 6:44 am
You still have a memory leak here. Finalizers in ruby are not garaunteed to run and cannot be counted on. also the way you use lambda in the finalizer is tricky, lambda captures the surrounding binding and will keep your finalizer from being able to run since it becomes a circular reference.
Multi threading is hard, let’s go shopping ;)
Comment by Ezra — April 13, 2008 @ 2:01 am
Hey Ezra,
Thanks for the comment.
I’m not sure what you’re referring to when you say the finalizer won’t be run because of a circular reference. The finalizer does in fact run. You can test it by shoving
p :blahinto the lambda and then creating enough objects to invoke GC.Also about finalizers not being gauranteed to run. I’m interested in seeing some info about this. I googled and couldn’t find anyone pointing out cases for which finalizers would never be run. I know if you were to terminate a process it’s likely some finalizers wouldn’t be run but in our case that’s not an issue as we are only using them to free memory, versus cleaning up a resource.
Are you saying that ruby randomly decides sometimes it’s never going to call a finalizer for some GCed object? Or just that we won’t know _when_ a finalizer will be called?
Also, I do realize a better way of doing this is to just use a single lambda as the finalizer for all the threads instead of creating a new one every time. But I don’t believe the original way has a memory leak as I have confirmed the finalizer lambdas are both called and GCed.
This is of course unless there is some special edge case in which Ruby will never call a finalizer.
Comment by coderrr — April 14, 2008 @ 10:36 am
Hello I’m new here
And it looks like a interesting forum, so just wanted to say hello! :):):)
And looking forward to participating.
Going on vacation for a few days, so i’ll be back
Comment by KeyncAntace — December 17, 2008 @ 9:05 pm
I have experience with threading in general but none with Ruby. My question: is “@@#{name}[Thread.current.object_id] = val” an atomic operation? I would imagine, judging by other hashing libraries, that it is non-atomic, so that if two threads called “#{name}=” at the same time and the first was undergoing a resize, the result would be Undefined Behavior. (Nerdy drinking game: a shot for every “undefined behavior” in the pthreads docs.)
Thread-local storage is also theoretically very different from a simple hash. (As a practical example, on some systems thread-local values can be stored next to the CPU which uses them while a global hash must be synchronized across CPUs.) Also, more idealistically, thread-specific data should be completely invisible to other threads, a trait your proposed solution circumvents.
I’m more tempted to just store data in Thread.current using my class’s name as a key. This strategy is officially supported by Ruby, it works, it avoids locking problems, it’s (potentially) faster, and it’s theoretically correct.
Comment by Adam Hooper — April 16, 2009 @ 8:09 pm
Hey Adam
Thanks for the comment. The answer to your question depends on the ruby interpreter we’re talking about.
In MRI (1.8) no C function will be interrupted by a thread switch unless it specifically makes a call to the ruby thread scheduling code. And since most of ruby’s core data structs are implemented in C they are thread-safe-ish. Meaning you can add unique keys to a hash w/o synchronizing it and they’ll all make it in safely.
I’m not 100% sure on how the threading model of 1.9 works but I do believe the same holds for at least the core data structs. I have tested on a Hash in 1.9 and it appears to be safe.
JRuby on the other hand runs things truly concurrently and parallely if possible and Hash won’t be thread safe in it. So my solution would need to use some synchronized version of Hash.
In terms of your final point, I agree prefixing thread local keys is the most pragmatic solution. Although the main point of the article was just to point out that other than the implementation details of them, they are not much different than globals. And if ActiveRecord or any other major lib went around setting $time_zone or even $AR_time_zone globals I’m pretty sure tons of people would have a fit.
Comment by coderrr — April 19, 2009 @ 4:47 pm
[...] Deuxième piste : une variable de classe mais attention il faut être sûr qu’elle soit valable uniquement pour le thread, voulant utiliser le hash Thread.current, j’ai quand même pris conseil chez coderr pour éviter d’abuser du Thread.current [...]
Pingback by La magie de ruby … | XykoX — May 26, 2011 @ 8:22 pm
[...] works but setting a global on the current thread felt like a big hack. (Here's a good article on Thread.current ) Lucky for me Jose was willing to spend a little time working with me and the resulting code works [...]
Pingback by Customizing Views for a Multi-Tenant Application Using Ruby on Rails Custom Resolvers | Nobody Listens Anyway — September 27, 2011 @ 3:32 pm
With havin so much content do you ever run into any issues of
plagorism or copyright infringement? My site has a lot of unique content I’ve either written myself or outsourced but it seems a lot of it is popping it up all over the web without my permission. Do you know any methods to help prevent content from being stolen? I’d really appreciate it.
Comment by Vilma — September 18, 2012 @ 5:54 pm