Update: Patch accepted, bug fixed!
While writing some tests for my tunnel_splitter project I ran into this really weird bug. It’s hard to believe no one has run into this before. I’ll give you the isolated version (which took considerable time to narrow down).
a = 1
Object.send :define_method, :x do
lambda do # lambda is necessary to reproduce bug
p a # => 1
a = 2
p a # => 2
end.call
end
x(nil) # passing at least one arg is necessary to reproduce bug
p a # => 1
This is wrong. Instead of 1, 2, 1, the output should be 1, 2, 2.
Here’s another reproduction, closer to the actual way I encountered the bug. This one is sensitive to the timing of when threads are scheduled. By commenting out the p :blah line you can actually prevent the bug from happening. That makes this bug way more evil and caused me considerable pain.
def callblock; yield; end
a = 1
Object.send :define_method, :x do
p :blah # comment this line out and things will work as expected
begin
callblock do
raise 'blah'
end
rescue
p a
a = 2
p a
end
end
Thread.new{ x(nil) }.join
p a # => 1
Offender #1
eval.c:8626 proc_invoke()
if (_block.frame.argc && DMETHOD_P()) {
NEWOBJ(scope, struct SCOPE);
OBJSETUP(scope, tmp, T_SCOPE);
scope->local_tbl = _block.scope->local_tbl;
scope->local_vars = _block.scope->local_vars;
scope->flags |= SCOPE_CLONE;
_block.scope = scope;
}
If the block being invoked was a block passed to define_method (DMETHOD_P()) and it is being invoked with some number of args other than 0 (_block.frame.argc) then Ruby will “clone” the scope. Effectively, this doesn’t really do much. It creates a new scope struct, but then it points the local vars tables to the exact same place as the original, meaning if you modify a local in this scope, it will be reflected in the original scope. The issue is that it loses the flags of the original scope. We will see why this is important in a second.
Offender #2
eval.c:8214
static void
scope_dup(scope)
struct SCOPE *scope;
{
ID *tbl;
VALUE *vars;
scope->flags |= SCOPE_DONT_RECYCLE;
if (scope->flags & SCOPE_MALLOC) return;
if (scope->local_tbl) {
tbl = scope->local_tbl;
vars = ALLOC_N(VALUE, tbl[0]+1);
*vars++ = scope->local_vars[-1];
MEMCPY(vars, scope->local_vars, VALUE, tbl[0]);
scope->local_vars = vars;
scope->flags |= SCOPE_MALLOC;
}
}
scope_dup will copy the local_vars (local variables’ values) table into a new memory address. Whenever this is done, changes to the variables in this scope will NOT affect the original scope’s variables (unless they are explicitly copied/pointed back). Note that scope_dup will return right away and do nothing if the scope has the SCOPE_MALLOC flag set, which scope_dup sets after duping a scope, effectively making it perform only one dup per scope, even if called multiple times.
Offender #1 + Offender #2 = fail
Each of these pieces of code by themselves will not cause the bug to occur. But put them together, specifically the scope cloning conditional and then the scope_dup function, and voila, your weekend has become an MRI debug session.
The issue is that the proc_invoke() scope cloning conditional will not copy the parent scope’s SCOPE_MALLOC flag. Which means if you call scope_dup on a cloned scope you will end up duping the local variables even if they had already been duped once before. Essentially you will end up duping a scope’s local variables twice, which means the parent scope’s variables will not be affected at all by changes to the current scope’s local variables.
Solution
If we just copy the SCOPE_MALLOC flag when cloning a scope all our problems are solved:
...
scope->local_vars = _block.scope->local_vars;
scope->flags |= SCOPE_CLONE;
scope->flags |= (_block.scope->flags & SCOPE_MALLOC); // add this line
_block.scope = scope;
}
This way if a duped scope is cloned then duped, the local var tables will still be pointing to the same place.
By itself this actually causes ruby to segfault due to multiple free()s on the same memory.
gc.c:1271
if (!(RANY(obj)->as.scope.flags & SCOPE_CLONE) && vars[0] == 0)
RUBY_CRITICAL(free(RANY(obj)->as.scope.local_tbl));
if (RANY(obj)->as.scope.flags & SCOPE_MALLOC))
RUBY_CRITICAL(free(vars)); // double free segfault here!
We can modify the second conditional to make sure the scope doesn’t have the SCOPE_CLONE flag before free()ing.
if (!(RANY(obj)->as.scope.flags & SCOPE_CLONE) && vars[0] == 0)
RUBY_CRITICAL(free(RANY(obj)->as.scope.local_tbl));
if (RANY(obj)->as.scope.flags & SCOPE_MALLOC)) && !(RANY(obj)->as.scope.flags & SCOPE_CLONE))
RUBY_CRITICAL(free(vars));
Conclusion
I’m not sure if this is the best solution. I’m not even sure what the scope cloning conditional is needed for. I can’t think of a reason why you would need to make some modification to the scope of a block passed to define_method called with 1 or more arguments, which you wouldn’t need to make if it were called with 0 arguments. There probably is some reason though, and it’d be nice if someone could point out what it is. Also I’m not sure if it might be better to just copy all the scope flags instead of just the SCOPE_MALLOC one.
Also, I’m definitely not an expert on the internals of MRI so it’s possible this solution actually causes some other bugs.
Here’s the patch
1.8.7 and 1.9
This bug also occurs in 1.8.7 but not in 1.9.
