coderrr

September 20, 2010

socks_switch: switching between direct connection and SOCKS proxying based on ping time to host

Filed under: network, ruby — Tags: , — coderrr @ 7:32 pm

This little hack named socks_switch allows you to direct connect to sites which are geographically close to you (and therefore have a low ping time) and proxy the rest through a SOCKS server. It’s built on top of this 7 line SOCKS 4 server hack which is built on top of ProxyMachine which is built on top of EventMachine.

The way it works is by pinging each host the first time you connect to it. To keep from delaying connections, the first connection to each host will always be proxied. After the ping time is determined then all subsequent connections will either be direct or proxied depending on the ping threshold you set. Ping times expire after 10 minutes (or whatever you change the ping interval to) and hosts will be re-pinged on the next connection. You can also specify a list of IPs which will always direct connect and therefore never be pinged or proxied.

This was created for an outside of the US environment where SOCKS proxying through a US server typically increases reliability and bandwidth of connections to US or adjacent hosts. But in cases where you are connecting to a geographically nearby server, proxying through the US is slower and less reliable than a direct connection. socks_switch aims to get the best of both worlds.

Here’s the code (or on github).

# socks_switch.rb
# to listen on 127.0.0.1:1080:
#   proxymachine -c socks_switch.rb -h 127.0.0.1 -p 1080 

require 'ipaddr'

# address to your main SOCKS server
SOCKS_SERVER = '1.2.3.4:1080'
# list of objects which will match IPs you never want to proxy
FORCE_DIRECT = ['127.0.0.1',
                IPAddr.new('192.168.0.0/16'),
                '074.125.000.000'..'074.125.255.255',
                /^10\./]
# anything pinging below this will connect directly
PING_THRESHOLD = 150
# re-ping hosts after this many seconds
PING_INTERVAL = 10*60

class Pinger
  PINGS = {}

  class PingConn < EM::Connection
    def initialize(addr)
      @addr = addr
      @start = Time.now
    end

    def connection_completed
      PINGS[@addr] = (Time.now - @start) * 1000
      p [:ping, @addr, PINGS[@addr].to_i]
      EM.add_timer(PING_INTERVAL) { PINGS.delete @addr }
      close_connection
    end
  end

  def fast?(addr, port)
    key = "#{addr}:#{port}"
    return PINGS[key] < PING_THRESHOLD  if PINGS[key]
    PINGS[key] = PING_THRESHOLD  # prevent double pinging
    EM.connect addr, port, PingConn, key
    false
  end
end

pinger = Pinger.new

proxy do |data|
  next  if data.size < 9
  v, c, port, o1, o2, o3, o4, user = data.unpack("CCnC4a*")
  return { :close => "\0\x5b\0\0\0\0\0\0" }  if v != 4 or c != 1
  next  if ! idx = user.index("\0")
  addr = [o1,o2,o3,o4]*'.'
  if FORCE_DIRECT.any? {|r| r === addr } or pinger.fast?(addr, port)
    p [:direct, addr]
    { :remote => "#{addr}:#{port}", :reply => "\0\x5a\0\0\0\0\0\0", :data => data[idx+9..-1] }
  else
    p [:proxied, addr]
    { :remote => SOCKS_SERVER }
  end
end

January 10, 2010

Tunnel Splitter: Accelerating a single TCP connection over multiple ISPs

Filed under: network, ruby, tunnel_splitter — Tags: , , — coderrr @ 12:15 pm
        LOCAL HOST                                          REMOTE HOST 

                                +--> (ISP1) ---+
                               /                \
(SOCKS client) -> (ts_client)  ---> (ISP2) ------ > (ts_server) -> (SOCKS server)
                               \                /
                                +--> (ISP3) ---+

About

At its essence Tunnel Splitter is a ruby and EventMachine based client/server pair which allows you to split a single TCP connection into multiple parts and send them over multiple TCP connections. The client does the splitting, the server does the reconstruction. At first glance this doesn’t seem to be anything special, but it actually allows you to do something very cool: use the aggregate the bandwidth of multiple ISPs for a single TCP connection.

There are already plenty of load balancing routers out there. Although they do load balance connections between multiple ISPs, any single connection will only be going out over a single ISP. Not to their discredit, it’s the best you can do with only a router. Tunnel splitter requires a little more. It requires that you have access to a server sitting on a connection twice as fast (up+down) as the sum of all your ISP connections. This is easily available nowadays. I currently run my tunnel splitter server on an $8/month VPS.

If it’s not already obvious, tunnel splitter can save you a lot of money if you need a fast connection. Getting a single 50Mbps connection to your house will cost a lot more than getting 6 15Mbps connections (or whatever speed your ISP provides). The increase in price as you need more and more bandwidth from a single ISP connection is almost never linear.

Tunnel splitter also provides transparent redundancy if ISPs go down. It uses a custom protocol on top of TCP to make sure all packets get delivered. If 10 packets have already been sent out through ISP1 before it is determined that it went down, they will be resent out through another link, so your tunneled TCP connection will not drop.

By itself tunnel splitter only tunnels one local port to one remote host:port. This isn’t that useful by itself. But when you put a SOCKS server at that remote host:port now you can tunnel any application over it which supports SOCKS proxying. Add a little routing magic (which I will discuss later) and you can proxy _anything_, even if it has no SOCKS support.

Usage

Now I’ll show how to download and run tunnel splitter. Either download or clone it from github. To run it you will need to install ruby and the eventmachine gem: gem install eventmachine.

Let’s assume you are connected to two ISPs, one on eth0 and one on eth1. And let’s say your server on a fast connection is running with the address tsplitterserver.mayhem.org. And let’s say you have a SOCKS server running on the server listening on localhost port 1080. On the server you would simply run:

ruby ts_server.rb 0.0.0.0:6000 localhost:1080

This tells it to listen for tunnel splitter client connections on port 6000 and forward the tunneled connections to the SOCKS server on port 1080.

Client side we would run this:

ruby ts_client.rb localhost:1080 tsplitterserver.mayhem.org:6000:eth0~10 tsplitterserver.mayhem.org:6000:eth1~10
# note that the localhost:1080 on the client and the server are unrelated and do not actually need to match

This tells the tunnel splitter client to create 10 tunnels to the server over the eth0 interface and 10 tunnels to the server over the eth1 interface, and then wait for incoming connections on port 1080. When the tunnel splitter client receives connections on port 1080 it will split them up and load balance them across all 20 connections. The reason we create 10 tunnels over each ISP here is to make sure we are maxing out each ISPs bandwidth. Some ISPs impose per connection caps so you can’t get your full bandwidth without using multiple connections. If you’re sure that your ISP doesn’t do this then you are fine with just a single connection and can leave off the ~10.

Now if you point any SOCKS capable app to localhost:1080 it will be using the aggregate bandwidth of both your ISPs, we’re done! Not quite. If you’re using Linux there is another issue we need to address first.

If you’re using OSX or another BSD (I’m assuming they’re the same, but I’ve only tested on OSX) then you can skip this next section.

Linux Routing

The way tunnel splitter makes connections over a specific ISP is to bind the source address to the address of that ISP. The problem is that Linux by default does not route packets based on source address. So with our current setup all the packets will go out over the same ISP and the other one will be totally unused. The simplest way to fix this is to install the shorewall firewall. You don’t need to use it as a firewall but it provides the least involved solution (that I know of) to get Linux to route based on source addresses. The instructions can be found in README.shorewall.

There _is_ a way to set up the needed routing without shorewall but I haven’t looked into it too much yet and it seems to be a little more complicated.

BSD Routing

It works out of the box (at least on OSX).

Applications which don’t support SOCKS proxying

For these applications you can use transocks_em. It is a server which receives all connections from specific apps (via some routing magic) and then forwards them to a SOCKS proxy of your choice. I’ll make a post about it soon, so stay tuned.

Other Uses

Tunnel splitter can also be useful in single ISP situations. As I mentioned above some ISPs impose per TCP connection bandwidth limits. If you are just downloading a file you can often overcome these with download acceleration software which makes multiple HTTP requests. This doesn’t work however for things like streaming video (youtube, hulu, etc). Tunnel splitter solves this problem because it allows you to have a single TCP connection use your ISPs maximum total bandwidth.

The fact that connections will never be dropped by tunnel splitter can be useful as well. You can sleep or hibernate your computer, or all your internet connections can go down for any amount of time, and when you restore things your remote connection will still be active. This works because of a few things. First, whatever program you are proxying is actually connected to localhost (to the tunnel splitter client), so your ISPs going down doesn’t affect that conneciton. Second, the remote connection from the tunnel splitter server to the final destination is of course not affected by your computer’s local status. Thirdly, it is only the tunnel splitter client and server which will detect that the tunnel connections have gone down. But as a side effect of its redundancy protocol, it will keep retrying forever to connect the tunnel connections. Once they do reconnect, any pending data on either side will be sent across as if nothing ever happened. As far as the client you are proxying and the remote server you are proxying to are concerned they have both had active TCP connections the entire time. Of course if you are proxying a protocol which does it own protocol level connection health checking or pinging (IRC) it will detect that there are no responses and most likely drop the connection. I have experimented a little with adding support for both the tunnel splitter client and server to be protocol aware and respond to these pings so that the even a protocol with connection health checks could stay up indefinitely. Not sure if this is something people will really care much about though.

Reliability

Tunnel splitter is still very beta although I have been using it locally for all my web traffic and movie streaming for close to a year now and it’s been quite a while since I’ve run into any bugs.

Feedback

All comments, criticisms, and suggestions welcome. Also feel free to contact me at coderrr.contact@gmail.com

Check out the README for more info.

December 22, 2009

Get arbitrarily precise BigDecimals in Ruby for just one extra character

Filed under: ruby — Tags: — coderrr @ 5:31 pm

Here’s a little hack to get arbitrarily precise decimal numbers in your Ruby code with relatively painless syntax.

require 'bigdecimal'

class Integer
  def method_missing(m, *a, &b)
    return BigDecimal("#{self}.#$1")  if m.to_s =~ /^f(\d+)$/
    super
  end
end

imprecise_pi = 3.14159265358979323846264338327950288419716939937510582 
precise_pi   = 3.f14159265358979323846264338327950288419716939937510582

puts('%.60f' % imprecise_pi, precise_pi.to_s('F'))

# => 3.141592653589793115997963468544185161590576171875000000000000
# => 3.14159265358979323846264338327950288419716939937510582

Notice the ‘f’ after the decimal point.

November 6, 2009

SOCKS 4 server with ProxyMachine

Filed under: network, ruby — Tags: , — coderrr @ 4:15 am

I proxy almost all my web traffic through a SOCKS proxy. After going through a bunch of different servers, each with its own issues, I decided to just write my own. I ended up using ProxyMachine and it only took seven lines!

# socks4.rb
proxy do |data|
  next  if data.size < 9
  v, c, port, o1, o2, o3, o4, user = data.unpack("CCnC4a*")
  return { :close => "\x0\x5b\x0\x0\x0\x0\x0\x0" }  if v != 4 or c != 1
  next  if ! idx = user.index("\x0")
  { :remote => "#{[o1,o2,o3,o4]*'.'}:#{port}", :reply => "\x0\x5a\x0\x0\x0\x0\x0\x0", :data => data[idx+9..-1] }
end

run it with:

$ proxymachine -h 127.0.0.1 -p 1080 -c socks4.rb

* note this only actually covers half of the SOCKS4 protocol, but it’s the only half that pretty much anyone uses

July 25, 2009

More Ordered Hashes for Ruby 1.8

Filed under: ruby — Tags: — coderrr @ 10:59 am

Here’s one more ParseTree based technique for ordered hashes, it’ll be the last one, I promise. This one doesn’t require any Ruby2Ruby string building/evaling craziness. The syntax is the same as the previous ParseTree example:

hash = H{{ :key1 => value1, 'key2' => value2 }}

But with this implementation you can only use literals as keys, meaning strings, symbols, and numbers.

First we call the block with yield to get the original unordered hash. Next we use ParseTree just to find the keys of the hash in order. Since we only support literals we can use the value straight out of the s-expression, no conversion is needed. And we don’t need to look at the keys’ values in the s-expression since we already have them in the unordered hash. Finally, we create a new ordered hash and set each key in order, grabbing the value from the original (unordered) hash. This allows us to simplify the use of the s-expressions from ParseTree and not have to deal with code generation at all (Ruby2Ruby). It also gives about a 4x speedup over the previous technique.

Here’s the code:

class K;end
def H(&b)
  K.send :define_method, :x, &b
  sexp = ParseTree.translate(K, :x)
  hash_sexp = sexp[2][2]  # skip method definition stuff

  raise "block doesn't contain hash!"  if hash_sexp.first != :hash
  hash_sexp.slice! 0

  unordered_hash = yield

  oh = OrderedHash.new
  until hash_sexp.empty?
    (type, k), _ = hash_sexp.slice! 0, 2
    raise "bad key type: #{type}"  if ! [:lit, :str].include?(type)

    oh[k] = unordered_hash[k]
  end

  oh
end

This and alternative implementations are on my github.

Ordered Hashes for Ruby 1.8 using ParseTree

Filed under: ruby — Tags: — coderrr @ 4:28 am

Update: Added another ParseTree based ordered hash technique here

Here’s another, less invasive (depending how you look at), way to get ordered hashes in ruby 1.8:

hash = H{{ :key => val, 'key' => val, 55 => val }}
method_needs_ordered_pairs {{ :key1 => :val1, :key2 => :val2 }}
method_needs_args_and_ordered_pairs arg1, arg2, H{{ :key1 => :val1, :key2 => :val2 }}

This technique uses ParseTree and Ruby2Ruby. The reason for the double curly braces ( {{}} ) is that we need a block for ParseTree to parse (ParseTree can’t just parse any argument unless we run it on the entire file). The first set of braces is the block, the second is the actual hash.

After ParseTree breaks down the hash into an s-expression we use a slightly modified Ruby2Ruby to rebuild the hash representation using an OrderedHash class instead of Hash. Once we have the new representation as a string we eval it in the binding of block so that any method calls or local variables inside the hash definition will work as expected.

hash = H{{ :key => val, 'key' => val, 55 => val }}
# turns into
hash = OrderedHash.new.merge!(:key => val).merge!('key' => val).merge!(55 => val)
# note: this could be optimized to something like this
hash = (__tmp1 = OrderedHash.new;__tmp1[:key]=val;__tmp1['key']=val;__tmp1[55]=val;__tmp1)

The benefits of this approach over the previous are that it doesn’t require modifying any core classes and it will work with any type of object as a key. Of course using ParseTree and Ruby2Ruby makes it much slower, about 300x.

Code is on github at ordered_hash_syntax repo: implementation and test with examples of usage.

July 20, 2009

Ordered Hashes for Ruby 1.8

Filed under: ruby — Tags: — coderrr @ 4:33 pm

For those of you who want a more declarative way to define ordered hashes in ruby 1.8 (perhaps for a DSL or something) here’s something I came up with:

class Object
  def >>(v)
    [self, v]
  end
end

def H(*pairs)
  pairs.inject(OrderedHash.new) {|oh, (k, v)| oh[k] = v; oh }
end

my_hash = H :some_key >> 'some value', 'nother key' >> another_value

Of course this won’t work for Fixnums as keys.

June 2, 2009

Fixing constant lookup in DSLs in Ruby 1.9.1

Filed under: ruby, ruby19 — Tags: , — coderrr @ 8:15 pm

Ruby 1.9 puts an extra hurdle in DSL development. Consider this simple DSL example which runs fine in 1.8:

class ProxyObject
  def initialize
    @o = Array.new
  end

  def method_missing(m, *a, &b)
    @o.send m, *a, &b
  end
end

def dsl(&b)
  ProxyObject.new.instance_eval &b
end

module A
  class B; end

  dsl do
    p Module.nesting # => [A]
    push B
    p pop # => A::B
  end
end

This DSL simply proxies to an Array object. In ruby 1.8 everything will work, in specific, resolving the constant B, which resolves to the class A::B. This constant resolution works because we are in the lexical scope of class A, as we can see by the module nesting. (For more info on module nesting see this post)

Now run the code in ruby 1.9:

module A
  class B; end

  dsl do
    p Module.nesting # => [#<Class:#<ProxyObject:0x97beab8>>]
    push B  # => uninitialized constant ProxyObject::B
    p pop
  end
end

Now things break. Our module nesting no longer contains A because instance_eval has changed the nesting to the metaclass of our ProxyObject instance, which doesn’t help us at all. But wait… my last blog post to the rescue! In ruby 1.9 we can dynamically add modules to our lexical scope. So all we need to do is find the original module nesting of the block, then add those modules to our nesting, and then eval the block as we normally would:

def dsl(&b)
  modules = b.binding.eval "Module.nesting"
  Kernel.with_module(*modules) do
    ProxyObject.new.instance_eval &b
  end
end

module A
  class B; end

  dsl do
    p Module.nesting  # => [#<Class:#<ProxyObject:0x8d6d190>>, #<Class:A>, A, Kernel]
    push B
    p pop  # => A::B
  end
end

Please let me know if you see any issues with this approach or have other ideas.

Credit goes to Lourens for mentioning this problem with 1.9 to me.

May 18, 2009

Dynamically adding a constant nesting in Ruby 1.9

Filed under: ruby, ruby19 — Tags: , — coderrr @ 11:37 pm

Wanting to see if it would be possible to somehow dynamically modify Module.nesting I hacked around looking for ways to do this for a long time in 1.8 to no avail. But it seems that it’s somewhat trivial in 1.9.

module Kernel
  def with_module(*consts, &blk)
    slf = blk.binding.eval('self')

    l = lambda { slf.instance_eval(&blk) }
    consts.reverse.inject(l) {|l, k| lambda { k.class_eval(&l) } }.call
  end
end

Allows you to do:

module X
  module Y
    module Z
    end
  end

  module Y2
    class Z2
    end
  end
end

x, @x = 5, 6
with_module(X) do
  p Y, Y2 # => X::Y, X::Y2
  with_module(Y, Y2) do
    p Z, Z2  # => X::Y::Z, X::Y2::Z2
    p x, @x  # => 5, 6
  end
end

Without losing the scope of your blocks.

Now should you ever do this? Probably not. It’s usually better to actually nest inside the modules you want to be in the lexical scope of. Or another technique, used sometimes testing situations, is to shorten a long constant name with an abbreviation constant like:

XYZ10 = X::Y::Z::Ten
...
XYZ10.new

Please let me know if you think of a use case which one might consider “valid” or useful for with_module.

One interesting aspect of figuring out how to make this method work was discovering some special properties of the *eval() methods.

The following will not work:

class X
  class Y
  end
end
l = lambda { p Y }
X.class_eval { l.call }  #  uninitialized constant Y (NameError)

But this will

l = lambda { p Y }
X.class_eval(&l)  # => X::Y

And so will this

l = lambda { p Y }
s = self
X.class_eval { s.instance_eval(&l) }  # => X::Y

When you eval a block directly rather than calling call on it, it will use the lexical scope of the calling scope rather than its own lexical scope at the time the block was created.

module G
  class N; end
  $l = lambda { p N }
end
# using the lexical scope of the lambda when it was created
$l.call  # => G::N
# using the current lexical scope + X
X.class_eval &$l  # uninitialized constant X::N 

April 29, 2009

C function thread safety in Ruby 1.8

Filed under: c, concurrency, ruby — Tags: , , — coderrr @ 4:47 pm

In ruby 1.8 (I’m speaking only of MRI here) you don’t have to worry about threads being context switched in the middle of (most) C functions. 1.8 switches threads in really only 1 way: a call to rb_thread_schedule(). This function is called from various other ruby internal functions and macros (CHECK_INTS) but as long you aren’t hitting any of those in the C function in question you won’t be switched.

You may know about the infamous SIGVTALRM that ruby sends to itself to schedule threads every 10ms. But all that signal handler does is set a global flag saying that it’s time to switch. CHECK_INTS uses the flag to determine whether or not to call rb_thread_schedule(). So that signal by itself does not actually switch threads.

signal(SIGVTALRM, catch_timer);
...
void catch_timer(int sig)
{
    if (!rb_thread_critical) {
        rb_thread_pending = 1;
    }
}
...
#define CHECK_INTS ...  if (rb_thread_pending) rb_thread_schedule(); ...

Now you DO have to worry about your C function being context switched if you are calling back to methods on ruby objects from your C function. rb_call() calls CHECK_INTS once every 256 times. Another way to get switched is dealing with ruby’s IO functions or rb_thread_select(). But if your C function is primarily self contained then you can rest safe that it will block the whole interpreter until it returns :).

This is one of if not the only reason that ruby’s primitive data structure operations (Array, Hash, String, etc) ARE thread-safe. Because they are implemented in C. If you were to reimplement all of Array, Hash, and Strings methods in ruby by directly translating the C code into ruby you’d probably end up with non thread safe methods.

Even though 1.9 uses native threads I believe it acts in a similar matter, but that’s for another post.

And if someone knows more than me and I misrepresented something, please let me know!

Older Posts »

The Silver is the New Black Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 28 other followers