coderrr

May 25, 2012

What to do when Google PR0s your business

Filed under: google — Tags: — coderrr @ 3:59 pm

Update
Matt Cutts seems to have confirmed that this is just an issue with PR reporting and not an actual issue with our PR. He suggests only using the official Google Toolbar (for Internet Explorer) to check Page Rank.

Today, we woke up to one of the most frightening scenarios imaginable for young entrepreneurs trying to bootstrap. Google had completely removed the PageRank rating from our site https://www.privateinternetaccess.com which is one of the most recommended VPN service providers. Yesterday, we were a PR6 site with incoming links from TorrentFreak, LifeHacker, Slashdot, Freenode.net, MIUI.us, Modaco.com, BlockExplorer.com, as well as many other high ranking sites. Today, we are a PR N/A site. The lowest possible rating.

After discovering this, we immediately began investigating the possible causes of this dreadful scenario. We checked multitudes of other websites’ PR ratings to make sure this wasn’t some obscure Google bug affecting the whole internet or some segment of it. We checked Google Webmaster Tools to see if there were any critical crawl errors on our site. We submitted a Reconsideration Request through Webmaster Tools indicating that we are a very legitimate site and could not identify any reason for the PR removal. We searched for an attack where someone might be spamming our links to intentionally hurt us. We found no evidence of this.

What changed recently?
– We had approximately 1 hour of downtime due to a Linode network outage in their New Jersey data center (NAC).
– We have had many new affiliates linking to us, some of which operate websites that are in the top 200 rankings of Alexa.
– We have had a high volume of incoming organic links from customers promoting our service to their respective forum communities and social circles.
– We recently had a post published on Falkvinge.net.
– And for the more conspiracy theory inclined, we decided not to go forward with a (Google Ventures backed) Trada marketing campaign. (Please note we’re not indicting Trada here, we’re just trying to be as thorough as possible.)

Researching what can cause a PR removal we found some of the reasons to be:
– Selling links
– Buying links
– Links from Bad Neighborhoods
– Web SPAM Links
We do not fall under these criteria. We only perform SEO in an ethical manner. We steer far clear of anything that would be deemed blackhat.

We’re really at a total loss as to what caused this and are quite afraid of how this will affect our business. While we currently still rank in VPN related searches, it seems that PR removal will slowly remove you from the search results over time. It can even result in a complete deletion from the index.

We aren’t looking to bash Google or their algorithm. We’re simply looking for some help in this issue and, hopefully, to provide valuable information to other entrepreneurs in case this happens to them.

If you have any information, or your name is Matt Cutts, please leave a comment or contact us at google@privateinternetaccess.com.

And a note to our existing customers – this will not in any way affect the quality of our service. We will continue to work hard on improving our service and providing you with the best possible user experience.

November 13, 2011

Simplified Summary of Microsoft Research’s Bitcoin Paper on Incentivizing Transaction Propagation

Filed under: bitcoin, slashdotted — Tags: , — coderrr @ 8:51 am

Shameless Plug: Hide your IP while connected to the Bitcoin P2P network with a VPN Service.

This is a very simplified summary of the Microsoft Research paper “On Bitcoin and Red Balloons”. This summary is meant for people who already understand how the Bitcoin network and protocol function. For an overview of that see the Bitcoin Wikipedia page.

The flaw pointed out in the paper is that there is a negative incentive for miners to forward Bitcoin transactions. By not forwarding you increase the chance that you receive the transaction’s fee rather than another miner. This is not so much of an issue now as the fees usually total to much less than the 50BTC reward per block. But as the block reward diminishes in the future this negative incentive may become more of an issue.

The paper’s proposed solution is to reward nodes who forward transactions as well as nodes who solve the block in which the transaction is included. Each transaction would have a chain of its forwarding nodes attached to it. When a miner solves a block all nodes in the chains that lead the transactions in that block to the miner would be rewarded. The issue with this is that a single node can forward to itself many times to illegitimately gain more of the reward. This is called a Sybil attack.

Their solution to the Sybil attack is to give 0 reward to all nodes in a chain of forwards if the length of that chain is greater than H. This gives a negative incentive to create fake forwards to yourself in attempt to gain multiple rewards for a single transaction. Your best bet is to forward legitimately to other nodes and hope the transaction reaches a miner who solves it before the number of forwards is greater than H.

The paper determines optimal strategies in terms of values for H and the functions to divide the fee between nodes in the chain. But this is all modeled on directed trees (which have no cycles) rather than a random graph (which is what the Bitcoin network is like in reality) so it’s unknown how well it would work in practice. They leave work on random graphs for future research.

To clear up some common questions

What prevents nodes from faking/stripping the forward chain so that they can pretend as if they were the only forwarder?

The paper proposes changing the protocol so that you are must include the public key you are forwarding the transaction to and then signing it with your public key. So if the transaction was sending coins from coderrr -> sammy and the chain was coderrr, bob, alice, miner, it would look like this:

msg0 = sign_coderrr(coderrr->sammy, 1BTC, forwardto: bob)
msg1 = sign_bob(msg0, forwardto: alice)
msg2 = sign_alice(msg1, forwardto: miner)

Alice would not be able to recreate the initial message replacing alice for bob because she cannot sign as coderrr.

Please read the paper if you want details.

The goal here was just to summarize the paper to make it easier for people to get the gist of. I’m not arguing for or against the paper’s assumptions or conclusions.

Here is a bitcointalk forum discussion of how relevant this is to the actual Bitcoin network.

October 31, 2011

GitHub hack: speed up git push and git pull

Filed under: git, linux, ssh — Tags: , , — coderrr @ 7:27 pm

tldr

`git pull` and `git push` to github can be greatly sped up (especially for users with >100ms round trip times to github.com) by keeping an ssh master connection open to github.com at all times. To do this, add these two lines to your ~/.ssh/config

ControlMaster auto
ControlPath /tmp/ssh_mux_%h_%p_%r

and then leave `ssh git@github.com git-receive-pack your_github_username/your_repo` running in the background.

 

For users with high latency, pulls and pushes to github can start quite slow. For example, with a RTT to github.com of 250ms, `git pull/push` usually takes a minimum of 4.5s to tell you ‘Already up-to-date’. This is largely due to the fact that git is using ssh and the startup time of an ssh connection requires many round trips. How many round trips exactly? We could read the RFC and OpenSSH implementation details… or we could just check what actually happens.

`ssh -v` shows you what ssh is doing at each step, but it’s not timestamped. We can use this little script to prefix each line with a timestamp.

# time.rb
start = Time.now
puts "#{((Time.now-start)*1000).to_i}\t#$_"  while $<.gets

To make it easier to determine whether time is spent on a network round-trip rather than client/server CPU time we can artificially increase the RTT to 1000ms using tc:

$ sudo tc qdisc add dev eth0 root netem delay 1000ms

Now we can look at the timestamped `ssh -v` output. I’ve annotated it to show where the round-trips occur.

$ ssh -v git@github.com echo hi 2>&1 | ruby time.rb

0	OpenSSH_5.5p1 Debian-4ubuntu6, OpenSSL 0.9.8o 01 Jun 2010
0	debug1: Reading configuration data /home/steve/.ssh/config
0	debug1: Reading configuration data /etc/ssh/ssh_config
0	debug1: Applying options for *
0	debug1: auto-mux: Trying existing master
0	debug1: Control socket "/tmp/ssh_mux_github.com_22_git" does not exist

DNS lookup

2331	debug1: Connecting to github.com [207.97.227.239] port 22.

1

3322	debug1: Connection established.
3322	debug1: identity file /home/steve/.ssh/id_rsa type 1
3322	debug1: Checking blacklist file /usr/share/ssh/blacklist.RSA-2048
3322	debug1: Checking blacklist file /etc/ssh/blacklist.RSA-2048
3322	debug1: identity file /home/steve/.ssh/id_rsa-cert type -1
3322	debug1: identity file /home/steve/.ssh/id_dsa type -1
3322	debug1: identity file /home/steve/.ssh/id_dsa-cert type -1

2

4318	debug1: Remote protocol version 2.0, remote software version OpenSSH_5.1p1 Debian-5github2
4318	debug1: match: OpenSSH_5.1p1 Debian-5github2 pat OpenSSH*
4318	debug1: Enabling compatibility mode for protocol 2.0
4318	debug1: Local version string SSH-2.0-OpenSSH_5.5p1 Debian-4ubuntu6
4318	debug1: SSH2_MSG_KEXINIT sent

3

5318	debug1: SSH2_MSG_KEXINIT received
5318	debug1: kex: server->client aes128-ctr hmac-md5 none
5318	debug1: kex: client->server aes128-ctr hmac-md5 none
5318	debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent
5318	debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP

4/5 ( two round trips )

7335	debug1: SSH2_MSG_KEX_DH_GEX_INIT sent
7335	debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY

6

8334	debug1: Host 'github.com' is known and matches the RSA host key.
8334	debug1: Found key in /home/steve/.ssh/known_hosts:1
8334	debug1: ssh_rsa_verify: signature correct
8334	debug1: SSH2_MSG_NEWKEYS sent
8334	debug1: expecting SSH2_MSG_NEWKEYS
8334	debug1: SSH2_MSG_NEWKEYS received
8334	debug1: Roaming not allowed by server
8334	debug1: SSH2_MSG_SERVICE_REQUEST sent

7/8 ( two round trips)

10350	debug1: SSH2_MSG_SERVICE_ACCEPT received

9

11344	debug1: Authentications that can continue: publickey
11344	debug1: Next authentication method: publickey
11344	debug1: Offering public key: /home/steve/.ssh/id_rsa

10

12376	debug1: Remote: Forced command: gerve coderrr

11

13398	debug1: Remote: Port forwarding disabled.
13398	debug1: Remote: X11 forwarding disabled.
13398	debug1: Remote: Agent forwarding disabled.
13398	debug1: Remote: Pty allocation disabled.
13398	debug1: Server accepts key: pkalg ssh-rsa blen 277

12

14398	debug1: Remote: Forced command: gerve coderrr

13

15420	debug1: Remote: Port forwarding disabled.
15420	debug1: Remote: X11 forwarding disabled.
15420	debug1: Remote: Agent forwarding disabled.
15420	debug1: Remote: Pty allocation disabled.
15420	debug1: Authentication succeeded (publickey).
15420	debug1: channel 0: new [client-session]
15420	debug1: setting up multiplex master socket
15420	debug1: channel 1: new [/tmp/ssh_mux_github.com_22_git]
15420	debug1: Entering interactive session.

14

16416	debug1: Sending environment.
16416	debug1: Sending env LANG = en_US.utf8
16416	debug1: Sending command: echo hi

15

17417	debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
17417	debug1: client_input_channel_req: channel 0 rtype eow@openssh.com reply 0
17417	Invalid command: 'echo hi'
17417	  You appear to be using ssh to clone a git:// URL.
17417	  Make sure your core.gitProxy config option and the
17417	  GIT_PROXY_COMMAND environment variable are NOT set.
17417	debug1: channel 0: free: client-session, nchannels 2
17417	debug1: channel 1: free: /tmp/ssh_mux_github.com_22_git, nchannels 1
17417	debug1: fd 1 clearing O_NONBLOCK
17417	Transferred: sent 2296, received 2952 bytes, in 2.0 seconds
17417	Bytes per second: sent 1151.0, received 1479.9
17417	debug1: Exit status 1

So we see a total of 15 round trips before github responds to the actual command we sent.

Luckily we can skip most of these by using ssh master connections. Just add this to the top of your ~/.ssh/config:

ControlMaster auto
ControlPath /tmp/ssh_mux_%h_%p_%r

Now as long you have an ssh connection open to github.com it will be reused when starting a new ssh session. But how do we keep a connection open to github.com since they don’t give you shell access. Well, we know from experience that when doing a long `git pull` it must be keeping the git/ssh session open for a while. So worst case, we could just continually do long `git pull`s in the background. But there’s gotta be a better way. Maybe starting a push but never sending any data will keep the session open indefinitely. How do we test this? First we need to find out what command git actually sends when doing a push. Let’s be lazy and not RTFM, but experiment instead.

The command pointed to in the GIT_SSH env var will be used instead of `ssh` if it is set. So let’s make a little script which writes the arguments passed to it to a file:

$ cat > git_out.rb
#!/usr/bin/env ruby
File.write("/tmp/git.txt", ARGV*" ")
$ chmod +x !$
$ GIT_SSH=./git_out.rb git push
fatal: The remote end hung up unexpectedly
$ cat /tmp/git.txt
git@github.com git-receive-pack 'coderrr/test.git'

Now let’s see what happens if we call `ssh git@github.com git-receive-pack coderrr/test.git` ourselves:

$ ssh git@github.com git-receive-pack coderrr/test
007ac836660a1d498131a934badab139fc0d347d2c29 refs/heads/master report-status delete-refs side-band-64k ofs-delta
003e4e454cba21ca64b1eda7d4042c9f86abf3987e8b refs/heads/stats
...
0000

The connection stays open! Now let’s check how fast `git push` is with our 1000ms latency.

$ time git push
Everything up-to-date

real	0m3.906s

Close the ssh connection and try again:

$ time git push
Everything up-to-date

real	0m23.402s

Down to 4 seconds from 23, not bad.

You can also use autossh to make sure the ssh connection reconnects in case it drops.

While this won’t be so useful for people in the US who have RTT to github of <50ms it can be very helpful for people in other countries where the RTTs are regularly more than 250ms.

Further investigation could include checking how long GitHub allows the git-receive-pack connection to remain open and possibly very slowly sending valid git protocol data into the git-receive-pack ssh connection to keep it open for longer periods of time.

Ethical implications of this are left as an exercise to the reader.

June 30, 2011

Patching The Bitcoin Client To Make It More Anonymous

Filed under: anonymity, bitcoin, cplusplus, patch — Tags: , , , — coderrr @ 5:18 pm

Shameless Plug: Hide your IP while connected to the Bitcoin P2P network with a VPN Service.

TLDR: this patch allows you to …
– see all addresses, including change
– see which addresses are linked together (does recursive expansion of address linkages)
– select which address(es) to send from, rather than letting the client to chose (randomly) for you

Bitcoin is a decentralized, peer to peer, digital currency. It has been referred to as anonymous, pseudo-anonymous, pseudonymous (whatever that means), and not anonymous at all. It seems there is a lot of misinformation about exactly how anonymous it is and how its anonymity works. I’m going to try to explain part of that here and provide a solution to one of the current big killers of its anonymity.

When you receive coins at a new Bitcoin address, that is, one you’ve never used before, the fact that you control that address is completely unknown to anyone except the sender (and anyone the sender leaked that info to). And the sender may not even know your actual identity, depending on if you revealed this to them or not. If you receive another payment at that same address then both the first and second payers will be able to see that both of them payed you at that address. This is due to how the Bitcoin block chain works and is why you are advised to create a new address for each new payment you wish to receive.

So assume you’ve created 100 addresses for 100 payments. Each of the 100 people know they paid you once, but they don’t know that 99 other people paid you or how much those payments were or how much you have total. So you have revealed very little about yourself to anyone.

Now let’s say you want to _make_ some payments or even just re-organize your funds by moving them to another address. This is where things get tricky and you start losing anonymity. The official Bitcoin client picks coins from multiple addresses in a random fashion when making payments. So let’s say you have those 100 payments from 100 different people each attached to their own address sitting in your wallet and now you want to send Wikileaks some coins. The Bitcoin client might chose coins from 3 of those incoming payments to send out. Now all 3 of the people who sent you those payments know that you received at least 3 payments, how much they were for, and when you received them.

Let me give you a scarier example. Let’s say you have 1 million dollars worth of Bitcoin sitting in one address from some withdrawals on a Bitcoin exchange. Now let’s say you have an address you use for donations, and assume you’ve gotten at least one. The next time you want to send some coins to someone, your client may pick a few coins from your million dollar address and a few coins from your donation address. This is a big problem because it gives the people who’ve donated coins the knowledge that you are also in control of the million dollar address. Plus if your donation address is publicly associated with your identity not only the donors but anyone can go through the block explorer to see which other addresses you are in control of and what their balances are.

Here is a related excerpt from the bitcoin wiki

… if one has bitcoins on several addresses, one can theoretically choose from which address to send the coins. Choosing personally generated coins or an address that you know doesn’t reveal information would protect you. Unfortunately, the default Bitcoin client doesn’t support this currently, so you must assume that your entire balance can identify you if any of the addresses can.

So what can you do about this? If you don’t have any Bitcoin yet then you can just make sure to use separate wallets for addresses you don’t want being mixed together. If you’re already in the position where you have public and private funds in the same wallet there’s not much you can do with the official Bitcoin client, other than not send coins to anyone (or yourself).

That’s why I’ve made a patch to the official client which allows you to send from _only_ a single specific address. Now you can be sure the only people who will ever know that you made that transaction are the ones who already knew about the address being under your control. If you did things right, this will only be a single person.

I’ve added a ‘Send From Address’ tab to the main interface. It actually contains information which was impossible to get from the client before. That is, every address in your wallet and the balance thereof. This includes addresses which were created for the change of your outgoing transactions. These were previously nowhere to be found in the client (even using the bitcoind RPC interface).

Simply chose the address you wish to send from and double click it. This will open the Send dialog with the Send From address filled in. If you try to send more coins than are available in that address the transaction will simply fail and you can try again. Leaving the Send From address blank will make the client behave normally and possibly pick coins from multiple addresses.

The second version of my Bitcoin client patch gives you a better view of your current address linkages. If any two or more addresses were used together for an outgoing transaction those will be considered linked. If any change is returned from an outgoing transaction that change address will be considered linked to all the originating addresses.

The ‘Send From Address’ tab now groups together linked addresses. Each group is separated by an empty line. I’ve also added a ‘Label’ column which will show you the label for the address if one has been set in the ‘Address Book’. Since your receiving addresses usually have labels this makes it easy to see which other addresses they have been linked to.

Sending from multiple addresses is now supported. Simply use the CTRL key to select multiple addresses then click the ‘Send’ button. The addresses will appear in the ‘Send From’ textbox separated by semicolons. Note, this DOES NOT guarantee all the addresses you selected will be used for the transaction. But it DOES guarantee that no unselected addresses will be used. As before, if you leave the ‘Send From’ field blank the client will fall back to its default behavior.

Version 3 of the patch now contains command line support for bitcoind:

bitcoin listaddressgroupings
bitcoin sendtoaddress <bitcoinaddress>[:<sendfromaddress1>[,<sendfromaddress2>[,...]]] <amount> [comment] [comment-to]

Add a +1 to the pull request if you believe this should be added to the official client: https://github.com/bitcoin/bitcoin/pull/415

My github bitcoin fork: https://github.com/coderrr/bitcoin/tree/v0.5.3+coderrr
The commits with the changes: https://github.com/coderrr/bitcoin/compare/v0.5.3…v0.5.3+coderrr
Compiled Windows 32bit client: https://github.com/coderrr/bitcoin/downloads
Compiled Linux 64bit client: https://github.com/coderrr/bitcoin/downloads

May 3, 2011

Beware of Thread#kill or your ActiveRecord transactions are in danger of being partially committed

Filed under: concurrency, patch, rails — Tags: , , — coderrr @ 3:25 pm

Firstly, before you all say you should never use Thread#kill and link to Headius post, this is not the same issue which he explains, but it is somewhat related.

Secondly, before you ask why are you using Thread#kill anyway, or why should we worry about anyone who is. There is one scenario where Threads being #killed is quite likely and there are probably a bunch of people who run into it without knowing. When a ruby process terminates all Threads other than the main one are #killed (the main Thread gets an exception raised). So if you had a ruby process with a bunch of threads and this process received a TERM signal, all those Threads are going to get #killed unless you specifically handle the TERM signal and instruct/wait for all threads to finish. That is of course what you should do but I am guessing there are plenty of people who don’t.

ActiveRecord transactions and Thread#kill

First let me say your ActiveRecord transactions could already be partially committed if you use Thread#kill or Thread#raise due to the reasons Headius’ explains. But the chance of this happening due to those reasons is lower than due to this new one. Here’s the ActiveRecord transaction code. I’ve posted the Rails 2.2 version because it’s much simpler and less code than the current 3.0 source. But 3.0 is still vulnerable to the same issue:

      def transaction(start_db_transaction = true)
        transaction_open = false
        begin
          if block_given?
            if start_db_transaction
              begin_db_transaction
              transaction_open = true
            end
            yield
          end
        rescue Exception => database_transaction_rollback
          if transaction_open
            transaction_open = false
            rollback_db_transaction
          end
          raise unless database_transaction_rollback.is_a? ActiveRecord::Rollback
        end
      ensure
        if transaction_open
          begin
            commit_db_transaction
          rescue Exception => database_transaction_rollback
            rollback_db_transaction
            raise
          end
        end
      end

Now imagine you’re using it like this

ActiveRecord::Base.transaction do
  user_1.update_attribute :something, 1
  ...
  user_n.update_attribute :something, n
end

If Thread#kill is called anywhere inside of that block the rest of the updates will not occur but the transaction will be committed.

How does it happen?

The reason this happens and why it’s a little unexpected is because Thread#kill will bypass all code _including_ rescue blocks and jump immediately to ensure blocks. Many people code with the assumption if an ensure block is hit without a rescue block being hit the code must have finished executing. And this is the assumption made in the ActiveRecord transaction code. Imagine we move from the middle of the block passed to #transaction (where Thread#kill was called) down to the start of the ensure block. transaction_open will evaluate to true (which it would not have were an exception raised since the rescue block sets transaction_open to false) and commit_db_transaction will be called.

In contrast, for this to occur due to the Headius’ issue first an exception would have to be raised inside the transaction block and _then_ Thread#raise or Thread#kill would have to be called inside the rescue block before transaction_open is set to false, a much smaller area of attack.

Can this be fixed?

Yes and no. Setting a local variable to true after the yield statement is a simple way to determine if the block passed to yield ran to completion:

begin
  yield
  completed = true
ensure
  puts "entire block ran!"  if completed
end

This assumes the code in the block doesn’t use certain flow control statements. Those statements are return, break, next, retry, redo and throw. If one of these statements is used to break out of the block prematurely it will effectively act just like Thread.kill. The interpreter will immediately jump to the ensure block bypassing everything else including the local variable being set. Which means if you relied on the local variable being set to commit the transaction then any transaction block with one of those statements would always be rolled back.

In MRI 1.8 and Rubinius Thread.current.status will be set to ‘aborting’ if the current thread is in the state of being killed. So we can detect if we are in the ensure block because of a Thread#kill rather than one of the flow control statements.

begin
  yield
ensure
  puts "enitre block ran!"  if Thread.current.status != 'aborting'
end

The problem is this doesn’t work at all in 1.9 and isn’t consistent in JRuby. If all interpreters behaved as 1.8 does then fixing this issue would be trivial and have no downsides. Knowing whether you are in an ensure block prematurely because the programmer intended so via flow control statements versus due to an aborting thread seems fairly important. So I believe that 1.9 and JRuby should copy this behavior or at least provide some other similar mechanism.

The Patch

Here’s a patch which fixes the issue for MRI 1.8 and Rubinius:
https://github.com/coderrr/rails/commit/7f8ac269578e847329c7cfb2010961d3e447fd19

Thanks to Marcos Toledo for brainstorming the issue with me and helping me proofread

January 10, 2011

String#slice!, Overflow Buffers, and Optimization Math

Filed under: ruby19 — Tags: — coderrr @ 5:57 pm
s="abc"
s.slice!(0,2) # => "ab"
s # => "c"

String#slice! is often used when parsing protocols: 1) read some data into a buffer string, 2) read bytes indicating how much data is in the current message, 3) slice! off that amount of data from the buffer, 4) do something with it, loop back to step 2 until buffer has no more full messages. In most situations the buffer always stays small because you parse every message out of the buffer each time you read data from the stream and usually you aren’t reading that many bytes from the stream at a time. But in some rare cases you may have to deal with larger buffers.

String#slice!’s runtime is approximately proportional to the length of the string being slice!d (the length of the resulting string is pretty much irrelevant). This can be a problem when you’re parsing many small messages out of a large string. For example, calling #slice! on a 1 meg string is about 10x slower than calling it on a 10k string. This means your parsing is going to be 10x slower if you’re parsing a single 1 meg buffer rather than a bunch of smaller 10k buffers. So how do we fix this?

An “overflow buffer”. Keep most of the data in a large overflow buffer which you transfer to a smaller active buffer as needed:

data = read_from_stream()
overflow_buffer << data
loop do
  buffer << overflow_buffer.slice!(0,BUFFER_SIZE)  if buffer.size < BUFFER_MIN_SIZE
  # parse buffer with buffer.slice!
end

This way you only call slice! on a large string a few times and most of the calls are on a relatively small string. The question now is what values should you chose for BUFFER_SIZE and BUFFER_MIN_SIZE. BUFFER_MIN_SIZE is easy, it should be the size of your largest possible message. Alternately you can use something smarter which copies data from the overflow buffer as needed based on your parsing logic.

Choosing BUFFER_SIZE is a little more difficult. It depends on the average amount you’re slice!ing off the buffer each time (SLICE_SIZE) which determines how many times you slice! it. It also depends on the average size of the overflow buffer (OVERFLOW_SIZE).

Let’s simply call these variables b (BUFFER_SIZE), s (SLICE_SIZE), and o (OVERFLOW_SIZE). Given s and o we want to find the optimal value of b. So we need to model the runtime of parsing the entire buffer in terms of those 3 variables. Since the runtime of slice! is approximately proportional to string length we can simply use string length as a replacement for runtime.

The runtime of parsing the entire buffer can be represented by two summations. First is the sum of the times you slice the overflow buffer:

sum i=0 to o/b, (o - b*i)

That is, we will slice the overflow buffer o/b times and each time the size of the overflow buffer will be b less than before.

The second is the sum of the times you slice the small buffer multiplied by the number of times you copy from the overflow buffer into the small buffer:

(o/b) * sum i=0 to b/s, (b - s*i)

Now we just take the derivative of these summations with respect to b, plug in whatever your values are for s and o and solve for 0.

d/db (sum (o-b*i), i=0 to o/b) + d/db (sum o/b*(b-s*i), i=0 to b/s) = 0, o = 1000000, s = 1000

So if your average maximum buffer size is 1 meg and your average slice size is 1k your optimal BUFFER_SIZE is about 10000*sqrt(10) or 31600. Doing some actual benchmarks I found the optimal to be around 40000 so the math comes pretty close to practice.

WARNING: Don’t actually try to optimize your string slicing unless you know you actually have a bottleneck on String#slice! and that the bottleneck is due to the strings you’re slicing being large. Otherwise people will just laugh at you and call you a noob.

December 27, 2010

Canonical redirect pitfalls with HTTP Strict Transport Security and some solutions

Filed under: security — Tags: — coderrr @ 4:21 pm

tl;dr

There is a common pitfall when implementing HTTP Strict Transport Security on sites that 301 redirect from x.com -> http://www.x.com which leaves your users open to a MITM attack.  Paypal is an example of one of the sites affected by this issue.  There are some solutions but none are ideal.

HSTS

HTTP Strict Transport Security (HSTS) is a mechanism for websites to instruct browsers to only ever connect to them using an encrypted HTTPS connection.  It solves the problem of man in the middle attacks (MITM) (except on the browser’s first ever connection to the site) which could allow an attacker to keep the user from connecting the secure version of the site and do evil things to them.

It’s generally very simple to setup.  You just 301 redirect any HTTP connections to the HTTPS version of your site and then you add an HTTP header (Strict-Transport-Security: max-age=TTL_IN_SECONDS) to every HTTPS response.  That’s it.  Note that you cannot put the HSTS header on unencrypted HTTP responses because this would allow an attacker to expire your HSTS policy prematurely.

HSTS with Canonical Redirects

Because of that last point it gets a little trickier when dealing with a website that does canonical redirects.  A canonical redirect is just a permanent 301 redirect.  For example, yoursite.com 301 -> http://www.yoursite.com or http://www.yoursite.com 301 -> yoursite.com.  This is fairly standard these days for SEO purposes (go to paypal.com for an example).

Let’s say your canonical redirect is from yoursite.com to http://www.yoursite.com.  So you setup the http://yoursite.com -> https://www.yoursite.com and http://www.yoursite.com -> https://www.yoursite.com 301 redirects and you set the HSTS header on https://www.yoursite.com responses.  This protects any user who goes directly to http://www.yoursite.com in their browser.  But most often people will type in the domain without the www.  Those users will still be open to MITM attacks as the http://yoursite.com -> https://www.yoursite.com redirect will always be unencrypted.  The HSTS headers on http://www.yoursite.com will not affect yoursite.com.

Paypal Fail

For a real world example of this let’s check out Paypal.  They were the “first major internet site” to start using HSTS.  To be fair they are still beta testing it (with a max-age of less than 10 minutes) and mention explicitly that it’s only enabled on http://www.paypal.com.  Although they don’t mention the ramifications of that.

Using the latest version of Google Chrome (which is the only non beta browser that currently supports HSTS) go to paypal.com.  Then open Developer Tools and go to the Resources tab.  You’ll see an unencrypted 301 redirect from http://paypal.com to https://www.paypal.com.  The response from https://www.paypal.com will set the HSTS policy.  Now open a new tab and go to http://www.paypal.com.  Open Developer Tools again and you’ll see the browser goes directly to the encrypted https://www.paypal.com, HSTS in action!  Now open a new tab and go to paypal.com (without the www) again.  You’ll see this time Developer Tools shows there’s still an unencrypted 301 redirect, meaning an attacker can still get you with a MITM attack.  This is a perfect example of the canonical redirect pitfall with HSTS.  Another site which has the exact same problem is http://pixi.me.  My guess is this error will repeat itself many times once more sites start implementing HSTS.

This issue doesn’t seem to be readily apparent to a lot of people or at least they don’t mention it.  Even people who should probably be the most aware of it.  Here’s an except from the Security Now podcast by Steven Gibson talking about Paypal HSTS:

“So, for example, if the user put in just http://www.paypal.com, or even http://paypal.com, if there is a Strict Transport Security token that has been received previously, the browser has permission to ignore that non-secure query request from the user and to transparently make it secure.”

http://www.grc.com/sn/sn-262.htm

And after reading Paypal’s HSTS announcement you’d assume that typing in paypal.com would be secure too.

Solution 1

To fix this we must set an HSTS policy on both yoursite.com and http://www.yoursite.com.  Here’s how:

1) Setup a 301 redirect from http://yoursite.com to https://yoursite.com
2) Setup a 301 redirect from https://yoursite.com to https://www.yoursite.com
3) Set an HSTS header on the redirect (or all responses) from https://yoursite.com
4) Setup a 301 redirect from http://www.yoursite.com to https://www.yoursite.com
5) Set an HSTS header on all responses from https://www.yoursite.com

So the redirect chains will look like this:

http://yoursite.com -> https://yoursite.com -> https://www.yoursite.com
OR
http://www.yoursite.com -> https://www.yoursite.com

And the request response cycle will look like this (for the first redirect chain):

- browser requests http://yoursite.com
– server redirects to https://yoursite.com

HTTP/1.1 301 Moved Permanently
Location: https://yoursite.com

- browser requests https://yoursite.com
– server redirects to https://www.yoursite.com and sets HSTS header

HTTP/1.1 301 Moved Permanently
Location: https://www.yoursite.com
Strict-Transport-Security: max-age=2592000

- browser stores HSTS policy for yoursite.com and will never connect to it without encryption for a month
– browser requests https://www.yoursite.com
– server sets HSTS header and serves up standard homepage

HTTP/1.1 200 OK
Strict-Transport-Security: max-age=2592000
Content-type: text/html
...

- browser stores HSTS policy for http://www.yoursite.com and will never connect to it without encryption for a month

Now you have secured both users who type yoursite.com into the location bar and those who type http://www.yoursite.com.  The downside to this is that you require two HTTPS connection which will be slower for the user and more expensive for your servers.  You’ve also still not secured the user who typed in http://www.yoursite.com and then a week later typed in yoursite.com.  That user’s first ever connection to yoursite.com will still be open to a MITM attack.

Solution 2

To secure both yoursite.com and http://www.yoursite.com on the initial connection to either here’s what you would have to do.  First make sure you have the simple 301 redirects from http://yoursite.com or https://yoursite.com to https://www.yoursite.com as most sites already have done.  Then embed an invisible iframe on the landing page (or every page) of https://www.yoursite.com which loads https://yoursite.com/hsts.  This /hsts path would do nothing other than send back a blank page with the HSTS header set.  The /hsts response could also contain cache headers (Expires and Cache-Control: public) which would make the browser not re-request it for some amount of time which is less than your HSTS policy.  This still has the issue of requiring two HTTPS connections every time a user types in yoursite.com.

How you actually implement any of the previous concepts depends entirely on your web server and framework.  This post is just meant to get the idea out there and people aware of this potential pitfall.  See the HSTS Wikipedia article for some implementation examples of HSTS.

Considering site.com -> http://www.site.com canonical redirects are quite ubiquitous across the web it seems like something HSTS should deal with better.  They include an HSTS option called includeSubDomains which would handle http://www.site.com -> site.com canonical redirects.  But even that isn’t perfect because it would also enforce the HSTS policy on other possibly unimportant subdomains like blog.site.com.

Solution 3: Add a Hack

I wonder if it’s worth putting an explicit exception into HSTS which allows the www subdomain to set its parent domain’s HSTS policy.  Maybe only allowing it to set or increase the TTL of the policy, and never allowing it to decrease.  I think this would make HSTS much easier to implement for the vast majority of websites.

In my next post I’ll suggest some potential additions to HSTS which could make things more simple for implementers.

Thanks to Victor Costan for reviewing this post and helping me brainstorm.

September 20, 2010

socks_switch: switching between direct connection and SOCKS proxying based on ping time to host

Filed under: network, ruby — Tags: , — coderrr @ 7:32 pm

This little hack named socks_switch allows you to direct connect to sites which are geographically close to you (and therefore have a low ping time) and proxy the rest through a SOCKS server. It’s built on top of this 7 line SOCKS 4 server hack which is built on top of ProxyMachine which is built on top of EventMachine.

The way it works is by pinging each host the first time you connect to it. To keep from delaying connections, the first connection to each host will always be proxied. After the ping time is determined then all subsequent connections will either be direct or proxied depending on the ping threshold you set. Ping times expire after 10 minutes (or whatever you change the ping interval to) and hosts will be re-pinged on the next connection. You can also specify a list of IPs which will always direct connect and therefore never be pinged or proxied.

This was created for an outside of the US environment where SOCKS proxying through a US server typically increases reliability and bandwidth of connections to US or adjacent hosts. But in cases where you are connecting to a geographically nearby server, proxying through the US is slower and less reliable than a direct connection. socks_switch aims to get the best of both worlds.

Here’s the code (or on github).

# socks_switch.rb
# to listen on 127.0.0.1:1080:
#   proxymachine -c socks_switch.rb -h 127.0.0.1 -p 1080 

require 'ipaddr'

# address to your main SOCKS server
SOCKS_SERVER = '1.2.3.4:1080'
# list of objects which will match IPs you never want to proxy
FORCE_DIRECT = ['127.0.0.1',
                IPAddr.new('192.168.0.0/16'),
                '074.125.000.000'..'074.125.255.255',
                /^10\./]
# anything pinging below this will connect directly
PING_THRESHOLD = 150
# re-ping hosts after this many seconds
PING_INTERVAL = 10*60

class Pinger
  PINGS = {}

  class PingConn < EM::Connection
    def initialize(addr)
      @addr = addr
      @start = Time.now
    end

    def connection_completed
      PINGS[@addr] = (Time.now - @start) * 1000
      p [:ping, @addr, PINGS[@addr].to_i]
      EM.add_timer(PING_INTERVAL) { PINGS.delete @addr }
      close_connection
    end
  end

  def fast?(addr, port)
    key = "#{addr}:#{port}"
    return PINGS[key] < PING_THRESHOLD  if PINGS[key]
    PINGS[key] = PING_THRESHOLD  # prevent double pinging
    EM.connect addr, port, PingConn, key
    false
  end
end

pinger = Pinger.new

proxy do |data|
  next  if data.size < 9
  v, c, port, o1, o2, o3, o4, user = data.unpack("CCnC4a*")
  return { :close => "\0\x5b\0\0\0\0\0\0" }  if v != 4 or c != 1
  next  if ! idx = user.index("\0")
  addr = [o1,o2,o3,o4]*'.'
  if FORCE_DIRECT.any? {|r| r === addr } or pinger.fast?(addr, port)
    p [:direct, addr]
    { :remote => "#{addr}:#{port}", :reply => "\0\x5a\0\0\0\0\0\0", :data => data[idx+9..-1] }
  else
    p [:proxied, addr]
    { :remote => SOCKS_SERVER }
  end
end

January 30, 2010

How to create a shortcut key to open a new terminal window in the current working directory on OSX

Filed under: osx — Tags: — coderrr @ 2:04 pm

Shameless Plug: Use a Mac OS X Compatible VPN Service to protect your privacy on the Internet.

In gnome-terminal I always used ctrl-shift-n to open a new terminal window in the current working directory (cwd). Having been forced to switch to OSX I was totally missing this functionality in the Terminal app. I often spawn a new terminal to quickly execute a quick command or series of commands and then close it. Not finding any satisfactory solutions online I hacked one together which works.

My Solution

First make sure the following checkbox is enabled: Terminal -> Preferences -> Window ->TTY Name

Now edit your ~/.bash_profile and add this line:

export PROMPT_COMMAND='pwd > /tmp/cwd.`ps -o tty= -p $$`'

Next create an applescript with the following content (if you don’t know how to create/compile an applescript just use the AppleScript Editor) and save it in your Scripts directory:

tell application "Terminal"
	set ttyname to (tty of (front window))
	set AppleScript's text item delimiters to "/"
	set cwd to do shell script "cat /tmp/cwd." & (item 3 of (text items of ttyname))
	activate
	do script "cd '" & cwd & "'"
end tell

Now that you have saved the script we only need to bind a key to it. This requires an extra application as I don’t know of any stock way to do this in OSX. I use FastScripts to do this. I believe KeyboardMaestro also has this capability, as I’m sure lots of other apps I don’t know about do. Feel free to leave a comment with suggestions on better/free-er ones.

With FastScripts you just drop down the menubar which should already contain your script if you saved it in your Scripts directory. Find the script and cmd-click it, this will bring up a popup which allows you to assign a global shortcut.

How it works

The $PROMPT_COMMAND is executed before each prompt (i.e. after any directory change). pwd > /tmp/cwd.`ps -o tty= -p $$` writes out the current working directory to a temp file containing the name of the current TTY. So for each Terminal session there will be a corresponding file which is kept up to date with its current directory. The applescript grabs the TTY from the current active Terminal and reads its cwd file to find the current directory. It then spawns a new terminal window and tells it to cd to that directory.

* note: if you are in a screen this will not work because the TTY will be different than the one in the Terminal’s window.

Let me know if you have a better solution, I’m sure there’s one out there.

January 10, 2010

Tunnel Splitter: Accelerating a single TCP connection over multiple ISPs

Filed under: network, ruby, tunnel_splitter — Tags: , , — coderrr @ 12:15 pm
        LOCAL HOST                                          REMOTE HOST 

                                +--> (ISP1) ---+
                               /                \
(SOCKS client) -> (ts_client)  ---> (ISP2) ------ > (ts_server) -> (SOCKS server)
                               \                /
                                +--> (ISP3) ---+

About

At its essence Tunnel Splitter is a ruby and EventMachine based client/server pair which allows you to split a single TCP connection into multiple parts and send them over multiple TCP connections. The client does the splitting, the server does the reconstruction. At first glance this doesn’t seem to be anything special, but it actually allows you to do something very cool: use the aggregate the bandwidth of multiple ISPs for a single TCP connection.

There are already plenty of load balancing routers out there. Although they do load balance connections between multiple ISPs, any single connection will only be going out over a single ISP. Not to their discredit, it’s the best you can do with only a router. Tunnel splitter requires a little more. It requires that you have access to a server sitting on a connection twice as fast (up+down) as the sum of all your ISP connections. This is easily available nowadays. I currently run my tunnel splitter server on an $8/month VPS.

If it’s not already obvious, tunnel splitter can save you a lot of money if you need a fast connection. Getting a single 50Mbps connection to your house will cost a lot more than getting 6 15Mbps connections (or whatever speed your ISP provides). The increase in price as you need more and more bandwidth from a single ISP connection is almost never linear.

Tunnel splitter also provides transparent redundancy if ISPs go down. It uses a custom protocol on top of TCP to make sure all packets get delivered. If 10 packets have already been sent out through ISP1 before it is determined that it went down, they will be resent out through another link, so your tunneled TCP connection will not drop.

By itself tunnel splitter only tunnels one local port to one remote host:port. This isn’t that useful by itself. But when you put a SOCKS server at that remote host:port now you can tunnel any application over it which supports SOCKS proxying. Add a little routing magic (which I will discuss later) and you can proxy _anything_, even if it has no SOCKS support.

Usage

Now I’ll show how to download and run tunnel splitter. Either download or clone it from github. To run it you will need to install ruby and the eventmachine gem: gem install eventmachine.

Let’s assume you are connected to two ISPs, one on eth0 and one on eth1. And let’s say your server on a fast connection is running with the address tsplitterserver.mayhem.org. And let’s say you have a SOCKS server running on the server listening on localhost port 1080. On the server you would simply run:

ruby ts_server.rb 0.0.0.0:6000 localhost:1080

This tells it to listen for tunnel splitter client connections on port 6000 and forward the tunneled connections to the SOCKS server on port 1080.

Client side we would run this:

ruby ts_client.rb localhost:1080 tsplitterserver.mayhem.org:6000:eth0~10 tsplitterserver.mayhem.org:6000:eth1~10
# note that the localhost:1080 on the client and the server are unrelated and do not actually need to match

This tells the tunnel splitter client to create 10 tunnels to the server over the eth0 interface and 10 tunnels to the server over the eth1 interface, and then wait for incoming connections on port 1080. When the tunnel splitter client receives connections on port 1080 it will split them up and load balance them across all 20 connections. The reason we create 10 tunnels over each ISP here is to make sure we are maxing out each ISPs bandwidth. Some ISPs impose per connection caps so you can’t get your full bandwidth without using multiple connections. If you’re sure that your ISP doesn’t do this then you are fine with just a single connection and can leave off the ~10.

Now if you point any SOCKS capable app to localhost:1080 it will be using the aggregate bandwidth of both your ISPs, we’re done! Not quite. If you’re using Linux there is another issue we need to address first.

If you’re using OSX or another BSD (I’m assuming they’re the same, but I’ve only tested on OSX) then you can skip this next section.

Linux Routing

The way tunnel splitter makes connections over a specific ISP is to bind the source address to the address of that ISP. The problem is that Linux by default does not route packets based on source address. So with our current setup all the packets will go out over the same ISP and the other one will be totally unused. The simplest way to fix this is to install the shorewall firewall. You don’t need to use it as a firewall but it provides the least involved solution (that I know of) to get Linux to route based on source addresses. The instructions can be found in README.shorewall.

There _is_ a way to set up the needed routing without shorewall but I haven’t looked into it too much yet and it seems to be a little more complicated.

BSD Routing

It works out of the box (at least on OSX).

Applications which don’t support SOCKS proxying

For these applications you can use transocks_em. It is a server which receives all connections from specific apps (via some routing magic) and then forwards them to a SOCKS proxy of your choice. I’ll make a post about it soon, so stay tuned.

Other Uses

Tunnel splitter can also be useful in single ISP situations. As I mentioned above some ISPs impose per TCP connection bandwidth limits. If you are just downloading a file you can often overcome these with download acceleration software which makes multiple HTTP requests. This doesn’t work however for things like streaming video (youtube, hulu, etc). Tunnel splitter solves this problem because it allows you to have a single TCP connection use your ISPs maximum total bandwidth.

The fact that connections will never be dropped by tunnel splitter can be useful as well. You can sleep or hibernate your computer, or all your internet connections can go down for any amount of time, and when you restore things your remote connection will still be active. This works because of a few things. First, whatever program you are proxying is actually connected to localhost (to the tunnel splitter client), so your ISPs going down doesn’t affect that conneciton. Second, the remote connection from the tunnel splitter server to the final destination is of course not affected by your computer’s local status. Thirdly, it is only the tunnel splitter client and server which will detect that the tunnel connections have gone down. But as a side effect of its redundancy protocol, it will keep retrying forever to connect the tunnel connections. Once they do reconnect, any pending data on either side will be sent across as if nothing ever happened. As far as the client you are proxying and the remote server you are proxying to are concerned they have both had active TCP connections the entire time. Of course if you are proxying a protocol which does it own protocol level connection health checking or pinging (IRC) it will detect that there are no responses and most likely drop the connection. I have experimented a little with adding support for both the tunnel splitter client and server to be protocol aware and respond to these pings so that the even a protocol with connection health checks could stay up indefinitely. Not sure if this is something people will really care much about though.

Reliability

Tunnel splitter is still very beta although I have been using it locally for all my web traffic and movie streaming for close to a year now and it’s been quite a while since I’ve run into any bugs.

Feedback

All comments, criticisms, and suggestions welcome. Also feel free to contact me at coderrr.contact@gmail.com

Check out the README for more info.

Older Posts »

The Silver is the New Black Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 28 other followers