Saturday, 28 January 2012

Escaping the Google cave

1. The problem with user tracking

A few years ago, a user complained that the Waf project was using Googlecode, and threatened to not use Waf if it remained hosted on the Google server. I thought that it was paranoid at that time, and I just forgot about that request...

Now a few years have passed, and it is a bit late to move to Github. Also, tons of websites are now hosted by Google, and it would be impossible to avoid all of them. But this is not my main concern. Rather, I have taken the bad habit of logging in on my Google or Facebook accounts more often than I cleared all my cookies and I am getting targeted content too often.

For instance I started to notice that it was much harder to obtain information from Google search. I would frequently find Python in all my search results. Searching for Java programming techniques would lead me to more Python sites. Searching for Scons or CMake would only lead me back to Waf. I would also get ads related to the contents of my emails. In other words, the Google tracking had started to create a convenient place where I would always find familiar information.

Since I was a child, I have always had the important feeling or belief that there exists a world independent of me, a reality that is worth exploring (solipsists may disagree). It is a virtue to try to know the world as it is and not as one would like it to be. The web is interesting because it gives an opportunity of getting other views easily, and to explore a world that is not limited or bound by a particular view, and I would like to keep it this way.

2. Filtering tracking websites


First of all, I believe that the "do not track" cookie is one of the most idiotic invention created recently. If I were creating a website and if someone set that cookie, I would really love to track that someone in the sneakiest way possible. This is equivalent to wandering around with a big sign reading "kick me" stuck on your back.

One of the first things I have tried is to avoid tracking by blocking the scripts that report the pages that I am visiting, for example google analytics. For this it is simple to edit the file /etc/hosts:

127.0.0.1 www.google-analytics.com
127.0.0.1 google-analytics.com
127.0.0.1 ssl.google-analytics.com

There are many more addresses to exclude however, and it does not prevent google from reading your mail. It is enough to log in once to Googlemail to get targeted ads and personalized contents again.

The filtering approach is also imperfect, for example, if it becomes widespread, a few websites will start breaking if the tracking is blocked. It will be easy to test in javascript what hosts are blocked to create a fingerprint of the user. This goes back to the principles of information theory, if you have a secret, it will leak eventually however hard you try to keep it.

3. Setting up multiple identities

Trying to filter the websites is just too complicated to do, and removing http cookies, flash cookies, visited pages, website preferences and user-agent is just too much of a hassle. Websites may also try cache timing attacks to get more information on you anyway.

Tor is nice but limited in terms of feature and speed (no flash, use the Tor browser, etc). Virtual machines are convenient but use a lot of resources, for example flash and javascript are too slow to be usable. I keep them for untrusted websites (and with flash disabled anyway).

The best success I have had so far is by setting up multiple Linux user accounts and multiple identities. I keep my current account for normal stateless activities, and use the other accounts for stateful operations. For example, I created a user account named "google" for all googlemail and googlecode-related activities:

# useradd google -p users
# mkdir /home/google
# echo "export DISPLAY=:0" >> /home/google/.bashrc
# chown google /home/google/.bashrc /home/google

The current user account must allow the windows for each other user accounts to be displayed on the current Xorg session. Make sure to always use xhost +local:accountname:

echo "xhost + local:google" >> ~/.profile

To obtain any sound, it is necessary to tweak the pulseaudio settings. First, the file /etc/pulse/default.pa must be copied to ~/.pulse/default.pa and modified to allow connections from other user accounts:

> diff -urN /etc/pulse/default.pa ~/.pulse/default.pa 
--- /etc/pulse/default.pa       2011-10-30 03:59:03.000000000 +0100
+++ .pulse/default.pa   2011-12-01 00:34:34.537118644 +0100
@@ -158,3 +158,6 @@
 ### Make some devices default
 #set-default-sink output
 #set-default-source input
+
+load-module module-native-protocol-tcp auth-ip-acl=127.0.0.1
+

Then a new file must be added to each other user account, for example /home/google/.pulse/client.conf:
default-server = 127.0.0.1

After that, a web browser can be started easily
sudo su
su - google
firefox

To make certain that I do not confuse the accounts (the web browser completion already helps a lot), I am also using different web browsers with different versions, with different extensions (noflash, firebug, etc), with different themes and with different language settings. For example, looking at ads in Serbian or in Russian is fun. The firefox themes (Personas) even have animations to help remember what window is what.

4 comments:

  1. Whenever I want to temporarily 'detach' from my google account (like if I want to do an unmodified search or login to a different account without logging out of my primary account) I just pop open an 'incognito' window in chrome. I'm sure firefox has a similar feature.

    ReplyDelete
  2. The browser cache is still around, the browser settings are the same, and there is still only one directory for storing the flash cookies. The incognito mode will also forget the changes you made (website settings, passwords) after the tab is closed. This may be nice for hiding visited porn sites from your wife (or your parents), but not if you want to browse the web in different languages.

    ReplyDelete
  3. Why do you say it is a bit late to move to Github?

    Also, consider Mercurial (and Bitbucket.org). You can import your entire Subversion history with the convert extension.

    ReplyDelete
  4. (Haha, I just noticed this was January *2012*, not *2013*. Oh well, my question and suggestion still stand.)

    ReplyDelete