Fri 29 Sep 2006
Is there a better process of tracking return visitors without concern of cookies being deleted?
According to this article, there may be…
Clearing cookies is not enough to save your privacy
"Clearing cookies may not be enough as you may think. Your browser's cache is a valuable store of information. A JavaScript .js file resource which is generated dynamically when requested can have embedded a unique tracking ID and can live permanently in your browser's cache when sent with the right HTTP cache-control headers. This JavaScript file can then be called by pages. The script is never re-requested, and hence keeps the unique ID, and it can call resources on the server-side to track you. They just need to associate this unique ID once with your account (when you login first time after the ID was created), and they can set cookies back again later and track you anyway. The result is that you can be tracked uniquely even past the point where you clear your cookies (i.e., as if you never cleared your cookies to generate fresh ones)."
"The fundament of the meantime exploit is that the server wishes to `tag' the client with some information that will later be reported back, allowing the server to identify a chain. Cookies are a good approach to this, but their privacy implications are well known and so Bob requires a more surreptitious approach.
The HTTP cache-control headers are perfect for this: the data is provided by the server, stored but not verified by the client, and then provided verbatim back to the server on the next matching request.
Two headers in particular are useful: Last-Modified and ETag. Both are designed to help the client and server negotiate whether to use a cached copy or fetch the resource again.
The general approach of meantime is that rather than using the headers for their intended purpose, Bob's servers will instead send down a unique tag for the client.
Last-Modified is constrained to be a date, and therefore is somewhat inflexible. Nevertheless, the server can reasonably choose any second since the Unix epoch, which allows it to tag on the order of one billion distinct clients.
ETag allows an arbitrary short string to be stored and passed. It is not so commonly implemented in user agents at the moment, and so not such a good choice."

September 29th, 2006 at 3:44 pm
Though this would work, I don’t see it as a reliable solution. Headers sent to a server can be easily spoofed to be something they are not. All it would take is one rotten apple to play the game and your stats would be come skewed.
This applies to programming as well, there are some headers that you cannot trust because they could easily spoofed (Last Modified, ETag, etc) or changed (just with a few clicks in some firefox extensions).
The way to track without cookies? Track on the server side. This solution is much better and reliable (yes, there are still issues with this). Using an analytics program, in conjuntion with your server logs, and in conjunction with a custom server side tracker will yield you your best results - relying on one source can lead to inconsistencies. Using several tools and comparing across them will help in narrowing things down.
To me, this seems too much like a hack for tracking - not a reliable way to track - but i know that cookies arent reliable, and there are drawbacks to server side stats.
What if a user has JS disabled? This is possible with so many security threats surrounding the PC world - this eliminates many things (including some third party analytics such as google).
November 3rd, 2006 at 2:42 pm
I agree there is some discrepency with users cookies, but still better than log file analyzers.
November 9th, 2006 at 12:21 pm
RE: Geoff
I would be interested to hear more in your reasoning behind that.
Cookies, something that can be spoofed by a simple telnet request (or hacked through multiple times), versus a log analyzer of your raw server logs (which you then have to have the knowledge to interpret). I could simply turn off my cookies while browsing, but I can’t turn off your raw server log. Which one should you put more faith in?
Having your eggs in more than one basket will give you a more refined picture, but I fail to see how cookies are a better way than log files?