In my last blog post about browser fingerprinting for bot detection, I mentioned that this approach is vulnerable to replay attacks. Defending against replay attack with client side JS seems impossible, but that’s actually not always the case.
In 2016, Google’s anti-abuse team members, led by Elie Bursztein, published amazing work named “Picasso: Lightweight Device Class Fingerprinting for Web Clients“. There’s also great slides that summarize this paper. For some unclear reason, this work gained little traction in the bot detection world, so I decided to publish a summary of the paper.
Picasso is a system that allows a server to identify the device class of a web client. Device class if defined as the combination of browser, OS, graphics hardware. That is, Picasso is not intended to identify unique web visitors or specifically bots, but rather distinguish, with high certainty, between different devices classes.
However, this capability is important in the context of bot detection, as many bots lie about their underlying technology within the user agent string in order to appear legitimate and get targeted with high paying ads. As Illustrated in the paper:
In addition, Picasso’s threat model assumes the lying user agent is actively trying to circumvent the fingerprinting system and appear legitimate. So, how does this system works?
The basic principle behind it is to utilize the graphic rendering system of a device as a physically unclonable function. i.e., The output of a web browser graphics such as canvas, is depends on many different layers, from hardware (GPU), to low level software (GPU driver, OS rendering) to high level software (OS and library provided graphics API). This makes the output highly unique per device class, and allows accurate differentiation between them.
This principle is implemented as a challenge-response system, where the server sends the client a challenge which composed of a random seed, number of iterations N and a set of graphical instructions such as quadratic curve, bezier curve, cricle and font. The clients then need to render these graphic instructions and has the output of it. The client is required to repeat this step for the number of iterations N, hashing the canvas output along with the previous result:
The result of these challenges, as said, are unique per device class, but also per random seed. The random seed is used to prevent replay attacks: different seed yields different hash, so it’s impossible to replay previously generated valid hash.
However, because of this, the system need to be initialized with a bootstrap phase, where valid challenge-response pairs are generated by trusted clients, and then saved and a DB with mapping of challenge -> response -> device class.
After the bootstrap phase, an infinite amount of new challenges can be created using verification method similar to old reCAPTCHA: sending the client one known challenge (equivalent to the known control word), paired with unknown challenge (equivalent to the unknown word. A correct response for the know challenge will validate the new challenges response’s and vice versa. In order to avoid pollution attacks, where attackers submit correct response for the known challenge paired with wrong responses for the new challenges, a threshold mechanism is applied: A number of clients must respond with the same response for the unknown challenge before validating it.
Verifying a response is easy, all it takes it to look up the DB and see if the device class associated with the response is indeed matching the declared user agent string.
The cool thing about is system is that not only it allows accurate differentiation between different OS and browsers, but also hardware stacks. For example, here’s the rendering differences between Safari on real iPhone vs emulator:
To summarize, Picasso is a really cool challenge-response system that allows tamper resistant device class identification of web clients. Great work, Bursztein et al!
P.S. Why Picasso, you ask? Look at some challenges rendering results: