Cookies
One of the simplest ways to track a user is to set a cookie in the user’s browser. A cookie is a small piece of information that can be set by a website, and which the browser will send back to the site each time it requests any resource from that site. A website that a user visits can set a cookie itself. This kind of cookie is called a first-party cookie, since the user has explicitly visited the website that sets it. Websites can include resources that are loaded from other websites, and these other websites can set their own cookies when the browser loads the included resources. These other cookies are called third-party cookies.
Video Lecture
Definition of a Cookie
We can blame Netscape Communications Corporation for inventing the online cookie in 1995, which it then patented.1 The original purpose of the cookie was to facilitate online shopping by enabling the creation of a shopping cart.2 Cookies were later standardized across browsers according to RFC 2109,3 RFC 2965,4 and RFC 6265.5 An RFC, or “Request for Comments,” document is the mechanism by which Internet standards typically evolve. RFCs are published by the Internet Engineering Task Force (IETF).
A cookie is a small piece of data containing name=value pairs. The server hosting a website, or even JavaScript code running within a web page, can set a cookie on the user’s browser. Each time the browser makes a request from the same website, it automatically transmits the cookie as part of the request.
Cookies come in several varieties:
- First-Party Cookie
- A cookie set by a website that the user explicitly visits. More precisely, a first-party cookie is one set by a website that the user’s browser directly visited. There are techniques available for creating first-party cookies from websites the user didn’t intend to visit.
- Third-Party Cookie
- A cookie set by a website whose content is loaded from the website that the user actually visits. The user does not directly try to visit the site that set the third-party cookie.
- Session Cookie
- A cookie that expires whenever the user closes the browser. It is removed from the user’s computer when the browser is closed. Session cookies can be first-party or third-party.
- Persistent Cookie
- A cookie that is saved on the user’s computer and is not automatically removed when the user closes the browser. Persistent cookies can be first-party or third-party.
Cookies as Tracking Devices
A user can be tracked by a site simply by setting a cookie in the user’s browser on the first visit. Inside that cookie, the site sets a unique identifier for that user. Each time the user loads anything from the same website in the future (until the cookie expires or is cleared), the unique identifier is sent to the site. As long as the unique identifier remains in place on the user’s computer, the user can be tracked. If the user has ever signed into the site with an email address or other identifier, the unique identifier in the cookie can be tied to a specific person, and that person can be tracked.
One example of how a user can be tracked even when signed out occurs with Facebook. If a user ever signs into Facebook (a service for which they must provide an email address upon registration), a unique identifier is set in a cookie provided by Facebook. Since many websites on the Internet include a Facebook “like” button, which is loaded from Facebook’s servers, any cookies that were set by Facebook are transmitted every time the “like” button loads. In this way, Facebook can track users across the Internet and tie them back to the real person behind the browser.
Note that Facebook is not alone here. Google operates one of the largest cross-site tracking operations through its Google Analytics platform. Google Analytics is used by many sites on the Internet, so if a user signs into Google and then later visits any website using Google Analytics, the technology enables Google to identify that person anywhere on the Internet that uses Analytics (whether or not they actually do so is another matter). Although Facebook and Google are two prominent examples, other companies both large and small also participate in the surveillance economy and may employ similar techniques.
Users can manually clear their cookies, or they can configure their browsers to clear cookies automatically whenever the browser is closed. It is important to note that as of early 2023, none of the mainstream browsers (Chrome, Edge, Firefox, and Safari) default to clearing cookies automatically. Unless the user chooses a browser specially modified for privacy (such as LibreWolf), clearing cookies automatically requires the deliberate configuration of a browser setting.
An additional feature of most modern browsers is the ability to reject third-party cookies. However, JavaScript can be used to generate requests directly from the browser to arbitrary websites, thereby bypassing simple implementation of third-party cookie restrictions. The JavaScript code causes the browser to make a direct connection to the third-party website, allowing a first-party cookie to be set. The default rules for third party cookies vary among the mainstream browsers, so some user intervention might be required to enable this capability.
Zombie Cookies
Some website operators arrogantly believe that they are entitled to track users who choose to clear their cookies. This arrogance is driven by greed, since tracking users is a lucrative business. Consequently, web developers have misused other technologies in ways that are designed to make it harder for the user to remove tracking technologies. These approaches are highly unethical, since they intentionally bypass settings the user explicitly made (remember that clearing cookies automatically isn’t the default in mainstream browsers).
Technologies that utilize alternative mechanisms to ensure a user continues to be tracked are collectively called zombie cookies, even though several unrelated techniques are used to implement them, and there are no unified standards for defining them. One such technique is to collude with Internet Service Providers (ISPs) to supply a unique identifier for each subscriber. This identifier can then be used to re-generate any deleted cookies.6
By using other storage mechanisms in the browser, a website can store the same tracking information on a user’s computer that would otherwise be stored in a cookie. However, unlike a cookie, this information is harder to clear without additional effort on the user’s part. The resulting tracking mechanism is stored on the user’s computer like a cookie but does not follow the same Internet standards for cookies. An open-source JavaScript-based implementation of this type of zombie cookie is Samy Kamkar’s Evercookie.7
Back when browsers widely supported the use of plugins to support different types of content, one way that zombie cookies could be created was to use Flash Player persistent storage to re-generate standard cookies every time the plugin was loaded. This approach led to litigation, since the use of Flash storage bypassed the user’s security and privacy settings.8 Today, a number of other browser storage areas can be misused to store tracking information, including HTML5 offline storage. The Hypertext Transfer Protocol (HTTP), which is the language spoken by the browser when communicating with a website, supports various cache control mechanisms that can also be abused for tracking. One such mechanism is the entity tag, or ETag.9 Avoiding this type of cookie requires configuring the browser to clear everything when it is closed, including all offline website data, cache, and other types of storage.
Certain web browsers also enable bypassing cookies by providing proprietary storage interfaces that can be abused by websites using JavaScript or WebAssembly. Some other browsers, particularly for mobile phones and similar devices, may transmit unique advertising identifiers with each Internet request. These unique identifiers are an out-of-the-box zombie mechanism, since they permit the regeneration of standard cookies (and the re-tracking of users) by matching the advertising ID.
Beyond the Cookie
As users become more privacy aware and sue companies that bypass browser security mechanisms with zombie cookies, developers are looking for ways to accomplish the same user profiling in a way that minimizes liability. One of Google’s attempts at this approach is to have the browser track users’ behaviors and group each user into advertisement-targeting categories. Only the category is reported to websites, allowing for ads to be targeted at broad demographic groups. This approach, called Federated Learning of Cohorts (FLoC), was silently enabled on randomly selected Google Chrome installations (without user consent) in early 2021.10 Google slightly modified and re-branded this mechanism as the Topics API in 2022.11
While letting the browser determine the interest group of the user might sound like an improvement on paper, there are a number of issues with this approach. First, there is no mechanism for forcing websites to use browser-supplied categories instead of traditional tracking techniques. As a result, the category data sent to each website will simply add entropy that can improve the identification of unique users. Other issues include the control of categories by a single company and the potential for the browser to give away more sensitive information about the user (including protected attributes such as race, gender, or sexual orientation) than is widely achievable with the existing cookie-based model. For these reasons, the World Wide Web Consortium (W3C) rejected standardization of the Topics API in early 2023.12
Legal Regulation
Finally, one might ask whether or not governments have done anything to address the issues presented by storing tracking cookies on users’ computers. The European Union’s General Data Protection Regulation (GDPR), which we will study later in this course, has been interpreted by a European court to require that users are given notice prior to setting a cookie in their browsers.13 As a result, annoying cookie messages now appear on websites everywhere, which probably does little to improve user privacy, since most users will just click “accept” to get rid of the message.
It appears unlikely that the United States will do much about online tracking at the federal level, considering that several of the major tracking companies – the so-called FANG group – are among the wealthiest corporations in the country. These companies actively engage in lobbying efforts to prevent the adoption of new privacy regulations and laws.14
References and Further Reading
-
Lou Montulli. “Persistent Client State in a Hypertext Transfer Protocol Based Client-Server Mechanism.” United States Patent 5,774,670. Available on Google Patents ↩
-
Ibid Fig. 5 ↩
-
D. Kristol and L. Montulli. “HTTP State Management Mechanism.” IETF RFC 2109. February 1997. ↩
-
D. Kristol and L. Montulli. “HTTP State Management Mechanism.” IETF RFC 2965. October 2000. ↩
-
A. Barth. “HTTP State Management Mechanism.” IETF RFC 6265. April 2011. ↩
-
Julia Angwin and Mike Tigas. Zoombie Cookie: The Tracking Cookie That You Can’t Kill. January 14, 2015. ↩
-
OUT-LAW. Web users sue companies claiming use of Flash cookies is a hack. August 19, 2010. ↩
-
R. Fielding (Ed.) and J. Reschke (Ed.). “Hypertext Transfer Protocol (HTTP/1.1): Conditional Requests.” IETF RFC 7232. June 2014. ↩
-
Bennett Cyphers. Google Is Testing Its Controversial New Ad Targeting Tech in Millions of Browsers. Here’s What We Know.. Electronic Frontier Foundation, March 30, 2011. ↩
-
Frederic Lardinois. Google kills off FLoC, replaces it with Topics. TechCrunch. January 25, 2022. ↩
-
Thomas Claburn. Shot down: Google’s grand fancy plan for pro-privacy targeted ads. The Register. January 18, 2023. ↩
-
Opinion of Advocate General Bobek delivered on 19 December 2018. Fashion ID GmbH & Co.KG v Verbraucherzentrale NRW eV. Available from EUR-Lex ↩
-
Carole Cadwalladr and Duncan Campbell. Revealed: Facebook’s global lobbying against data privacy laws. The Guardian, March 2, 2019. ↩