Tor
Various methods exist for tracking a user’s Internet activities. Websites, service providers, law enforcement, and intelligence agencies can determine what a user is doing online if they monitor network traffic and create activity logs. One way for users to avoid such surveillance is by using anonymity software to hide their locations and activities. Tor is one such piece of software.
Video Lecture
Anonymity on the Internet
Online activities generate logs that can be used to determine what sites a person visits. On each site, server logs record what actions a person takes. By using server logs alone, it is possible to associate an Internet Protocol (IP) address with a particular login and/or a set of actions taken on a website. By associating the IP address with an individual subscriber, logs maintained by the Internet Service Provider (ISP) can be used to link online activities to a specific location. A search of the computer(s) at that location can then link the activities to a unique individual. If encryption (HTTPS) is not used, the individual’s activities also can be viewed at the time of initial intercept, as shown in Figure 1.
It is difficult to stay anonymous online, as most sites implement tracking technologies that are specifically designed to link activities to individuals. If a website knows the identity of a person accessing a service, it becomes easier to target that individual with advertising that is customized for the person’s individual interests. Advertising technology companies work with various commercial websites to try to collect as much data as possible about individuals, including sensitive personal information.1
Some sites actively encourage users to share information, which is then assimilated into the advertising databases. For example, social media websites are engineered to extract information from their users, most of whom voluntarily supply copious details about themselves. When these data are coupled with logging and tracking data, it becomes easy to relate a person’s online identity to that individual’s real-life identity. Sites may attempt to reassure users with privacy policies and encrypted connections, but the net effect on monitoring a user’s browsing habits is that only the contents of the connections are protected. As shown in Figure 2, both the user’s service provider and third parties can still determine where the user is going on the Internet, even if the exact details of the connection are hidden.
One might wonder why a person would want anonymity online in an age where people are used to sharing lots of data. While it is true that criminals have an obvious need for anonymity, there are some legitimate reasons for a law-abiding citizen to want to stay anonymous. For example, if a person simply wishes to avoid being tracked across the Internet by large corporations, an anonymity system can increase personal privacy. It is one approach to escaping the “filter bubble” of search results that major search engines create as they adapt to the user’s search history. Such anonymity could be beneficial for a person conducting research into a sensitive subject, such as a major illness, that the individual might not want incorporated into their regular search history.
Outside the United States, anonymity online could easily become a life or death situation for a person. In some countries, anonymity networks provide the only quasi-safe means of transferring information out of the country. Repressive, totalitarian regimes might otherwise successfully suppress reports of human rights violations. Furthermore, our own intelligence services may operate in such countries, or in countries in which the consequences of discovery could be dire. An anonymity network may be used to conceal such operations, using traffic from regular members of the public as a cover to prevent discovery based upon use of the anonymity network alone.
Finally, anonymity is important for law enforcement investigators working Internet-based cases. As previously noted, Web servers are able to log and record the IP addresses of any visitors. If the server is being operated by a criminal organization, the criminals will have access to the logs. By tracing the IP addresses of visitors using the Internet standard WHOIS protocol,2 the criminal organization could trace a visitor back to the law enforcement agency. The criminals would then know that law enforcement was monitoring their activity, and they could proceed to destroy evidence or relocate their operations. Once again, by hiding within an anonymity network that is open to anyone on Earth, law enforcement personnel are able to mask their identities and avoid tipping off the suspects that there is an ongoing investigation.
Onion Routing
One way to anonymize a person’s use of the Internet is to route their traffic through a series of intermediate systems between the user and the Web server to which the user is connecting. This technique, called onion routing, was developed at the United States Naval Research Laboratory in the mid-1990s.3 Researchers published the source code for an onion routing system under an open-source license, eventually creating The Tor Project.4
Tor (originally “The Onion Router”) uses layered encryption to send data through multiple intermediate systems, where each system only knows the addresses of the immediate previous and next systems in the chain.5 Whenever a user wishes to make a connection to a remote server, the user first connects to the Tor network, which routes the connection through a series of 3 intermediate systems before reaching the destination. As shown in Figure 3, the process begins with the user (Alice, in this example) starting Tor, which retrieves a list of Tor nodes from a directory server.
The Tor client randomly chooses an initial Tor node to which to connect. Once the Tor client has joined the Tor network, it builds a circuit through the network by routing the client’s traffic through a set of intermediate systems to the destination. Each connection within the Tor network is encrypted, and layers of encryption are used so that only the final intermediate system – called the exit node – has access to the unencrypted contents of the connection (Figure 4).
When the connection data (traffic) leaves Alice’s computer, it is protected with 3 layers of encryption. The first system to which Alice connects only has the necessary key to decrypt the outer layer. Once decrypted, the contents of the outer layer will include the address of the second intermediate system and a separately encrypted payload that will be sent to the second intermediate system. The system to which Alice connects directly has no way to decrypt this inner payload, since it lacks the proper key to be able to do so.
The second intermediate system in the chain (shown in Figure 4 as the middle computer with the green + sign on the screen) receives the connection from the first intermediate system. It does not know that the connection originally came from Alice. Instead, as far as this second node is concerned, the connection is coming from the first node. It contains the encrypted payload that was extracted and forwarded by the first machine. Unlike the first machine, this Tor node has the proper key to decrypt this payload, which it does. Inside the payload is the address of the next machine in the Tor circuit, along with yet another encrypted payload, which this second node does not have the proper key to decrypt. The second intermediate system forwards this innermost encrypted payload to the third machine in the circuit.
Upon reaching the third machine, the connection appears to be coming from the second intermediate machine. The third machine has the proper key to decrypt the payload sent from the second machine, which it does. There are no further layers of encryption: the third machine sends the unencrypted contents of the original traffic from Alice to the final destination (Bob, in the diagram). An eavesdropper could read this unencrypted data, but the origin of the traffic would appear to be the final node in the Tor circuit, not Alice.
This third machine in the circuit is called the exit node and routes traffic from the Tor network onto the regular Internet. Importantly, the third machine does not know the original origin of the traffic, as it only knows the request came from the second machine. Upon receiving a reply from the regular Internet server to which the exit node has connected, the reply is wrapped in 3 layers of encryption and sent back to the second machine in the circuit. The second machine unwraps the outer encryption layer and sends the reply to the first machine, which unwraps the middle encryption layer and returns the reply – still wrapped in the inner encryption layer – back to Alice’s machine. The Tor client on Alice’s machine is the only device that can decrypt this inner layer of encryption.
At each step in a Tor circuit, one layer of encryption is removed. If an observer were to watch the whole process from outside the system, it would appear much like peeling the layers from an onion. An onion with its outermost layer removed still looks like an onion. Unless the observer had the ability to monitor each and every intermediate node in the circuit at the same time (acting as a so-called global adversary), it would not be easy to find the origin of the original connection (Alice).
Given enough information about the Tor network, an observer without a simultaneous global view might be able to trace a connection back to its origin by using statistical methods to measure how many network packets were sent and received at multiple points within the network. To reduce the amount of data available for such analysis, Tor uses a circuit hopping system that creates new circuits periodically (Figure 5). This approach reduces the chance of de-anonymizing the user.
Effects of Tor
Without end-to-end encryption, such as HTTPS, it is still possible to intercept data as it leaves the Tor network (Figure 6). If the user were to transmit personally identifying information over Tor to an unencrypted destination (such as a regular HTTP website), the exit node or a third party monitoring the exit node’s traffic could identify the user.
For this reason, it is still necessary to use HTTPS connections to websites when using Tor, as the extra encryption between the browser and the final site helps protect the user from a malicious exit node or a third-party observer at the exit node (Figure 7).7 While the use of HTTPS makes it much more difficult to identify a Tor user, there are still potential ways to de-anonymize the user. For example, if the user logs into the website using credentials tied back to their identity, a warrant or subpoena for information from the site will identify the user. Even if the user does not log into the site using traceable credentials, they might still fail to use a separate contextual identity for separate activities using Tor. For example, a user might log into a system that reveals his or her identity while using the same Tor circuit to commit some illegal act. With sufficient surveillance, it might be possible to link the user to the illegal act even if the entire connection from the user to the final server cannot be traced.
Attacks on Tor
Government agencies and researchers are always trying to break Tor, looking for weaknesses in the system. Various agencies, such as the FBI and NSA, have an interest in breaking Tor in order to identify criminals who are using the system. Since a significant component of funding for Tor itself comes from the United States government, some agencies of the government are effectively trying to undo the work of other agencies, some of which have a vested interest in the security of Tor to keep agents alive in the field and collect valuable intelligence.9 This double position on Tor has become yet another example of our tax dollars in action.
One way in which the FBI and NSA have been successful in identifying Tor users is by exploiting the user’s Web browser. These types of attacks – dubbed Network Investigative Techniques (NITs) by the FBI10 and Computer Network Exploitation (CNE) by the NSA11 – do not compromise the Tor network directly. Instead, they target the endpoints of the connections, which are often the weakest links in the system. Attacks on the Tor network itself are difficult but not necessarily impossible. End-to-end traffic confirmation attacks, for example, might be feasible for an entity with sufficiently large surveillance of the global Internet (in other words, the NSA).
Exploits that target the Tor Browser, a special version of Firefox, have yielded successful results that have been covered in the press. For example, the EgotisticalGiraffe exploit from the NSA was used to identify an al-Qaeda member in 2013.12 Similar methods are believed to have been used in the FBI’s Operation Onymous in 2014, which led to some media outlets incorrectly claiming that the entirety of Tor itself was compromised.13 Since there are always some legal and ethical concerns with deploying NITs and attacking potential suspects with malware, the FBI is not always successful with this approach. For example, in at least one known instance, they dismissed a charge rather than disclose the exact nature and source code of the NIT used to identify a user.14
Users can reduce the effectiveness of NITs by utilizing custom environments designed to limit the damage such malware can inflict. As of 2023, the Tails live distribution15 is probably the best-known of these environments. Instead of installing Tor on top of a regular operating system, the user boots the whole computer into the Tails environment, which runs off a USB flash drive or DVD media. Tails restricts all outgoing traffic to force it through the Tor connection, attempting to limit any side-channel communications that bypass Tor and identify the user. Even if an exploit inside Tails is successful, the environment randomizes the user’s network card physical address (Media Access Control (MAC) address) and does not store any data on the user’s hard drive unless explicitly requested to do so. Once the Tails environment is shut down, any forensically recoverable local evidence is quickly lost from system RAM. Consequently, it would be difficult to prove in court that a user was responsible for some act, unless a search warrant could be executed on the user’s physical system while Tails is active.
Notes and References
-
Gilad Edelman. “Why Don’t We Just Ban Targeted Advertising?” Wired. March 22, 2020. Article ↩
-
L. Daigle. WHOIS Protocol Specification. RFC 3912. Network Working Group. ↩
-
David M. Goldschlag, Michael G. Reed, and Paul F. Syverson. “Hiding Routing Information.” In: Information Hiding, 1996. Lecture Notes in Computer Science 1174. Springer-Verlag Berlin Heidelberg. Publisher Site ↩
-
The Tor Project. Tor: Overview ↩
-
Image Credit: Electronic Frontier Foundation. License: CC-BY. ↩↩↩↩
-
Electronic Frontier Foundation. How HTTPS and Tor Work Together to Protect Your Anonymity and Privacy ↩
-
Image Credit: Electronic Frontier Foundation (via The Tor Project). License: CC-BY. ↩↩↩
-
Patrick Howell O’Neill. “Former Tor developer created malware for the FBI to hack Tor users.” The Daily Dot. February 29, 2020. Article ↩
-
Iain Thomson. “Tor pedo’s torpeo torpedoed: FBI spyware crossed the line but was in good faith, say judges.” The Register. February 24, 2018. Article ↩
-
Bruce Schneier. “Attacking Tor: how the NSA targets user’s online anonymity.” The Guardian. October 4, 2013. Article ↩
-
Dan Goodin. “NSA repeatedly tries to unpeel Tor anonymity and spy on users, memos show.” Ars Technica. October 4, 2013. Article ↩
-
Kashmir Hill. “How Did The FBI Break Tor?” Forbes. November 7, 2014. Article ↩
-
Iain Thomson. “FBI let alleged pedo walk free rather than explain how they snared him.” The Register. January 6, 2017. Article ↩