Hypertext Transfer Protocol
One common protocol used on the Internet – and in fact the same protocol that your browser used to retrieve this sentence – is the Hypertext Transfer Protocol (HTTP). HTTP is an example of an Application Layer (layer 7) protocol in the OSI model.
Video Lecture
HTTP
As shown in Figure 1, the layer 7 protocol that enables websites to function is the Hypertext Transfer Protocol (HTTP). Several different versions of the HTTP protocol are in use, including the original 1.0 version,1 version 1.1,2 version 2,3 and version 3.4 HTTP protocol versions 1.0 through 2 utilize the Transmission Control Protocol (TCP)5 at layer 4. HTTP protocol version 3 instead uses the QUIC protocol6 at layer 4.
As we’ve already noted, the Hypertext Transfer Protocol is the Application Layer protocol that is spoken by both the browser and the Web server. It is possible for the browser and server to encrypt this protocol using Transport Layer Security (TLS), which is the modern successor to Secure Sockets Layer (SSL). When HTTP is using TLS, the term Secure HTTP (HTTPS) is often used. However, it is important to note that the fundamental HTTP protocol is the same in either case. The protocol is simply first encrypted at one end and decrypted at the other when HTTPS is used.
HTTP is an example of a protocol designed for client-server communications. As illustrated in Figure 2, both the client and server have their own HTTP implementations. Since HTTP is a layer 7 protocol, it is the responsibility of both the client and server applications to implement the code (often via libraries) to “speak” HTTP.
Requests
An HTTP transaction begins with a request from the client to the server. The server receives the request, processes it, and transmits a response back to the client. For this reason, HTTP is an example of a request-response protocol. As is the case with other request-response protocols, it is the client’s responsibility to initiate each request-response cycle. Numerous HTTP clients are available for this purpose, including Web browsers, command-line transfer clients like curl7 and Wget,8 and software libraries that can be used in other programs. An example of a software library HTTP client can be found in Python Requests.9
Each HTTP request specifies a method that will be used. There are a number of methods defined in the HTTP protocol, but the primary ones used by browsers are GET and POST. When connecting to a website to request a page, the GET method is used. The POST method is normally used when submitting a form that contains data, but there is quite a bit of flexibility in website design. POST requests can be used for nonstandard purposes in some cases, and forms can use GET requests for submission.
The HTTP protocol is implemented using plain text and is generally readable by humans. For example, the following request can be used to get the main page of my website:
GET /mmurphy2/index.html HTTP/1.1
Host: ww2.coastal.edu
The first line of this request specifies that the GET method is to be used. The path upon which the GET method should operate is the path to my home page, which is /mmurphy2/index.html on this server. This request is made using version 1.1 of HTTP, connected to TCP port 80 (unencrypted HTTP).
The second line of this request is a header, which provides additional information to the server about the request that is being made (and is therefore more precisely called a request header). In this case, version 1.1 of the HTTP protocol requires a header named Host with a value set to the hostname of the website to which the request is made. Multiple websites can be hosted from the same server, so it is necessary to identify the website that is being requested.
Requests can include multiple headers, each of which is in a name: value format. The headers are followed by a blank line that indicates the end of the headers. It is possible for a request to contain a request body, which is data to be sent to the server, although none is shown in the above example.
Responses
After receiving a request from a client, the server processes it and returns a response. The response to the request above looks like this:
HTTP/1.1 302 Found
Date: Fri, 17 Mar 2023 18:10:21 GMT
Location: https://ww2.coastal.edu/mmurphy2/index.html
Content-Length: 308
Content-Type: text/html; charset=iso-8859-1
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>302 Found</title>
</head><body>
<h1>Found</h1>
<p>The document has moved <a href="https://ww2.coastal.edu/mmurphy2/index.html">here</a>.</p>
<hr>
<address>Apache/2.4.41 (Ubuntu) Server at ww2.coastal.edu Port 80</address>
</body></html>
The first line of the response contains the HTTP version spoken by the server (also version 1.1 in this case). The 302 is an example of a response code, which quickly tells the browser about the disposition of the request. A textual description follows the response code, which is mostly useful to humans debugging the protocol by hand. Common response codes include 200, 301, 302, 401, 403, 404, and 500. Code 200 (OK) is used when a resource has been located directly and is being returned to the client. Codes in the 300s are used for redirects. 401 is used when authorization needs to be requested from the browser, such as a password. Codes 403 and 404 are used when the browser doesn’t have authorization to access a particular page (403) or a page cannot be found at the path requested (404). If the server encounters some kind of programming bug when handling the response, an error code in the 500s is used.
After the response code line, several response headers are sent back from the server. As shown in the above example, these include the date, location that was requested, length of the response, and the content type for the response. The length of the response is given in bytes, while the content type contains two parts: the media type and the character encoding.
The server response uses a blank line to indicate the end of the response headers and the start of the response body. What is contained within the response body depends on the media type. In the above example, an HTML redirect page is returned to the browser. However, different kinds of files can be returned (for example, images, PDF documents, etc.) depending upon the request that the browser made.
Media Types
Media types (also known as MIME types) tell the client what kind of data to expect.10 Standard media types are assigned by the Internet Assigned Names Authority (IANA).11 Modern browsers are capable of handling and displaying many different types of media, which is part of the reason that they can be a privacy minefield. Common media types that browsers can handle include:
- Hypertext Markup Language (HTML) documents12
- Cascading Style Sheets (CSS)13
- JavaScript programs14
- WebAssembly programs15
- WebGL instructions16
- Various kinds of images, sound files, videos, PDF documents, and other embedded content
Web pages are largely composed of HTML documents with CSS used for styling purposes. The HTML document contains the structure and textual content of the page, while CSS tells the browser how to format HTML structures by applying colors, decorations, spacing, fonts, and other transformations. HTML documents might also embed images or programs for the browser to run, using languages like JavaScript, WebAssembly, and WebGL.
References and Further Reading
-
Henrik Nielsen, Roy T. Fielding, and Tim Berners-Lee. “Hypertext Transfer Protocol – HTTP/1.0.” RFC 1945. May 1996. ↩
-
Roy T. Fielding, Mark Nottingham, and Julian Reschke. “HTTP/1.1.” RFC 9112. June 2022. ↩
-
Martin Thomson and Cory Benfield. “HTTP/2.” RFC 9113. June 2022. ↩
-
Wesley Eddy. “Transmission Control Protocol (TCP).” RFC 9293. August 2022. ↩
-
Jana Iyengar and Martin Thomson. “QUIC: A UDP-Based Multiplexed and Secure Transport.” RFC 9000. May 2021. ↩
-
Ned Freed and Nathaniel S. Borenstein. “Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types.” RFC 2046. November 1996. ↩
-
Internet Assigned Numbers Authority. Media Types. ↩
-
WHATWG. HTML Living Standard. ↩
-
World Wide Web Consortium. CSS Snapshot 2023. February 14, 2023. ↩
-
Ecma International. ECMA-262: ECMAScript 2022 language specification. June 2022. ↩
-
WebAssembly Community Group. WebAssembly Specification. ↩
-
Khronos Group. WebGL Overview. ↩