Skip to main content

Command Palette

Search for a command to run...

What happens when you type www.google.com? (considering nothing is cached)

Updated
6 min read
What happens when you type www.google.com? (considering nothing is cached)
R

I am a Site Reliability Engineer with nearly 5 years of experience. I talk about Linux, Automation, Networking, and anything else related to tech and CS.

Ever wondered how the browser fetches a website from the internet? With over 40 years of development, there are so many things happening in the backdrop that it might totally sound overwhelming. But here's a step-by-step description of the phases that your request goes through.

  • HSTS Lookup

    • The HSTS list consists of sites that require HTTPS only to access them. Google maintains such a list remotely, and a copy of the list remains on the browser that is updated frequently.

      * When the request is made from the browser, it first checks if the host is present in the HSTS list. If the host is found, then the request is made using port 443; otherwise, the request is made using port 80.

      * HSTS can also be configured as a header on the server or the application. The use of HSTS is to protect against SSL Stripping Attacks, which is a form of a man-in-the-middle attack.

  • DNS Lookup

    • In the DNS lookup, the browser first checks the local cache. If it is not found in the local cache, it checks for any entries in the local hosts file of the operating system.
  • ARP Lookup

    • If the entry for the host is not found in the local hosts file, the browser needs to check outside and get the IP of the server hosting the website. To do this the computer sends an ARP request.

      * First, the ARP cache is checked for the IP address of the router and its MAC address, if it is not there, then an ARP request is broadcast to the network.

      * First, the computer sends an ARP request to the router to get its MAC address. The router responds with the MAC address. This MAC address, along with the IPv4 of the router, is then used to create an ethernet frame to send the IP packet to the router.

      * The router then uses its routing table to forward the packet to the gateway server. In this step, the browser just needs the IP of the gateway or the DNS server. Once it gets the IP of the DNS server, it opens a UDP connection on port 53 with a source port greater than 1023 (basically from the ephemeral port range) to get the IP of the target server.

  • DNS Resolution

    • First, the browser makes a query to the local resolver or the gateway to check for the host entry in their cache.

      * If the gateway server doesn’t have it, then it makes a DNS query to the root servers that have the entries of the TLDs. (There are 13 root servers).

      * These TLDs then further point down the tree to the authoritative or the recursive nameservers. Once the query reaches the authoritative nameservers, the SOA is reached, and the A record returns the IP address of the target server.

  • TCP Handshake

    • Once the browser receives the IP address of the target server, it takes the port number from the URL based on the HSTS result and starts the process of creating a TCP socket stream.

      * First, the request is sent to the Transport layer, where the TCP segment is crafted, and the destination port, source port, etc., is added to the header.

      * Then the segment is sent to the Network layer, which wraps an IP header. The IP address of the destination server as well as the sender.

      * Then the segment is sent to the Data Link layer, where the MAC addresses of the NIC and the gateway are added, and the packet is now ready to be sent.

      * Once the packet reaches the local subnet, from there, it travels through Autonomous Systems border routers using Border Gateway Protocol. In BGP, the routers figure out the smallest path to reach the destination server, and communication between the AS routers is based on trust.

      * So once the packet reaches the destination server, the TCP handshake starts. One important feature is the TTL field which is decreased by one every time a packet reaches a router. If the TTL field reaches 0, the packet is dropped.

      * The handshake happens as:

      * * The client chooses an Initial Sequence Number and sends SYN to the server to indicate its intent to connect.

      * The server then chooses its own ISN and indicates it wants to connect by sending an SYN-ACK packet. Meanwhile, it increases the client’s sequence number by 1.

      * The client acknowledges the packet, increases the ISN by 1, and sends the ACK packet.

  • TLS Connection

    • After a TCP connection is established between the client and the server, the client initiates the TLS connection by sending a ClientHello message to the server along with acceptable cipher suites and the TLS version. (this happens only if HTTPS is requested)

      * The server replies with a ServerHello message and also sends the cipher suites, TLS version, compression methods, Certificate, etc. The certificate contains the public key.

      * The client verifies the certificate against some trusted CAs. If trust is established, it generates some random bytes, encrypts them using the public key, and sends them to the server.

      * The server decrypts the random numbers using the private key and generates a symmertic key using the random bytes sent by the client.

      * The client and the server exchange the change_cipher_specs if there are any to agree on a cipher suite.

      * Finally they exchange the finished message and from here on forward, the data exchanged is encrypted using the symmetric key.

  • HTTP Protocol

    • Now to access the webpage, the browser starts HTTP to fetch the webpage.

      * The browser sends the HTTP request which contains the Port, URL, Headers, and HTTP version to the server.

      * The request reaches the external most point of the target network and is usually handled by a load balancer. The load balancer then forwards the request to the internal application server.

      * Once the request reaches the server, it breaks down the request components into

      * * Request Method

      * Request Path

      * Request Headers

      * IP, etc.

      * The server, which can usually be something like Nginx or Apache serving maybe a PHP or a Python application, talks to the application through FPM or WSGI. These components help retrieve the webpage and send the response to the server, which in turn sends the response back to the client.

      * In the response, the server includes Headers, HTTP Status Code, HTTP Version, etc.

  • Web Page Construction

    • The browser collects the CSS, JS, or HTML files and builds the DOM that eventually builds the webpage in the browser.

Do check out - https://github.com/alex/what-happens-when