Domain Name System (DNS)

Every time you visit a website or send an email, your computer translates a human-readable name like "www.example.com" into an IP addresses like "192.0.2.1". Domain Name System (DNS) is what makes this possible. DNS is a distributed database, and is one of the internet's most critical infrastructure components. Among other things, it enables service mobility: a website can change servers and IP addresses without users noticing, because the DNS mapping can be updated.

How DNS Works

DNS is a hierarchical, distributed database that handles over a trillion queries per day. Instead of one central server knowing all domain names, which couldn't possibly scale, the work is distributed across millions of servers organized in a tree structure:

13 root servers (though there are actually many more physical servers).
Top-level domain (TLD) servers for .com, .org, .net, etc.
Authoritative name servers specific to each domain like example.com.
Recursive resolvers, such as your ISP's DNS server or public DNS like 8.8.8.8.

When you look up www.example.com, your computer asks a recursive resolver, which may:

ask a root server "where's .com?"
ask the .com server "where's example.com?"
ask the example.com server "what's www.example.com?"
return the answer to you

This process is called recursive resolution. It scales because no single server needs to know everything. Instead, each server knows its piece of the hierarchy and where to find the next level.

DNS would be unusably slow if every lookup required multiple round-trips through the hierarchy. The solution is caching: resolvers remember answers for a period referred to as Time to Live (TTL). Popular domains like wikipedia.org are cached at every level so that most queries are answered from cache in milliseconds. This ensures that root servers handle surprisingly few queries despite being at the top of the hierarchy. It is also why changing a DNS record doesn't take effect immediately: old values persist in caches until their TTL expires.

Implementation

DNS stores different types of records. Our implementation will focus on A and CNAME records, which handle most web traffic. As in previous chapters, we start by enumerating the different types of records:

class RecordType(Enum):
    """DNS record types."""

    A = "A"  # IPv4 address
    AAAA = "AAAA"  # IPv6 address
    CNAME = "CNAME"  # Canonical name (alias)
    NS = "NS"  # Name server
    MX = "MX"  # Mail exchange

and then define a DNS resource record:

@dataclass
class DNSRecord:
    """A DNS resource record."""

    name: str  # Domain name
    value: str  # IP address, domain name, etc.
    record_type: RecordType
    ttl: int = 3600  # Time to live in seconds

We also need to define the structure of queries and responses:

@dataclass
class DNSQuery:
    """A DNS query message."""

    query_id: int
    domain: str
    record_type: RecordType

@dataclass
class DNSResponse:
    """A DNS response message."""

    query_id: int
    domain: str
    record_type: RecordType
    records: list[DNSRecord]
    authoritative: bool = False
    from_cache: bool = False

The next step is to create the authoritative DNS server. The constructor registers the zone this server is authoritative for and creates its record store. add_record lets callers populate the zone before the simulation starts:

class AuthoritativeDNSServer(Process):
    """An authoritative DNS server for a specific zone."""

    def init(self, name: str, zone: str, request_queue: Queue):
        self.name = name
        self.zone = zone  # e.g., "example.com"
        self.request_queue = request_queue
        self.records: dict[tuple[str, RecordType], list[DNSRecord]] = {}
        self.queries_served = 0

    def add_record(self, record: DNSRecord):
        """Add a DNS record to this server's zone."""
        key = (record.name, record.record_type)
        if key not in self.records:
            self.records[key] = []
        self.records[key].append(record)

The run loop receives queries and builds responses. If the query domain ends in this server's zone, the server looks up the record directly; it also follows CNAME chains to resolve aliases to their target A records:

    async def run(self):
        """Process DNS queries."""
        while True:
            # Wait for a query
            client_queue, query = await self.request_queue.get()

            # Check if this query is for our zone
            if not query.domain.endswith(self.zone):
                # Not authoritative for this domain
                response = DNSResponse(
                    query_id=query.query_id,
                    domain=query.domain,
                    record_type=query.record_type,
                    records=[],
                    authoritative=False,
                )
            else:
                # Look up the record
                key = (query.domain, query.record_type)
                records = self.records.get(key, [])

                # Handle CNAME (alias) records
                if not records and query.record_type == RecordType.A:
                    cname_key = (query.domain, RecordType.CNAME)
                    cname_records = self.records.get(cname_key, [])
                    if cname_records:
                        # Follow the CNAME
                        canonical_name = cname_records[0].value
                        a_key = (canonical_name, RecordType.A)
                        records = self.records.get(a_key, [])
                        records = cname_records + records

                response = DNSResponse(
                    query_id=query.query_id,
                    domain=query.domain,
                    record_type=query.record_type,
                    records=records,
                    authoritative=True,
                )

            print(
                f"[{self.now:.3f}] {self.name}: Query for {query.domain} "
                f"({query.record_type.value}) -> {len(response.records)} record(s)"
            )

            # Simulate processing delay
            await self.timeout(0.01)

            # Send response
            await client_queue.put(response)
            self.queries_served += 1

The recursive resolver is more complex because it must cache and coordinate with authoritative servers. A CacheEntry wraps a record with its expiry time:

class CacheEntry:
    """A cached DNS record with expiration time."""

    def __init__(self, record: DNSRecord, expire_time: float):
        self.record = record
        self.expire_time = expire_time

    def is_expired(self, current_time: float) -> bool:
        """Check if this cache entry has expired."""
        return current_time >= self.expire_time

The resolver's constructor sets up the cache and tracks hit and miss statistics:

class RecursiveDNSResolver(Process):
    """A recursive DNS resolver with caching."""

    def init(
        self,
        name: str,
        request_queue: Queue,
        root_servers: dict[str, Queue],
    ):
        self.name = name
        self.request_queue = request_queue
        self.root_servers = root_servers  # zone -> server queue
        self.cache: dict[tuple[str, RecordType], list[CacheEntry]] = {}
        self.response_queue = Queue(self._env)
        self.queries_received = 0
        self.cache_hits = 0
        self.cache_misses = 0

The run loop checks the cache before forwarding to an authoritative server. On a cache miss it calls _resolve_recursive, then stores the results:

    async def run(self):
        """Process client DNS queries."""
        while True:
            # Wait for a query from a client
            client_queue, query = await self.request_queue.get()
            self.queries_received += 1

            print(
                f"[{self.now:.3f}] {self.name}: Received query for "
                f"{query.domain} ({query.record_type.value})"
            )

            # Check cache first
            cached_records = self._check_cache(query.domain, query.record_type)

            if cached_records:
                self.cache_hits += 1
                response = DNSResponse(
                    query_id=query.query_id,
                    domain=query.domain,
                    record_type=query.record_type,
                    records=cached_records,
                    authoritative=False,
                    from_cache=True,
                )
                print(f"[{self.now:.3f}] {self.name}: Cache HIT for {query.domain}")
            else:
                self.cache_misses += 1
                print(f"[{self.now:.3f}] {self.name}: Cache MISS for {query.domain}")
                # Recursively resolve
                response = await self._resolve_recursive(query)

                # Cache the results
                if response.records:
                    self._cache_records(response.records)

            # Send response to client
            await client_queue.put(response)

_check_cache filters out expired entries and returns None on a complete miss. _cache_records stores records with expiry times computed from each record's TTL:

    def _check_cache(
        self, domain: str, record_type: RecordType
    ) -> Optional[list[DNSRecord]]:
        """Check if we have a cached record."""
        key = (domain, record_type)
        if key not in self.cache:
            return None

        # Filter out expired entries
        valid_entries = [
            entry for entry in self.cache[key] if not entry.is_expired(self.now)
        ]

        if not valid_entries:
            del self.cache[key]
            return None

        self.cache[key] = valid_entries
        return [entry.record for entry in valid_entries]

    def _cache_records(self, records: list[DNSRecord]):
        """Add records to the cache."""
        for record in records:
            key = (record.name, record.record_type)
            if key not in self.cache:
                self.cache[key] = []

            expire_time = self.now + record.ttl
            self.cache[key].append(CacheEntry(record, expire_time))

_resolve_recursive finds the authoritative server whose zone suffix matches the query domain most specifically, then forwards the query and returns the response. Each cached entry has an expiration time based on the record's TTL.

    async def _resolve_recursive(self, query: DNSQuery) -> DNSResponse:
        """Recursively resolve a query by querying authoritative servers."""
        # Find the appropriate authoritative server
        # In real DNS, this involves walking the DNS hierarchy
        # For simplicity, we'll match the longest zone suffix

        matching_zone = None
        for zone in self.root_servers.keys():
            if query.domain.endswith(zone):
                if matching_zone is None or len(zone) > len(matching_zone):
                    matching_zone = zone

        if matching_zone is None:
            # No authoritative server found
            return DNSResponse(
                query_id=query.query_id,
                domain=query.domain,
                record_type=query.record_type,
                records=[],
                authoritative=False,
            )

        # Query the authoritative server
        server_queue = self.root_servers[matching_zone]
        await server_queue.put((self.response_queue, query))

        # Wait for response
        response = await self.response_queue.get()
        return response

Our simplified resolver directly contacts authoritative servers. Real DNS resolvers walk the hierarchy starting from root servers, but the principle is the same: cache aggressively, query only when necessary.

Clients

The DNS client wraps a response queue and a query counter:

class DNSClient(Process):
    """A DNS client that performs lookups."""

    def init(self, name: str, resolver_queue: Queue):
        self.name = name
        self.resolver_queue = resolver_queue
        self.response_queue = Queue(self._env)
        self.query_counter = 0
        self.queries_sent = 0

    async def run(self):
        """Override in subclass to perform lookups."""
        pass

lookup constructs a query, sends it to the resolver, and prints the response:

    async def lookup(self, domain: str, record_type: RecordType = RecordType.A):
        """Perform a DNS lookup."""
        self.query_counter += 1
        query = DNSQuery(
            query_id=self.query_counter, domain=domain, record_type=record_type
        )

        print(
            f"[{self.now:.3f}] {self.name}: Looking up {domain} ({record_type.value})"
        )

        # Send query to resolver
        await self.resolver_queue.put((self.response_queue, query))
        self.queries_sent += 1

        # Wait for response
        response = await self.response_queue.get()

        # Print results
        if response.records:
            cache_status = " (cached)" if response.from_cache else ""
            print(f"[{self.now:.3f}] {self.name}: Resolved {domain}{cache_status}:")
            for record in response.records:
                print(f"  {record.name} -> {record.value} (TTL: {record.ttl}s)")
        else:
            print(f"[{self.now:.3f}] {self.name}: No records found for {domain}")

        return response

Clients send queries to recursive resolvers and wait for responses. In real systems, clients also cache locally and can query multiple resolvers for redundancy.

Running a Simulation

Let's see DNS resolution in action by building a client:

class TestClient(DNSClient):
    """Test client that performs a series of lookups."""

    async def run(self):
        """Perform test lookups."""
        # Lookup example.com
        await self.lookup("www.example.com")
        await self.timeout(1.0)

        # Lookup again - should be cached
        await self.lookup("www.example.com")
        await self.timeout(1.0)

        # Lookup different domain
        await self.lookup("mail.example.com")
        await self.timeout(1.0)

        # Lookup from different zone
        await self.lookup("www.another.org")

We can now create the overall simulation:

def main():
    """Simulate DNS resolution with caching."""
    env = Environment()

    # Create authoritative servers for different zones
    example_queue = Queue(env)
    example_server = AuthoritativeDNSServer(
        env, "ns1.example.com", "example.com", example_queue
    )

    # Add records to example.com zone
    example_server.add_record(
        DNSRecord("www.example.com", RecordType.A, "192.0.2.1", ttl=300)
    )
    example_server.add_record(
        DNSRecord("mail.example.com", RecordType.A, "192.0.2.2", ttl=300)
    )
    example_server.add_record(
        DNSRecord("ftp.example.com", RecordType.CNAME, "www.example.com", ttl=300)
    )

    # Create authoritative server for another.org
    another_queue = Queue(env)
    another_server = AuthoritativeDNSServer(
        env, "ns1.another.org", "another.org", another_queue
    )
    another_server.add_record(
        DNSRecord("www.another.org", RecordType.A, "198.51.100.1", ttl=300)
    )

    # Create recursive resolver
    resolver_queue = Queue(env)
    resolver = RecursiveDNSResolver(
        env,
        "resolver.isp.net",
        resolver_queue,
        root_servers={"example.com": example_queue, "another.org": another_queue},
    )

    # Create test clients
    TestClient(env, "client1.local", resolver_queue)

    # Run simulation
    env.run(until=10)

    # Print statistics
    print("\n=== DNS Resolution Statistics ===")
    print(f"Example.com server: {example_server.queries_served} queries")
    print(f"Another.org server: {another_server.queries_served} queries")
    print("\nResolver:")
    print(f"  Total queries: {resolver.queries_received}")
    print(f"  Cache hits: {resolver.cache_hits}")
    print(f"  Cache misses: {resolver.cache_misses}")
    if resolver.queries_received > 0:
        hit_rate = (resolver.cache_hits / resolver.queries_received) * 100
        print(f"  Cache hit rate: {hit_rate:.1f}%")

The output shows that the first lookup of www.example.com is a cache miss, necessitating a query to the authoritative server, but the second lookup is a cache hit. Different domains are cache misses until they're cached, and the cache hit rate improves as the simulation runs. In other words, the first user to look up a domain pays the full resolution cost, but subsequent users (or repeated lookups) get instant responses.

[0.000] client1.local: Looking up www.example.com (A)
[0.000] resolver.isp.net: Received query for www.example.com (A)
[0.000] resolver.isp.net: Cache MISS for www.example.com
[0.000] ns1.example.com: Query for www.example.com (A) -> 0 record(s)
[0.010] client1.local: No records found for www.example.com
[1.010] client1.local: Looking up www.example.com (A)
[1.010] resolver.isp.net: Received query for www.example.com (A)
[1.010] resolver.isp.net: Cache MISS for www.example.com
[1.010] ns1.example.com: Query for www.example.com (A) -> 0 record(s)
[1.020] client1.local: No records found for www.example.com
[2.020] client1.local: Looking up mail.example.com (A)
[2.020] resolver.isp.net: Received query for mail.example.com (A)
[2.020] resolver.isp.net: Cache MISS for mail.example.com
[2.020] ns1.example.com: Query for mail.example.com (A) -> 0 record(s)
[2.030] client1.local: No records found for mail.example.com
[3.030] client1.local: Looking up www.another.org (A)
[3.030] resolver.isp.net: Received query for www.another.org (A)
[3.030] resolver.isp.net: Cache MISS for www.another.org
[3.030] ns1.another.org: Query for www.another.org (A) -> 0 record(s)
[3.040] client1.local: No records found for www.another.org

=== DNS Resolution Statistics ===
Example.com server: 3 queries
Another.org server: 1 queries

Resolver:
  Total queries: 4
  Cache hits: 0
  Cache misses: 4
  Cache hit rate: 0.0%

ex_hierarchy.py is a more comples simulation that demonstrate several key properties of DNS, including the use of ultiple resolvers, load distribution, and the effectiveness of multiple caches. Its output shows how the cache hit rate increases over time as more domains get cached.

Real-World Considerations

Our implementation simplifies several DNS complexities:

Hierarchy walking: Real resolvers start at root servers and work down the hierarchy. We skip directly to authoritative servers for clarity.
Multiple answers: DNS records can have multiple values (like multiple A records for load balancing). Real clients choose among them.
Negative caching: DNS caches "this domain doesn't exist" responses to avoid repeatedly querying for non-existent domains.
Anycast: Root servers use anycast (many servers sharing one IP address) to distribute load and improve availability.
TTL selection: Low TTLs mean changes propagate quickly but increase query load. High TTLs reduce load but slow down changes.

The most important omission in our implementation, however, is security. DNS was designed without it; DNSSEC adds cryptographic signatures to prevent spoofing and cache poisoning.