Domain Name System (DNS)
Every time you visit a website or send an email, your computer translates a human-readable name like "www.example.com" into an IP addresses like "192.0.2.1". Domain Name System (DNS) is what makes this possible. DNS is a distributed database, and is one of the internet's most critical infrastructure components. Among other things, it enables service mobility: a website can change servers and IP addresses without users noticing, because the DNS mapping can be updated.
How DNS Works
DNS is a hierarchical, distributed database that handles over a trillion queries per day. Instead of one central server knowing all domain names, which couldn't possibly scale, the work is distributed across millions of servers organized in a tree structure:
- 13 root servers (though there are actually many more physical servers).
- Top-level domain (TLD) servers for
.com,.org,.net, etc. - Authoritative name servers specific to each domain like
example.com. - Recursive resolvers, such as your ISP's DNS server or public DNS like 8.8.8.8.
When you look up www.example.com, your computer asks a recursive resolver, which may:
- ask a root server "where's
.com?" - ask the
.comserver "where'sexample.com?" - ask the
example.comserver "what'swww.example.com?" - return the answer to you
This process is called recursive resolution. It scales because no single server needs to know everything. Instead, each server knows its piece of the hierarchy and where to find the next level.
DNS would be unusably slow if every lookup required multiple round-trips through the hierarchy. The solution is caching: resolvers remember answers for a period referred to as Time to Live (TTL). Popular domains like wikipedia.org are cached at every level so that most queries are answered from cache in milliseconds. This ensures that root servers handle surprisingly few queries despite being at the top of the hierarchy. It is also why changing a DNS record doesn't take effect immediately: old values persist in caches until their TTL expires.
Implementation
DNS stores different types of records. Our implementation will focus on A and CNAME records, which handle most web traffic. As in previous chapters, we start by enumerating the different types of records:
class RecordType(Enum):
"""DNS record types."""
A = "A" # IPv4 address
AAAA = "AAAA" # IPv6 address
CNAME = "CNAME" # Canonical name (alias)
NS = "NS" # Name server
MX = "MX" # Mail exchange
and then define a DNS resource record:
@dataclass
class DNSRecord:
"""A DNS resource record."""
name: str # Domain name
value: str # IP address, domain name, etc.
record_type: RecordType
ttl: int = 3600 # Time to live in seconds
We also need to define the structure of queries and responses:
@dataclass
class DNSQuery:
"""A DNS query message."""
query_id: int
domain: str
record_type: RecordType
@dataclass
class DNSResponse:
"""A DNS response message."""
query_id: int
domain: str
record_type: RecordType
records: list[DNSRecord]
authoritative: bool = False
from_cache: bool = False
The next step is to create the authoritative DNS server.
The constructor registers the zone this server is authoritative for and creates its record store.
add_record lets callers populate the zone before the simulation starts:
class AuthoritativeDNSServer(Process):
"""An authoritative DNS server for a specific zone."""
def init(self, name: str, zone: str, request_queue: Queue):
self.name = name
self.zone = zone # e.g., "example.com"
self.request_queue = request_queue
self.records: dict[tuple[str, RecordType], list[DNSRecord]] = {}
self.queries_served = 0
def add_record(self, record: DNSRecord):
"""Add a DNS record to this server's zone."""
key = (record.name, record.record_type)
if key not in self.records:
self.records[key] = []
self.records[key].append(record)
The run loop receives queries and builds responses.
If the query domain ends in this server's zone,
the server looks up the record directly;
it also follows CNAME chains to resolve aliases to their target A records:
async def run(self):
"""Process DNS queries."""
while True:
# Wait for a query
client_queue, query = await self.request_queue.get()
# Check if this query is for our zone
if not query.domain.endswith(self.zone):
# Not authoritative for this domain
response = DNSResponse(
query_id=query.query_id,
domain=query.domain,
record_type=query.record_type,
records=[],
authoritative=False,
)
else:
# Look up the record
key = (query.domain, query.record_type)
records = self.records.get(key, [])
# Handle CNAME (alias) records
if not records and query.record_type == RecordType.A:
cname_key = (query.domain, RecordType.CNAME)
cname_records = self.records.get(cname_key, [])
if cname_records:
# Follow the CNAME
canonical_name = cname_records[0].value
a_key = (canonical_name, RecordType.A)
records = self.records.get(a_key, [])
records = cname_records + records
response = DNSResponse(
query_id=query.query_id,
domain=query.domain,
record_type=query.record_type,
records=records,
authoritative=True,
)
print(
f"[{self.now:.3f}] {self.name}: Query for {query.domain} "
f"({query.record_type.value}) -> {len(response.records)} record(s)"
)
# Simulate processing delay
await self.timeout(0.01)
# Send response
await client_queue.put(response)
self.queries_served += 1
The recursive resolver is more complex because it must cache and coordinate with authoritative servers.
A CacheEntry wraps a record with its expiry time:
class CacheEntry:
"""A cached DNS record with expiration time."""
def __init__(self, record: DNSRecord, expire_time: float):
self.record = record
self.expire_time = expire_time
def is_expired(self, current_time: float) -> bool:
"""Check if this cache entry has expired."""
return current_time >= self.expire_time
The resolver's constructor sets up the cache and tracks hit and miss statistics:
class RecursiveDNSResolver(Process):
"""A recursive DNS resolver with caching."""
def init(
self,
name: str,
request_queue: Queue,
root_servers: dict[str, Queue],
):
self.name = name
self.request_queue = request_queue
self.root_servers = root_servers # zone -> server queue
self.cache: dict[tuple[str, RecordType], list[CacheEntry]] = {}
self.response_queue = Queue(self._env)
self.queries_received = 0
self.cache_hits = 0
self.cache_misses = 0
The run loop checks the cache before forwarding to an authoritative server.
On a cache miss it calls _resolve_recursive, then stores the results:
async def run(self):
"""Process client DNS queries."""
while True:
# Wait for a query from a client
client_queue, query = await self.request_queue.get()
self.queries_received += 1
print(
f"[{self.now:.3f}] {self.name}: Received query for "
f"{query.domain} ({query.record_type.value})"
)
# Check cache first
cached_records = self._check_cache(query.domain, query.record_type)
if cached_records:
self.cache_hits += 1
response = DNSResponse(
query_id=query.query_id,
domain=query.domain,
record_type=query.record_type,
records=cached_records,
authoritative=False,
from_cache=True,
)
print(f"[{self.now:.3f}] {self.name}: Cache HIT for {query.domain}")
else:
self.cache_misses += 1
print(f"[{self.now:.3f}] {self.name}: Cache MISS for {query.domain}")
# Recursively resolve
response = await self._resolve_recursive(query)
# Cache the results
if response.records:
self._cache_records(response.records)
# Send response to client
await client_queue.put(response)
_check_cache filters out expired entries and returns None on a complete miss.
_cache_records stores records with expiry times computed from each record's TTL:
def _check_cache(
self, domain: str, record_type: RecordType
) -> Optional[list[DNSRecord]]:
"""Check if we have a cached record."""
key = (domain, record_type)
if key not in self.cache:
return None
# Filter out expired entries
valid_entries = [
entry for entry in self.cache[key] if not entry.is_expired(self.now)
]
if not valid_entries:
del self.cache[key]
return None
self.cache[key] = valid_entries
return [entry.record for entry in valid_entries]
def _cache_records(self, records: list[DNSRecord]):
"""Add records to the cache."""
for record in records:
key = (record.name, record.record_type)
if key not in self.cache:
self.cache[key] = []
expire_time = self.now + record.ttl
self.cache[key].append(CacheEntry(record, expire_time))
_resolve_recursive finds the authoritative server
whose zone suffix matches the query domain most specifically,
then forwards the query and returns the response.
Each cached entry has an expiration time based on the record's TTL.
async def _resolve_recursive(self, query: DNSQuery) -> DNSResponse:
"""Recursively resolve a query by querying authoritative servers."""
# Find the appropriate authoritative server
# In real DNS, this involves walking the DNS hierarchy
# For simplicity, we'll match the longest zone suffix
matching_zone = None
for zone in self.root_servers.keys():
if query.domain.endswith(zone):
if matching_zone is None or len(zone) > len(matching_zone):
matching_zone = zone
if matching_zone is None:
# No authoritative server found
return DNSResponse(
query_id=query.query_id,
domain=query.domain,
record_type=query.record_type,
records=[],
authoritative=False,
)
# Query the authoritative server
server_queue = self.root_servers[matching_zone]
await server_queue.put((self.response_queue, query))
# Wait for response
response = await self.response_queue.get()
return response
Our simplified resolver directly contacts authoritative servers. Real DNS resolvers walk the hierarchy starting from root servers, but the principle is the same: cache aggressively, query only when necessary.
Clients
The DNS client wraps a response queue and a query counter:
class DNSClient(Process):
"""A DNS client that performs lookups."""
def init(self, name: str, resolver_queue: Queue):
self.name = name
self.resolver_queue = resolver_queue
self.response_queue = Queue(self._env)
self.query_counter = 0
self.queries_sent = 0
async def run(self):
"""Override in subclass to perform lookups."""
pass
lookup constructs a query, sends it to the resolver, and prints the response:
async def lookup(self, domain: str, record_type: RecordType = RecordType.A):
"""Perform a DNS lookup."""
self.query_counter += 1
query = DNSQuery(
query_id=self.query_counter, domain=domain, record_type=record_type
)
print(
f"[{self.now:.3f}] {self.name}: Looking up {domain} ({record_type.value})"
)
# Send query to resolver
await self.resolver_queue.put((self.response_queue, query))
self.queries_sent += 1
# Wait for response
response = await self.response_queue.get()
# Print results
if response.records:
cache_status = " (cached)" if response.from_cache else ""
print(f"[{self.now:.3f}] {self.name}: Resolved {domain}{cache_status}:")
for record in response.records:
print(f" {record.name} -> {record.value} (TTL: {record.ttl}s)")
else:
print(f"[{self.now:.3f}] {self.name}: No records found for {domain}")
return response
Clients send queries to recursive resolvers and wait for responses. In real systems, clients also cache locally and can query multiple resolvers for redundancy.
Running a Simulation
Let's see DNS resolution in action by building a client:
class TestClient(DNSClient):
"""Test client that performs a series of lookups."""
async def run(self):
"""Perform test lookups."""
# Lookup example.com
await self.lookup("www.example.com")
await self.timeout(1.0)
# Lookup again - should be cached
await self.lookup("www.example.com")
await self.timeout(1.0)
# Lookup different domain
await self.lookup("mail.example.com")
await self.timeout(1.0)
# Lookup from different zone
await self.lookup("www.another.org")
We can now create the overall simulation:
def main():
"""Simulate DNS resolution with caching."""
env = Environment()
# Create authoritative servers for different zones
example_queue = Queue(env)
example_server = AuthoritativeDNSServer(
env, "ns1.example.com", "example.com", example_queue
)
# Add records to example.com zone
example_server.add_record(
DNSRecord("www.example.com", RecordType.A, "192.0.2.1", ttl=300)
)
example_server.add_record(
DNSRecord("mail.example.com", RecordType.A, "192.0.2.2", ttl=300)
)
example_server.add_record(
DNSRecord("ftp.example.com", RecordType.CNAME, "www.example.com", ttl=300)
)
# Create authoritative server for another.org
another_queue = Queue(env)
another_server = AuthoritativeDNSServer(
env, "ns1.another.org", "another.org", another_queue
)
another_server.add_record(
DNSRecord("www.another.org", RecordType.A, "198.51.100.1", ttl=300)
)
# Create recursive resolver
resolver_queue = Queue(env)
resolver = RecursiveDNSResolver(
env,
"resolver.isp.net",
resolver_queue,
root_servers={"example.com": example_queue, "another.org": another_queue},
)
# Create test clients
TestClient(env, "client1.local", resolver_queue)
# Run simulation
env.run(until=10)
# Print statistics
print("\n=== DNS Resolution Statistics ===")
print(f"Example.com server: {example_server.queries_served} queries")
print(f"Another.org server: {another_server.queries_served} queries")
print("\nResolver:")
print(f" Total queries: {resolver.queries_received}")
print(f" Cache hits: {resolver.cache_hits}")
print(f" Cache misses: {resolver.cache_misses}")
if resolver.queries_received > 0:
hit_rate = (resolver.cache_hits / resolver.queries_received) * 100
print(f" Cache hit rate: {hit_rate:.1f}%")
The output shows that the first lookup of www.example.com is a cache miss,
necessitating a query to the authoritative server,
but the second lookup is a cache hit.
Different domains are cache misses until they're cached,
and the cache hit rate improves as the simulation runs.
In other words,
the first user to look up a domain pays the full resolution cost,
but subsequent users (or repeated lookups) get instant responses.
[0.000] client1.local: Looking up www.example.com (A)
[0.000] resolver.isp.net: Received query for www.example.com (A)
[0.000] resolver.isp.net: Cache MISS for www.example.com
[0.000] ns1.example.com: Query for www.example.com (A) -> 0 record(s)
[0.010] client1.local: No records found for www.example.com
[1.010] client1.local: Looking up www.example.com (A)
[1.010] resolver.isp.net: Received query for www.example.com (A)
[1.010] resolver.isp.net: Cache MISS for www.example.com
[1.010] ns1.example.com: Query for www.example.com (A) -> 0 record(s)
[1.020] client1.local: No records found for www.example.com
[2.020] client1.local: Looking up mail.example.com (A)
[2.020] resolver.isp.net: Received query for mail.example.com (A)
[2.020] resolver.isp.net: Cache MISS for mail.example.com
[2.020] ns1.example.com: Query for mail.example.com (A) -> 0 record(s)
[2.030] client1.local: No records found for mail.example.com
[3.030] client1.local: Looking up www.another.org (A)
[3.030] resolver.isp.net: Received query for www.another.org (A)
[3.030] resolver.isp.net: Cache MISS for www.another.org
[3.030] ns1.another.org: Query for www.another.org (A) -> 0 record(s)
[3.040] client1.local: No records found for www.another.org
=== DNS Resolution Statistics ===
Example.com server: 3 queries
Another.org server: 1 queries
Resolver:
Total queries: 4
Cache hits: 0
Cache misses: 4
Cache hit rate: 0.0%
ex_hierarchy.py is a more comples simulation
that demonstrate several key properties of DNS,
including the use of ultiple resolvers,
load distribution,
and the effectiveness of multiple caches.
Its output shows how the cache hit rate increases over time as more domains get cached.
Real-World Considerations
Our implementation simplifies several DNS complexities:
-
Hierarchy walking: Real resolvers start at root servers and work down the hierarchy. We skip directly to authoritative servers for clarity.
-
Multiple answers: DNS records can have multiple values (like multiple A records for load balancing). Real clients choose among them.
-
Negative caching: DNS caches "this domain doesn't exist" responses to avoid repeatedly querying for non-existent domains.
-
Anycast: Root servers use anycast (many servers sharing one IP address) to distribute load and improve availability.
-
TTL selection: Low TTLs mean changes propagate quickly but increase query load. High TTLs reduce load but slow down changes.
The most important omission in our implementation, however, is security. DNS was designed without it; DNSSEC adds cryptographic signatures to prevent spoofing and cache poisoning.