BitTorrent Protocol
- Explain the swarming property of BitTorrent and why adding more downloaders increases available upload capacity.
- Describe the rarest-first piece selection strategy and explain why it maintains piece diversity in the swarm.
- Explain how the choking and optimistic-unchoking algorithm implements tit-for-tat without a central enforcer.
- Describe how SHA-1 piece verification protects against corrupted or malicious data from untrusted peers.
When you download a large Linux distribution ISO, a game update, or a software package, you might be using BitTorrent without even knowing it. BitTorrent revolutionized file sharing by turning traditional client-server downloads upside down: instead of downloading from a single server, you download pieces from dozens of peers simultaneously. The more popular a file, the faster it downloads—a property called "swarming" that makes BitTorrent uniquely efficient for distributing large files.
BitTorrent emerged in 2001 as a solution to a fundamental problem: how do you distribute large files to millions of people without overwhelming your servers? Traditional HTTP downloads create a bottleneck—the server's upload bandwidth limits how many people can download simultaneously. BitTorrent solves this by having downloaders help each other: as soon as you download a piece, you can share it with others. This creates a distributed system where upload capacity scales with demand.
This pattern powers countless systems: Linux distributions use BitTorrent for ISO distribution, game companies use it for patches and updates, academic institutions share datasets, and content delivery networks use BitTorrent-inspired protocols for video streaming. Understanding BitTorrent reveals fundamental principles of peer-to-peer systems, incentive design, and distributed consensus.
The BitTorrent Architecture
BitTorrent involves several components working together:
- Torrent file: Metadata describing the file(s) to download, including piece hashes and tracker URL
- Tracker: Coordinates peers by providing lists of other peers in the swarm
- Peers: Clients downloading and uploading pieces simultaneously
- Seeders: Peers who have the complete file and only upload
- Leechers: Peers who are still downloading
The protocol works through these steps:
- Obtain torrent file: Contains metadata and tracker URL
- Contact tracker: Get list of peers in the swarm
- Connect to peers: Establish TCP connections with multiple peers
- Exchange piece information: Tell peers what you have, learn what they have
- Download pieces: Request rarest pieces first to maximize availability
- Upload to others: Share pieces you've downloaded to maintain good standing
- Verify integrity: Check each piece against SHA-1 hash from torrent
- Become seeder: Continue uploading after completing download
The key insight is tit-for-tat: peers upload to those who upload to them. This creates incentives for cooperation without central enforcement.
Core Data Structures
Let's start with the fundamental types. The file types describe pieces and torrent metadata:
@dataclass
class Piece:
"""A piece of the file being shared."""
index: int
data: bytes
hash_value: str # SHA-1 hash for verification
def verify(self) -> bool:
"""Verify piece integrity against hash."""
computed_hash = hashlib.sha1(self.data).hexdigest()
return computed_hash == self.hash_value
@dataclass
class TorrentMetadata:
"""Metadata from .torrent file."""
info_hash: str # Unique identifier for this torrent
piece_length: int # Size of each piece in bytes
total_pieces: int # Number of pieces
piece_hashes: List[str] # SHA-1 hash for each piece
file_name: str
file_size: int
tracker_url: str
def __str__(self) -> str:
return f"Torrent({self.file_name}, {self.total_pieces} pieces)"
@dataclass
class PeerInfo:
"""Information about a peer."""
peer_id: str
ip_address: str
port: int
def __str__(self) -> str:
return f"Peer({self.peer_id})"
def __hash__(self) -> int:
return hash(self.peer_id)
def __eq__(self, other: object) -> bool:
if not isinstance(other, PeerInfo):
return False
return self.peer_id == other.peer_id
The message types describe the protocol exchanges between peers and tracker:
@dataclass
class TrackerRequest:
"""Request to tracker."""
info_hash: str
peer_id: str
port: int
uploaded: int
downloaded: int
left: int # Bytes remaining to download
event: str # "started", "completed", "stopped"
response_queue: Queue
def __str__(self) -> str:
return f"TrackerReq(peer={self.peer_id}, event={self.event})"
@dataclass
class TrackerResponse:
"""Response from tracker."""
interval: int # Seconds until next tracker request
peers: List[PeerInfo]
def __str__(self) -> str:
return f"TrackerResp({len(self.peers)} peers)"
@dataclass
class PeerMessage:
"""Message exchanged between peers."""
msg_type: str # "choke", "unchoke", "interested", "have", "request", "piece"
payload: Any = None
def __str__(self) -> str:
if self.msg_type == "have":
return f"Have(piece={self.payload})"
elif self.msg_type == "request":
return f"Request(piece={self.payload})"
elif self.msg_type == "piece":
piece_idx = self.payload.index if isinstance(self.payload, Piece) else "?"
return f"Piece(index={piece_idx})"
return f"Msg({self.msg_type})"
@dataclass
class BitfieldMessage:
"""Bitfield indicating which pieces a peer has."""
bitfield: List[bool] # True if peer has piece at that index
def has_piece(self, index: int) -> bool:
"""Check if peer has a specific piece."""
return index < len(self.bitfield) and self.bitfield[index]
def __str__(self) -> str:
count = sum(1 for b in self.bitfield if b)
return f"Bitfield({count}/{len(self.bitfield)} pieces)"
These structures represent the protocol's messages and state. The bitfield is particularly important—it compactly represents which pieces a peer has.
Tracker Implementation
The tracker coordinates the swarm by maintaining a list of active peers. The class and its constructor set up the request queue and swarm registry:
class Tracker(Process):
"""BitTorrent tracker coordinating peers."""
def init(self) -> None:
self.request_queue: Queue = Queue(self._env)
# Track peers for each torrent (by info_hash)
self.swarms: Dict[str, Set[PeerInfo]] = {}
# Track when peers were last seen
self.peer_last_seen: Dict[str, float] = {}
print(f"[{self.now:.1f}] Tracker started")
async def run(self) -> None:
"""Main tracker loop."""
while True:
request = await self.request_queue.get()
await self.handle_request(request)
When a peer announces itself, handle_request updates the swarm and returns a list of known peers:
async def handle_request(self, request: TrackerRequest) -> None:
"""Handle tracker announce request."""
print(f"[{self.now:.1f}] Tracker: Received {request}")
# Initialize swarm if needed
if request.info_hash not in self.swarms:
self.swarms[request.info_hash] = set()
swarm = self.swarms[request.info_hash]
# Create peer info
peer = PeerInfo(
peer_id=request.peer_id, ip_address="127.0.0.1", port=request.port
)
# Handle different events
if request.event == "started" or request.event == "":
swarm.add(peer)
self.peer_last_seen[request.peer_id] = self.now
print(
f"[{self.now:.1f}] Tracker: Added {peer.peer_id} to swarm "
f"(total: {len(swarm)})"
)
elif request.event == "stopped":
swarm.discard(peer)
print(f"[{self.now:.1f}] Tracker: Removed {peer.peer_id} from swarm")
elif request.event == "completed":
self.peer_last_seen[request.peer_id] = self.now
print(f"[{self.now:.1f}] Tracker: {peer.peer_id} completed download")
# Return list of other peers
other_peers = [p for p in swarm if p.peer_id != request.peer_id]
# Limit to 50 peers (typical tracker behavior)
if len(other_peers) > 50:
other_peers = random.sample(other_peers, 50)
response = TrackerResponse(
interval=30, # Re-announce every 30 seconds
peers=other_peers,
)
await request.response_queue.put(response)
print(
f"[{self.now:.1f}] Tracker: Sent {len(other_peers)} peers to "
f"{request.peer_id}"
)
The tracker is stateless—it just maintains the current list of peers. In production, trackers often use UDP for efficiency and can handle millions of peers.
Simplified Peer
The peer is the heart of BitTorrent—it downloads pieces, uploads to others, and manages connections. The constructor stores the peer's identity, piece inventory, and connections to other peers:
class SimplifiedPeer(Process):
"""Simplified peer for simulation purposes."""
def init(
self,
peer_id: str,
metadata: TorrentMetadata,
tracker: "Tracker",
other_peers: List["SimplifiedPeer"],
initial_pieces: Optional[List[int]] = None,
) -> None:
self.peer_id = peer_id
self.metadata = metadata
self.tracker = tracker
self.other_peers = other_peers
# Which pieces we have (just indices for simplicity)
self.have_pieces: Set[int] = set(initial_pieces) if initial_pieces else set()
# Statistics
self.downloaded_pieces = len(self.have_pieces)
self.uploaded_pieces = 0
print(
f"[{self.now:.1f}] Peer {self.peer_id}: Started with "
f"{len(self.have_pieces)}/{metadata.total_pieces} pieces"
)
def is_complete(self) -> bool:
"""Check if download is complete."""
return len(self.have_pieces) == self.metadata.total_pieces
The run method announces to the tracker, then loops through download rounds until all pieces are obtained:
async def run(self) -> None:
"""Simplified download loop."""
# Announce to tracker
await self.announce("started")
# Download pieces
while not self.is_complete():
await self.download_round()
await self.timeout(1.0)
print(f"[{self.now:.1f}] Peer {self.peer_id}: ✓ Download complete!")
# Announce completion
await self.announce("completed")
# Continue seeding
await self.timeout(3.0)
async def announce(self, event: str) -> None:
"""Simplified tracker announce."""
print(f"[{self.now:.1f}] Peer {self.peer_id}: Announcing '{event}' to tracker")
Each download round selects pieces to request using rarest-first ordering, then attempts to download from available peers:
async def download_round(self) -> None:
"""Attempt to download pieces from peers."""
needed = [
i for i in range(self.metadata.total_pieces) if i not in self.have_pieces
]
if not needed:
return
# Rarest first
piece_counts: Dict[int, int] = {}
for peer in self.other_peers:
for piece_idx in peer.have_pieces:
piece_counts[piece_idx] = piece_counts.get(piece_idx, 0) + 1
needed.sort(key=lambda idx: piece_counts.get(idx, 0))
# Try to download rarest piece we need
for piece_idx in needed[:3]:
# Find peer with this piece
candidates = [p for p in self.other_peers if piece_idx in p.have_pieces]
if candidates:
peer = random.choice(candidates)
await self.download_piece_from(peer, piece_idx)
break
_download_from_peer applies the tit-for-tat rule—only downloading from a peer if it has uploaded to us recently—and on success updates piece state and broadcasts a HAVE message:
async def download_piece_from(self, peer: "SimplifiedPeer", piece_idx: int) -> None:
"""Download a piece from a peer."""
# Simulate transfer time
await self.timeout(0.2)
self.have_pieces.add(piece_idx)
self.downloaded_pieces += 1
peer.uploaded_pieces += 1
print(
f"[{self.now:.1f}] Peer {self.peer_id}: Downloaded piece {piece_idx} "
f"from {peer.peer_id} ({len(self.have_pieces)}/"
f"{self.metadata.total_pieces})"
)
This simplified version captures the essence of BitTorrent without all the protocol complexity.
Basic Simulation
Let's see BitTorrent in action:
def run_basic_bittorrent() -> None:
"""Demonstrate basic BitTorrent operation."""
env = Environment()
# Create tracker
tracker = Tracker(env)
# Create torrent metadata
metadata = TorrentMetadata(
info_hash="abc123",
piece_length=256 * 1024, # 256 KB pieces
total_pieces=10,
piece_hashes=["hash" + str(i) for i in range(10)],
file_name="example.iso",
file_size=10 * 256 * 1024,
tracker_url="http://tracker.example.com:8080/announce",
)
print(f"[{env.now:.1f}] Created {metadata}\n")
# Create initial seeder with all pieces
peers: List[SimplifiedPeer] = []
seeder = SimplifiedPeer(
env,
"Seeder",
metadata,
tracker,
peers,
initial_pieces=list(range(10)), # Has all pieces
)
peers.append(seeder)
# Create leechers with no pieces
for i in range(3):
peer = SimplifiedPeer(
env, f"Peer{i + 1}", metadata, tracker, peers, initial_pieces=[]
)
peers.append(peer)
# Update peer lists
for peer in peers:
peer.other_peers = [p for p in peers if p != peer]
# Run simulation
env.run(until=30)
# Print statistics
print(f"\n{'=' * 60}")
print("Final Statistics:")
print("=" * 60)
for peer in peers:
print(
f"{peer.peer_id}: Downloaded={peer.downloaded_pieces}, "
f"Uploaded={peer.uploaded_pieces}"
)
This shows how pieces propagate through the swarm—early peers help later peers, distributing the upload burden.
Key BitTorrent Concepts
Rarest First
Peers prioritize downloading the rarest pieces in the swarm. This ensures piece diversity—if everyone downloads common pieces, rare pieces might disappear if the only seeder leaves.
Tit-for-Tat
Peers upload to those who upload to them. This creates incentive for cooperation without central enforcement. Peers who don't upload ("leechers") get poor download speeds.
Optimistic Unchoking
Periodically upload to a random peer who isn't uploading to you. This gives new peers a chance to join the ecosystem and discover faster peers.
Choking Algorithm
Peers limit uploads to their top 4-5 uploaders. This maximizes efficiency by focusing bandwidth on productive connections rather than spreading it thin.
ChokingPeer implements the full choking/unchoking algorithm:
@dataclass
class UploadRecord:
"""Tracks how much a peer has uploaded to us in the recent window."""
peer_id: str
bytes_received: int = 0 # bytes downloaded FROM this peer recently
is_choked: bool = True # are we choking this peer (refusing to upload)?
class ChokingPeer(Process):
"""Peer with a proper choking algorithm and piece verification.
Hidden complexity:
The real BitTorrent choking algorithm measures upload rate over a 20-second
rolling window, not total bytes. Peers that have good upload rates *recently*
are unchoked; peers whose rates drop are re-choked. This prevents a peer that
uploaded a lot in the past but is now idle from monopolizing upload slots.
We use a simpler sliding counter here; the structure is the same.
"""
def init(
self,
peer_id: str,
metadata: TorrentMetadata,
other_peers: List["SimplifiedPeer"],
initial_pieces: Optional[List[int]] = None,
) -> None:
self.peer_id = peer_id
self.metadata = metadata
self.other_peers = other_peers
self.have_pieces: Set[int] = set(initial_pieces or [])
self.upload_records: Dict[str, UploadRecord] = {}
self.optimistic_peer: Optional[str] = None
# Statistics
self.downloaded_pieces = len(self.have_pieces)
self.uploaded_pieces = 0
self.rejected_pieces = 0 # failed hash verification
for peer in other_peers:
self.upload_records[peer.peer_id] = UploadRecord(peer_id=peer.peer_id)
print(
f"[{self.now:.1f}] Peer {self.peer_id}: Started with "
f"{len(self.have_pieces)}/{metadata.total_pieces} pieces"
)
def is_complete(self) -> bool:
return len(self.have_pieces) == self.metadata.total_pieces
async def run(self) -> None:
"""Download loop with concurrent unchoke timer."""
_UnchokeTimer(self._env, self)
_OptimisticUnchokeTimer(self._env, self)
while not self.is_complete():
await self._download_round()
await self.timeout(1.0)
print(f"[{self.now:.1f}] Peer {self.peer_id}: Download complete!")
The rechoke method re-evaluates who gets upload slots based on recent upload history,
and rotate_optimistic gives new peers a chance to prove themselves:
def rechoke(self) -> None:
"""Re-evaluate which peers to unchoke based on upload rate.
The top MAX_UNCHOKED peers by bytes_received are unchoked.
All others are choked (unless they are the current optimistic unchoke).
After evaluating, reset the counters for the next interval.
"""
records = list(self.upload_records.values())
records.sort(key=lambda r: r.bytes_received, reverse=True)
for i, record in enumerate(records):
if record.peer_id == self.optimistic_peer:
record.is_choked = False # Always unchoke the optimistic slot.
else:
record.is_choked = i >= MAX_UNCHOKED
print(
f"[{self.now:.1f}] Peer {self.peer_id}: Rechoked. "
f"Unchoked: {[r.peer_id for r in records if not r.is_choked]}"
)
# Reset counters for next interval.
for record in records:
record.bytes_received = 0
def rotate_optimistic(self) -> None:
"""Rotate the optimistic unchoke to a randomly-chosen choked peer.
This gives new or slow peers a chance to prove they can upload.
Without optimistic unchoking, a peer that joined the swarm after the
top uploaders were established would never get unchoked and could
never start downloading.
"""
choked = [r.peer_id for r in self.upload_records.values() if r.is_choked]
if choked:
self.optimistic_peer = random.choice(choked)
if self.optimistic_peer in self.upload_records:
self.upload_records[self.optimistic_peer].is_choked = False
print(
f"[{self.now:.1f}] Peer {self.peer_id}: "
f"Optimistically unchoked {self.optimistic_peer}"
)
Piece verification checks each downloaded piece against the SHA-1 hash in the torrent metadata:
async def _download_round(self) -> None:
"""Download one piece using rarest-first selection."""
needed = [
i for i in range(self.metadata.total_pieces) if i not in self.have_pieces
]
if not needed:
return
# Rarest-first: count how many peers have each piece.
piece_counts: Dict[int, int] = {}
for peer in self.other_peers:
for idx in peer.have_pieces:
piece_counts[idx] = piece_counts.get(idx, 0) + 1
needed.sort(key=lambda idx: piece_counts.get(idx, 0))
for piece_idx in needed[:3]:
candidates = [
p for p in self.other_peers
if piece_idx in p.have_pieces
and self.upload_records.get(p.peer_id, UploadRecord(p.peer_id)).is_choked is False
]
if not candidates:
# No unchoked peer has this piece; try any peer.
candidates = [
p for p in self.other_peers if piece_idx in p.have_pieces
]
if candidates:
source = random.choice(candidates)
await self._download_and_verify(source, piece_idx)
# Record that this peer uploaded to us.
if source.peer_id in self.upload_records:
self.upload_records[source.peer_id].bytes_received += 1
source.uploaded_pieces += 1
break
async def _download_and_verify(
self, source: "SimplifiedPeer", piece_idx: int
) -> None:
"""Download a piece and verify its SHA-1 hash.
If verification fails the piece is discarded.
In real BitTorrent, a peer that repeatedly sends bad pieces is banned.
"""
await self.timeout(0.2) # simulate transfer time
# Simulate the piece data (in a real system this would be actual bytes).
fake_data = f"piece-{piece_idx}-from-{source.peer_id}".encode()
expected_hash = self.metadata.piece_hashes[piece_idx]
actual_hash = hashlib.sha1(fake_data).hexdigest()
if actual_hash == expected_hash:
self.have_pieces.add(piece_idx)
self.downloaded_pieces += 1
print(
f"[{self.now:.1f}] Peer {self.peer_id}: "
f"Verified piece {piece_idx} from {source.peer_id} "
f"({len(self.have_pieces)}/{self.metadata.total_pieces})"
)
else:
# Hash mismatch — corrupt or tampered piece.
self.rejected_pieces += 1
print(
f"[{self.now:.1f}] Peer {self.peer_id}: "
f"REJECTED piece {piece_idx} from {source.peer_id} "
f"(hash mismatch)"
)
End Game Mode
When almost complete, aggressively request remaining pieces from all peers. Cancel duplicate requests when pieces arrive. This prevents the last few pieces from taking disproportionately long.
DHT (Distributed Hash Table)
Modern BitTorrent doesn't require trackers. DHT creates a distributed database where peers can find other peers without a central server:
- Uses Kademlia algorithm
- Each peer stores information about nearby peers in ID space
- Provides tracker-like functionality in decentralized manner
- Resilient to tracker failures or censorship
Security and Privacy
BitTorrent has several security considerations:
Protocol Encryption: Obfuscates traffic to bypass ISP throttling
PEX (Peer Exchange): Peers share peer lists directly, reducing tracker dependency
Magnet Links: Reference torrents by info hash without needing .torrent file
Private Trackers: Require authentication, track ratios, encourage seeding
Copyright Concerns: BitTorrent is neutral technology but can distribute copyrighted content
Real-World Applications
Beyond file sharing, BitTorrent's principles appear in:
Content Delivery: Twitter uses BitTorrent-inspired Murder for deploying code to servers
Software Distribution: Linux distributions, game updates, software patches
Live Streaming: Peer-assisted streaming reduces server load
Blockchain: Bitcoin and Ethereum use gossip protocols inspired by P2P systems
IPFS: InterPlanetary File System uses BitTorrent-like chunking and distribution
Conclusion
BitTorrent demonstrates how to build efficient distributed systems through clever incentive design. The key principles are:
- Decentralization: No single point of failure or bottleneck
- Swarming: More demand creates more supply
- Incentive alignment: Tit-for-tat encourages cooperation
- Piece verification: Cryptographic hashes ensure integrity
- Rarest first: Maintains piece diversity in the swarm
These patterns extend beyond file sharing to distributed databases, content delivery networks, and peer-to-peer systems generally. Understanding BitTorrent provides insight into how to coordinate distributed resources without central control—a fundamental challenge in distributed systems.
Our simulation captures the essence of BitTorrent: pieces flowing through a swarm, rarest-first selection, and peers helping each other. While production BitTorrent adds complexity—TCP connection management, detailed choking algorithms, DHT—the core ideas we've demonstrated remain central to how BitTorrent achieves its remarkable efficiency.
Exercises
-
Run the basic simulation with 1 seeder and 4 peers. Look at the download times for each peer. Now change the simulation to start with 2 seeders. By how much does the total download time decrease? Why doesn't adding a second seeder halve the download time?
-
In
simplified_peer.py, the rarest-first selection sorts pieces by how many peers have them. Change the selection to most common first (reverse the sort order) and run the simulation. What happens to the swarm over time? Which pieces are at risk of disappearing if the only seeder leaves? -
The
ChokingPeerrejects pieces that fail hash verification. Modify the simulation to occasionally corrupt a piece (flip one byte) before the hash check. What fraction of downloads succeed on the first try? How many total download attempts are needed to get all pieces? -
Optimistic unchoking gives new peers a chance to start downloading. Simulate a swarm where all peers start at the same time with no pieces. With optimistic unchoking disabled, can the swarm make any progress? With it enabled, how many rounds does it take before the first piece propagates?
-
A "free-rider" downloads pieces but never uploads. In the choking algorithm, what happens to a free-rider's upload record? Will it ever be unchoked by the top-4 mechanism? Under what circumstances might it receive pieces anyway, and is this a design flaw?