Glossary

A

AP system: A distributed system that prioritizes availability and partition tolerance according to the CAP theorem, remaining operational during network partitions but potentially returning stale or inconsistent data.
associativity: The property of an operation where grouping does not affect the result, i.e., (a op b) op c = a op (b op c). Required for operations to be safely split across workers and recombined.
async context variables: Variables that store context information tied to the current asynchronous execution context rather than to a specific thread, allowing data such as trace IDs to flow automatically through async call chains.
at-least-once delivery: Message delivery guarantee ensuring every message is delivered one or more times. Requires acknowledgments and message redelivery on failure but may result in duplicates.
at-most-once delivery: Message delivery guarantee ensuring messages are delivered zero or one time, never duplicated but possibly lost. This is simple to implement but provides the weakest guarantee.

B

backoff multiplier: The factor by which the wait time is multiplied after each failed retry attempt. A multiplier of 2 produces exponential backoff, doubling the delay with each retry.
backpressure: Mechanism to prevent overwhelming a system by signaling upstream components to slow down when downstream components cannot keep up.
buffer: A temporary storage area that holds data while it is being transferred between components that operate at different speeds or rates.

C

cache miss: A failure to find requested data in a cache, requiring the more expensive operation of fetching it from the original source.
causal consistency: A consistency model where operations that are causally related are seen by all nodes in the same order, while unrelated operations may be observed in different orders on different nodes.
commutativity: The property of an operation where the order of operands does not affect the result, i.e., a op b = b op a. Together with associativity, it allows operations to be applied in any order and combined safely.
compensation: A corrective action taken to undo the effects of a previously completed step in a saga when a later step fails, restoring the system to a consistent state without requiring distributed transactions.
concurrency: The execution of multiple tasks that overlap in time, which may or may not run simultaneously on different processors. Concurrency is concerned with managing multiple tasks, while parallelism is concerned with executing them at the same instant.
conflict-free replicated data type (CRDT): A data structure designed for distributed systems that can be replicated across nodes and merged automatically without coordination, guaranteeing that all replicas converge to the same state.
consumer group/span>: A set of message queue subscribers that share the workload. Messages are distributed among group members rather than duplicated to each.
contention: Competition between concurrent processes for access to a shared resource such as a lock, memory location, or I/O device, which can become a performance bottleneck.
context propagation: Passing trace IDs, span IDs, and other metadata between services so that operations can be correlated in distributed tracing.
CP system: A distributed system that prioritizes consistency and partition tolerance according to the CAP theorem, refusing to serve requests rather than returning potentially inconsistent data during network partitions.
critical section: A portion of code that accesses a shared resource and must not be executed by more than one process or thread at a time, requiring mutual exclusion to prevent data corruption.
cross-site request forgery (CSRF): An attack where a malicious site tricks a user's browser into sending an authenticated request to another site without the user's knowledge, exploiting the browser's automatic inclusion of session cookies.

D

dataclass: A Python class whose primary purpose is to hold data, created using the @dataclass decorator which automatically generates common methods such as __init__, __repr__, and __eq__.
decorator: A function that wraps another function or class to modify or extend its behavior without changing its source code.
decoupling: The design principle of reducing dependencies between components so that changes in one component do not require changes in others, enabling independent development, testing, and scaling.
delta: A record of the difference or change between two states, used in delta-based CRDTs to transmit only recent changes rather than the full state.
double-ended queue (deque): A data structure that supports efficient insertion and removal of elements at both ends, combining the properties of a stack and a queue.
divide and conquer: An algorithm design strategy that breaks a problem into smaller subproblems, solves each independently, and combines the results. MapReduce applies this pattern to distributed data processing.
Domain Name System (DNS): A hierarchical, distributed database that translates human-readable domain names such as "www.example.com" into IP addresses. DNS is a critical internet infrastructure component that enables service mobility and scales by distributing authority across millions of servers.
DNSSEC: DNS Security Extensions, a suite of protocols that add cryptographic signatures to DNS records, allowing resolvers to verify that responses are authentic and have not been tampered with, thereby preventing spoofing and cache poisoning attacks.

E

eventual consistency: A consistency model where replicas may temporarily disagree but are guaranteed to converge to the same state if no new updates are made for a sufficient period.
exactly-once delivery: Message delivery guarantee ensuring each message is processed exactly once. This is difficult to achieve in practice.
exponential backoff: A retry strategy where the wait time between attempts increases exponentially after each failure, reducing load on an overloaded system and preventing retry storms.

F

fan-out: Pattern where one message or request triggers multiple downstream operations, such as publishing to multiple subscribers or calling multiple services in parallel.
fault tolerance: The ability of a system to continue operating correctly in the presence of failures of some of its components.
fencing token: A monotonically increasing number issued with each lock grant that a protected resource uses to reject requests from stale lock holders, preventing split-brain data corruption.
future: An object representing the result of an asynchronous computation that may not have completed yet. The result can be retrieved once the computation finishes, allowing the caller to do other work in the meantime.

G

granularity: The size of the units into which work is divided. Fine-grained tasks are small and numerous while coarse-grained tasks are large and few. The right granularity balances the benefits of parallelism against the overhead of coordination.
grow-only counter: A CRDT that supports only increment operations, with each node tracking its own count and the total computed as the sum across all nodes.

H

happens-before relation: A partial ordering of events in a distributed system where event A happens-before event B if A could have causally influenced B. Used to reason about consistency and the ordering of concurrent operations.
hash code: A fixed-size integer derived from data by a hash function, used to quickly locate data in hash tables. Good hash functions distribute values uniformly and minimize collisions.
HTTP header: A key-value pair sent at the start of an HTTP request or response that provides metadata such as content type, authentication tokens, or caching directives.
HTTP status code: A three-digit number in an HTTP response indicating the result of the request. Codes in the 200s indicate success, 400s indicate client errors, and 500s indicate server errors.

I

idempotence: Describing an operation that produces the same result whether applied once or multiple times. Idempotent operations are safe to retry in at-least-once delivery systems.

J

JSON: JavaScript Object Notation, a lightweight text-based format for representing structured data as key-value pairs, arrays, and nested objects. Widely used for data exchange between web services.

K

L

last-write-wins register: A CRDT that resolves concurrent writes by keeping the value with the highest timestamp, discarding earlier writes. Simple to implement but may silently lose updates.
lease: A time-limited grant of a resource or lock that expires automatically if not renewed by the holder, allowing the system to recover from client failures without manual intervention.
lease-based lock: A distributed lock that is held for a limited duration and must be periodically renewed, so that the lock is automatically released if the holder crashes or becomes unreachable.
linearizability: A strong consistency model where every operation appears to take effect instantaneously at some point between its start and completion, making the system behave as if there were a single copy of the data.
livelock: A situation where two or more processes continually change state in response to each other without making progress, similar to deadlock but with processes that are not blocked, only unproductive.
load balancing: The distribution of incoming requests or work across multiple servers or workers to prevent any single component from becoming a bottleneck and to make efficient use of available resources.
logical clock: A mechanism for ordering events in a distributed system without using physical time, such as a Lamport clock or vector clock, by assigning monotonically increasing counters to events.

M

message broker: A middleware component that receives messages from producers, stores them, and routes them to consumers. It decouples senders from receivers and may provide durability, ordering, and filtering.
microservice: An architectural style where an application is built as a collection of small, independently deployable services, each responsible for a specific business capability and communicating over a network.
mutex: A synchronization primitive that grants exclusive access to a shared resource, allowing only one thread or process to hold it at a time and blocking others until it is released.
mutual exclusion: The guarantee that only one process or thread accesses a shared resource at any given time, preventing race conditions and data corruption in concurrent systems.

N

negative cache: A cache that stores records indicating that a lookup returned no result, such as a non-existent domain name, so that repeated queries for the same missing resource are answered locally rather than forwarded to authoritative servers.
negative feedback loop: A control mechanism where a system's output feeds back to reduce its input, stabilizing the system around a target state. Backpressure in distributed systems is an example of this pattern.
network partition: A failure that splits a distributed system into two or more groups of nodes that cannot communicate with each other, forcing a choice between consistency and availability according to the CAP theorem.
Network Time Protocol (NTP): A networking protocol for synchronizing clocks across computers on a network. NTP uses a hierarchy of time sources and adjusts for network delays to keep clocks accurate to within milliseconds.
NTP stratum: A level in the NTP hierarchy indicating how many hops a clock is from a reference time source. Stratum 0 devices are high-precision clocks such as atomic clocks or GPS receivers; stratum 1 servers synchronize directly from them, and each subsequent stratum synchronizes from the level above.

O

OAuth scope: A string that specifies the permissions an application is requesting from a user, limiting what the application can do with the access token it receives.
OAuth token: A credential issued by an authorization server that grants an application limited access to a user's resources on another service without exposing the user's password.
operation-based CRDT: A CRDT that replicates by broadcasting individual operations to all replicas. Requires operations to be commutative so they can be applied in any order.

P

partial order: A relation that is reflexive, antisymmetric, and transitive, but where not all pairs of elements are necessarily comparable. The happens-before relation is a partial order on events in a distributed system.
partition tolerance: The ability of a distributed system to continue operating correctly even when network partitions prevent some nodes from communicating with others.
positive-negative counter: A CRDT counter that supports both increment and decrement operations by maintaining one grow-only counter for increments and another for decrements, with the value being their difference.
priority queue: A data structure where each element has an associated priority and elements are removed in priority order rather than insertion order.
publish-subscribe: Messaging pattern where publishers send messages to topics and subscribers receive all messages from topics they're interested in. This pattern decouples senders from receivers.

Q

R

recursive resolver: A DNS server that accepts queries from clients and resolves them on the client's behalf by walking the DNS hierarchy from root servers down to authoritative name servers, caching responses to answer future queries more quickly.
root server: One of the thirteen logical DNS servers at the top of the DNS hierarchy that know the authoritative name servers for each top-level domain. In practice, root servers are implemented as many physical servers distributed worldwide using anycast addressing.
root span: The first span in a trace, representing the top-level operation that initiated the request and serving as the root of the trace's span tree.
round-robin polling: A scheduling strategy that cycles through a list of items in fixed rotation, giving each one a turn in order to distribute work or checks evenly.

S

saga: A apttern in which a multi-step transaction is implemented as a sequence of local transactions, each with a compensating action to undo its effects if the overall transaction fails.
sampling: Recording only a fraction of traces in distributed tracing to reduce overhead and storage requirements.
schema: A formal description of the structure and types of data, used to validate that data conforms to an expected format. Common in databases and data interchange formats such as JSON and Avro.
semaphore: A synchronization primitive that controls access to a shared resource by maintaining a counter representing the number of available permits, allowing up to that many concurrent accessors.
sequential consistency: A consistency model where all operations appear to execute in some total order that is consistent with the order seen by each individual process.
single sign-on: An authentication scheme that allows a user to log in once and gain access to multiple applications or services without re-entering credentials for each one.
singleton: A design pattern that restricts a class to a single instance and provides a global access point to it.
span: A named, timed operation representing a single unit of work within a distributed trace. Spans can be nested to form a tree that represents the full execution of a request.
speculative execution: Running multiple instances of a task in parallel and using the result of whichever completes first, discarding the others, to reduce the impact of slow or failed workers.
split-brain scenario: A situation where two or more nodes in a distributed system each believe they hold exclusive control over a resource, typically caused by a network partition, leading to conflicting writes and data inconsistency.
state-based CRDT: A CRDT that replicates by periodically sending the full state to other replicas, which merge it using a commutative, associative, and idempotent merge function.
strong consistency: A consistency model where all reads return the most recently written value, as if the system had a single authoritative copy of the data.
strong eventual consistency: A consistency model guaranteeing that any two replicas that have received the same set of updates will have identical states, regardless of the order in which those updates arrived.

T

thread-local storage: A mechanism that provides each thread with its own private copy of a variable, preventing interference between threads without requiring synchronization.
time to live (TTL): A value attached to a DNS record or cached entry that specifies how long it may be stored before it must be discarded and re-fetched. Low TTLs allow changes to propagate quickly but increase query load; high TTLs reduce load but slow propagation of updates.
top-level domain: The rightmost label in a domain name, such as .com, .org, or .net, managed by a dedicated set of DNS servers that know the authoritative name servers for every domain registered under that suffix.
total order: A relation that is reflexive, antisymmetric, and transitive, and where every pair of elements is comparable. Unlike a partial order, every two elements can be ordered relative to each other.
trace: The complete journey of a request through a distributed system, identified by a unique trace ID and composed of multiple spans forming a tree.
trace collector: A service that receives spans from instrumented applications, assembles them into complete traces, and stores or forwards them to a backend for analysis and visualization.
Transmission Control Protocol (TCP): A network protocol that provides reliable, ordered delivery of data over the internet, built on top of the unreliable UDP protocol. TCP uses sequence numbers, acknowledgments, retransmission, and sliding windows to handle packet loss, reordering, and congestion automatically.
trust anchor: A certificate or public key that is trusted unconditionally and serves as the root from which the validity of other certificates in a chain is established.

U

V

vector clock: A data structure used to capture causality in distributed systems, consisting of one counter per process. Comparing two vector clocks determines whether events are causally related or concurrent.

W

work stealing: A scheduling strategy where each worker maintains a local task queue and idle workers take tasks from others' queues in order to minimize contention while balancing load.

Glossary

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

W

X

Y

Z