Sharding a database across eight servers introduces network latency to queries that would otherwise run locally on a single disk, making simple reads slower.

Most engineers shard a database to handle write volume or storage limits that exceed a single machine’s capacity. The architecture splits data horizontally across multiple nodes, often using a router or coordinator to direct traffic. A client sends a query to the coordinator. The coordinator hashes the request key to determine which shard holds the data. It forwards the request. The shard processes the query. The shard returns the result. The coordinator aggregates the response and sends it back to the client. This path is longer than a direct connection to a single database server. The Google Spanner paper documents this topology as a necessity for global scale, but it comes with a physical cost. Every hop adds milliseconds. Every hop adds points of failure.

The latency difference depends entirely on where the data lives relative to the query logic. A query targeting one shard is fast. A query requiring data from all shards is slow. The difference is not linear. The coordination overhead scales exponentially with the number of shards involved in a single transaction.

The latency breakdown

The following table compares three common query patterns on a cluster with 8 shards. The baseline is a single-node database running on local SSDs. The sharded cluster uses standard cloud networking with 1ms intra-region latency.

Query TypePathLocal ProcessingNetwork HopsTotal Latency
Single-Shard ReadClient → Router → Shard 12ms2ms4ms
Cross-Shard JoinClient → Router → Shard 1 & 24ms10ms14ms
Scatter-Gather CountClient → Router → All 8 Shards10ms80ms90ms

The numbers assume a standard cloud environment. The “Local Processing” time includes disk I/O and CPU for the query plan. The “Network Hops” time includes the round-trip time (RTT) for the router to talk to the shard(s) and back. In a single-node database, the “Network Hops” column is zero. The data is on the same machine. The CPU talks to the disk directly over a bus.

In the sharded setup, the router must resolve the key. This is a network call to the metadata service. Then the router calls the shard. The shard talks to its local disk. The result travels back up the chain. AWS DynamoDB documentation specifies that partition keys are hashed to determine storage node location. This hashing is efficient, but the physical distance remains. Even if the 8 shards are in the same data center, the network stack adds overhead. TCP handshakes, serialization, and deserialization consume time.

The coordination tax

The table shows why cross-shard operations are pathological. A single-shard read requires one network round trip from the router to the data node. A cross-shard join requires two. The router must fetch data from Shard 1, hold it in memory, fetch data from Shard 2, and perform the join operation in the router itself. This moves the CPU load from the storage nodes to the coordinator. The coordinator becomes the bottleneck.

CockroachDB documentation describes this as the “distributed join” problem. When the join happens on the router, the router must buffer the entire result set from Shard 1 before it can process Shard 2. If Shard 1 returns 1 million rows, the router must store 1 million rows in RAM. If the router runs out of memory, the query fails. The single-node database does not have this problem. The query engine and the storage engine are the same process. They stream data directly.

The Scatter-Gather pattern is worse. To count all rows in the system, the router sends a request to all 8 shards. Each shard counts its local rows. The 8 results return to the router. The router sums the numbers. The latency is dominated by the slowest shard. If one shard is on a noisy neighbor instance and takes 15ms instead of 5ms, the entire query waits 15ms. This is the “straggler” problem. The AWS DynamoDB documentation notes that partition keys must be chosen to avoid hotspots, but it does not eliminate the straggler latency in aggregate queries. The 90ms latency in the table represents the worst-case scenario where one shard is slow and the network is congested.

The math of scale

The tradeoff is throughput for latency. A single-node database might handle 5,000 queries per second before CPU saturation. Sharding across 8 nodes might allow 40,000 queries per second. The write capacity scales linearly. The read capacity scales linearly for single-key lookups. But the average latency increases by a factor of 2 to 4. The network is the limiting factor, not the disk.

Modern fiber optics transmit data at the speed of light, but network stacks add delay. A 1ms RTT in a data center is typical. A 10ms RTT across regions is typical. The table assumes 1ms. If the shards are distributed across availability zones, the latency doubles. If the shards are distributed across regions, the latency increases to 50ms or more. The Google Spanner paper discusses TrueTime, which adds synchronization latency to ensure consistency. This adds another layer of delay. The system waits for clocks to agree before committing a transaction. This is correct by design, but it is slow by physics.

The decision to shard is a decision to accept higher latency for higher capacity. If the application does not need 40,000 writes per second, the shard adds cost without benefit. The 2ms latency of a single-node read is superior to the 4ms latency of a sharded read. The 90ms latency of a scatter-gather query is often unacceptable for user-facing interfaces.

The decision point

The 4ms latency of a single-shard read is acceptable for most APIs. The 90ms latency of a scatter-gather count is not. The closer the query stays to the primary key, the closer the performance stays to a single node. The further the query drifts into joins or aggregations, the more the network cost dominates.

The 40ms difference between a single-shard read and a scatter-gather count is the price of distribution. It is the cost of moving data from a local bus to a network cable. If the application requires complex joins across user data and order data, and that data lives on different shards, the router must fetch both. This multiplies the network hops. The 14ms cross-shard join latency is the warning sign. It indicates that the data model does not fit the sharding key.

The math says single-node is faster for reads. The behavior says sharding is necessary for writes. The compromise is to keep read-heavy workloads on a single node or a read replica, and reserve sharding for write-heavy workloads. The 90ms latency is a bill for scale that most applications do not need to pay.