A kubectl apply command does not start a container — it files a request that travels through seven distinct system components before the first byte of application code executes.
The common mental model treats Kubernetes as a single engine where a command spins up a process. In reality, the system separates the control plane from the data plane. The control plane decides what should run, and the data plane makes it run. This separation introduces latency and specific failure points that are invisible when looking only at the final status.
Understanding the path from request to runtime reveals where delays occur. The sequence begins with validation and ends with a binary start signal. Between those points, state must be written, distributed, scheduled, and confirmed. Each step adds milliseconds to the total time, and each step depends on a different subsystem defined by the Cloud Native Computing Foundation and the Kubernetes project.
The user sees a single command. The system sees a distributed transaction.
The control plane sequence
The journey begins when the client tool sends a JSON object to the API server. This is not a direct instruction to a machine. It is a state change request that must be recorded before it can be acted upon. The API server validates the syntax against the Custom Resource Definitions, checks permissions, and then persists the desired state.
Persistence happens in etcd, a distributed key-value store. The API server does not consider the request complete until etcd confirms the write has been replicated across the cluster’s quorum. This ensures that if the API server restarts, the request is not lost. Once etcd confirms the write, the scheduler wakes up. The scheduler scans for pending pods, checks resource availability on nodes, and binds the pod to a specific machine.
The kubelet on that machine then receives the assignment. It does not start containers directly. It calls the Container Runtime Interface (CRI), a standardized plugin system, to hand off the workload to the actual runtime, such as containerd or CRI-O. Only the runtime pulls the image layers, configures the network namespace, and executes the binary.
The following table breaks down the seven distinct steps, the component responsible for each, and the persistence state at that moment.
| Step | Component | Action | Latency Impact | Persistence State |
|---|---|---|---|---|
| 1 | kubectl client | Sends POST request to API | 50-100ms | None |
| 2 | kube-apiserver | Validates auth and schema | 20-50ms | None |
| 3 | etcd | Writes state to disk (quorum) | 50-200ms | Persistent |
| 4 | kube-scheduler | Watches for unbound pods | 10-30ms | None |
| 5 | kube-apiserver | Updates pod with node name | 20-50ms | Persistent |
| 6 | kubelet | Syncs pod spec to local cache | 5-20ms | Local |
| 7 | containerd | Pulls image and starts process | 2000-6000ms | Local (runtime) |
The latency numbers vary significantly based on network conditions and image size. Steps 1 through 6 are mostly CPU-bound and network-bound within the control plane. Step 7 is I/O bound and depends on disk speed and bandwidth.
The tradeoff between consistency and speed
The seven-step sequence reveals a specific architectural tradeoff: consistency is prioritized over immediate execution. The system waits for etcd to confirm the write before the scheduler makes a decision. This prevents split-brain scenarios where two schedulers might assign the same pod to different nodes, or where a pod is started on a node that the API server no longer recognizes.
This design choice means the user cannot start a container faster than the etcd write time. If the cluster has three etcd members and one is slow, the write latency increases for all users. The Kubernetes project documentation states that etcd performance is the primary bottleneck for large-scale cluster operations. For a standard cluster, the etcd write alone adds roughly 100ms of overhead before the scheduler even sees the request.
Conversely, the kubelet operates on a local loop. It polls the API server for changes. This introduces a small delay between the API server updating the pod status and the kubelet reacting. The kubelet syncs every 10 seconds by default, though it reacts faster to specific events. This polling interval is the reason a pod status might show “Pending” for a few seconds even after the scheduler has made a decision.
The separation also dictates failure modes. If the API server is healthy but the etcd cluster is unresponsive, the system accepts commands but never persists them. If the scheduler is down, pods remain unassigned. If the kubelet is down, the node is marked “NotReady” and no new workloads are assigned to it. Each component is a single point of failure for a specific part of the lifecycle.
The Container Runtime Interface (CRI) adds another layer of indirection. The kubelet does not know how to start a container; it only knows how to talk to the CRI. This allows the system to swap containerd for CRI-O without changing the kubelet. However, this abstraction layer adds a small overhead to every container start command as the request passes through the socket.
The shape of the delay
The total time to run is the sum of the control plane latency and the data plane execution. For a small image on a local network, the control plane steps (1-6) take roughly 400ms to 1 second. The data plane step (7) takes 2 to 6 seconds for a typical 200MB image. For a large image or a slow network, the data plane dominates the timeline.
If the cluster is under heavy load, the etcd write time can spike to 500ms. The scheduler queue can grow, adding 1-2 seconds of wait time. The kubelet sync loop can miss a change, adding 10 seconds of delay. These delays are additive. A slow control plane combined with a slow network results in a 10-second startup time, which can frustrate users expecting instant feedback.
The system is designed to be eventually consistent. It does not guarantee that a container starts within a fixed time window. It guarantees that the state will eventually match the request. This distinction is critical for debugging. When a container does not start, the investigation must follow the table: check the API server logs for validation errors, check etcd for write failures, check the scheduler for binding issues, and check the kubelet for runtime errors.
The closer
The math says the control plane overhead is roughly 400ms for a healthy cluster. The behavior says the network and image pull time are the variable costs that dominate the wait. If the etcd write takes 200ms, that time is non-negotiable overhead. The operator can optimize the image size to reduce the 6000ms pull time, but cannot optimize the 200ms etcd write without changing the cluster topology. The tradeoff is fixed: consistency costs time, and speed costs reliability.