System Design Interview Questions

20+ Questions. Master scalability, reliability, and distributed architectures with detailed design breakdowns.

Search All System Designs

Large-Scale System Designs (Top 4)

1. Design Twitter (or a Feed/Timeline Service). Scale: Hard

Answer & Explanation

Key Challenge: The "Fanout" mechanism. Tweets need to reach millions of followers instantly.

Fanout-on-Write (Push Model): Used for tweets by celebrities (high follower count). The tweet is immediately pushed into all followers' inboxes/timelines. Ensures fast read time but has high write load.
Fanout-on-Read (Pull Model): Used for users with massive follower counts (e.g., millions). The user's timeline is generated dynamically when they load the app. Saves write load but increases read latency.
Storage: Tweets stored in a distributed key-value store (NoSQL, like Cassandra). Timelines are cached in-memory (like Redis).

2. Design a URL Shortening service like bit.ly. Scale: Medium

Answer & Explanation

Core Mechanism: Generating unique 6-8 character keys for a long URL and redirecting requests.

Key Generation: Instead of simple hashing (prone to collisions), use a **Distributed Counter Service** to generate unique numerical IDs. Convert the ID to a base62 string (0-9, a-z, A-Z) to get the short code.
Redirection: The service takes the short code, looks up the long URL in the database, and sends an HTTP 301 or 307 redirect to the client.
Database: Use a fast, highly available key-value store (like DynamoDB or Cassandra) for the (short\_key, long\_url) mapping. Reads greatly outweigh writes.

3. Design Netflix (or a Video Streaming Service). Scale: Hard

Answer & Explanation

Key Challenge: Low-latency streaming and massive bandwidth requirements.

CDN: Use a proprietary or custom Content Delivery Network (like Netflix Open Connect) distributed globally to cache video chunks close to users. 90% of requests should be handled by the CDN.
Encoding/Transcoding: Videos must be transcoded into various bitrates, resolutions, and formats (HLS, DASH) for different devices and network conditions.
Recommendation Service: This runs offline/asynchronously using ML models to personalize the homepage queue.
Databases: Use NoSQL for user data and metadata (highly scalable, available).

Fundamental Concepts & Principles

4. Describe the CAP Theorem and its implications for distributed systems. Conceptual

Answer & Explanation

CAP Theorem: A distributed data store can only guarantee **two** of the following three properties simultaneously:

Consistency (C): All clients see the same data at the same time (e.g., all database replicas must agree).
Availability (A): Every request receives a non-error response, without guarantee that the response contains the most recent write.
Partition Tolerance (P): The system continues to operate despite arbitrary message loss or system failures (network partition).

Implication: Since P is unavoidable in a real-world distributed system, developers must choose between **C and A**. Choosing **CP** (e.g., traditional RDBMS, MongoDB) sacrifices availability during a partition. Choosing **AP** (e.g., Cassandra, DynamoDB) sacrifices immediate consistency for availability.

5. How does Load Balancing work? Explain different algorithms (Round Robin, Least Connections, etc.). Conceptual

Answer & Explanation

Load Balancing: Distributing incoming network traffic across a group of backend servers to maximize throughput, minimize latency, and prevent any single server from becoming a single point of failure (SPOF).

Algorithms:

Round Robin: Distributes requests sequentially to each server. Simple, but ignores server capacity/load.
Least Connections: Directs traffic to the server with the fewest active connections. Optimal for uneven loads.
Least Response Time: Directs traffic to the server with the lowest average response time and fewest active connections.
Hashing (IP Hash): Routes requests from the same client IP address to the same server. Important for session persistence.

6. How do you handle database sharding? Explain different sharding keys. Conceptual

Answer & Explanation

Sharding: Horizontal partitioning of a large database into smaller, independent pieces (shards) across different database server instances. This is done when vertical scaling (larger CPU/RAM) is no longer viable.

Sharding Keys: The column used to determine which shard a row belongs to (also called the partition key):

User ID / Customer ID (Hash-Based): Hashing the ID ensures load is distributed evenly but makes range queries difficult.
Geolocation: Sharding based on geographical regions. Good for localized services.
Time-Based: Sharding archives data over time (e.g., all 2024 data on shard 1).

Challenge: Choosing the wrong key leads to "hotspots" or difficult cross-shard joins.

Search All System Design Questions