Home / Notebooks / Architecture
Architecture
intermediate

System Design Essentials

Core concepts and patterns for designing scalable systems

April 20, 2026
Updated regularly

System Design Essentials

Fundamental concepts and patterns for designing scalable, reliable systems.

Core Principles

Scalability

Vertical Scaling (Scale Up)

  • Add more CPU, RAM, or storage to existing machines
  • Simpler to implement
  • Has hardware limits
  • Single point of failure
  • Horizontal Scaling (Scale Out)

  • Add more machines to the system
  • Better fault tolerance
  • Virtually unlimited scaling
  • Requires load balancing
  • Reliability

    Availability = Uptime / (Uptime + Downtime)

    AvailabilityDowntime/YearUse Case
    99% (2 nines)3.65 daysInternal tools
    99.9% (3 nines)8.76 hoursStandard services
    99.99% (4 nines)52.56 minutesCritical services
    99.999% (5 nines)5.26 minutesMission critical

    Performance

    Key Metrics:

  • Latency: Time to process a request
  • Throughput: Requests processed per second
  • Response Time: Latency + network delay
  • System Components

    Load Balancer

    Distributes incoming traffic across multiple servers.

    Algorithms:

  • Round Robin: Distributes sequentially
  • Least Connections: Sends to server with fewest connections
  • IP Hash: Routes based on client IP
  • Weighted: Distributes based on server capacity
  • Types:

  • Layer 4 (Transport): Routes based on IP/port
  • Layer 7 (Application): Routes based on HTTP headers/content
  • Caching

    Store frequently accessed data in fast storage.

    Cache Levels:
    Client Cache → CDN → Application Cache → Database Cache
    

    Strategies:

    Cache-Aside (Lazy Loading)
    1. Check cache
    2. If miss, query database
    3. Store in cache
    4. Return data
    
    Write-Through
    1. Write to cache
    2. Write to database synchronously
    3. Return success
    
    Write-Behind (Write-Back)
    1. Write to cache
    2. Queue database write
    3. Write to database asynchronously
    

    Eviction Policies:

  • LRU (Least Recently Used)
  • LFU (Least Frequently Used)
  • FIFO (First In First Out)
  • TTL (Time To Live)
  • Database

    SQL (Relational)

  • Strong consistency (ACID)
  • Complex queries (JOINs)
  • Fixed schema
  • Vertical scaling primarily
  • NoSQL

  • Eventual consistency (BASE)
  • Flexible schema
  • Horizontal scaling
  • Specialized for specific use cases
  • Types:

  • Document: MongoDB, Firestore
  • Key-Value: Redis, DynamoDB
  • Column: Cassandra, HBase
  • Graph: Neo4j, Neptune
  • Message Queue

    Asynchronous communication between services.

    Benefits:

  • Decoupling services
  • Load buffering
  • Fault tolerance
  • Async processing
  • Patterns:

  • Point-to-Point: One producer → One consumer
  • Pub/Sub: One producer → Multiple subscribers
  • Request-Reply: Two-way communication
  • Examples:

  • RabbitMQ: Advanced routing, AMQP
  • Kafka: High throughput, event streaming
  • SQS: Managed, AWS-native
  • Design Patterns

    Microservices vs Monolith

    Monolith:

  • Single deployable unit
  • Simpler development
  • Easier debugging
  • Tight coupling
  • Microservices:

  • Independent services
  • Technology flexibility
  • Independent scaling
  • Complex operations
  • API Gateway

    Single entry point for all client requests.

    Responsibilities:

  • Request routing
  • Authentication/Authorization
  • Rate limiting
  • Request/Response transformation
  • Logging and monitoring
  • Service Discovery

    Client-Side Discovery:
    Client → Service Registry → Service Instance
    
    Server-Side Discovery:
    Client → Load Balancer → Service Registry → Service Instance
    

    Tools:

  • Consul: Service mesh, health checking
  • Eureka: Netflix service registry
  • etcd: Distributed key-value store
  • Circuit Breaker

    Prevent cascading failures.

    States:

  • Closed: Normal operation
  • Open: Reject requests, return error
  • Half-Open: Test with limited requests
  • Implementation:
    class CircuitBreaker:
        def __init__(self, threshold=5, timeout=60):
            self.failure_count = 0
            self.threshold = threshold
            self.timeout = timeout
            self.state = "CLOSED"
            self.last_failure_time = None
        
        def call(self, func):
            if self.state == "OPEN":
                if time.time() - self.last_failure_time > self.timeout:
                    self.state = "HALF_OPEN"
                else:
                    raise Exception("Circuit breaker is OPEN")
            
            try:
                result = func()
                self.on_success()
                return result
            except Exception as e:
                self.on_failure()
                raise e
    

    Rate Limiting

    Control request rate to prevent abuse.

    Algorithms:

    Token Bucket
    - Bucket holds tokens
    - Tokens added at fixed rate
    - Request consumes token
    - Reject if no tokens available
    
    Leaky Bucket
    - Requests enter bucket
    - Process at fixed rate
    - Overflow requests rejected
    
    Fixed Window
    - Count requests per time window
    - Reset counter at window end
    - Simple but has boundary issues
    
    Sliding Window
    - Track requests with timestamps
    - Count in sliding time window
    - More accurate, higher memory
    

    Data Management

    Database Sharding

    Split data across multiple databases.

    Strategies:

    Horizontal Sharding (Range-Based)
    User ID 1-1000 → Shard 1
    User ID 1001-2000 → Shard 2
    
    Hash-Based Sharding
    Shard = hash(user_id) % num_shards
    
    Geographic Sharding
    US users → US Shard
    EU users → EU Shard
    

    Challenges:

  • Complex queries across shards
  • Rebalancing data
  • Hotspot shards
  • Database Replication

    Master-Slave (Primary-Replica)
    Write → Master → Replicate → Slaves
    Read → Slaves (load balanced)
    
    Master-Master (Multi-Master)
    Write → Master 1 ↔ Master 2
    - Active-active setup
    - Conflict resolution needed
    

    Benefits:

  • Read scalability
  • High availability
  • Geographic distribution
  • CAP Theorem

    You can only have 2 of 3:

    C (Consistency)

  • All nodes see same data
  • A (Availability)

  • Every request gets response
  • P (Partition Tolerance)

  • System works despite network failures
  • Real-world choices:

  • CP: MongoDB, HBase (consistency over availability)
  • AP: Cassandra, DynamoDB (availability over consistency)
  • CA: Traditional RDBMS (assumes no partitions)
  • System Design Process

    1. Requirements

    Functional:

  • What features does the system need?
  • What are the core user flows?
  • Non-Functional:

  • Scale: Users, requests/sec, data size
  • Performance: Latency, throughput
  • Availability: Uptime requirements
  • Consistency: Strong vs eventual
  • 2. Capacity Estimation

    Example: URL Shortener

    Traffic:
    - 100M new URLs per month
    - Read:Write = 100:1
    - Write: 100M / (30 days × 86400 sec) ≈ 40 URLs/sec
    - Read: 40 × 100 = 4000 URLs/sec
    
    Storage:
    - 100M URLs × 12 months × 5 years = 6B URLs
    - Average URL size: 500 bytes
    - Total: 6B × 500 bytes = 3 TB
    
    Bandwidth:
    - Write: 40 URLs/sec × 500 bytes = 20 KB/sec
    - Read: 4000 URLs/sec × 500 bytes = 2 MB/sec
    
    Cache:
    - 80-20 rule: 20% URLs = 80% traffic
    - Cache: 4000 req/sec × 86400 sec = 345M requests/day
    - 20% of daily: 69M URLs × 500 bytes = 35 GB
    

    3. API Design

    POST /api/v1/urls
      Body: { "long_url": "https://example.com/very/long/url" }
      Response: { "short_url": "https://short.ly/abc123" }
    
    GET /api/v1/urls/{short_code}
      Response: 302 Redirect to long_url
    
    DELETE /api/v1/urls/{short_code}
      Response: 204 No Content
    

    4. High-Level Design

    Client
      ↓
    CDN / Load Balancer
      ↓
    API Gateway
      ↓
    Application Servers (Stateless)
      ↓
    Cache (Redis)
      ↓
    Database (Primary + Replicas)
      ↓
    Object Storage (S3)
    

    5. Detailed Design

    Focus on:

  • Data models
  • Algorithms
  • Component interactions
  • Data flow
  • 6. Bottlenecks & Trade-offs

    Identify:

  • Single points of failure
  • Performance bottlenecks
  • Scalability limits
  • Solutions:

  • Add redundancy
  • Implement caching
  • Scale horizontally
  • Optimize queries
  • Common System Designs

    URL Shortener

    Key Components:

  • Hash function (Base62 encoding)
  • Database (URL mappings)
  • Cache (Hot URLs)
  • Analytics (Click tracking)
  • Approach:
    1. Generate unique short code (hash or counter)
    2. Store mapping in database
    3. Cache popular URLs
    4. Redirect with 301/302
    

    Notification System

    Types:

  • Push notifications (mobile)
  • SMS
  • Email
  • In-app notifications
  • Architecture:
    Event → Message Queue → Notification Service → Provider API
                                 ↓
                          User Preferences DB
    

    Rate Limiter

    Requirements:

  • Accurately limit requests
  • Low latency
  • Distributed
  • Fault tolerant
  • Implementation:
    Client → Rate Limiter (Redis) → API Server
             - Store counters per user/IP
             - Sliding window algorithm
             - Return 429 if exceeded
    

    News Feed

    Components:

  • Feed generation (push/pull/hybrid)
  • Ranking algorithm
  • Storage (posts, media)
  • Cache (user feeds)
  • Fanout Approaches:

    Fanout on Write (Push)
    Post created → Write to all followers' feeds
    + Fast reads
    - Slow writes for popular users
    
    Fanout on Read (Pull)
    User requests feed → Fetch from followed users
    + Fast writes
    - Slow reads
    
    Hybrid
    - Push for regular users
    - Pull for celebrities
    - Best of both worlds
    

    Chat System

    Features:

  • One-on-one messaging
  • Group chat
  • Online status
  • Message persistence
  • Technologies:

  • WebSocket (real-time bidirectional)
  • Message queue (delivery)
  • Database (message history)
  • Redis (online status)
  • Best Practices

    1. Start Simple

    Begin with monolith, scale to microservices when needed.

    2. Design for Failure

  • Assume components will fail
  • Implement retries with exponential backoff
  • Use circuit breakers
  • Add health checks
  • 3. Monitor Everything

    Metrics:

  • Request rate, latency, error rate
  • CPU, memory, disk usage
  • Database connections, query time
  • Tools:

  • Prometheus: Metrics collection
  • Grafana: Visualization
  • ELK Stack: Logging
  • Jaeger: Distributed tracing
  • 4. Use Asynchronous Processing

  • Offload heavy tasks to background jobs
  • Improve user experience
  • Better resource utilization
  • 5. Security

  • Authentication (Who are you?)
  • Authorization (What can you do?)
  • Encryption (TLS, data at rest)
  • Rate limiting
  • Input validation
  • Resources

  • Books:
  • - "Designing Data-Intensive Applications" by Martin Kleppmann - "System Design Interview" by Alex Xu
  • Websites:
  • - High Scalability - System Design Primer
  • Practice:
  • - Pramp - Exponent

    Topics

    System DesignArchitectureScalability

    Found This Helpful?

    If you have questions or suggestions for improving these notes, I'd love to hear from you.