Skip to content

Infrastructure

Planned infrastructure improvements: compute (Graviton, Reserved Instances), database (Aurora Serverless v2), cache (Valkey cluster), and cross-region disaster recovery.


Graviton Migration (r6g / m6g)

Current Target Savings Trigger
r6i.large ($0.152/hr) r6g.large (~$0.122/hr) ~20% 10K+ users
m6i.xlarge ($0.24/hr) m6g.xlarge (~$0.192/hr) ~20% 10K+ users

Why Not Now?

Graviton (ARM64) requires validating all Python dependencies compile on ARM. The Shioaji SDK currently distributes x86-only wheels. Gate.io SDK needs testing. Migration is low-risk but requires dedicated testing effort.

Steps: 1. Build ARM64 Docker images in CI/CD 2. Validate Shioaji SDK ARM compatibility (or use x86 emulation layer) 3. Run parallel ARM + x86 capacity providers 4. Gradually shift traffic to ARM instances 5. Decommission x86 instances


Reserved Instances / Savings Plans

Tier Strategy Estimated Savings
1K users 1-year RI for always-on baseline ~37% on 2-3 instances
5K users 1-year RI for peak capacity ~$680/month saved
50K users 3-year RI + Savings Plans ~57% on all EC2

Aurora Serverless v2

Replace instance-based RDS with Aurora Serverless v2 for automatic scaling:

graph LR
    subgraph current["Current (Instance-Based)"]
        RDS1["db.t3.large<br/>Fixed capacity<br/>Manual scaling"]
    end

    subgraph future["Future (Serverless v2)"]
        RDS2["Aurora Serverless v2<br/>0.5 - 128 ACU<br/>Auto-scaling"]
    end

    current -->|"Migration at 1K users"| future

Benefits: - Auto-scales ACU (Aurora Capacity Units) with load - Scales to zero during off-hours (cost savings) - No manual instance type changes - Seamless Multi-AZ failover

Trade-off: Higher per-ACU cost than Reserved Instances at sustained high load.


ElastiCache Valkey Cluster Mode

Scale Current Future
1-10K Serverless (auto-scaling) Continue serverless
50K+ Serverless Provisioned cluster (3-6 shards)

At 50K+ users with 17,500 concurrent workers, provisioned cluster mode provides better price-performance than serverless at sustained high throughput.


Cross-Region Disaster Recovery

At 50K+ users, deploy cross-region backup:

  • Aurora Global Database (read replica in secondary region)
  • ElastiCache Global Datastore
  • S3 cross-region replication for backups
  • Route 53 failover routing

Overview · System Architecture · System Features & Product · DevOps & Quality