Infrastructure¶
Planned infrastructure improvements: compute (Graviton, Reserved Instances), database (Aurora Serverless v2), cache (Valkey cluster), and cross-region disaster recovery.
Graviton Migration (r6g / m6g)¶
| Current | Target | Savings | Trigger |
|---|---|---|---|
| r6i.large ($0.152/hr) | r6g.large (~$0.122/hr) | ~20% | 10K+ users |
| m6i.xlarge ($0.24/hr) | m6g.xlarge (~$0.192/hr) | ~20% | 10K+ users |
Why Not Now?
Graviton (ARM64) requires validating all Python dependencies compile on ARM. The Shioaji SDK currently distributes x86-only wheels. Gate.io SDK needs testing. Migration is low-risk but requires dedicated testing effort.
Steps: 1. Build ARM64 Docker images in CI/CD 2. Validate Shioaji SDK ARM compatibility (or use x86 emulation layer) 3. Run parallel ARM + x86 capacity providers 4. Gradually shift traffic to ARM instances 5. Decommission x86 instances
Reserved Instances / Savings Plans¶
| Tier | Strategy | Estimated Savings |
|---|---|---|
| 1K users | 1-year RI for always-on baseline | ~37% on 2-3 instances |
| 5K users | 1-year RI for peak capacity | ~$680/month saved |
| 50K users | 3-year RI + Savings Plans | ~57% on all EC2 |
Aurora Serverless v2¶
Replace instance-based RDS with Aurora Serverless v2 for automatic scaling:
graph LR
subgraph current["Current (Instance-Based)"]
RDS1["db.t3.large<br/>Fixed capacity<br/>Manual scaling"]
end
subgraph future["Future (Serverless v2)"]
RDS2["Aurora Serverless v2<br/>0.5 - 128 ACU<br/>Auto-scaling"]
end
current -->|"Migration at 1K users"| future
Benefits: - Auto-scales ACU (Aurora Capacity Units) with load - Scales to zero during off-hours (cost savings) - No manual instance type changes - Seamless Multi-AZ failover
Trade-off: Higher per-ACU cost than Reserved Instances at sustained high load.
ElastiCache Valkey Cluster Mode¶
| Scale | Current | Future |
|---|---|---|
| 1-10K | Serverless (auto-scaling) | Continue serverless |
| 50K+ | Serverless | Provisioned cluster (3-6 shards) |
At 50K+ users with 17,500 concurrent workers, provisioned cluster mode provides better price-performance than serverless at sustained high throughput.
Cross-Region Disaster Recovery¶
At 50K+ users, deploy cross-region backup:
- Aurora Global Database (read replica in secondary region)
- ElastiCache Global Datastore
- S3 cross-region replication for backups
- Route 53 failover routing
Overview · System Architecture · System Features & Product · DevOps & Quality