Data Layer & Networking¶
The data layer is fully managed — no self-hosted databases, no Redis clusters to patch, no connection poolers to tune. ElastiCache Valkey Serverless handles all real-time state, Aurora PostgreSQL stores durable data, and RDS Proxy absorbs the connection churn from ephemeral ECS tasks.
ElastiCache Valkey Serverless¶
Valkey (Redis-compatible) serves as the real-time nervous system of the platform. Every sub-second interaction — worker heartbeats, order routing, session management — flows through it.
What It Stores¶
| Data Type | Key Pattern | TTL | Purpose |
|---|---|---|---|
| Worker heartbeat marks | worker:active:{user_id} |
30s | Presence detection. Refreshed every 5s by worker. |
| Request queues | trading:user:{user_id}:requests |
— | BLPOP-based work queue. Worker pops orders in real-time. |
| Response mailboxes | trading:response:{request_id} |
60s | API polls for worker's response after submitting an order. |
| Control messages | worker:control:{user_id}:messages |
— | Credential reload commands without restart. |
| Worker metadata | worker:metadata:{user_id} |
30s | Task ARN, instance ID, launch time. |
| Session data | session:{session_id} |
24h | User session tokens and CSRF state. |
| Webhook replay hashes | webhook:replay:{hash} |
5min | Idempotency — reject duplicate webhook deliveries. |
| Rate limit counters | ratelimit:{user_id}:{endpoint} |
Window | Sliding window rate limiting per user per endpoint. |
| ECS task cache | ecs:tasks:cache |
90s | Cached ECS task list to avoid API throttling. |
| Symbol caches | symbols:{broker}:{exchange} |
1h | Broker symbol/contract data. Avoids repeated API calls. |
Scaling Configuration¶
| Parameter | Minimum | Maximum | Scaling |
|---|---|---|---|
| Storage | 1 GB | 10 GB | Automatic |
| Compute | 1,000 ECPU | 100,000 ECPU | Automatic |
Why Serverless over Provisioned
Trading platforms have extreme traffic variance. During market hours (09:00–13:30 for Taiwan markets), thousands of workers send heartbeats every 5 seconds and orders flow continuously. After hours, traffic drops to near-zero. Provisioned ElastiCache would require sizing for peak and paying 24/7. Valkey Serverless scales ECPU with demand — off-hours cost approaches the storage minimum only.
Pricing Model¶
Valkey Serverless bills on two dimensions:
- Storage: $0.125/GB-hour for data stored
- ElastiCache Processing Units (ECPU): $0.0000098 per ECPU consumed
An ECPU roughly maps to a simple command on data < 1 KB. Commands on larger payloads or complex operations (SORT, LRANGE on long lists) consume proportionally more ECPUs.
RDS Aurora PostgreSQL¶
Aurora PostgreSQL is the system of record for all durable data. Everything that needs to survive a restart lives here.
What It Stores¶
| Table / Domain | Row Estimate (10K users) | Purpose |
|---|---|---|
users |
10K | User accounts, preferences, subscription tier |
trading_accounts |
30K (avg 3/user) | Broker credentials (AES-256-GCM encrypted), account config |
orders |
1M+ / month | Order history, status, fills, timestamps |
audit_logs |
5M+ / month | Every authenticated action with IP, user-agent, path |
sessions |
10K active | Server-side session store |
api_keys |
15K | Webhook tokens, API keys per trading account |
subscription_plans |
~10 | Plan definitions, feature flags, limits |
Scaling Tiers¶
| Users | Instance | Configuration | Estimated Cost |
|---|---|---|---|
| 1–500 | db.t3.large | Single instance, daily snapshots | ~$120/mo |
| 500–1K | db.t3.large | Multi-AZ standby | ~$240/mo |
| 1K–5K | db.r6g.large | Multi-AZ + daily snapshots, enhanced monitoring | ~$400/mo |
| 5K–10K | db.r6g.xlarge | Multi-AZ + 1 read replica | ~$800/mo |
| 10K–50K | db.r6g.2xlarge | Multi-AZ + 2 read replicas | ~$1,800/mo |
| 50K+ | db.r6g.4xlarge | Multi-AZ + 2 read replicas + provisioned IOPS | ~$4,000/mo |
Aurora Advantages
Aurora's storage automatically grows from 10 GB to 128 TB with no downtime. Replication lag to read replicas is typically < 20ms. Failover to a standby completes in < 30 seconds.
RDS Proxy¶
ECS tasks are ephemeral — workers start and stop constantly as users come and go. Without connection pooling, each task opening a PostgreSQL connection creates significant overhead (TLS handshake, authentication, memory allocation on the database).
Why RDS Proxy¶
| Problem | Without Proxy | With Proxy |
|---|---|---|
| Connection churn | New TCP + TLS for every task | Multiplexed over persistent pool |
| Connection limit | db.t3.large has ~680 max connections | App sees unlimited; proxy manages pool |
| Failover | App must detect and reconnect | Proxy handles transparently |
| Credential rotation | App restart required | Proxy picks up new credentials from Secrets Manager |
Configuration¶
| Parameter | Value |
|---|---|
| Max Connections | 80% of database max |
| Borrow Timeout | 120 seconds |
| Idle Timeout | 1800 seconds |
| Engine | PostgreSQL |
| Auth | Secrets Manager (auto-rotation capable) |
The proxy multiplexes application connections (potentially hundreds from ECS tasks) down to a smaller set of database connections, reusing them across requests. This is critical at scale — 30 workers per instance × 10 instances = 300 potential connections, but the database only sees ~50 active connections through the proxy.
VPC Architecture¶
flowchart LR
subgraph public["Public Subnets (AZ-a, AZ-b)"]
ALB["ALB"]
API["API Tasks"]
W["Worker Tasks"]
end
subgraph private["Private Subnets (AZ-a, AZ-b)"]
Proxy["RDS Proxy"]
RDS["Aurora Primary + Standby"]
Valkey["Valkey Endpoint"]
Lambda["Lambda ENIs"]
end
Internet["Internet"] --> ALB --> API
W -->|"Broker APIs"| Internet
API & W --> Proxy --> RDS
API & W & Lambda --> Valkey
Why Workers Need Public Subnets¶
Workers must reach external broker APIs (Shioaji, Fubon) over the public internet. Placing them in private subnets would require NAT Gateways — at $0.045/hr per AZ plus $0.045/GB processed. For a trading platform generating significant outbound traffic, NAT costs would exceed the EC2 instance costs. Public subnets with security groups restricting inbound traffic achieve the same security posture at zero additional cost.
Security Group Rules¶
| Rule | Source | Destination | Port | Protocol |
|---|---|---|---|---|
| ALB → API | ALB SG | API SG | 8000 | TCP |
| API → Valkey | API SG | Valkey SG | 6379 | TCP |
| Worker → Valkey | Worker SG | Valkey SG | 6379 | TCP |
| Lambda → Valkey | Lambda SG | Valkey SG | 6379 | TCP |
| API → RDS Proxy | API SG | RDS SG | 5432 | TCP |
| Worker → RDS Proxy | Worker SG | RDS SG | 5432 | TCP |
| Worker → Internet | Worker SG | 0.0.0.0/0 | 443 | TCP (outbound) |
| API → Internet | API SG | 0.0.0.0/0 | 443 | TCP (outbound) |
Least Privilege
No security group allows 0.0.0.0/0 inbound. The ALB SG only accepts 80/443 from the internet. API tasks only accept 8000 from the ALB SG. Workers accept nothing inbound — all communication is outbound to Redis and broker APIs.
ALB + WAF¶
Application Load Balancer¶
| Parameter | Value |
|---|---|
| Scheme | Internet-facing |
| TLS | ACM-managed certificate (auto-renewal) |
| Health Check Path | /health |
| Health Check Interval | 15 seconds |
| Healthy Threshold | 2 consecutive passes |
| Unhealthy Threshold | 3 consecutive failures |
| Deregistration Delay | 120 seconds |
| Idle Timeout | 60 seconds |
WAF Rules¶
| Rule | Type | Configuration | Purpose |
|---|---|---|---|
| Admin IP Restriction | IP Set | Allowlist of admin IPs | Block admin panel access from unknown IPs |
| TradingView IP Exemption | IP Set | TradingView webhook source IPs | Bypass rate limits for legitimate webhooks |
| Login Rate Limit | Rate-based | 100 requests / 5 min per IP | Prevent brute-force attacks on auth endpoints |
| Blanket Rate Limit | Rate-based | 2000 requests / 5 min per IP | General DDoS protection |
| AWS Managed — SQLi | Managed rule group | AWSManagedRulesSQLiRuleSet | SQL injection protection |
| AWS Managed — XSS | Managed rule group | AWSManagedRulesCommonRuleSet | Cross-site scripting, bad inputs |
| AWS Managed — IP Reputation | Managed rule group | AWSManagedRulesAmazonIpReputationList | Block known malicious IPs |
TradingView IP Exemption
TradingView sends webhooks from a known set of IP ranges. These are exempted from rate limiting but still pass through SQLi/XSS rules. If TradingView changes their IP ranges, webhooks will be rate-limited until the IP set is updated.
Other Services¶
KMS — Key Management¶
| Parameter | Value |
|---|---|
| Key Type | RSA-4096, asymmetric |
| Backing | HSM (hardware security module) |
| Purpose | Wrap per-user AES-256-GCM data encryption keys |
| Rotation | Automatic annual rotation |
Every trading account's broker credentials are encrypted with a unique AES-256-GCM key. That AES key is itself encrypted (wrapped) with the KMS RSA master key. Decryption requires both the encrypted data key (stored in the database) and KMS access (controlled by IAM policy). Compromising the database alone reveals nothing.
Secrets Manager¶
Stores application configuration: database credentials, API secrets, encryption parameters. Referenced by ECS task definitions and Lambda functions at startup. Supports automatic rotation.
ECR — Container Registry¶
| Repository | Purpose | Lifecycle |
|---|---|---|
shioaji-api |
API container image (FastAPI + Gunicorn) | Keep last 10 tagged images |
shioaji-worker |
Worker container image (broker SDKs) | Keep last 10 tagged images |
Images are built in CI/CD, scanned for vulnerabilities, and pushed to ECR. ECS pulls from ECR at task launch.
Route 53¶
DNS management for 4pass.io. A-record alias to the ALB. Health checks integrated with ALB health status.
VPC Endpoint — Lambda¶
| Parameter | Value |
|---|---|
| Type | Interface endpoint |
| Service | com.amazonaws.{region}.elasticache |
| AZs | 3 (matching Lambda subnets) |
Lambda functions run inside the VPC to access Valkey. The VPC endpoint provides private connectivity without routing through the internet, reducing latency and improving security.