Skip to content

Data Layer & Networking

The data layer is fully managed — no self-hosted databases, no Redis clusters to patch, no connection poolers to tune. ElastiCache Valkey Serverless handles all real-time state, Aurora PostgreSQL stores durable data, and RDS Proxy absorbs the connection churn from ephemeral ECS tasks.


ElastiCache Valkey Serverless

Valkey (Redis-compatible) serves as the real-time nervous system of the platform. Every sub-second interaction — worker heartbeats, order routing, session management — flows through it.

What It Stores

Data Type Key Pattern TTL Purpose
Worker heartbeat marks worker:active:{user_id} 30s Presence detection. Refreshed every 5s by worker.
Request queues trading:user:{user_id}:requests BLPOP-based work queue. Worker pops orders in real-time.
Response mailboxes trading:response:{request_id} 60s API polls for worker's response after submitting an order.
Control messages worker:control:{user_id}:messages Credential reload commands without restart.
Worker metadata worker:metadata:{user_id} 30s Task ARN, instance ID, launch time.
Session data session:{session_id} 24h User session tokens and CSRF state.
Webhook replay hashes webhook:replay:{hash} 5min Idempotency — reject duplicate webhook deliveries.
Rate limit counters ratelimit:{user_id}:{endpoint} Window Sliding window rate limiting per user per endpoint.
ECS task cache ecs:tasks:cache 90s Cached ECS task list to avoid API throttling.
Symbol caches symbols:{broker}:{exchange} 1h Broker symbol/contract data. Avoids repeated API calls.

Scaling Configuration

Parameter Minimum Maximum Scaling
Storage 1 GB 10 GB Automatic
Compute 1,000 ECPU 100,000 ECPU Automatic

Why Serverless over Provisioned

Trading platforms have extreme traffic variance. During market hours (09:00–13:30 for Taiwan markets), thousands of workers send heartbeats every 5 seconds and orders flow continuously. After hours, traffic drops to near-zero. Provisioned ElastiCache would require sizing for peak and paying 24/7. Valkey Serverless scales ECPU with demand — off-hours cost approaches the storage minimum only.

Pricing Model

Valkey Serverless bills on two dimensions:

  • Storage: $0.125/GB-hour for data stored
  • ElastiCache Processing Units (ECPU): $0.0000098 per ECPU consumed

An ECPU roughly maps to a simple command on data < 1 KB. Commands on larger payloads or complex operations (SORT, LRANGE on long lists) consume proportionally more ECPUs.


RDS Aurora PostgreSQL

Aurora PostgreSQL is the system of record for all durable data. Everything that needs to survive a restart lives here.

What It Stores

Table / Domain Row Estimate (10K users) Purpose
users 10K User accounts, preferences, subscription tier
trading_accounts 30K (avg 3/user) Broker credentials (AES-256-GCM encrypted), account config
orders 1M+ / month Order history, status, fills, timestamps
audit_logs 5M+ / month Every authenticated action with IP, user-agent, path
sessions 10K active Server-side session store
api_keys 15K Webhook tokens, API keys per trading account
subscription_plans ~10 Plan definitions, feature flags, limits

Scaling Tiers

Users Instance Configuration Estimated Cost
1–500 db.t3.large Single instance, daily snapshots ~$120/mo
500–1K db.t3.large Multi-AZ standby ~$240/mo
1K–5K db.r6g.large Multi-AZ + daily snapshots, enhanced monitoring ~$400/mo
5K–10K db.r6g.xlarge Multi-AZ + 1 read replica ~$800/mo
10K–50K db.r6g.2xlarge Multi-AZ + 2 read replicas ~$1,800/mo
50K+ db.r6g.4xlarge Multi-AZ + 2 read replicas + provisioned IOPS ~$4,000/mo

Aurora Advantages

Aurora's storage automatically grows from 10 GB to 128 TB with no downtime. Replication lag to read replicas is typically < 20ms. Failover to a standby completes in < 30 seconds.


RDS Proxy

ECS tasks are ephemeral — workers start and stop constantly as users come and go. Without connection pooling, each task opening a PostgreSQL connection creates significant overhead (TLS handshake, authentication, memory allocation on the database).

Why RDS Proxy

Problem Without Proxy With Proxy
Connection churn New TCP + TLS for every task Multiplexed over persistent pool
Connection limit db.t3.large has ~680 max connections App sees unlimited; proxy manages pool
Failover App must detect and reconnect Proxy handles transparently
Credential rotation App restart required Proxy picks up new credentials from Secrets Manager

Configuration

Parameter Value
Max Connections 80% of database max
Borrow Timeout 120 seconds
Idle Timeout 1800 seconds
Engine PostgreSQL
Auth Secrets Manager (auto-rotation capable)

The proxy multiplexes application connections (potentially hundreds from ECS tasks) down to a smaller set of database connections, reusing them across requests. This is critical at scale — 30 workers per instance × 10 instances = 300 potential connections, but the database only sees ~50 active connections through the proxy.


VPC Architecture

flowchart LR
    subgraph public["Public Subnets (AZ-a, AZ-b)"]
        ALB["ALB"]
        API["API Tasks"]
        W["Worker Tasks"]
    end

    subgraph private["Private Subnets (AZ-a, AZ-b)"]
        Proxy["RDS Proxy"]
        RDS["Aurora Primary + Standby"]
        Valkey["Valkey Endpoint"]
        Lambda["Lambda ENIs"]
    end

    Internet["Internet"] --> ALB --> API
    W -->|"Broker APIs"| Internet
    API & W --> Proxy --> RDS
    API & W & Lambda --> Valkey

Why Workers Need Public Subnets

Workers must reach external broker APIs (Shioaji, Fubon) over the public internet. Placing them in private subnets would require NAT Gateways — at $0.045/hr per AZ plus $0.045/GB processed. For a trading platform generating significant outbound traffic, NAT costs would exceed the EC2 instance costs. Public subnets with security groups restricting inbound traffic achieve the same security posture at zero additional cost.

Security Group Rules

Rule Source Destination Port Protocol
ALB → API ALB SG API SG 8000 TCP
API → Valkey API SG Valkey SG 6379 TCP
Worker → Valkey Worker SG Valkey SG 6379 TCP
Lambda → Valkey Lambda SG Valkey SG 6379 TCP
API → RDS Proxy API SG RDS SG 5432 TCP
Worker → RDS Proxy Worker SG RDS SG 5432 TCP
Worker → Internet Worker SG 0.0.0.0/0 443 TCP (outbound)
API → Internet API SG 0.0.0.0/0 443 TCP (outbound)

Least Privilege

No security group allows 0.0.0.0/0 inbound. The ALB SG only accepts 80/443 from the internet. API tasks only accept 8000 from the ALB SG. Workers accept nothing inbound — all communication is outbound to Redis and broker APIs.


ALB + WAF

Application Load Balancer

Parameter Value
Scheme Internet-facing
TLS ACM-managed certificate (auto-renewal)
Health Check Path /health
Health Check Interval 15 seconds
Healthy Threshold 2 consecutive passes
Unhealthy Threshold 3 consecutive failures
Deregistration Delay 120 seconds
Idle Timeout 60 seconds

WAF Rules

Rule Type Configuration Purpose
Admin IP Restriction IP Set Allowlist of admin IPs Block admin panel access from unknown IPs
TradingView IP Exemption IP Set TradingView webhook source IPs Bypass rate limits for legitimate webhooks
Login Rate Limit Rate-based 100 requests / 5 min per IP Prevent brute-force attacks on auth endpoints
Blanket Rate Limit Rate-based 2000 requests / 5 min per IP General DDoS protection
AWS Managed — SQLi Managed rule group AWSManagedRulesSQLiRuleSet SQL injection protection
AWS Managed — XSS Managed rule group AWSManagedRulesCommonRuleSet Cross-site scripting, bad inputs
AWS Managed — IP Reputation Managed rule group AWSManagedRulesAmazonIpReputationList Block known malicious IPs

TradingView IP Exemption

TradingView sends webhooks from a known set of IP ranges. These are exempted from rate limiting but still pass through SQLi/XSS rules. If TradingView changes their IP ranges, webhooks will be rate-limited until the IP set is updated.


Other Services

KMS — Key Management

Parameter Value
Key Type RSA-4096, asymmetric
Backing HSM (hardware security module)
Purpose Wrap per-user AES-256-GCM data encryption keys
Rotation Automatic annual rotation

Every trading account's broker credentials are encrypted with a unique AES-256-GCM key. That AES key is itself encrypted (wrapped) with the KMS RSA master key. Decryption requires both the encrypted data key (stored in the database) and KMS access (controlled by IAM policy). Compromising the database alone reveals nothing.

Secrets Manager

Stores application configuration: database credentials, API secrets, encryption parameters. Referenced by ECS task definitions and Lambda functions at startup. Supports automatic rotation.

ECR — Container Registry

Repository Purpose Lifecycle
shioaji-api API container image (FastAPI + Gunicorn) Keep last 10 tagged images
shioaji-worker Worker container image (broker SDKs) Keep last 10 tagged images

Images are built in CI/CD, scanned for vulnerabilities, and pushed to ECR. ECS pulls from ECR at task launch.

Route 53

DNS management for 4pass.io. A-record alias to the ALB. Health checks integrated with ALB health status.

VPC Endpoint — Lambda

Parameter Value
Type Interface endpoint
Service com.amazonaws.{region}.elasticache
AZs 3 (matching Lambda subnets)

Lambda functions run inside the VPC to access Valkey. The VPC endpoint provides private connectivity without routing through the internet, reducing latency and improving security.