CI/CD & Deployment¶
Overview¶
4pass uses three GitHub Actions workflows for automated deployment to AWS ECS, with multi-stage Docker builds, OIDC authentication (no stored AWS credentials), and rolling deployments with health verification.
graph LR
Dev["Developer"] -->|git push / dispatch| GHA["GitHub Actions"]
GHA -->|OIDC Auth| AWS["AWS STS"]
GHA -->|Build + Push| ECR["Amazon ECR"]
GHA -->|Update Task Def| ECS["Amazon ECS"]
ECS -->|Rolling Deploy| Service["ECS Service"]
Service -->|Health Check| ALB["ALB /health"]
Deployment Pipelines¶
Pipeline Overview¶
| Workflow | Trigger | Target | Strategy |
|---|---|---|---|
deploy-api.yml |
Manual dispatch | ECS API Service | Rolling update, 10-min stability wait |
deploy-worker.yml |
Manual dispatch | ECR Worker Image | Image push only (workers are ephemeral) |
deploy-ec2.yml |
Manual dispatch | EC2 instances | Capacity provider update |
All workflows use workflow_dispatch (manual trigger) for controlled deployments. No auto-deploy on push — trading platforms require deliberate releases.
API Deployment Flow¶
sequenceDiagram
participant Dev as Developer
participant GHA as GitHub Actions
participant ECR as Amazon ECR
participant ECS as Amazon ECS
participant ALB as ALB Health Check
Dev->>GHA: Trigger deploy-api workflow
GHA->>GHA: Checkout code
GHA->>GHA: Build Vue.js frontend (npm build)
GHA->>GHA: Build multi-stage Docker image
GHA->>ECR: Push image (SHA tag + latest)
GHA->>ECS: Register new task definition
GHA->>ECS: Update service (rolling deploy)
loop Every 15s for 10 minutes
ECS->>ALB: Health check /health
ALB-->>ECS: 200 OK
end
GHA->>GHA: Verify running task uses new definition
GHA->>GHA: Verify /health returns 200
Key Features:
- Multi-stage build: Node.js builds Vue frontend → Python runtime with FastAPI
- OIDC authentication: No AWS access keys stored in GitHub — uses
aws-actions/configure-aws-credentialswith OIDC - Rolling deployment: Minimum 100% healthy, maximum 200% — zero downtime
- Circuit breaker: Auto-rollback on deployment failure
- Verification: Confirms new task definition is running + health endpoint responds
Docker Builds¶
API Image (Dockerfile)¶
graph TB
subgraph stage1["Stage 1: Frontend Build"]
Node["Node.js 20"] --> NPM["npm install + build"]
NPM --> Dist["dist/ (Vue SPA)"]
end
subgraph stage2["Stage 2: Python Runtime"]
Python["Python 3.11-slim"] --> Deps["pip install requirements.txt"]
Deps --> App["Copy app/ + frontend dist"]
App --> Health["Health check /health"]
end
Dist --> App
| Property | Value |
|---|---|
| Base image | python:3.11-slim |
| Frontend | Vue 3 SPA built with Vite |
| Server | Gunicorn + Uvicorn (16 workers, preload) |
| User | Non-root (appuser via gosu) |
| Secrets | AWS Secrets Manager loaded at startup via entrypoint.sh |
| Health check | curl /health |
Worker Image (Dockerfile.worker)¶
The worker image is aggressively optimized for fast startup and minimal footprint:
| Metric | Before | After | Improvement |
|---|---|---|---|
| Image size | 541 MB | 254 MB | 53% smaller |
| Startup time | 0.75s | 0.43s | 43% faster |
| AWS SDK | Full boto3 | Trimmed to ECS/STS/Logs/SQS only | ~60% smaller |
graph TB
subgraph stage1["Stage 1: Builder"]
B1["Install all dependencies"]
B1 --> B2["Compile .pyc bytecode"]
end
subgraph stage2["Stage 2: Optimizer"]
O1["Remove pip, setuptools, pygments"]
O1 --> O2["Trim botocore to 5 services"]
O2 --> O3["Strip __pycache__, tests, docs"]
end
subgraph stage3["Stage 3: Runtime"]
R1["python:3.11-slim base"]
R1 --> R2["Copy optimized site-packages"]
R2 --> R3["Pre-compiled bytecode"]
R3 --> R4["254 MB final image"]
end
stage1 --> stage2 --> stage3
Optimizations Applied:
- 3-stage build: Builder → Optimizer → Runtime
- Pre-compiled bytecode:
compileallduring build, not at runtime - Trimmed boto3/botocore: Only ECS, Secrets Manager, STS, Logs, SQS services kept
- No pip/setuptools: Not needed at runtime
- No AWS CLI: Uses boto3 directly
Authentication & Security¶
OIDC (No Stored Credentials)¶
sequenceDiagram
participant GHA as GitHub Actions
participant STS as AWS STS
participant IAM as IAM Role
GHA->>STS: AssumeRoleWithWebIdentity (OIDC token)
STS->>IAM: Verify trust policy (repo: fullpass-4pass/4pass)
IAM-->>STS: Temporary credentials (1 hour)
STS-->>GHA: Access Key + Secret + Session Token
GHA->>GHA: Use credentials for ECR, ECS operations
- No long-lived AWS credentials in GitHub Secrets
- IAM role trust policy restricts to specific repository
- Temporary credentials expire after 1 hour
- Follows AWS security best practices
Image Tagging¶
| Tag | Format | Purpose |
|---|---|---|
| SHA tag | sha-abc1234 |
Immutable, traceable to commit |
latest |
Always updated | Convenience for manual runs |
ECR Configuration¶
| Repository | Purpose | Lifecycle |
|---|---|---|
shioaji-api |
API + frontend image | Keep last 10 images |
shioaji-worker |
Trading worker image | Keep last 10 images |
GitHub Pages custom domain (docs.4pass.io)¶
The documentation site is served at https://docs.4pass.io via GitHub Pages with a custom domain. Setup is split between AWS Route 53 (DNS) and GitHub (Pages config).
1. AWS Route 53 (this repo’s Terraform)¶
DNS for docs.4pass.io is managed in the same Route 53 hosted zone as 4pass.io. Terraform creates a CNAME record:
- Name:
docs.4pass.io(record name in zone:docs) - Target:
fullpass-4pass.github.io(configurable viagithub_pages_cname_target)
After changing Terraform, apply and verify:
cd terraform && terraform plan -target=aws_route53_record.github_pages_docs
terraform apply -target=aws_route53_record.github_pages_docs
dig CNAME docs.4pass.io # should show fullpass-4pass.github.io
2. GitHub repository settings¶
In the repo that publishes the docs (e.g. fullpass-4pass/4pass or the repo that has “Pages” enabled):
- Go to Settings → Pages.
- Under Custom domain, enter:
docs.4pass.io. - Click Save. GitHub will verify DNS (CNAME to
fullpass-4pass.github.io). - Once verified, enable Enforce HTTPS if desired.
The repo must have a CNAME file in the branch/folder used for Pages (e.g. docs/CNAME with content docs.4pass.io). This repo already has docs/CNAME with that value.
3. Troubleshooting: “Your connection is not private” / ERR_CERT_COMMON_NAME_INVALID¶
If the browser shows a certificate error for docs.4pass.io, traffic is likely hitting the 4pass.io ALB (whose cert does not include docs.4pass.io). Fix:
-
Verify DNS — From your machine:
dig docs.4pass.io CNAME +short
Expected:fullpass-4pass.github.io.
If you see an A record or a different target, DNS is wrong. -
Remove any A/ALIAS record for
docs— In Route 53 → Hosted zone 4pass.io, if there is an A or ALIAS record fordocsordocs.4pass.iopointing at the ALB, delete it. Only the CNAMEdocs→fullpass-4pass.github.ioshould exist. -
Re-apply Terraform (CNAME uses
name = "docs"andallow_overwrite = true): -
Wait for GitHub — In Settings → Pages, “DNS Check in Progress” must change to a green check. Only then will GitHub issue the certificate and “Enforce HTTPS” become available. This can take a few minutes after DNS is correct.
4. Summary¶
| Where | What |
|---|---|
| Route 53 | CNAME docs (→ docs.4pass.io) → fullpass-4pass.github.io (Terraform: aws_route53_record.github_pages_docs) |
| GitHub | Pages → Custom domain: docs.4pass.io; after DNS check passes, enable Enforce HTTPS |
| Repo | docs/CNAME with docs.4pass.io (so builds keep the custom domain) |
Deployment Checklist¶
Pre-Deployment
- All tests passing locally
- Database migrations applied (
alembic upgrade head) - Environment variables updated in Secrets Manager
- Worker image compatible with new API (if breaking changes)
Post-Deployment
- Verify
/healthendpoint returns 200 - Check CloudWatch for error spikes
- Monitor SQS DLQ for failed messages
- Verify worker pool is healthy (maintenance Lambda logs)