Skip to content

CI/CD & Deployment

Overview

4pass uses three GitHub Actions workflows for automated deployment to AWS ECS, with multi-stage Docker builds, OIDC authentication (no stored AWS credentials), and rolling deployments with health verification.

graph LR
    Dev["Developer"] -->|git push / dispatch| GHA["GitHub Actions"]
    GHA -->|OIDC Auth| AWS["AWS STS"]
    GHA -->|Build + Push| ECR["Amazon ECR"]
    GHA -->|Update Task Def| ECS["Amazon ECS"]
    ECS -->|Rolling Deploy| Service["ECS Service"]
    Service -->|Health Check| ALB["ALB /health"]

Deployment Pipelines

Pipeline Overview

Workflow Trigger Target Strategy
deploy-api.yml Manual dispatch ECS API Service Rolling update, 10-min stability wait
deploy-worker.yml Manual dispatch ECR Worker Image Image push only (workers are ephemeral)
deploy-ec2.yml Manual dispatch EC2 instances Capacity provider update

All workflows use workflow_dispatch (manual trigger) for controlled deployments. No auto-deploy on push — trading platforms require deliberate releases.

API Deployment Flow

sequenceDiagram
    participant Dev as Developer
    participant GHA as GitHub Actions
    participant ECR as Amazon ECR
    participant ECS as Amazon ECS
    participant ALB as ALB Health Check

    Dev->>GHA: Trigger deploy-api workflow
    GHA->>GHA: Checkout code
    GHA->>GHA: Build Vue.js frontend (npm build)
    GHA->>GHA: Build multi-stage Docker image
    GHA->>ECR: Push image (SHA tag + latest)
    GHA->>ECS: Register new task definition
    GHA->>ECS: Update service (rolling deploy)
    loop Every 15s for 10 minutes
        ECS->>ALB: Health check /health
        ALB-->>ECS: 200 OK
    end
    GHA->>GHA: Verify running task uses new definition
    GHA->>GHA: Verify /health returns 200

Key Features:

  • Multi-stage build: Node.js builds Vue frontend → Python runtime with FastAPI
  • OIDC authentication: No AWS access keys stored in GitHub — uses aws-actions/configure-aws-credentials with OIDC
  • Rolling deployment: Minimum 100% healthy, maximum 200% — zero downtime
  • Circuit breaker: Auto-rollback on deployment failure
  • Verification: Confirms new task definition is running + health endpoint responds

Docker Builds

API Image (Dockerfile)

graph TB
    subgraph stage1["Stage 1: Frontend Build"]
        Node["Node.js 20"] --> NPM["npm install + build"]
        NPM --> Dist["dist/ (Vue SPA)"]
    end

    subgraph stage2["Stage 2: Python Runtime"]
        Python["Python 3.11-slim"] --> Deps["pip install requirements.txt"]
        Deps --> App["Copy app/ + frontend dist"]
        App --> Health["Health check /health"]
    end

    Dist --> App
Property Value
Base image python:3.11-slim
Frontend Vue 3 SPA built with Vite
Server Gunicorn + Uvicorn (16 workers, preload)
User Non-root (appuser via gosu)
Secrets AWS Secrets Manager loaded at startup via entrypoint.sh
Health check curl /health

Worker Image (Dockerfile.worker)

The worker image is aggressively optimized for fast startup and minimal footprint:

Metric Before After Improvement
Image size 541 MB 254 MB 53% smaller
Startup time 0.75s 0.43s 43% faster
AWS SDK Full boto3 Trimmed to ECS/STS/Logs/SQS only ~60% smaller
graph TB
    subgraph stage1["Stage 1: Builder"]
        B1["Install all dependencies"]
        B1 --> B2["Compile .pyc bytecode"]
    end

    subgraph stage2["Stage 2: Optimizer"]
        O1["Remove pip, setuptools, pygments"]
        O1 --> O2["Trim botocore to 5 services"]
        O2 --> O3["Strip __pycache__, tests, docs"]
    end

    subgraph stage3["Stage 3: Runtime"]
        R1["python:3.11-slim base"]
        R1 --> R2["Copy optimized site-packages"]
        R2 --> R3["Pre-compiled bytecode"]
        R3 --> R4["254 MB final image"]
    end

    stage1 --> stage2 --> stage3

Optimizations Applied:

  • 3-stage build: Builder → Optimizer → Runtime
  • Pre-compiled bytecode: compileall during build, not at runtime
  • Trimmed boto3/botocore: Only ECS, Secrets Manager, STS, Logs, SQS services kept
  • No pip/setuptools: Not needed at runtime
  • No AWS CLI: Uses boto3 directly

Authentication & Security

OIDC (No Stored Credentials)

sequenceDiagram
    participant GHA as GitHub Actions
    participant STS as AWS STS
    participant IAM as IAM Role

    GHA->>STS: AssumeRoleWithWebIdentity (OIDC token)
    STS->>IAM: Verify trust policy (repo: fullpass-4pass/4pass)
    IAM-->>STS: Temporary credentials (1 hour)
    STS-->>GHA: Access Key + Secret + Session Token
    GHA->>GHA: Use credentials for ECR, ECS operations
  • No long-lived AWS credentials in GitHub Secrets
  • IAM role trust policy restricts to specific repository
  • Temporary credentials expire after 1 hour
  • Follows AWS security best practices

Image Tagging

Tag Format Purpose
SHA tag sha-abc1234 Immutable, traceable to commit
latest Always updated Convenience for manual runs

ECR Configuration

Repository Purpose Lifecycle
shioaji-api API + frontend image Keep last 10 images
shioaji-worker Trading worker image Keep last 10 images

GitHub Pages custom domain (docs.4pass.io)

The documentation site is served at https://docs.4pass.io via GitHub Pages with a custom domain. Setup is split between AWS Route 53 (DNS) and GitHub (Pages config).

1. AWS Route 53 (this repo’s Terraform)

DNS for docs.4pass.io is managed in the same Route 53 hosted zone as 4pass.io. Terraform creates a CNAME record:

  • Name: docs.4pass.io (record name in zone: docs)
  • Target: fullpass-4pass.github.io (configurable via github_pages_cname_target)

After changing Terraform, apply and verify:

cd terraform && terraform plan -target=aws_route53_record.github_pages_docs
terraform apply -target=aws_route53_record.github_pages_docs
dig CNAME docs.4pass.io   # should show fullpass-4pass.github.io

2. GitHub repository settings

In the repo that publishes the docs (e.g. fullpass-4pass/4pass or the repo that has “Pages” enabled):

  1. Go to Settings → Pages.
  2. Under Custom domain, enter: docs.4pass.io.
  3. Click Save. GitHub will verify DNS (CNAME to fullpass-4pass.github.io).
  4. Once verified, enable Enforce HTTPS if desired.

The repo must have a CNAME file in the branch/folder used for Pages (e.g. docs/CNAME with content docs.4pass.io). This repo already has docs/CNAME with that value.

3. Troubleshooting: “Your connection is not private” / ERR_CERT_COMMON_NAME_INVALID

If the browser shows a certificate error for docs.4pass.io, traffic is likely hitting the 4pass.io ALB (whose cert does not include docs.4pass.io). Fix:

  1. Verify DNS — From your machine: dig docs.4pass.io CNAME +short
    Expected: fullpass-4pass.github.io.
    If you see an A record or a different target, DNS is wrong.

  2. Remove any A/ALIAS record for docs — In Route 53 → Hosted zone 4pass.io, if there is an A or ALIAS record for docs or docs.4pass.io pointing at the ALB, delete it. Only the CNAME docsfullpass-4pass.github.io should exist.

  3. Re-apply Terraform (CNAME uses name = "docs" and allow_overwrite = true):

    cd terraform && terraform apply -target=aws_route53_record.github_pages_docs
    

  4. Wait for GitHub — In Settings → Pages, “DNS Check in Progress” must change to a green check. Only then will GitHub issue the certificate and “Enforce HTTPS” become available. This can take a few minutes after DNS is correct.

4. Summary

Where What
Route 53 CNAME docs (→ docs.4pass.io) → fullpass-4pass.github.io (Terraform: aws_route53_record.github_pages_docs)
GitHub Pages → Custom domain: docs.4pass.io; after DNS check passes, enable Enforce HTTPS
Repo docs/CNAME with docs.4pass.io (so builds keep the custom domain)

Deployment Checklist

Pre-Deployment

  1. All tests passing locally
  2. Database migrations applied (alembic upgrade head)
  3. Environment variables updated in Secrets Manager
  4. Worker image compatible with new API (if breaking changes)

Post-Deployment

  1. Verify /health endpoint returns 200
  2. Check CloudWatch for error spikes
  3. Monitor SQS DLQ for failed messages
  4. Verify worker pool is healthy (maintenance Lambda logs)