06 — Development Implementation Steps
Version 1.2 · 2026-04-28 · Adventive Platform Engineering · Confidential
Operational, command-level walkthrough for executing the signed plan (
Adventive_Public_API_Cloudflare_Migration_Plan.pdf, v2, 2026-04-22). Each phase is treated as a discrete unit of work with prerequisites, commands, deliverables, validation, and a six-pillar design callout (Redundancy · Resiliency · Disaster Recovery · Backup · Deployment Strategy · Observability).Changelog: v1.2 (2026-04-28) — Sizing revised: stg drops to 1 instance / 1 AZ (parity with dev); only prd retains multi-AZ + 2 replicas. Added §1.1.1 with concrete, ready-to-commit Image Builder component YAML and an
aws-ia/ec2-image-builderTerraform reference so the pipeline build is copy-paste, not from-scratch. v1.1 (2026-04-28) — Phase 1 rewritten: hand-launched EC2 + manualcloudflaredinstall replaced with EC2 Image Builder + Auto Scaling Group + replica HA. Added “Tunnel maintenance and updates” section covering zero-downtime rolling updates forcloudflaredand the underlying Linux. Tunnel-per-environment confirmed as a locked decision with rationale. v1.0 (2026-04-28) — Initial implementation walkthrough.
Document scope
Section titled “Document scope”This chapter does not change the plan. It is the implementation surface for Phases 0 through 8 against the existing repository:
- Local repo:
/Users/jlambert/Repositories/GitHub/Adventive/adventive-public-api-worker/ - Intended GitHub slug:
Adventive/adventive-public-api-worker - Implementation driver: Claude Code (Cowork’s role ends with this chapter)
- Plan reference:
../public-api-cf-migration/Adventive_Public_API_Cloudflare_Migration_Plan.pdf - Workers SOP:
../../platform/cloudflare/workers/Adventive_Cloudflare_Workers_SOP.pdf
If a step in this chapter conflicts with the signed plan, the plan wins.
Open an ADR under decisions/ and update both before proceeding.
How to use this chapter
Section titled “How to use this chapter”- Phases run in order. Each phase has explicit prerequisites; do not skip.
- Phases 0–3 and 8 are executed by Jeffrey directly (infrastructure,
provisioning, cutover). Phases 4–7 are executed in Claude Code from the
handoff package (
handoff/CLAUDE_CODE_KICKOFF.md). - Mandatory pause checkpoints: end of Phase 5 and end of Phase 6. Claude Code
stops and waits for
continue. - Every command shown should be considered a first-pass; verify against current Cloudflare docs (URL inline) before executing in production.
Pre-flight prerequisites (one-time)
Section titled “Pre-flight prerequisites (one-time)”The following must be true before Phase 0 begins. Treat this as a gate.
| Check | Verification | Owner |
|---|---|---|
| Cloudflare account on Workers Paid plan | dashboard → Workers → Plans (Hyperdrive + Durable Objects require Paid) | Jeffrey |
| All three zones active in Cloudflare | adventive.dev, adventivestg.com, adventive.com | Jeffrey |
wrangler CLI installed and authed | wrangler whoami returns the platform service account | Jeffrey |
| Cloudflare API token (CI scope) | scoped to: Workers Scripts:Edit, Hyperdrive:Edit, Tunnel:Edit; stored in 1Password under “CF — adventive-public-api-worker CI” | Jeffrey |
| AWS access to Aurora VPCs (dev / stg / prd) | sufficient to launch EC2 instances in each via ASG | Jeffrey |
| Read-only Aurora user provisioned | per cluster (console, aggregate) — Hyperdrive will use these | Jeffrey |
| AWS EC2 Image Builder available in target region | dashboard → Image Builder; service-linked role created | Jeffrey |
| AWS Secrets Manager access for tunnel credentials | one secret per env: /adventive/cloudflared/dev, …/stg, …/prd | Jeffrey |
| AWS SSM Parameter Store available | for AMI version pointer at /adventive/cloudflared/ami-id-latest | Jeffrey |
| Terraform backend configured | S3 + DynamoDB lock table, scoped per env; module infra/cloudflared/ will live in this repo | Jeffrey |
| New Relic Workers + EC2 integration ready | account ID + license key; CloudWatch → NR log subscription operational | Jeffrey |
| Local repo present | ~/Repositories/GitHub/Adventive/adventive-public-api-worker/ with git init, wrangler.toml stub, package.json | Confirmed 2026-04-28 |
wrangler hyperdrive --help works | minimum Wrangler 4.x | Jeffrey |
| Codeowners + branch protection ready (post-push) | Adventive/adventive-public-api-worker repo creation deferred to Phase 7 | Jeffrey |
Cloudflare references: Workers Plans · Wrangler install · API tokens
Phase 0 — Pre-migration cleanup (existing PHP repo)
Section titled “Phase 0 — Pre-migration cleanup (existing PHP repo)”Touches only the existing CodeIgniter PHP application; no Cloudflare work. Done first to stop the codebase from drifting while the rewrite is in flight.
- In the legacy PHP repo, remove the
data_warehouseblock fromapplication/config/database.jsonfor all three environments (lines 99–121 dev, 260–282 prod, 377–399 stg). No model loads$CI->load->database('data_warehouse')— verified during plan authoring. - Rotate every MySQL credential currently committed in plaintext in
database.json. Move to environment variables or AWS Secrets Manager. Force-rotate even the dev set; assume the file’s history is compromised. - Confirm Cloudflare DNS for the three zones resolves and the existing API hosts have low-TTL records (≤ 300 s) staged for cutover.
- Tag the legacy repo:
pre-cloudflare-migration-2026-04so we have a known rollback point.
Validation
Section titled “Validation”git grep data_warehousereturns nothing in the PHP repo.- All MySQL connection strings in CI/CD secret stores have been rotated; old values revoked at the Aurora cluster.
dig +short api.adventive.comand the staging/dev equivalents return the current PHP origin with TTL ≤ 300.
Six-pillar callout — Phase 0
Section titled “Six-pillar callout — Phase 0”| Pillar | Treatment |
|---|---|
| Redundancy | No new redundancy. Existing PHP origin remains the sole serving path. |
| Resiliency | Reduced exposure: removing dead data_warehouse config eliminates a config-load risk path. |
| Disaster Recovery | Git tag pre-cloudflare-migration-2026-04 is the rollback anchor for the entire migration. |
| Backup | Aurora automated backups already configured (verify retention ≥ 7 d on each cluster before proceeding). |
| Deployment Strategy | Routine PHP deploy via existing pipeline; no Cloudflare deploy in this phase. |
| Observability | Confirm legacy New Relic APM tags remain in place — they become the baseline that the new Worker is compared against during Phase 8. |
Phase 1 — Cloudflare Tunnel infrastructure
Section titled “Phase 1 — Cloudflare Tunnel infrastructure”Bridge each Aurora VPC to Cloudflare’s edge so Hyperdrive can reach the private DB endpoints without a public listener. One dedicated tunnel per environment (3 tunnels total). Each tunnel is served by an Auto Scaling Group of cloudflared replicas built from an EC2 Image Builder AMI.
This phase is more elaborate than the original plan called for. The
upgrade exists because the tunnel is a foundational, long-lived dependency:
hand-rolled EC2 instances are unacceptable for something every Worker DB
call traverses. The whole pattern is captured in the standing memory
Cloudflare Tunnel infra pattern (Adventive standard) and will graduate to
docs/platform/cloudflare/tunnel/ once this rollout completes.
Cloudflare reference: Cloudflare Tunnel
· cloudflared install
· TCP routing
· Run as replica
AWS reference: EC2 Image Builder · Auto Scaling Group instance refresh · SSM Parameter Store
1.0 — Architectural decisions (locked)
Section titled “1.0 — Architectural decisions (locked)”| Decision | Choice | Rationale |
|---|---|---|
| Tunnel separation | Three tunnels — one per env (dev, stg, prd) | Credential blast-radius isolation; rotate prod creds without touching dev/stg; per-env audit log; each tunnel’s config.yml only carries its own ingress; cost is identical (tunnels are free). |
| Compute platform | EC2 + ASG (sizing per-env, see Sizing row), not Fargate | Matches existing Adventive AWS footprint; single networking model with Aurora VPCs; SSM/CloudWatch agents already standardized. Re-evaluate if Adventive adopts ECS broadly. |
| AMI build | EC2 Image Builder, weekly cadence | Single mechanism updates BOTH cloudflared and the Linux base; reproducible, signed AMI; eliminates drift across env-specific instances. |
| HA model | Replica registration (multiple cloudflared instances ↔ same tunnel UUID) | Native Cloudflare feature; edge load-balances and fails over automatically; foundation for zero-downtime updates. |
| Sizing | dev & stg: 1 × t3.micro / 1 AZ; prd: 2 × t3.small / 2 AZs | dev and stg are non-customer-facing and tolerate brief downtime during update; cost-optimize there with t3.micro and a single AZ. Only prd carries customer traffic — it requires ≥ 2 replicas across 2 AZs for AZ-failure resilience and rolling-update headroom. |
| Credential storage | AWS Secrets Manager (primary) + Cloudflare Secrets Store (secondary) + 1Password (break-glass) | Three-tier; user-data pulls from Secrets Manager at boot; AMI itself is environment-agnostic. |
Why not one tunnel for dev+stg? Cost savings are zero (Cloudflare Tunnel is a free product). The savings argument is operational simplicity, but sharing a tunnel couples credential rotation, ingress edits, and audit logs across environments — that’s a real cost paid every time someone touches the dev config. Strict per-env separation is the standing default.
1.1 — Build the cloudflared AMI in EC2 Image Builder
Section titled “1.1 — Build the cloudflared AMI in EC2 Image Builder”Single Image Builder pipeline produces a versioned AMI consumed by all three environments.
EC2 Image Builder pipeline: adv-cflared-pipeline├── Source AMI: Ubuntu 22.04 LTS (Canonical, latest)├── Components (build phase):│ ├── adv-cflared-base (apt update; unattended-upgrades; clock sync)│ ├── adv-cflared-cloudflared (install cloudflared from pkg.cloudflare.com apt repo)│ ├── adv-cflared-systemd-unit (drop /etc/systemd/system/cloudflared.service.d/override.conf)│ ├── adv-cflared-bootstrap-script (drop /usr/local/sbin/cflared-bootstrap.sh)│ ├── adv-cflared-cloudwatch-agent (install + base config)│ └── adv-cflared-ssm-agent (verified present; SSM-managed)├── Test phase:│ ├── cloudflared --version returns ≥ 2025.x│ ├── systemctl list-unit-files includes cloudflared.service│ └── ssm-cli get-instance-information returns ManagedInstance├── Distribution: shared AMI in target region (us-east-1 first; us-west-2 for DR)├── Output:│ └── SSM Parameter Store: /adventive/cloudflared/ami-id-latest│ (Lambda hook updates this on successful pipeline run)└── Schedule: cron(0 6 ? * TUE *) # weekly Tuesday 06:00 UTCThe AMI does not contain tunnel credentials. Bootstrap happens at instance launch. AMI itself is environment-agnostic and reusable across dev/stg/prd.
1.1.1 — Prebuilt components and Terraform reference (do not write from scratch)
Section titled “1.1.1 — Prebuilt components and Terraform reference (do not write from scratch)”The pipeline is not hand-authored. AWS publishes a maintained
Terraform module that builds the pipeline, and most of the components
above already exist as AWS-managed components. Only the
adv-cflared-cloudflared and adv-cflared-bootstrap-script components
are Adventive-specific — both have full source below.
Terraform module to use:
module "cflared_pipeline" { source = "aws-ia/ec2-image-builder/aws" version = "~> 1.5"
pipeline_name = "adv-cflared-pipeline" schedule_expression = "cron(0 6 ? * TUE *)" recipe_name = "adv-cflared-recipe" recipe_version = "1.0.0" parent_image = "arn:aws:imagebuilder:us-east-1:aws:image/ubuntu-server-22-lts-x86/x.x.x"
# AWS-managed components — reuse as-is, no editing needed managed_components = [ "arn:aws:imagebuilder:us-east-1:aws:component/update-linux/x.x.x", "arn:aws:imagebuilder:us-east-1:aws:component/aws-cli-version-2-linux/x.x.x", "arn:aws:imagebuilder:us-east-1:aws:component/amazon-cloudwatch-agent-linux/x.x.x", "arn:aws:imagebuilder:us-east-1:aws:component/amazon-ssm-agent-linux/x.x.x", "arn:aws:imagebuilder:us-east-1:aws:component/cis-level-1-ubuntu-22-04-lts/x.x.x", ]
# Adventive-authored components — both sources below custom_components = [ aws_imagebuilder_component.cflared.arn, aws_imagebuilder_component.bootstrap.arn, ]
output_ami_ssm_parameter = "/adventive/cloudflared/ami-id-latest" ami_lifecycle_retain_count = 4}The module wires the EventBridge rule on pipeline success, publishes
the AMI ID to SSM, and applies the AMI lifecycle policy. Module repo:
https://github.com/aws-ia/terraform-aws-ec2-image-builder.
Custom component 1 — adv-cflared-cloudflared.yml:
name: adv-cflared-cloudflareddescription: Install cloudflared from the official Cloudflare APT repo.schemaVersion: 1.0phases: - name: build steps: - name: AddCloudflareAptRepo action: ExecuteBash inputs: commands: - sudo mkdir -p --mode=0755 /usr/share/keyrings - curl -fsSL https://pkg.cloudflare.com/cloudflare-main.gpg | sudo tee /usr/share/keyrings/cloudflare-main.gpg > /dev/null - echo 'deb [signed-by=/usr/share/keyrings/cloudflare-main.gpg] https://pkg.cloudflare.com/cloudflared jammy main' | sudo tee /etc/apt/sources.list.d/cloudflared.list - sudo apt-get update - name: InstallCloudflared action: ExecuteBash inputs: commands: - sudo apt-get install -y cloudflared - name: DropSystemdOverride action: CreateFile inputs: - path: /etc/systemd/system/cloudflared.service.d/override.conf permissions: 0644 content: | [Service] ExecStart= ExecStart=/usr/bin/cloudflared tunnel --config /etc/cloudflared/config.yml run Restart=always RestartSec=5s - name: validate steps: - name: VersionCheck action: ExecuteBash inputs: commands: - cloudflared --version | grep -E '^cloudflared version 20' - name: UnitPresent action: ExecuteBash inputs: commands: - systemctl list-unit-files | grep -q '^cloudflared.service'Custom component 2 — adv-cflared-bootstrap-script.yml:
name: adv-cflared-bootstrap-scriptdescription: Drop the env-resolving cflared-bootstrap.sh into /usr/local/sbin.schemaVersion: 1.0phases: - name: build steps: - name: DropBootstrap action: S3Download inputs: - source: s3://adventive-platform-artifacts/cflared/cflared-bootstrap.sh destination: /usr/local/sbin/cflared-bootstrap.sh - name: SetExec action: ExecuteBash inputs: commands: - sudo chmod 0755 /usr/local/sbin/cflared-bootstrap.sh - name: WireOnBoot action: CreateFile inputs: - path: /etc/systemd/system/cflared-bootstrap.service permissions: 0644 content: | [Unit] Description=Adventive cloudflared bootstrap (resolve env, fetch creds) Before=cloudflared.service [Service] Type=oneshot ExecStart=/usr/local/sbin/cflared-bootstrap.sh [Install] WantedBy=cloudflared.service - name: EnableUnit action: ExecuteBash inputs: commands: - sudo systemctl enable cflared-bootstrap.serviceWhat this means in practice: Phase 1 build is roughly one
terraform apply once the bootstrap script is uploaded to S3 and the
two YAML components are committed. The pipeline self-runs on its
weekly schedule from that point forward.
1.2 — Provision tunnels in Cloudflare
Section titled “1.2 — Provision tunnels in Cloudflare”Done once per env via Cloudflare API or cloudflared tunnel create from
an admin workstation. Capture the tunnel UUID and credentials JSON.
# Run from a credentialed admin shell (not on the future ASG instances)cloudflared tunnel logincloudflared tunnel create adv-aurora-tunnel-devcloudflared tunnel create adv-aurora-tunnel-stgcloudflared tunnel create adv-aurora-tunnel-prdEach command emits a credentials JSON. Store each in all three tiers:
# AWS Secrets Manager (primary — used by user-data)aws secretsmanager create-secret \ --name /adventive/cloudflared/dev \ --secret-string file://dev-creds.json
# Cloudflare Secrets Store (secondary — recovery)wrangler secrets-store secret create \ --store-id adv-cflared-secrets \ --name tunnel-creds-dev \ --value "$(cat dev-creds.json)"
# 1Password (tertiary — break-glass; manual)Securely shred the local credentials JSON files. Repeat for stg and prd.
1.3 — Author the per-env tunnel config.yml
Section titled “1.3 — Author the per-env tunnel config.yml”Stored in AWS Secrets Manager alongside the credentials so the bootstrap script can fetch both atomically.
# adventive/cloudflared/dev — config.ymltunnel: <dev-tunnel-uuid>credentials-file: /etc/cloudflared/<dev-tunnel-uuid>.jsonmetrics: 0.0.0.0:2000 # exposed for ASG health checkingress: - hostname: aurora-console-dev.internal.adventive.com service: tcp://console-cluster.dev.cluster.us-east-1.rds.amazonaws.com:3306 - hostname: aurora-aggregate-dev.internal.adventive.com service: tcp://aggregate-cluster.dev.cluster.us-east-1.rds.amazonaws.com:3306 - service: http_status:4041.4 — Bootstrap script (baked into the AMI)
Section titled “1.4 — Bootstrap script (baked into the AMI)”/usr/local/sbin/cflared-bootstrap.sh runs once on first boot via the
systemd cloudflared.service.d/override.conf ExecStartPre hook. It
queries instance metadata for the env tag, fetches the matching secret,
writes config, and starts cloudflared.
#!/usr/bin/env bashset -euo pipefailENV=$(curl -s http://169.254.169.254/latest/meta-data/tags/instance/adv:env)REGION=$(curl -s http://169.254.169.254/latest/meta-data/placement/region)SECRET="/adventive/cloudflared/${ENV}"
aws secretsmanager get-secret-value \ --region "${REGION}" \ --secret-id "${SECRET}" \ --query SecretString --output text \ | jq -r '.config_yaml' > /etc/cloudflared/config.yml
aws secretsmanager get-secret-value \ --region "${REGION}" \ --secret-id "${SECRET}" \ --query SecretString --output text \ | jq -r '.credentials_json' > /etc/cloudflared/$(jq -r .TunnelID /etc/cloudflared/config.yml).json
chmod 0600 /etc/cloudflared/*.json1.5 — Launch Template + ASG (Terraform module)
Section titled “1.5 — Launch Template + ASG (Terraform module)”A single Terraform module infra/cloudflared/ parameterized by env is
applied three times (dev, stg, prd workspaces). Key bits:
locals { # Sizing per env: dev & stg are non-customer-facing (single AZ, t3.micro); # only prd carries customer traffic and gets multi-AZ + 2 replicas. is_prod = var.env == "prd" instance_sz = local.is_prod ? "t3.small" : "t3.micro" asg_min = local.is_prod ? 2 : 1 asg_desired = local.is_prod ? 2 : 1 asg_max = local.is_prod ? 4 : 2 asg_subnets = local.is_prod ? [var.subnet_az_a, var.subnet_az_b] : [var.subnet_az_a]}
resource "aws_launch_template" "cflared" { name_prefix = "adv-cflared-${var.env}-" # AMI resolved from SSM at apply time image_id = data.aws_ssm_parameter.cflared_ami.value instance_type = local.instance_sz
iam_instance_profile { name = aws_iam_instance_profile.cflared.name } vpc_security_group_ids = [aws_security_group.cflared.id] metadata_options { http_tokens = "required"; instance_metadata_tags = "enabled" }
tag_specifications { resource_type = "instance" tags = { "adv:env" = var.env, Name = "adv-cflared-${var.env}", OwnedBy = "platform" } }}
resource "aws_autoscaling_group" "cflared" { name = "adv-cflared-${var.env}" min_size = local.asg_min max_size = local.asg_max desired_capacity = local.asg_desired vpc_zone_identifier = local.asg_subnets health_check_type = "EC2" health_check_grace_period = 180
launch_template { id = aws_launch_template.cflared.id version = "$Latest" }
instance_refresh { strategy = "Rolling" preferences { # In dev/stg the ASG runs a single instance, so refresh launches a new # one before terminating the old (count goes 1 → 2 → 1). Replica HA # holds during the swap. In prd, with 2 replicas, count goes 2 → 3 → 2. min_healthy_percentage = 50 instance_warmup = 120 auto_rollback = true } triggers = ["launch_template"] }}The security group allows outbound 443/TCP to 0.0.0.0/0 (Cloudflare
edge), outbound 3306/TCP to the two Aurora cluster SGs only, and no
inbound rules.
1.6 — Wire AMI auto-promotion
Section titled “1.6 — Wire AMI auto-promotion”EventBridge rule fires on Image Builder pipeline success → Lambda updates
/adventive/cloudflared/ami-id-latest → updates each environment’s
launch-template version → triggers ASG instance refresh per env on a
staggered schedule (dev Tue 07:00, stg Tue 09:00, prd Tue 11:00 UTC). Each
refresh is gated by the prior env’s success (Step Functions state machine
adv-cflared-rolling-update).
Validation
Section titled “Validation”aws imagebuilder list-image-pipelinesshowsadv-cflared-pipeline./adventive/cloudflared/ami-id-latestresolves to a current AMI.aws autoscaling describe-auto-scaling-groupsreturns three groups; each reportsHealthyInstances == DesiredCapacityfor ≥ 5 minutes.- Cloudflare dashboard → Networks → Tunnels shows each tunnel status
Healthywith the expected replica count (1 dev, 1 stg, 2 prd). - From a peer EC2:
mysql -h aurora-console-prd.internal.adventive.com -u ro_user -pconnects via the tunnel. wrangler hyperdrivetest connection (Phase 2 prerequisite) succeeds.
Deliverables
Section titled “Deliverables”- Image Builder pipeline + 6 components committed under
infra/imagebuilder/. - Terraform module
infra/cloudflared/with three workspaces applied. - Three Cloudflare tunnels with their UUIDs and credentials secured in three credential tiers.
- ADR
decisions/2026-MM-DD-tunnel-infra-pattern.mdcapturing the locked decisions in §1.0 and any deviations specific to Adventive’s environment. - EventBridge → Lambda → Step Functions wiring for AMI auto-promotion.
Six-pillar callout — Phase 1
Section titled “Six-pillar callout — Phase 1”| Pillar | Treatment |
|---|---|
| Redundancy | prd ASG runs ≥ 2 replicas across 2 AZs; Cloudflare edge load-balances replica registrations on the same tunnel UUID; ASG self-replaces failed instances. dev/stg run a single replica in a single AZ — brief gap during instance replacement is acceptable for non-customer-facing traffic. |
| Resiliency | Outbound-only tunnel (no inbound SG rules on Aurora); cloudflared auto-reconnects to edge; ASG auto_rollback reverts a bad AMI promotion; instance refresh MinHealthyPercentage=50 guarantees ≥ 1 healthy replica throughout. |
| Disaster Recovery | RTO ≤ 5 min for instance loss (ASG auto-launch). RTO ≤ 30 min for AMI rollback (revert SSM param + manual instance refresh). Multi-region playbook: build AMI in us-west-2 monthly; instructions in §1.7 maintenance section. RPO = 0 (cloudflared is stateless). |
| Backup | Three-tier credential storage (Secrets Manager / CF Secrets Store / 1Password). Image Builder retains four most recent AMI revisions. Terraform state in S3 + DynamoDB lock. ASG and launch-template versions retained indefinitely. |
| Deployment Strategy | Weekly Image Builder run → SSM param update → Step Functions orchestrates env-staggered ASG instance refresh (dev → stg → prd, gated by prior success). Manual emergency promotion via terraform apply + ASG instance refresh trigger. |
| Observability | cloudflared metrics endpoint :2000 scraped by CloudWatch agent; logs forwarded CloudWatch → New Relic. Custom CloudWatch metrics: cflared_replica_count_per_tunnel, cflared_systemd_restart_count, imagebuilder_ami_age_days. Alarms: replica count below expected for 2 min (page); restart frequency > 3/h per instance (page); AMI age > 21 days (warn — pipeline failure). |
Tunnel maintenance and updates (cross-cutting, applies post-Phase 1)
Section titled “Tunnel maintenance and updates (cross-cutting, applies post-Phase 1)”This section describes how routine and unplanned changes to the tunnel infrastructure happen without taking the public API offline.
Routine: weekly Linux + cloudflared update (zero-downtime)
Section titled “Routine: weekly Linux + cloudflared update (zero-downtime)”This is the default mechanism. No human in the loop.
- Tuesday 06:00 UTC — Image Builder pipeline runs; pulls current
Ubuntu LTS, latest cloudflared from
pkg.cloudflare.com, runs the build + test phases. - Pipeline success — EventBridge fires; Lambda writes the new AMI ID
to
/adventive/cloudflared/ami-id-latest. - Step Functions:
adv-cflared-rolling-updateorchestrates:- dev refresh at Tue 07:00 UTC. ASG instance refresh triggers
because the launch template’s resolved AMI changed. With desired=1
and
MinHealthyPercentage=50, ASG actually launches a new instance to bring count to 2, waits 120 s for warm-up, then terminates the old instance — momentarily 2 replicas, never 0. - stg refresh at Tue 09:00 UTC, gated by dev success. Same single-replica mechanic as dev: count goes 1 → 2 → 1.
- prd refresh at Tue 11:00 UTC, gated by stg success. With desired=2, count goes 2 → 3 → 2 (one new replica added before the first old replica terminates), repeated for the second old replica.
- dev refresh at Tue 07:00 UTC. ASG instance refresh triggers
because the launch template’s resolved AMI changed. With desired=1
and
- Each refresh — observability dashboards must show:
- tunnel replica count never < 1
- cloudflared restart frequency normal
- Worker error rate flat Failure of any check pauses the Step Functions state machine and pages on-call.
Net effect: cloudflared and the Linux OS update weekly, with no operator action and no observable customer impact.
Unplanned: emergency cloudflared CVE
Section titled “Unplanned: emergency cloudflared CVE”A vendor-disclosed CVE that pre-dates the next scheduled pipeline run:
- Trigger Image Builder pipeline manually:
aws imagebuilder start-image-pipeline-execution --image-pipeline-arn … - On success, manually advance the SSM param (or wait for the EventBridge handler).
- Manually invoke
adv-cflared-rolling-updateStep Functions execution with input{"acceleration": "fast"}which collapses inter-env wait to 15 minutes. - Total RTO ~ 90 minutes from CVE disclosure to all envs patched.
Unplanned: cloudflared running but tunnel reports unhealthy
Section titled “Unplanned: cloudflared running but tunnel reports unhealthy”Health check signal mismatch — daemon up, but tunnel registration broken.
- CloudWatch alarm
cflared_replica_count_per_tunnel < expectedfires. - ASG marks the affected instance unhealthy via custom health check
(Lambda evaluates the metric and calls
SetInstanceHealthwithUnhealthy). - ASG terminates and replaces the instance — fresh boot rebuilds tunnel registration from scratch.
- If condition repeats across replacements, page on-call to investigate tunnel-side issues at Cloudflare (likely creds rotated upstream or tunnel deleted).
Unplanned: AMI rollback
Section titled “Unplanned: AMI rollback”A new AMI passes Image Builder tests but fails post-deploy in dev (e.g., cloudflared regression Cloudflare didn’t catch).
- ASG
auto_rollback = truereverts dev automatically when health checks fail. Step Functions halts the schedule before stg/prd refresh fires. - On-call manually overwrites SSM param to the previous AMI ID:
aws ssm put-parameter --name /adventive/cloudflared/ami-id-latest --value <previous> --overwrite. - Manually trigger ASG instance refresh per env to converge on the pinned AMI.
Routine: rotating tunnel credentials
Section titled “Routine: rotating tunnel credentials”Quarterly or post-incident:
cloudflared tunnel token --cred-file new-creds.json adv-aurora-tunnel-{env}from admin shell.- Update Secrets Manager secret in place
(
aws secretsmanager update-secret --secret-id …). - Trigger ASG instance refresh for that env only (
aws autoscaling start-instance-refresh --auto-scaling-group-name adv-cflared-{env}). - New replicas pick up new creds at boot; old replicas drain. The tunnel UUID does not change — Hyperdrive configurations remain valid.
Routine: changing tunnel ingress
Section titled “Routine: changing tunnel ingress”Adding a new internal hostname (e.g., a future analytics replica):
- Edit the env’s Secrets Manager secret to append the new ingress block.
- Trigger ASG instance refresh for that env.
- New replicas come up with the updated config; cloudflared replicas process new ingress on next start.
Capacity scaling
Section titled “Capacity scaling”If sustained CPU > 60 % on cloudflared instances (an indicator of high
TCP throughput), bump desired_capacity and max_size in Terraform —
ASG launches additional replicas; tunnel automatically uses them.
Phase 2 — Hyperdrive provisioning (6 configurations)
Section titled “Phase 2 — Hyperdrive provisioning (6 configurations)”One Hyperdrive resource per (cluster, environment). Hyperdrive holds the
DB credentials; the Worker never sees them.
Cloudflare reference: Hyperdrive overview · Create with Wrangler · Hyperdrive + Tunnel
# Console cluster — three environmentswrangler hyperdrive create adv-svc-public-api-console-dev \ --connection-string="mysql://ro_user:DEV_PASS@aurora-console-dev.internal.adventive.com:3306/console"wrangler hyperdrive create adv-svc-public-api-console-stg \ --connection-string="mysql://ro_user:STG_PASS@aurora-console-stg.internal.adventive.com:3306/console"wrangler hyperdrive create adv-svc-public-api-console-prd \ --connection-string="mysql://ro_user:PRD_PASS@aurora-console-prd.internal.adventive.com:3306/console"
# Aggregate cluster — three environmentswrangler hyperdrive create adv-svc-public-api-aggregate-dev \ --connection-string="mysql://ro_user:DEV_PASS@aurora-aggregate-dev.internal.adventive.com:3306/aggregate_ro"wrangler hyperdrive create adv-svc-public-api-aggregate-stg \ --connection-string="mysql://ro_user:STG_PASS@aurora-aggregate-stg.internal.adventive.com:3306/aggregate_ro"wrangler hyperdrive create adv-svc-public-api-aggregate-prd \ --connection-string="mysql://ro_user:PRD_PASS@aurora-aggregate-prd.internal.adventive.com:3306/aggregate_ro"Each command returns a Hyperdrive ID. Capture all six and update
wrangler.toml in the repo, replacing the six REPLACE_WITH_*_ID
placeholders. Commit with message: chore(infra): wire Hyperdrive IDs for console + aggregate (dev/stg/prd).
Validation
Section titled “Validation”wrangler hyperdrive listreturns six configurations.- A throwaway Worker with
DB_CONSOLEandDB_AGGREGATEbindings canSELECT 1against each — proves end-to-end (Worker → Hyperdrive → Tunnel → Aurora). wrangler.tomlno longer containsREPLACE_WITH_*_IDplaceholders;wrangler deploy --dry-run --env devpasses.
Six-pillar callout — Phase 2
Section titled “Six-pillar callout — Phase 2”| Pillar | Treatment |
|---|---|
| Redundancy | Hyperdrive itself is a globally-distributed pool. Pair with the dual-AZ tunnel (Phase 1) for end-to-end redundancy. |
| Resiliency | Pool absorbs cold-start cost and handles transient DB blips. Set application-level timeout < 5 s; let Hyperdrive surface back-pressure. |
| Disaster Recovery | Hyperdrive can be re-provisioned in minutes using the captured wrangler hyperdrive create commands. Aurora point-in-time recovery is the upstream DR control. |
| Backup | Aurora automated backups + manual snapshot prior to first prod traffic. Hyperdrive stores no data — credentials are the only artifact (kept in Cloudflare Secrets Store). |
| Deployment Strategy | Hyperdrive ID changes are environment-scoped in wrangler.toml. Treat ID rotation (e.g., credential rotation) as a wrangler deploy --env <env> event with version routing fallback. |
| Observability | Enable Hyperdrive metrics in the Cloudflare dashboard; ship to New Relic via Logpush. Alarms: pool-acquire latency p95 > 100 ms, error rate > 0.5%. |
Phase 3 — Auth helper Worker (adv-svc-auth-helper)
Section titled “Phase 3 — Auth helper Worker (adv-svc-auth-helper)”The auth helper is a prerequisite for the public API Worker — Phase 4 cannot dry-run-deploy until the helper service exists at all three names.
Cloudflare reference: Service bindings · KV namespaces
-
Create a sibling repo (or sub-folder under a
services/monorepo — Jeffrey’s call):Adventive/adventive-auth-helper-worker. Same TypeScript/Hono pattern as the public API Worker. -
Provision three KV namespaces:
Terminal window wrangler kv namespace create kv-adv-svc-auth-helper-cache-devwrangler kv namespace create kv-adv-svc-auth-helper-cache-stgwrangler kv namespace create kv-adv-svc-auth-helper-cache-prd -
Bindings (per env):
CACHE(KV) +DB_CONSOLE(Hyperdrive — same console cluster as the public API). -
Implement: accept
X-Api-Key+X-Integration-Key→ 5-min KV cache lookup → on miss, queryapitable viaDB_CONSOLE→ return{ valid: boolean, accountId: number, rph: number }. -
Smoke endpoint:
GET /__healthreturns 200 withCOMMIT_SHA. -
Deploy all three:
wrangler deploy --env dev,--env stg,--env prd. Verify eachwrangler tail --env <env>is quiet. -
From a one-off Worker (or
curlto a private deploy URL), confirm a real key returnsvalid: trueagainst each environment.
Six-pillar callout — Phase 3
Section titled “Six-pillar callout — Phase 3”| Pillar | Treatment |
|---|---|
| Redundancy | Worker is globally replicated by default. KV is multi-region eventually-consistent — acceptable for 5-min TTL auth cache. |
| Resiliency | On Hyperdrive miss + KV miss + DB unreachable, return 503 (not 401) so callers retry rather than treating an outage as auth failure. |
| Disaster Recovery | KV caches can be flushed and rebuilt in minutes. Helper is the smallest stateful surface; document its restoration as < 10 min. |
| Backup | KV cache is regenerable from the api table — no separate backup needed. The api table itself is covered by Aurora backup. |
| Deployment Strategy | Use wrangler deploy --env <env> with version routing; promote 0 % → 10 % → 100 % over a 30-min window for prd. |
| Observability | Emit structured logs (auth.lookup, auth.cache.hit, auth.cache.miss, auth.failure). Forward to New Relic via tail Worker. Alert on cache-miss rate > 25 % over 5 min (suggests rotation event or KV outage). |
Phase 4 — Public API Worker scaffolding
Section titled “Phase 4 — Public API Worker scaffolding”This is the entry point for the Claude Code session. The kickoff prompt is
handoff/CLAUDE_CODE_KICKOFF.md. Cowork’s job ends here; Claude Code drives.
-
Confirm
~/Repositories/GitHub/Adventive/adventive-public-api-worker/hasgit initand thatCLAUDE.md+PLAN.mdare in the repo root (already true as of 2026-04-28). -
Install dependencies declared in
package.json:Terminal window cd ~/Repositories/GitHub/Adventive/adventive-public-api-workernpm install -
Author
tsconfig.json,vitest.config.ts,.dev.vars.example,.gitignore,.editorconfig,eslint.config.js,prettier.config.cjsto project conventions. -
Author
openapi.yaml(OpenAPI 3.1) covering every endpoint in the migration map, including/v1.0/*and/v2.0/*aliases. Keep field names byte-identical to the PHP responses. -
Generate types:
npx openapi-typescript openapi.yaml -o src/types/api.ts. -
Stub
src/index.tswith a Hono app exposing/__health,/openapi.yaml, and/docs(Redoc). Wire the env interface insrc/lib/env.tsagainst the bindings declared inwrangler.toml. -
Validate the scaffold:
npm run typecheck && npm run dry-run:dev.
Deliverables
Section titled “Deliverables”openapi.yamlcomplete and lint-clean.src/index.ts+src/lib/env.tsexporting a typed Hono app.- Three green dry-run deploys (
dev,stg,prd).
Six-pillar callout — Phase 4
Section titled “Six-pillar callout — Phase 4”| Pillar | Treatment |
|---|---|
| Redundancy | n/a (no traffic yet). |
| Resiliency | Establish error envelopes in src/lib/response.ts so every handler returns the same JSON error shape — uniform retry semantics for callers. |
| Disaster Recovery | git init + signed-commit hooks ensure the repo is recoverable from any developer’s local. Push to GitHub gated to end of Phase 4 once wrangler deploy --dry-run is green. |
| Backup | GitHub is the source of truth once pushed; until then, the local repo is single-host — keep an additional clone on iCloud-synced disk. |
| Deployment Strategy | No live deploys. Every commit must satisfy npm run typecheck && npm run dry-run:{dev,stg,prd} before merge. |
| Observability | Add src/lib/logger.ts (structured JSON, fields: request_id, account_id, endpoint, latency_ms, commit_sha) — every later phase emits through it. |
Phase 5 — Core middleware (PAUSE checkpoint)
Section titled “Phase 5 — Core middleware (PAUSE checkpoint)”After Phase 5 ships, Claude Code stops and waits for explicit continue.
Cloudflare reference: Durable Objects · DO alarms · Hyperdrive client libs
src/lib/db.ts— Hyperdrive connection factory usingmysql2/promise. Two functions:getConsole(env)andgetAggregate(env). Both return typed connections with 5-second connect timeout and 10-second query timeout. Both close inctx.waitUntilon response.src/lib/auth.ts— Thin wrapper around theAUTHservice binding. ThrowsHttpError(401)onvalid: false. Caches the result onc.set('auth', …)for the request scope.src/durable-objects/RateLimiter.ts— DO class withcheck(key, rph)RPC. Increments hourly counter; returns 429 metadata whencount > rph. Usesstate.storage.setAlarm(nextHourTopUTC)to self-reset.- CORS middleware — Hono
cors()matched against the existingCORS_ALLOWED_ORIGINSvar; preserve current allowed-headers list exactly. src/lib/response.ts—jsonResponse,xmlResponse,csvResponse,HttpError. Field name preservation is a contract — no camelCase renaming.src/lib/validation.ts— Date defaults (last 30 days, ET viaIntl.DateTimeFormat),format,data_connector,removeZeros,advertiser_id,from,to,version.
Per-commit gate:
npm run typecheck && npm run test && \ npm run dry-run:dev && npm run dry-run:stg && npm run dry-run:prdPause checkpoint
Section titled “Pause checkpoint”Claude Code posts a summary message and waits. The summary must include the gate results, the test count, the Worker bundle size for each env, and any TODO comments left in source.
Six-pillar callout — Phase 5
Section titled “Six-pillar callout — Phase 5”| Pillar | Treatment |
|---|---|
| Redundancy | DO storage is single-region per object; for a per-key counter that is correct (one counter, one home). KV (RATE_LIMIT_KV) is the documented fallback if DO becomes unavailable. |
| Resiliency | DO alarm() is the hourly reset mechanism — no cron worker needed. Failure to set an alarm degrades to “rolling 1-hour windows enforced lazily on next request.” |
| Disaster Recovery | DOs survive worker version rollback; the [[migrations]] tag in wrangler.toml is the contract. Never delete a class without a migration tag. |
| Backup | Counters are ephemeral; loss is acceptable (worst case: a key gets one extra free hour). No backup needed. Document this as a known one-time behavior change at cutover. |
| Deployment Strategy | DO migrations are global; treat them as schema changes — separate PR, separate review. Use Wrangler version routing on first deploy after a migration. |
| Observability | DO emits via state.id, class_name, key_hash, count_after, outcome (allowed/throttled). Forward via tail Worker → New Relic. Alert on throttle rate > 5 %. |
Phase 6 — Handler migration (PAUSE checkpoint)
Section titled “Phase 6 — Handler migration (PAUSE checkpoint)”Migrate one handler at a time, simplest first. Each handler ships with unit tests, a smoke probe, and a New Relic custom event.
Order of work (locked)
Section titled “Order of work (locked)”credentialscheck— auth-only; proves the service binding chain.advertisers—DB_CONSOLE; two queries.campaigns—DB_CONSOLE; nested queries (sites, placements, ad units, delivery groups).analytics(v1 + v2) —DB_AGGREGATE; dual-query (kpi + eng/quartile in v2).clickthroughs—DB_AGGREGATE; same aggregate pattern.connector— wraps analytics aggregated per advertiser.dataconnector— account-wide scope +convertForConnector/convertForConnectorNew. Hardest; ship last.
Per-handler checklist
Section titled “Per-handler checklist”- Field-for-field response equivalence with the PHP origin (use
tests/fixtures/php-responses/*.jsoncaptured during plan authoring). - Versioned route registration: both
/handlerand/v1.0/handlerand/v2.0/handlerwhere the version param is supported. - Format multiplexing via
format=json|xml|csv;phpandserializedreturn 410 Gone with a documentation link. - Handler-level rate-limit check via the
RateLimiterDO before any DB call. - Logger emits
endpoint,version,format,account_id,query_ms,total_ms,cached(always false for v1 — placeholder for future).
Per-commit gate
Section titled “Per-commit gate”npm run lint && npm run typecheck && npm run test && \ npm run dry-run:dev && npm run dry-run:stg && npm run dry-run:prdPause checkpoint
Section titled “Pause checkpoint”After all seven handlers are merged, Claude Code summarizes endpoint parity (matrix of endpoints × formats × versions × auth states) and waits.
Six-pillar callout — Phase 6
Section titled “Six-pillar callout — Phase 6”| Pillar | Treatment |
|---|---|
| Redundancy | Workers run globally; handlers are stateless except for the DO call. |
| Resiliency | Wrap every Hyperdrive call in a 10-second budget and a 1-retry policy with jittered backoff. Surface 504 on exceeded budget rather than hanging. |
| Disaster Recovery | Each handler must be independently deployable — keep coupling (shared state, shared mutable globals) at zero so a buggy handler can be reverted without touching the rest. |
| Backup | Snapshot a sample of PHP responses per endpoint in tests/fixtures/. These become the regression contract for the lifetime of the Worker. |
| Deployment Strategy | Handlers ship behind feature flags read from [vars] (e.g., HANDLER_ANALYTICS_V2_ENABLED). Cutover for a handler is a flag flip + Worker version pin, not a code change. |
| Observability | Per-handler New Relic custom event includes the response shape hash; mismatches against the PHP baseline alert immediately during dual-run. |
Phase 7 — CI / CD pipeline and smoke tests
Section titled “Phase 7 — CI / CD pipeline and smoke tests”GitHub Actions wired to the repo (created at the start of this phase, not earlier).
Cloudflare reference: Wrangler in CI · Gradual deployments
- Create the GitHub repo
Adventive/adventive-public-api-worker. Add the remote:git remote add origin git@github.com:Adventive/adventive-public-api-worker.git. Pushmain. - Configure branch protection: required checks (
lint,typecheck,test,dry-run:dev,dry-run:stg,dry-run:prd), required code owner review, dismiss stale reviews on push. - Add
.github/workflows/ci.ymlwith three jobs:pr-gate(onpull_request): lint → typecheck → test → secret scan → all three dry-runs.deploy-stg(on push tomain):wrangler deploy --env stg, thenscripts/smoke.sh https://api.adventivestg.com.deploy-prd(on git tagv*.*.*):wrangler deploy --env prd, thenscripts/smoke.sh https://api.adventive.com. Use gradual deployment: 10 % → 25 min wait → 100 %.
- Author
scripts/smoke.sh. Required probes:GET /__health→ 200,GET /credentialscheck(valid creds) →{ status: true },GET /credentialscheck(invalid) → 401,GET /advertisers(valid) → 200 with at least one row. - Wire the Cloudflare API token + account ID as GitHub repo secrets:
CF_API_TOKEN,CF_ACCOUNT_ID. Restrict the token scope per pre-flight. - Set up Wrangler
tailintegration with New Relic by deploying a tail Worker (adv-svc-public-api-tail-{env}) that POSTs to NR Logs API.
Six-pillar callout — Phase 7
Section titled “Six-pillar callout — Phase 7”| Pillar | Treatment |
|---|---|
| Redundancy | Two-tier deploy (stg → prd) ensures any regression is caught before production. Gradual deployment splits traffic between the previous and current version. |
| Resiliency | Smoke tests run post-deploy and fail the workflow on any non-200. Alarm if smoke fails more than once consecutively in stg (often signals upstream problem, not code). |
| Disaster Recovery | Rollback procedure: wrangler rollback --env <env> (uses Cloudflare’s deployment history). Document RTO of 5 min for any single environment. Archive the previous tag’s bundle in R2 as belt-and-braces. |
| Backup | GitHub repo is the source of truth. Tag every prod release. Mirror main to an internal Cloudflare R2 bucket nightly via Action (regenerable, but cheap insurance). |
| Deployment Strategy | Gradual deployment for prd is mandatory: 10 % canary → 25 min health window → 100 %. CI auto-aborts on smoke failure. |
| Observability | New Relic deploy markers fire on every successful deploy event. Synthetic probes (every 60 s) hit /__health and /credentialscheck against all three envs. |
Phase 8 — Staging soak and DNS cutover
Section titled “Phase 8 — Staging soak and DNS cutover”Tier-2 minimum soak: 2 hours on staging before any prod promotion.
Soak protocol (staging)
Section titled “Soak protocol (staging)”- Run a load profile against
api.adventivestg.commatching the production p50 / p95 of the legacy API for the last 14 days. Source: New Relic transaction trace export. - Compare response shapes endpoint-by-endpoint against the legacy PHP
stage in real time (the dual-run reconciler from
tests/parity/). - Watch dashboards for the full 2-hour window:
- Hyperdrive p95 acquire latency
- DO
RateLimiterinvocation count and error rate - Worker CPU time p95 (must stay < 50 ms)
- Auth helper cache-hit ratio (must stay > 70 %)
- Tunnel
Upstatus (must remainUpcontinuously)
- Run the Google Data Studio connector against
api.adventivestg.com/dataconnector and confirm schema parity vs. legacy.
DNS cutover (production)
Section titled “DNS cutover (production)”- Confirm legacy origin TTL on
api.adventive.comis ≤ 300 s (set in Phase 0). - Tag a release:
git tag v1.0.0 && git push origin v1.0.0. CI deploys prd at 10 % canary → 25 min → 100 %. - Flip the Cloudflare route for
api.adventive.com/*from the legacy origin toadv-svc-public-api-prd. - Watch live dashboards for 60 minutes. Compare against the soak baseline.
- Keep the legacy CodeIgniter server running for 48 hours as fallback. Do not stop the PHP-FPM service.
- After the 48-hour fallback, decommission the legacy origin per the
runbook (Chapter 04). Update
decisions/2026-MM-DD-cutover.mdwith final timestamps.
Rollback (any time during cutover)
Section titled “Rollback (any time during cutover)”- Flip the Cloudflare route back to the legacy origin (single dashboard click).
wrangler rollback --env prdto the previous version (5 minutes).- Open an incident; do not re-attempt cutover until the trigger is root-caused.
Six-pillar callout — Phase 8
Section titled “Six-pillar callout — Phase 8”| Pillar | Treatment |
|---|---|
| Redundancy | Two parallel serving paths exist for 48 hours: legacy origin (warm fallback) and Worker (active). Rollback is a route flip. |
| Resiliency | Cutover happens during a low-traffic window (Tuesday 14:00–16:00 ET historically lowest per New Relic). 25-min canary phase absorbs unexpected regressions. |
| Disaster Recovery | Documented rollback steps above. RTO 5 min (route flip + rollback). RPO 0 (Worker is stateless except for DO counters, which are explicitly accepted as resettable at cutover). |
| Backup | Legacy origin remains running for 48 h. Aurora point-in-time recovery covers the underlying data. R2 nightly mirror of main covers code. |
| Deployment Strategy | Gradual deployment (10 % → 100 %) layered over DNS cutover. Two control surfaces: Worker version (per-request) and DNS route (per-region edge cache). |
| Observability | Dashboard kit pinned for the 48 h fallback window: Worker CPU, error rate, auth cache hit, Hyperdrive latency, tunnel up/down, NR synthetic probe. Page on any sustained breach > 2 min. |
Cross-cutting traceability
Section titled “Cross-cutting traceability”| Plan phase | Owner | Implementation environment | Output artifact | Pillar emphasis |
|---|---|---|---|---|
| 0 | Jeffrey | PHP repo (existing) | rotated creds, removed dead config, git tag | DR (anchor tag) |
| 1 | Jeffrey | AWS Image Builder + Cloudflare dashboard | Image Builder pipeline, 3 ASGs, 3 tunnels, ADR | Redundancy (dual-AZ prd) + Deployment (rolling AMI refresh) |
| 2 | Jeffrey | Wrangler CLI | 6 Hyperdrive IDs, wrangler.toml updated | Backup (creds in Cloudflare Secrets Store) |
| 3 | Jeffrey | New repo + Wrangler | auth-helper Worker × 3 envs | Resiliency (5xx on infra failure) |
| 4 | Claude Code | Local repo | scaffolded src/, openapi.yaml, dry-runs green | Observability (logger contract) |
| 5 | Claude Code | Local repo | middleware + DO + tests; PAUSE | Resiliency (timeouts, error envelope) |
| 6 | Claude Code | Local repo | 7 handlers + tests; PAUSE | Backup (PHP response fixtures) |
| 7 | Claude Code | GitHub Actions | CI green; gradual deploys configured | Deployment (10 % → 100 %) |
| 8 | Jeffrey | Cloudflare + AWS | DNS cutover; 48 h fallback | DR (rollback ≤ 5 min) |
Verification gates (must pass before next phase begins)
Section titled “Verification gates (must pass before next phase begins)”- After Phase 0:
git grep data_warehouseempty; tag pushed. - After Phase 1:
mysql -h aurora-{cluster}-{env}.internal.adventive.comsucceeds from a peer EC2. - After Phase 2:
wrangler hyperdrive listshows six configs; throwaway WorkerSELECT 1passes for all six. - After Phase 3:
curl https://adv-svc-auth-helper-{env}.adventive.workers.dev/__healthreturns 200 across dev/stg/prd. - After Phase 4:
npm run dry-run:{dev,stg,prd}all green; openapi.yaml lints clean. - After Phase 5 (PAUSE): unit tests cover auth, db, ratelimit, response, validation; DO migration tag committed.
- After Phase 6 (PAUSE): parity matrix shows 100 % field-for-field equivalence with PHP fixtures.
- After Phase 7: main branch protected; CI runs all gates; smoke.sh green against stg.
- After Phase 8: Soak baseline captured; cutover executed; 48 h fallback window logged in ADR.
Cloudflare documentation index
Section titled “Cloudflare documentation index”Consult these before executing any phase. The list below mirrors the
canonical reference memory at reference_cloudflare_documentation.md.
Open items / follow-on ADRs
Section titled “Open items / follow-on ADRs”decisions/2026-MM-DD-tunnel-topology.md— single vs dual-AZ per env (Phase 1 deliverable).decisions/2026-MM-DD-rate-limit-fallback.md— confirm whetherRATE_LIMIT_KVfallback is wired or held in reserve (Phase 5).decisions/2026-MM-DD-cutover.md— actual cutover timestamps, soak metrics, fallback decommission timestamp (Phase 8).decisions/2026-MM-DD-terraform-import.md— when to import Hyperdrive + Tunnel into Terraform (deferred from Phase 1; revisit after first prod deploy).