Skip to content

06 — Development Implementation Steps

Version 1.2 · 2026-04-28 · Adventive Platform Engineering · Confidential

Operational, command-level walkthrough for executing the signed plan (Adventive_Public_API_Cloudflare_Migration_Plan.pdf, v2, 2026-04-22). Each phase is treated as a discrete unit of work with prerequisites, commands, deliverables, validation, and a six-pillar design callout (Redundancy · Resiliency · Disaster Recovery · Backup · Deployment Strategy · Observability).

Changelog: v1.2 (2026-04-28) — Sizing revised: stg drops to 1 instance / 1 AZ (parity with dev); only prd retains multi-AZ + 2 replicas. Added §1.1.1 with concrete, ready-to-commit Image Builder component YAML and an aws-ia/ec2-image-builder Terraform reference so the pipeline build is copy-paste, not from-scratch. v1.1 (2026-04-28) — Phase 1 rewritten: hand-launched EC2 + manual cloudflared install replaced with EC2 Image Builder + Auto Scaling Group + replica HA. Added “Tunnel maintenance and updates” section covering zero-downtime rolling updates for cloudflared and the underlying Linux. Tunnel-per-environment confirmed as a locked decision with rationale. v1.0 (2026-04-28) — Initial implementation walkthrough.

This chapter does not change the plan. It is the implementation surface for Phases 0 through 8 against the existing repository:

  • Local repo: /Users/jlambert/Repositories/GitHub/Adventive/adventive-public-api-worker/
  • Intended GitHub slug: Adventive/adventive-public-api-worker
  • Implementation driver: Claude Code (Cowork’s role ends with this chapter)
  • Plan reference: ../public-api-cf-migration/Adventive_Public_API_Cloudflare_Migration_Plan.pdf
  • Workers SOP: ../../platform/cloudflare/workers/Adventive_Cloudflare_Workers_SOP.pdf

If a step in this chapter conflicts with the signed plan, the plan wins. Open an ADR under decisions/ and update both before proceeding.

  1. Phases run in order. Each phase has explicit prerequisites; do not skip.
  2. Phases 0–3 and 8 are executed by Jeffrey directly (infrastructure, provisioning, cutover). Phases 4–7 are executed in Claude Code from the handoff package (handoff/CLAUDE_CODE_KICKOFF.md).
  3. Mandatory pause checkpoints: end of Phase 5 and end of Phase 6. Claude Code stops and waits for continue.
  4. Every command shown should be considered a first-pass; verify against current Cloudflare docs (URL inline) before executing in production.

The following must be true before Phase 0 begins. Treat this as a gate.

CheckVerificationOwner
Cloudflare account on Workers Paid plandashboard → Workers → Plans (Hyperdrive + Durable Objects require Paid)Jeffrey
All three zones active in Cloudflareadventive.dev, adventivestg.com, adventive.comJeffrey
wrangler CLI installed and authedwrangler whoami returns the platform service accountJeffrey
Cloudflare API token (CI scope)scoped to: Workers Scripts:Edit, Hyperdrive:Edit, Tunnel:Edit; stored in 1Password under “CF — adventive-public-api-worker CI”Jeffrey
AWS access to Aurora VPCs (dev / stg / prd)sufficient to launch EC2 instances in each via ASGJeffrey
Read-only Aurora user provisionedper cluster (console, aggregate) — Hyperdrive will use theseJeffrey
AWS EC2 Image Builder available in target regiondashboard → Image Builder; service-linked role createdJeffrey
AWS Secrets Manager access for tunnel credentialsone secret per env: /adventive/cloudflared/dev, …/stg, …/prdJeffrey
AWS SSM Parameter Store availablefor AMI version pointer at /adventive/cloudflared/ami-id-latestJeffrey
Terraform backend configuredS3 + DynamoDB lock table, scoped per env; module infra/cloudflared/ will live in this repoJeffrey
New Relic Workers + EC2 integration readyaccount ID + license key; CloudWatch → NR log subscription operationalJeffrey
Local repo present~/Repositories/GitHub/Adventive/adventive-public-api-worker/ with git init, wrangler.toml stub, package.jsonConfirmed 2026-04-28
wrangler hyperdrive --help worksminimum Wrangler 4.xJeffrey
Codeowners + branch protection ready (post-push)Adventive/adventive-public-api-worker repo creation deferred to Phase 7Jeffrey

Cloudflare references: Workers Plans · Wrangler install · API tokens


Phase 0 — Pre-migration cleanup (existing PHP repo)

Section titled “Phase 0 — Pre-migration cleanup (existing PHP repo)”

Touches only the existing CodeIgniter PHP application; no Cloudflare work. Done first to stop the codebase from drifting while the rewrite is in flight.

  1. In the legacy PHP repo, remove the data_warehouse block from application/config/database.json for all three environments (lines 99–121 dev, 260–282 prod, 377–399 stg). No model loads $CI->load->database('data_warehouse') — verified during plan authoring.
  2. Rotate every MySQL credential currently committed in plaintext in database.json. Move to environment variables or AWS Secrets Manager. Force-rotate even the dev set; assume the file’s history is compromised.
  3. Confirm Cloudflare DNS for the three zones resolves and the existing API hosts have low-TTL records (≤ 300 s) staged for cutover.
  4. Tag the legacy repo: pre-cloudflare-migration-2026-04 so we have a known rollback point.
  • git grep data_warehouse returns nothing in the PHP repo.
  • All MySQL connection strings in CI/CD secret stores have been rotated; old values revoked at the Aurora cluster.
  • dig +short api.adventive.com and the staging/dev equivalents return the current PHP origin with TTL ≤ 300.
PillarTreatment
RedundancyNo new redundancy. Existing PHP origin remains the sole serving path.
ResiliencyReduced exposure: removing dead data_warehouse config eliminates a config-load risk path.
Disaster RecoveryGit tag pre-cloudflare-migration-2026-04 is the rollback anchor for the entire migration.
BackupAurora automated backups already configured (verify retention ≥ 7 d on each cluster before proceeding).
Deployment StrategyRoutine PHP deploy via existing pipeline; no Cloudflare deploy in this phase.
ObservabilityConfirm legacy New Relic APM tags remain in place — they become the baseline that the new Worker is compared against during Phase 8.

Phase 1 — Cloudflare Tunnel infrastructure

Section titled “Phase 1 — Cloudflare Tunnel infrastructure”

Bridge each Aurora VPC to Cloudflare’s edge so Hyperdrive can reach the private DB endpoints without a public listener. One dedicated tunnel per environment (3 tunnels total). Each tunnel is served by an Auto Scaling Group of cloudflared replicas built from an EC2 Image Builder AMI.

This phase is more elaborate than the original plan called for. The upgrade exists because the tunnel is a foundational, long-lived dependency: hand-rolled EC2 instances are unacceptable for something every Worker DB call traverses. The whole pattern is captured in the standing memory Cloudflare Tunnel infra pattern (Adventive standard) and will graduate to docs/platform/cloudflare/tunnel/ once this rollout completes.

Cloudflare reference: Cloudflare Tunnel · cloudflared install · TCP routing · Run as replica

AWS reference: EC2 Image Builder · Auto Scaling Group instance refresh · SSM Parameter Store

DecisionChoiceRationale
Tunnel separationThree tunnels — one per env (dev, stg, prd)Credential blast-radius isolation; rotate prod creds without touching dev/stg; per-env audit log; each tunnel’s config.yml only carries its own ingress; cost is identical (tunnels are free).
Compute platformEC2 + ASG (sizing per-env, see Sizing row), not FargateMatches existing Adventive AWS footprint; single networking model with Aurora VPCs; SSM/CloudWatch agents already standardized. Re-evaluate if Adventive adopts ECS broadly.
AMI buildEC2 Image Builder, weekly cadenceSingle mechanism updates BOTH cloudflared and the Linux base; reproducible, signed AMI; eliminates drift across env-specific instances.
HA modelReplica registration (multiple cloudflared instances ↔ same tunnel UUID)Native Cloudflare feature; edge load-balances and fails over automatically; foundation for zero-downtime updates.
Sizingdev & stg: 1 × t3.micro / 1 AZ; prd: 2 × t3.small / 2 AZsdev and stg are non-customer-facing and tolerate brief downtime during update; cost-optimize there with t3.micro and a single AZ. Only prd carries customer traffic — it requires ≥ 2 replicas across 2 AZs for AZ-failure resilience and rolling-update headroom.
Credential storageAWS Secrets Manager (primary) + Cloudflare Secrets Store (secondary) + 1Password (break-glass)Three-tier; user-data pulls from Secrets Manager at boot; AMI itself is environment-agnostic.

Why not one tunnel for dev+stg? Cost savings are zero (Cloudflare Tunnel is a free product). The savings argument is operational simplicity, but sharing a tunnel couples credential rotation, ingress edits, and audit logs across environments — that’s a real cost paid every time someone touches the dev config. Strict per-env separation is the standing default.

1.1 — Build the cloudflared AMI in EC2 Image Builder

Section titled “1.1 — Build the cloudflared AMI in EC2 Image Builder”

Single Image Builder pipeline produces a versioned AMI consumed by all three environments.

EC2 Image Builder pipeline: adv-cflared-pipeline
├── Source AMI: Ubuntu 22.04 LTS (Canonical, latest)
├── Components (build phase):
│ ├── adv-cflared-base (apt update; unattended-upgrades; clock sync)
│ ├── adv-cflared-cloudflared (install cloudflared from pkg.cloudflare.com apt repo)
│ ├── adv-cflared-systemd-unit (drop /etc/systemd/system/cloudflared.service.d/override.conf)
│ ├── adv-cflared-bootstrap-script (drop /usr/local/sbin/cflared-bootstrap.sh)
│ ├── adv-cflared-cloudwatch-agent (install + base config)
│ └── adv-cflared-ssm-agent (verified present; SSM-managed)
├── Test phase:
│ ├── cloudflared --version returns ≥ 2025.x
│ ├── systemctl list-unit-files includes cloudflared.service
│ └── ssm-cli get-instance-information returns ManagedInstance
├── Distribution: shared AMI in target region (us-east-1 first; us-west-2 for DR)
├── Output:
│ └── SSM Parameter Store: /adventive/cloudflared/ami-id-latest
│ (Lambda hook updates this on successful pipeline run)
└── Schedule: cron(0 6 ? * TUE *) # weekly Tuesday 06:00 UTC

The AMI does not contain tunnel credentials. Bootstrap happens at instance launch. AMI itself is environment-agnostic and reusable across dev/stg/prd.

1.1.1 — Prebuilt components and Terraform reference (do not write from scratch)

Section titled “1.1.1 — Prebuilt components and Terraform reference (do not write from scratch)”

The pipeline is not hand-authored. AWS publishes a maintained Terraform module that builds the pipeline, and most of the components above already exist as AWS-managed components. Only the adv-cflared-cloudflared and adv-cflared-bootstrap-script components are Adventive-specific — both have full source below.

Terraform module to use:

infra/imagebuilder/main.tf
module "cflared_pipeline" {
source = "aws-ia/ec2-image-builder/aws"
version = "~> 1.5"
pipeline_name = "adv-cflared-pipeline"
schedule_expression = "cron(0 6 ? * TUE *)"
recipe_name = "adv-cflared-recipe"
recipe_version = "1.0.0"
parent_image = "arn:aws:imagebuilder:us-east-1:aws:image/ubuntu-server-22-lts-x86/x.x.x"
# AWS-managed components — reuse as-is, no editing needed
managed_components = [
"arn:aws:imagebuilder:us-east-1:aws:component/update-linux/x.x.x",
"arn:aws:imagebuilder:us-east-1:aws:component/aws-cli-version-2-linux/x.x.x",
"arn:aws:imagebuilder:us-east-1:aws:component/amazon-cloudwatch-agent-linux/x.x.x",
"arn:aws:imagebuilder:us-east-1:aws:component/amazon-ssm-agent-linux/x.x.x",
"arn:aws:imagebuilder:us-east-1:aws:component/cis-level-1-ubuntu-22-04-lts/x.x.x",
]
# Adventive-authored components — both sources below
custom_components = [
aws_imagebuilder_component.cflared.arn,
aws_imagebuilder_component.bootstrap.arn,
]
output_ami_ssm_parameter = "/adventive/cloudflared/ami-id-latest"
ami_lifecycle_retain_count = 4
}

The module wires the EventBridge rule on pipeline success, publishes the AMI ID to SSM, and applies the AMI lifecycle policy. Module repo: https://github.com/aws-ia/terraform-aws-ec2-image-builder.

Custom component 1 — adv-cflared-cloudflared.yml:

infra/imagebuilder/components/cflared.yml
name: adv-cflared-cloudflared
description: Install cloudflared from the official Cloudflare APT repo.
schemaVersion: 1.0
phases:
- name: build
steps:
- name: AddCloudflareAptRepo
action: ExecuteBash
inputs:
commands:
- sudo mkdir -p --mode=0755 /usr/share/keyrings
- curl -fsSL https://pkg.cloudflare.com/cloudflare-main.gpg | sudo tee /usr/share/keyrings/cloudflare-main.gpg > /dev/null
- echo 'deb [signed-by=/usr/share/keyrings/cloudflare-main.gpg] https://pkg.cloudflare.com/cloudflared jammy main' | sudo tee /etc/apt/sources.list.d/cloudflared.list
- sudo apt-get update
- name: InstallCloudflared
action: ExecuteBash
inputs:
commands:
- sudo apt-get install -y cloudflared
- name: DropSystemdOverride
action: CreateFile
inputs:
- path: /etc/systemd/system/cloudflared.service.d/override.conf
permissions: 0644
content: |
[Service]
ExecStart=
ExecStart=/usr/bin/cloudflared tunnel --config /etc/cloudflared/config.yml run
Restart=always
RestartSec=5s
- name: validate
steps:
- name: VersionCheck
action: ExecuteBash
inputs:
commands:
- cloudflared --version | grep -E '^cloudflared version 20'
- name: UnitPresent
action: ExecuteBash
inputs:
commands:
- systemctl list-unit-files | grep -q '^cloudflared.service'

Custom component 2 — adv-cflared-bootstrap-script.yml:

infra/imagebuilder/components/bootstrap.yml
name: adv-cflared-bootstrap-script
description: Drop the env-resolving cflared-bootstrap.sh into /usr/local/sbin.
schemaVersion: 1.0
phases:
- name: build
steps:
- name: DropBootstrap
action: S3Download
inputs:
- source: s3://adventive-platform-artifacts/cflared/cflared-bootstrap.sh
destination: /usr/local/sbin/cflared-bootstrap.sh
- name: SetExec
action: ExecuteBash
inputs:
commands:
- sudo chmod 0755 /usr/local/sbin/cflared-bootstrap.sh
- name: WireOnBoot
action: CreateFile
inputs:
- path: /etc/systemd/system/cflared-bootstrap.service
permissions: 0644
content: |
[Unit]
Description=Adventive cloudflared bootstrap (resolve env, fetch creds)
Before=cloudflared.service
[Service]
Type=oneshot
ExecStart=/usr/local/sbin/cflared-bootstrap.sh
[Install]
WantedBy=cloudflared.service
- name: EnableUnit
action: ExecuteBash
inputs:
commands:
- sudo systemctl enable cflared-bootstrap.service

What this means in practice: Phase 1 build is roughly one terraform apply once the bootstrap script is uploaded to S3 and the two YAML components are committed. The pipeline self-runs on its weekly schedule from that point forward.

Done once per env via Cloudflare API or cloudflared tunnel create from an admin workstation. Capture the tunnel UUID and credentials JSON.

Terminal window
# Run from a credentialed admin shell (not on the future ASG instances)
cloudflared tunnel login
cloudflared tunnel create adv-aurora-tunnel-dev
cloudflared tunnel create adv-aurora-tunnel-stg
cloudflared tunnel create adv-aurora-tunnel-prd

Each command emits a credentials JSON. Store each in all three tiers:

Terminal window
# AWS Secrets Manager (primary — used by user-data)
aws secretsmanager create-secret \
--name /adventive/cloudflared/dev \
--secret-string file://dev-creds.json
# Cloudflare Secrets Store (secondary — recovery)
wrangler secrets-store secret create \
--store-id adv-cflared-secrets \
--name tunnel-creds-dev \
--value "$(cat dev-creds.json)"
# 1Password (tertiary — break-glass; manual)

Securely shred the local credentials JSON files. Repeat for stg and prd.

1.3 — Author the per-env tunnel config.yml

Section titled “1.3 — Author the per-env tunnel config.yml”

Stored in AWS Secrets Manager alongside the credentials so the bootstrap script can fetch both atomically.

# adventive/cloudflared/dev — config.yml
tunnel: <dev-tunnel-uuid>
credentials-file: /etc/cloudflared/<dev-tunnel-uuid>.json
metrics: 0.0.0.0:2000 # exposed for ASG health check
ingress:
- hostname: aurora-console-dev.internal.adventive.com
service: tcp://console-cluster.dev.cluster.us-east-1.rds.amazonaws.com:3306
- hostname: aurora-aggregate-dev.internal.adventive.com
service: tcp://aggregate-cluster.dev.cluster.us-east-1.rds.amazonaws.com:3306
- service: http_status:404

1.4 — Bootstrap script (baked into the AMI)

Section titled “1.4 — Bootstrap script (baked into the AMI)”

/usr/local/sbin/cflared-bootstrap.sh runs once on first boot via the systemd cloudflared.service.d/override.conf ExecStartPre hook. It queries instance metadata for the env tag, fetches the matching secret, writes config, and starts cloudflared.

#!/usr/bin/env bash
set -euo pipefail
ENV=$(curl -s http://169.254.169.254/latest/meta-data/tags/instance/adv:env)
REGION=$(curl -s http://169.254.169.254/latest/meta-data/placement/region)
SECRET="/adventive/cloudflared/${ENV}"
aws secretsmanager get-secret-value \
--region "${REGION}" \
--secret-id "${SECRET}" \
--query SecretString --output text \
| jq -r '.config_yaml' > /etc/cloudflared/config.yml
aws secretsmanager get-secret-value \
--region "${REGION}" \
--secret-id "${SECRET}" \
--query SecretString --output text \
| jq -r '.credentials_json' > /etc/cloudflared/$(jq -r .TunnelID /etc/cloudflared/config.yml).json
chmod 0600 /etc/cloudflared/*.json

1.5 — Launch Template + ASG (Terraform module)

Section titled “1.5 — Launch Template + ASG (Terraform module)”

A single Terraform module infra/cloudflared/ parameterized by env is applied three times (dev, stg, prd workspaces). Key bits:

locals {
# Sizing per env: dev & stg are non-customer-facing (single AZ, t3.micro);
# only prd carries customer traffic and gets multi-AZ + 2 replicas.
is_prod = var.env == "prd"
instance_sz = local.is_prod ? "t3.small" : "t3.micro"
asg_min = local.is_prod ? 2 : 1
asg_desired = local.is_prod ? 2 : 1
asg_max = local.is_prod ? 4 : 2
asg_subnets = local.is_prod ? [var.subnet_az_a, var.subnet_az_b] : [var.subnet_az_a]
}
resource "aws_launch_template" "cflared" {
name_prefix = "adv-cflared-${var.env}-"
# AMI resolved from SSM at apply time
image_id = data.aws_ssm_parameter.cflared_ami.value
instance_type = local.instance_sz
iam_instance_profile { name = aws_iam_instance_profile.cflared.name }
vpc_security_group_ids = [aws_security_group.cflared.id]
metadata_options { http_tokens = "required"; instance_metadata_tags = "enabled" }
tag_specifications {
resource_type = "instance"
tags = { "adv:env" = var.env, Name = "adv-cflared-${var.env}", OwnedBy = "platform" }
}
}
resource "aws_autoscaling_group" "cflared" {
name = "adv-cflared-${var.env}"
min_size = local.asg_min
max_size = local.asg_max
desired_capacity = local.asg_desired
vpc_zone_identifier = local.asg_subnets
health_check_type = "EC2"
health_check_grace_period = 180
launch_template {
id = aws_launch_template.cflared.id
version = "$Latest"
}
instance_refresh {
strategy = "Rolling"
preferences {
# In dev/stg the ASG runs a single instance, so refresh launches a new
# one before terminating the old (count goes 1 → 2 → 1). Replica HA
# holds during the swap. In prd, with 2 replicas, count goes 2 → 3 → 2.
min_healthy_percentage = 50
instance_warmup = 120
auto_rollback = true
}
triggers = ["launch_template"]
}
}

The security group allows outbound 443/TCP to 0.0.0.0/0 (Cloudflare edge), outbound 3306/TCP to the two Aurora cluster SGs only, and no inbound rules.

EventBridge rule fires on Image Builder pipeline success → Lambda updates /adventive/cloudflared/ami-id-latest → updates each environment’s launch-template version → triggers ASG instance refresh per env on a staggered schedule (dev Tue 07:00, stg Tue 09:00, prd Tue 11:00 UTC). Each refresh is gated by the prior env’s success (Step Functions state machine adv-cflared-rolling-update).

  • aws imagebuilder list-image-pipelines shows adv-cflared-pipeline.
  • /adventive/cloudflared/ami-id-latest resolves to a current AMI.
  • aws autoscaling describe-auto-scaling-groups returns three groups; each reports HealthyInstances == DesiredCapacity for ≥ 5 minutes.
  • Cloudflare dashboard → Networks → Tunnels shows each tunnel status Healthy with the expected replica count (1 dev, 1 stg, 2 prd).
  • From a peer EC2: mysql -h aurora-console-prd.internal.adventive.com -u ro_user -p connects via the tunnel.
  • wrangler hyperdrive test connection (Phase 2 prerequisite) succeeds.
  • Image Builder pipeline + 6 components committed under infra/imagebuilder/.
  • Terraform module infra/cloudflared/ with three workspaces applied.
  • Three Cloudflare tunnels with their UUIDs and credentials secured in three credential tiers.
  • ADR decisions/2026-MM-DD-tunnel-infra-pattern.md capturing the locked decisions in §1.0 and any deviations specific to Adventive’s environment.
  • EventBridge → Lambda → Step Functions wiring for AMI auto-promotion.
PillarTreatment
Redundancyprd ASG runs ≥ 2 replicas across 2 AZs; Cloudflare edge load-balances replica registrations on the same tunnel UUID; ASG self-replaces failed instances. dev/stg run a single replica in a single AZ — brief gap during instance replacement is acceptable for non-customer-facing traffic.
ResiliencyOutbound-only tunnel (no inbound SG rules on Aurora); cloudflared auto-reconnects to edge; ASG auto_rollback reverts a bad AMI promotion; instance refresh MinHealthyPercentage=50 guarantees ≥ 1 healthy replica throughout.
Disaster RecoveryRTO ≤ 5 min for instance loss (ASG auto-launch). RTO ≤ 30 min for AMI rollback (revert SSM param + manual instance refresh). Multi-region playbook: build AMI in us-west-2 monthly; instructions in §1.7 maintenance section. RPO = 0 (cloudflared is stateless).
BackupThree-tier credential storage (Secrets Manager / CF Secrets Store / 1Password). Image Builder retains four most recent AMI revisions. Terraform state in S3 + DynamoDB lock. ASG and launch-template versions retained indefinitely.
Deployment StrategyWeekly Image Builder run → SSM param update → Step Functions orchestrates env-staggered ASG instance refresh (dev → stg → prd, gated by prior success). Manual emergency promotion via terraform apply + ASG instance refresh trigger.
Observabilitycloudflared metrics endpoint :2000 scraped by CloudWatch agent; logs forwarded CloudWatch → New Relic. Custom CloudWatch metrics: cflared_replica_count_per_tunnel, cflared_systemd_restart_count, imagebuilder_ami_age_days. Alarms: replica count below expected for 2 min (page); restart frequency > 3/h per instance (page); AMI age > 21 days (warn — pipeline failure).

Tunnel maintenance and updates (cross-cutting, applies post-Phase 1)

Section titled “Tunnel maintenance and updates (cross-cutting, applies post-Phase 1)”

This section describes how routine and unplanned changes to the tunnel infrastructure happen without taking the public API offline.

Routine: weekly Linux + cloudflared update (zero-downtime)

Section titled “Routine: weekly Linux + cloudflared update (zero-downtime)”

This is the default mechanism. No human in the loop.

  1. Tuesday 06:00 UTC — Image Builder pipeline runs; pulls current Ubuntu LTS, latest cloudflared from pkg.cloudflare.com, runs the build + test phases.
  2. Pipeline success — EventBridge fires; Lambda writes the new AMI ID to /adventive/cloudflared/ami-id-latest.
  3. Step Functions: adv-cflared-rolling-update orchestrates:
    • dev refresh at Tue 07:00 UTC. ASG instance refresh triggers because the launch template’s resolved AMI changed. With desired=1 and MinHealthyPercentage=50, ASG actually launches a new instance to bring count to 2, waits 120 s for warm-up, then terminates the old instance — momentarily 2 replicas, never 0.
    • stg refresh at Tue 09:00 UTC, gated by dev success. Same single-replica mechanic as dev: count goes 1 → 2 → 1.
    • prd refresh at Tue 11:00 UTC, gated by stg success. With desired=2, count goes 2 → 3 → 2 (one new replica added before the first old replica terminates), repeated for the second old replica.
  4. Each refresh — observability dashboards must show:
    • tunnel replica count never < 1
    • cloudflared restart frequency normal
    • Worker error rate flat Failure of any check pauses the Step Functions state machine and pages on-call.

Net effect: cloudflared and the Linux OS update weekly, with no operator action and no observable customer impact.

A vendor-disclosed CVE that pre-dates the next scheduled pipeline run:

  1. Trigger Image Builder pipeline manually: aws imagebuilder start-image-pipeline-execution --image-pipeline-arn …
  2. On success, manually advance the SSM param (or wait for the EventBridge handler).
  3. Manually invoke adv-cflared-rolling-update Step Functions execution with input {"acceleration": "fast"} which collapses inter-env wait to 15 minutes.
  4. Total RTO ~ 90 minutes from CVE disclosure to all envs patched.

Unplanned: cloudflared running but tunnel reports unhealthy

Section titled “Unplanned: cloudflared running but tunnel reports unhealthy”

Health check signal mismatch — daemon up, but tunnel registration broken.

  1. CloudWatch alarm cflared_replica_count_per_tunnel < expected fires.
  2. ASG marks the affected instance unhealthy via custom health check (Lambda evaluates the metric and calls SetInstanceHealth with Unhealthy).
  3. ASG terminates and replaces the instance — fresh boot rebuilds tunnel registration from scratch.
  4. If condition repeats across replacements, page on-call to investigate tunnel-side issues at Cloudflare (likely creds rotated upstream or tunnel deleted).

A new AMI passes Image Builder tests but fails post-deploy in dev (e.g., cloudflared regression Cloudflare didn’t catch).

  1. ASG auto_rollback = true reverts dev automatically when health checks fail. Step Functions halts the schedule before stg/prd refresh fires.
  2. On-call manually overwrites SSM param to the previous AMI ID: aws ssm put-parameter --name /adventive/cloudflared/ami-id-latest --value <previous> --overwrite.
  3. Manually trigger ASG instance refresh per env to converge on the pinned AMI.

Quarterly or post-incident:

  1. cloudflared tunnel token --cred-file new-creds.json adv-aurora-tunnel-{env} from admin shell.
  2. Update Secrets Manager secret in place (aws secretsmanager update-secret --secret-id …).
  3. Trigger ASG instance refresh for that env only (aws autoscaling start-instance-refresh --auto-scaling-group-name adv-cflared-{env}).
  4. New replicas pick up new creds at boot; old replicas drain. The tunnel UUID does not change — Hyperdrive configurations remain valid.

Adding a new internal hostname (e.g., a future analytics replica):

  1. Edit the env’s Secrets Manager secret to append the new ingress block.
  2. Trigger ASG instance refresh for that env.
  3. New replicas come up with the updated config; cloudflared replicas process new ingress on next start.

If sustained CPU > 60 % on cloudflared instances (an indicator of high TCP throughput), bump desired_capacity and max_size in Terraform — ASG launches additional replicas; tunnel automatically uses them.



Phase 2 — Hyperdrive provisioning (6 configurations)

Section titled “Phase 2 — Hyperdrive provisioning (6 configurations)”

One Hyperdrive resource per (cluster, environment). Hyperdrive holds the DB credentials; the Worker never sees them.

Cloudflare reference: Hyperdrive overview · Create with Wrangler · Hyperdrive + Tunnel

Terminal window
# Console cluster — three environments
wrangler hyperdrive create adv-svc-public-api-console-dev \
--connection-string="mysql://ro_user:DEV_PASS@aurora-console-dev.internal.adventive.com:3306/console"
wrangler hyperdrive create adv-svc-public-api-console-stg \
--connection-string="mysql://ro_user:STG_PASS@aurora-console-stg.internal.adventive.com:3306/console"
wrangler hyperdrive create adv-svc-public-api-console-prd \
--connection-string="mysql://ro_user:PRD_PASS@aurora-console-prd.internal.adventive.com:3306/console"
# Aggregate cluster — three environments
wrangler hyperdrive create adv-svc-public-api-aggregate-dev \
--connection-string="mysql://ro_user:DEV_PASS@aurora-aggregate-dev.internal.adventive.com:3306/aggregate_ro"
wrangler hyperdrive create adv-svc-public-api-aggregate-stg \
--connection-string="mysql://ro_user:STG_PASS@aurora-aggregate-stg.internal.adventive.com:3306/aggregate_ro"
wrangler hyperdrive create adv-svc-public-api-aggregate-prd \
--connection-string="mysql://ro_user:PRD_PASS@aurora-aggregate-prd.internal.adventive.com:3306/aggregate_ro"

Each command returns a Hyperdrive ID. Capture all six and update wrangler.toml in the repo, replacing the six REPLACE_WITH_*_ID placeholders. Commit with message: chore(infra): wire Hyperdrive IDs for console + aggregate (dev/stg/prd).

  • wrangler hyperdrive list returns six configurations.
  • A throwaway Worker with DB_CONSOLE and DB_AGGREGATE bindings can SELECT 1 against each — proves end-to-end (Worker → Hyperdrive → Tunnel → Aurora).
  • wrangler.toml no longer contains REPLACE_WITH_*_ID placeholders; wrangler deploy --dry-run --env dev passes.
PillarTreatment
RedundancyHyperdrive itself is a globally-distributed pool. Pair with the dual-AZ tunnel (Phase 1) for end-to-end redundancy.
ResiliencyPool absorbs cold-start cost and handles transient DB blips. Set application-level timeout < 5 s; let Hyperdrive surface back-pressure.
Disaster RecoveryHyperdrive can be re-provisioned in minutes using the captured wrangler hyperdrive create commands. Aurora point-in-time recovery is the upstream DR control.
BackupAurora automated backups + manual snapshot prior to first prod traffic. Hyperdrive stores no data — credentials are the only artifact (kept in Cloudflare Secrets Store).
Deployment StrategyHyperdrive ID changes are environment-scoped in wrangler.toml. Treat ID rotation (e.g., credential rotation) as a wrangler deploy --env <env> event with version routing fallback.
ObservabilityEnable Hyperdrive metrics in the Cloudflare dashboard; ship to New Relic via Logpush. Alarms: pool-acquire latency p95 > 100 ms, error rate > 0.5%.

Phase 3 — Auth helper Worker (adv-svc-auth-helper)

Section titled “Phase 3 — Auth helper Worker (adv-svc-auth-helper)”

The auth helper is a prerequisite for the public API Worker — Phase 4 cannot dry-run-deploy until the helper service exists at all three names.

Cloudflare reference: Service bindings · KV namespaces

  1. Create a sibling repo (or sub-folder under a services/ monorepo — Jeffrey’s call): Adventive/adventive-auth-helper-worker. Same TypeScript/Hono pattern as the public API Worker.

  2. Provision three KV namespaces:

    Terminal window
    wrangler kv namespace create kv-adv-svc-auth-helper-cache-dev
    wrangler kv namespace create kv-adv-svc-auth-helper-cache-stg
    wrangler kv namespace create kv-adv-svc-auth-helper-cache-prd
  3. Bindings (per env): CACHE (KV) + DB_CONSOLE (Hyperdrive — same console cluster as the public API).

  4. Implement: accept X-Api-Key + X-Integration-Key → 5-min KV cache lookup → on miss, query api table via DB_CONSOLE → return { valid: boolean, accountId: number, rph: number }.

  5. Smoke endpoint: GET /__health returns 200 with COMMIT_SHA.

  6. Deploy all three: wrangler deploy --env dev, --env stg, --env prd. Verify each wrangler tail --env <env> is quiet.

  7. From a one-off Worker (or curl to a private deploy URL), confirm a real key returns valid: true against each environment.

PillarTreatment
RedundancyWorker is globally replicated by default. KV is multi-region eventually-consistent — acceptable for 5-min TTL auth cache.
ResiliencyOn Hyperdrive miss + KV miss + DB unreachable, return 503 (not 401) so callers retry rather than treating an outage as auth failure.
Disaster RecoveryKV caches can be flushed and rebuilt in minutes. Helper is the smallest stateful surface; document its restoration as < 10 min.
BackupKV cache is regenerable from the api table — no separate backup needed. The api table itself is covered by Aurora backup.
Deployment StrategyUse wrangler deploy --env <env> with version routing; promote 0 % → 10 % → 100 % over a 30-min window for prd.
ObservabilityEmit structured logs (auth.lookup, auth.cache.hit, auth.cache.miss, auth.failure). Forward to New Relic via tail Worker. Alert on cache-miss rate > 25 % over 5 min (suggests rotation event or KV outage).

This is the entry point for the Claude Code session. The kickoff prompt is handoff/CLAUDE_CODE_KICKOFF.md. Cowork’s job ends here; Claude Code drives.

  1. Confirm ~/Repositories/GitHub/Adventive/adventive-public-api-worker/ has git init and that CLAUDE.md + PLAN.md are in the repo root (already true as of 2026-04-28).

  2. Install dependencies declared in package.json:

    Terminal window
    cd ~/Repositories/GitHub/Adventive/adventive-public-api-worker
    npm install
  3. Author tsconfig.json, vitest.config.ts, .dev.vars.example, .gitignore, .editorconfig, eslint.config.js, prettier.config.cjs to project conventions.

  4. Author openapi.yaml (OpenAPI 3.1) covering every endpoint in the migration map, including /v1.0/* and /v2.0/* aliases. Keep field names byte-identical to the PHP responses.

  5. Generate types: npx openapi-typescript openapi.yaml -o src/types/api.ts.

  6. Stub src/index.ts with a Hono app exposing /__health, /openapi.yaml, and /docs (Redoc). Wire the env interface in src/lib/env.ts against the bindings declared in wrangler.toml.

  7. Validate the scaffold: npm run typecheck && npm run dry-run:dev.

  • openapi.yaml complete and lint-clean.
  • src/index.ts + src/lib/env.ts exporting a typed Hono app.
  • Three green dry-run deploys (dev, stg, prd).
PillarTreatment
Redundancyn/a (no traffic yet).
ResiliencyEstablish error envelopes in src/lib/response.ts so every handler returns the same JSON error shape — uniform retry semantics for callers.
Disaster Recoverygit init + signed-commit hooks ensure the repo is recoverable from any developer’s local. Push to GitHub gated to end of Phase 4 once wrangler deploy --dry-run is green.
BackupGitHub is the source of truth once pushed; until then, the local repo is single-host — keep an additional clone on iCloud-synced disk.
Deployment StrategyNo live deploys. Every commit must satisfy npm run typecheck && npm run dry-run:{dev,stg,prd} before merge.
ObservabilityAdd src/lib/logger.ts (structured JSON, fields: request_id, account_id, endpoint, latency_ms, commit_sha) — every later phase emits through it.

Phase 5 — Core middleware (PAUSE checkpoint)

Section titled “Phase 5 — Core middleware (PAUSE checkpoint)”

After Phase 5 ships, Claude Code stops and waits for explicit continue.

Cloudflare reference: Durable Objects · DO alarms · Hyperdrive client libs

  1. src/lib/db.ts — Hyperdrive connection factory using mysql2/promise. Two functions: getConsole(env) and getAggregate(env). Both return typed connections with 5-second connect timeout and 10-second query timeout. Both close in ctx.waitUntil on response.
  2. src/lib/auth.ts — Thin wrapper around the AUTH service binding. Throws HttpError(401) on valid: false. Caches the result on c.set('auth', …) for the request scope.
  3. src/durable-objects/RateLimiter.ts — DO class with check(key, rph) RPC. Increments hourly counter; returns 429 metadata when count > rph. Uses state.storage.setAlarm(nextHourTopUTC) to self-reset.
  4. CORS middleware — Hono cors() matched against the existing CORS_ALLOWED_ORIGINS var; preserve current allowed-headers list exactly.
  5. src/lib/response.tsjsonResponse, xmlResponse, csvResponse, HttpError. Field name preservation is a contract — no camelCase renaming.
  6. src/lib/validation.ts — Date defaults (last 30 days, ET via Intl.DateTimeFormat), format, data_connector, removeZeros, advertiser_id, from, to, version.

Per-commit gate:

Terminal window
npm run typecheck && npm run test && \
npm run dry-run:dev && npm run dry-run:stg && npm run dry-run:prd

Claude Code posts a summary message and waits. The summary must include the gate results, the test count, the Worker bundle size for each env, and any TODO comments left in source.

PillarTreatment
RedundancyDO storage is single-region per object; for a per-key counter that is correct (one counter, one home). KV (RATE_LIMIT_KV) is the documented fallback if DO becomes unavailable.
ResiliencyDO alarm() is the hourly reset mechanism — no cron worker needed. Failure to set an alarm degrades to “rolling 1-hour windows enforced lazily on next request.”
Disaster RecoveryDOs survive worker version rollback; the [[migrations]] tag in wrangler.toml is the contract. Never delete a class without a migration tag.
BackupCounters are ephemeral; loss is acceptable (worst case: a key gets one extra free hour). No backup needed. Document this as a known one-time behavior change at cutover.
Deployment StrategyDO migrations are global; treat them as schema changes — separate PR, separate review. Use Wrangler version routing on first deploy after a migration.
ObservabilityDO emits via state.id, class_name, key_hash, count_after, outcome (allowed/throttled). Forward via tail Worker → New Relic. Alert on throttle rate > 5 %.

Phase 6 — Handler migration (PAUSE checkpoint)

Section titled “Phase 6 — Handler migration (PAUSE checkpoint)”

Migrate one handler at a time, simplest first. Each handler ships with unit tests, a smoke probe, and a New Relic custom event.

  1. credentialscheck — auth-only; proves the service binding chain.
  2. advertisersDB_CONSOLE; two queries.
  3. campaignsDB_CONSOLE; nested queries (sites, placements, ad units, delivery groups).
  4. analytics (v1 + v2) — DB_AGGREGATE; dual-query (kpi + eng/quartile in v2).
  5. clickthroughsDB_AGGREGATE; same aggregate pattern.
  6. connector — wraps analytics aggregated per advertiser.
  7. dataconnector — account-wide scope + convertForConnector / convertForConnectorNew. Hardest; ship last.
  • Field-for-field response equivalence with the PHP origin (use tests/fixtures/php-responses/*.json captured during plan authoring).
  • Versioned route registration: both /handler and /v1.0/handler and /v2.0/handler where the version param is supported.
  • Format multiplexing via format=json|xml|csv; php and serialized return 410 Gone with a documentation link.
  • Handler-level rate-limit check via the RateLimiter DO before any DB call.
  • Logger emits endpoint, version, format, account_id, query_ms, total_ms, cached (always false for v1 — placeholder for future).
Terminal window
npm run lint && npm run typecheck && npm run test && \
npm run dry-run:dev && npm run dry-run:stg && npm run dry-run:prd

After all seven handlers are merged, Claude Code summarizes endpoint parity (matrix of endpoints × formats × versions × auth states) and waits.

PillarTreatment
RedundancyWorkers run globally; handlers are stateless except for the DO call.
ResiliencyWrap every Hyperdrive call in a 10-second budget and a 1-retry policy with jittered backoff. Surface 504 on exceeded budget rather than hanging.
Disaster RecoveryEach handler must be independently deployable — keep coupling (shared state, shared mutable globals) at zero so a buggy handler can be reverted without touching the rest.
BackupSnapshot a sample of PHP responses per endpoint in tests/fixtures/. These become the regression contract for the lifetime of the Worker.
Deployment StrategyHandlers ship behind feature flags read from [vars] (e.g., HANDLER_ANALYTICS_V2_ENABLED). Cutover for a handler is a flag flip + Worker version pin, not a code change.
ObservabilityPer-handler New Relic custom event includes the response shape hash; mismatches against the PHP baseline alert immediately during dual-run.

Phase 7 — CI / CD pipeline and smoke tests

Section titled “Phase 7 — CI / CD pipeline and smoke tests”

GitHub Actions wired to the repo (created at the start of this phase, not earlier).

Cloudflare reference: Wrangler in CI · Gradual deployments

  1. Create the GitHub repo Adventive/adventive-public-api-worker. Add the remote: git remote add origin git@github.com:Adventive/adventive-public-api-worker.git. Push main.
  2. Configure branch protection: required checks (lint, typecheck, test, dry-run:dev, dry-run:stg, dry-run:prd), required code owner review, dismiss stale reviews on push.
  3. Add .github/workflows/ci.yml with three jobs:
    • pr-gate (on pull_request): lint → typecheck → test → secret scan → all three dry-runs.
    • deploy-stg (on push to main): wrangler deploy --env stg, then scripts/smoke.sh https://api.adventivestg.com.
    • deploy-prd (on git tag v*.*.*): wrangler deploy --env prd, then scripts/smoke.sh https://api.adventive.com. Use gradual deployment: 10 % → 25 min wait → 100 %.
  4. Author scripts/smoke.sh. Required probes: GET /__health → 200, GET /credentialscheck (valid creds) → { status: true }, GET /credentialscheck (invalid) → 401, GET /advertisers (valid) → 200 with at least one row.
  5. Wire the Cloudflare API token + account ID as GitHub repo secrets: CF_API_TOKEN, CF_ACCOUNT_ID. Restrict the token scope per pre-flight.
  6. Set up Wrangler tail integration with New Relic by deploying a tail Worker (adv-svc-public-api-tail-{env}) that POSTs to NR Logs API.
PillarTreatment
RedundancyTwo-tier deploy (stg → prd) ensures any regression is caught before production. Gradual deployment splits traffic between the previous and current version.
ResiliencySmoke tests run post-deploy and fail the workflow on any non-200. Alarm if smoke fails more than once consecutively in stg (often signals upstream problem, not code).
Disaster RecoveryRollback procedure: wrangler rollback --env <env> (uses Cloudflare’s deployment history). Document RTO of 5 min for any single environment. Archive the previous tag’s bundle in R2 as belt-and-braces.
BackupGitHub repo is the source of truth. Tag every prod release. Mirror main to an internal Cloudflare R2 bucket nightly via Action (regenerable, but cheap insurance).
Deployment StrategyGradual deployment for prd is mandatory: 10 % canary → 25 min health window → 100 %. CI auto-aborts on smoke failure.
ObservabilityNew Relic deploy markers fire on every successful deploy event. Synthetic probes (every 60 s) hit /__health and /credentialscheck against all three envs.

Tier-2 minimum soak: 2 hours on staging before any prod promotion.

  1. Run a load profile against api.adventivestg.com matching the production p50 / p95 of the legacy API for the last 14 days. Source: New Relic transaction trace export.
  2. Compare response shapes endpoint-by-endpoint against the legacy PHP stage in real time (the dual-run reconciler from tests/parity/).
  3. Watch dashboards for the full 2-hour window:
    • Hyperdrive p95 acquire latency
    • DO RateLimiter invocation count and error rate
    • Worker CPU time p95 (must stay < 50 ms)
    • Auth helper cache-hit ratio (must stay > 70 %)
    • Tunnel Up status (must remain Up continuously)
  4. Run the Google Data Studio connector against api.adventivestg.com /dataconnector and confirm schema parity vs. legacy.
  1. Confirm legacy origin TTL on api.adventive.com is ≤ 300 s (set in Phase 0).
  2. Tag a release: git tag v1.0.0 && git push origin v1.0.0. CI deploys prd at 10 % canary → 25 min → 100 %.
  3. Flip the Cloudflare route for api.adventive.com/* from the legacy origin to adv-svc-public-api-prd.
  4. Watch live dashboards for 60 minutes. Compare against the soak baseline.
  5. Keep the legacy CodeIgniter server running for 48 hours as fallback. Do not stop the PHP-FPM service.
  6. After the 48-hour fallback, decommission the legacy origin per the runbook (Chapter 04). Update decisions/2026-MM-DD-cutover.md with final timestamps.
  1. Flip the Cloudflare route back to the legacy origin (single dashboard click).
  2. wrangler rollback --env prd to the previous version (5 minutes).
  3. Open an incident; do not re-attempt cutover until the trigger is root-caused.
PillarTreatment
RedundancyTwo parallel serving paths exist for 48 hours: legacy origin (warm fallback) and Worker (active). Rollback is a route flip.
ResiliencyCutover happens during a low-traffic window (Tuesday 14:00–16:00 ET historically lowest per New Relic). 25-min canary phase absorbs unexpected regressions.
Disaster RecoveryDocumented rollback steps above. RTO 5 min (route flip + rollback). RPO 0 (Worker is stateless except for DO counters, which are explicitly accepted as resettable at cutover).
BackupLegacy origin remains running for 48 h. Aurora point-in-time recovery covers the underlying data. R2 nightly mirror of main covers code.
Deployment StrategyGradual deployment (10 % → 100 %) layered over DNS cutover. Two control surfaces: Worker version (per-request) and DNS route (per-region edge cache).
ObservabilityDashboard kit pinned for the 48 h fallback window: Worker CPU, error rate, auth cache hit, Hyperdrive latency, tunnel up/down, NR synthetic probe. Page on any sustained breach > 2 min.

Plan phaseOwnerImplementation environmentOutput artifactPillar emphasis
0JeffreyPHP repo (existing)rotated creds, removed dead config, git tagDR (anchor tag)
1JeffreyAWS Image Builder + Cloudflare dashboardImage Builder pipeline, 3 ASGs, 3 tunnels, ADRRedundancy (dual-AZ prd) + Deployment (rolling AMI refresh)
2JeffreyWrangler CLI6 Hyperdrive IDs, wrangler.toml updatedBackup (creds in Cloudflare Secrets Store)
3JeffreyNew repo + Wranglerauth-helper Worker × 3 envsResiliency (5xx on infra failure)
4Claude CodeLocal reposcaffolded src/, openapi.yaml, dry-runs greenObservability (logger contract)
5Claude CodeLocal repomiddleware + DO + tests; PAUSEResiliency (timeouts, error envelope)
6Claude CodeLocal repo7 handlers + tests; PAUSEBackup (PHP response fixtures)
7Claude CodeGitHub ActionsCI green; gradual deploys configuredDeployment (10 % → 100 %)
8JeffreyCloudflare + AWSDNS cutover; 48 h fallbackDR (rollback ≤ 5 min)

Verification gates (must pass before next phase begins)

Section titled “Verification gates (must pass before next phase begins)”
  • After Phase 0: git grep data_warehouse empty; tag pushed.
  • After Phase 1: mysql -h aurora-{cluster}-{env}.internal.adventive.com succeeds from a peer EC2.
  • After Phase 2: wrangler hyperdrive list shows six configs; throwaway Worker SELECT 1 passes for all six.
  • After Phase 3: curl https://adv-svc-auth-helper-{env}.adventive.workers.dev/__health returns 200 across dev/stg/prd.
  • After Phase 4: npm run dry-run:{dev,stg,prd} all green; openapi.yaml lints clean.
  • After Phase 5 (PAUSE): unit tests cover auth, db, ratelimit, response, validation; DO migration tag committed.
  • After Phase 6 (PAUSE): parity matrix shows 100 % field-for-field equivalence with PHP fixtures.
  • After Phase 7: main branch protected; CI runs all gates; smoke.sh green against stg.
  • After Phase 8: Soak baseline captured; cutover executed; 48 h fallback window logged in ADR.

Consult these before executing any phase. The list below mirrors the canonical reference memory at reference_cloudflare_documentation.md.

TopicURL
Workershttps://developers.cloudflare.com/workers/
Wrangler CI/CDhttps://developers.cloudflare.com/workers/wrangler/ci-cd/
Versions and gradual deploymentshttps://developers.cloudflare.com/workers/configuration/versions-and-deployments/gradual-deployments/
Hyperdrivehttps://developers.cloudflare.com/hyperdrive/
Hyperdrive + private DBhttps://developers.cloudflare.com/hyperdrive/configuration/connect-to-private-database/
Tunnelhttps://developers.cloudflare.com/cloudflare-one/connections/connect-networks/
Tunnel TCP routinghttps://developers.cloudflare.com/cloudflare-one/connections/connect-networks/use-cases/tcp/
Durable Objectshttps://developers.cloudflare.com/durable-objects/
DO alarmshttps://developers.cloudflare.com/durable-objects/api/alarms/
KVhttps://developers.cloudflare.com/kv/
Service bindingshttps://developers.cloudflare.com/workers/runtime-apis/bindings/service-bindings/
Workers Logs / Tailhttps://developers.cloudflare.com/workers/observability/logs/
Logpushhttps://developers.cloudflare.com/logs/logpush/
Workers Analytics Enginehttps://developers.cloudflare.com/analytics/analytics-engine/
Secrets Storehttps://developers.cloudflare.com/secrets-store/
Terraform providerhttps://developers.cloudflare.com/terraform/
  • decisions/2026-MM-DD-tunnel-topology.md — single vs dual-AZ per env (Phase 1 deliverable).
  • decisions/2026-MM-DD-rate-limit-fallback.md — confirm whether RATE_LIMIT_KV fallback is wired or held in reserve (Phase 5).
  • decisions/2026-MM-DD-cutover.md — actual cutover timestamps, soak metrics, fallback decommission timestamp (Phase 8).
  • decisions/2026-MM-DD-terraform-import.md — when to import Hyperdrive + Tunnel into Terraform (deferred from Phase 1; revisit after first prod deploy).