06 — Development Implementation Steps

Version 1.2 · 2026-04-28 · Adventive Platform Engineering · Confidential

Operational, command-level walkthrough for executing the signed plan (Adventive_Public_API_Cloudflare_Migration_Plan.pdf, v2, 2026-04-22). Each phase is treated as a discrete unit of work with prerequisites, commands, deliverables, validation, and a six-pillar design callout (Redundancy · Resiliency · Disaster Recovery · Backup · Deployment Strategy · Observability).

Changelog: v1.2 (2026-04-28) — Sizing revised: stg drops to 1 instance / 1 AZ (parity with dev); only prd retains multi-AZ + 2 replicas. Added §1.1.1 with concrete, ready-to-commit Image Builder component YAML and an aws-ia/ec2-image-builder Terraform reference so the pipeline build is copy-paste, not from-scratch. v1.1 (2026-04-28) — Phase 1 rewritten: hand-launched EC2 + manual cloudflared install replaced with EC2 Image Builder + Auto Scaling Group + replica HA. Added “Tunnel maintenance and updates” section covering zero-downtime rolling updates for cloudflared and the underlying Linux. Tunnel-per-environment confirmed as a locked decision with rationale. v1.0 (2026-04-28) — Initial implementation walkthrough.

Document scope

This chapter does not change the plan. It is the implementation surface for Phases 0 through 8 against the existing repository:

Local repo: /Users/jlambert/Repositories/GitHub/Adventive/adventive-public-api-worker/
Intended GitHub slug: Adventive/adventive-public-api-worker
Implementation driver: Claude Code (Cowork’s role ends with this chapter)
Plan reference: ../public-api-cf-migration/Adventive_Public_API_Cloudflare_Migration_Plan.pdf
Workers SOP: ../../platform/cloudflare/workers/Adventive_Cloudflare_Workers_SOP.pdf

If a step in this chapter conflicts with the signed plan, the plan wins. Open an ADR under decisions/ and update both before proceeding.

How to use this chapter

Phases run in order. Each phase has explicit prerequisites; do not skip.
Phases 0–3 and 8 are executed by Jeffrey directly (infrastructure, provisioning, cutover). Phases 4–7 are executed in Claude Code from the handoff package (handoff/CLAUDE_CODE_KICKOFF.md).
Mandatory pause checkpoints: end of Phase 5 and end of Phase 6. Claude Code stops and waits for continue.
Every command shown should be considered a first-pass; verify against current Cloudflare docs (URL inline) before executing in production.

Pre-flight prerequisites (one-time)

The following must be true before Phase 0 begins. Treat this as a gate.

Check	Verification	Owner
Cloudflare account on Workers Paid plan	dashboard → Workers → Plans (Hyperdrive + Durable Objects require Paid)	Jeffrey
All three zones active in Cloudflare	`adventive.dev`, `adventivestg.com`, `adventive.com`	Jeffrey
`wrangler` CLI installed and authed	`wrangler whoami` returns the platform service account	Jeffrey
Cloudflare API token (CI scope)	scoped to: Workers Scripts:Edit, Hyperdrive:Edit, Tunnel:Edit; stored in 1Password under “CF — adventive-public-api-worker CI”	Jeffrey
AWS access to Aurora VPCs (dev / stg / prd)	sufficient to launch EC2 instances in each via ASG	Jeffrey
Read-only Aurora user provisioned	per cluster (`console`, `aggregate`) — Hyperdrive will use these	Jeffrey
AWS EC2 Image Builder available in target region	dashboard → Image Builder; service-linked role created	Jeffrey
AWS Secrets Manager access for tunnel credentials	one secret per env: `/adventive/cloudflared/dev`, `…/stg`, `…/prd`	Jeffrey
AWS SSM Parameter Store available	for AMI version pointer at `/adventive/cloudflared/ami-id-latest`	Jeffrey
Terraform backend configured	S3 + DynamoDB lock table, scoped per env; module `infra/cloudflared/` will live in this repo	Jeffrey
New Relic Workers + EC2 integration ready	account ID + license key; CloudWatch → NR log subscription operational	Jeffrey
Local repo present	`~/Repositories/GitHub/Adventive/adventive-public-api-worker/` with git init, wrangler.toml stub, package.json	Confirmed 2026-04-28
`wrangler hyperdrive --help` works	minimum Wrangler 4.x	Jeffrey
Codeowners + branch protection ready (post-push)	`Adventive/adventive-public-api-worker` repo creation deferred to Phase 7	Jeffrey

Cloudflare references: Workers Plans · Wrangler install · API tokens

Phase 0 — Pre-migration cleanup (existing PHP repo)

Touches only the existing CodeIgniter PHP application; no Cloudflare work. Done first to stop the codebase from drifting while the rewrite is in flight.

Steps

In the legacy PHP repo, remove the data_warehouse block from application/config/database.json for all three environments (lines 99–121 dev, 260–282 prod, 377–399 stg). No model loads $CI->load->database('data_warehouse') — verified during plan authoring.
Rotate every MySQL credential currently committed in plaintext in database.json. Move to environment variables or AWS Secrets Manager. Force-rotate even the dev set; assume the file’s history is compromised.
Confirm Cloudflare DNS for the three zones resolves and the existing API hosts have low-TTL records (≤ 300 s) staged for cutover.
Tag the legacy repo: pre-cloudflare-migration-2026-04 so we have a known rollback point.

Validation

git grep data_warehouse returns nothing in the PHP repo.
All MySQL connection strings in CI/CD secret stores have been rotated; old values revoked at the Aurora cluster.
dig +short api.adventive.com and the staging/dev equivalents return the current PHP origin with TTL ≤ 300.

Six-pillar callout — Phase 0

Pillar	Treatment
Redundancy	No new redundancy. Existing PHP origin remains the sole serving path.
Resiliency	Reduced exposure: removing dead `data_warehouse` config eliminates a config-load risk path.
Disaster Recovery	Git tag `pre-cloudflare-migration-2026-04` is the rollback anchor for the entire migration.
Backup	Aurora automated backups already configured (verify retention ≥ 7 d on each cluster before proceeding).
Deployment Strategy	Routine PHP deploy via existing pipeline; no Cloudflare deploy in this phase.
Observability	Confirm legacy New Relic APM tags remain in place — they become the baseline that the new Worker is compared against during Phase 8.

Phase 1 — Cloudflare Tunnel infrastructure

Bridge each Aurora VPC to Cloudflare’s edge so Hyperdrive can reach the private DB endpoints without a public listener. One dedicated tunnel per environment (3 tunnels total). Each tunnel is served by an Auto Scaling Group of cloudflared replicas built from an EC2 Image Builder AMI.

This phase is more elaborate than the original plan called for. The upgrade exists because the tunnel is a foundational, long-lived dependency: hand-rolled EC2 instances are unacceptable for something every Worker DB call traverses. The whole pattern is captured in the standing memory Cloudflare Tunnel infra pattern (Adventive standard) and will graduate to docs/platform/cloudflare/tunnel/ once this rollout completes.

Cloudflare reference: Cloudflare Tunnel · cloudflared install · TCP routing · Run as replica

AWS reference: EC2 Image Builder · Auto Scaling Group instance refresh · SSM Parameter Store

1.0 — Architectural decisions (locked)

Decision	Choice	Rationale
Tunnel separation	Three tunnels — one per env (dev, stg, prd)	Credential blast-radius isolation; rotate prod creds without touching dev/stg; per-env audit log; each tunnel’s `config.yml` only carries its own ingress; cost is identical (tunnels are free).
Compute platform	EC2 + ASG (sizing per-env, see Sizing row), not Fargate	Matches existing Adventive AWS footprint; single networking model with Aurora VPCs; SSM/CloudWatch agents already standardized. Re-evaluate if Adventive adopts ECS broadly.
AMI build	EC2 Image Builder, weekly cadence	Single mechanism updates BOTH cloudflared and the Linux base; reproducible, signed AMI; eliminates drift across env-specific instances.
HA model	Replica registration (multiple cloudflared instances ↔ same tunnel UUID)	Native Cloudflare feature; edge load-balances and fails over automatically; foundation for zero-downtime updates.
Sizing	dev & stg: 1 × t3.micro / 1 AZ; prd: 2 × t3.small / 2 AZs	dev and stg are non-customer-facing and tolerate brief downtime during update; cost-optimize there with `t3.micro` and a single AZ. Only prd carries customer traffic — it requires ≥ 2 replicas across 2 AZs for AZ-failure resilience and rolling-update headroom.
Credential storage	AWS Secrets Manager (primary) + Cloudflare Secrets Store (secondary) + 1Password (break-glass)	Three-tier; user-data pulls from Secrets Manager at boot; AMI itself is environment-agnostic.

Why not one tunnel for dev+stg? Cost savings are zero (Cloudflare Tunnel is a free product). The savings argument is operational simplicity, but sharing a tunnel couples credential rotation, ingress edits, and audit logs across environments — that’s a real cost paid every time someone touches the dev config. Strict per-env separation is the standing default.

1.1 — Build the cloudflared AMI in EC2 Image Builder

Single Image Builder pipeline produces a versioned AMI consumed by all three environments.

EC2 Image Builder pipeline: adv-cflared-pipeline
├── Source AMI: Ubuntu 22.04 LTS (Canonical, latest)
├── Components (build phase):
│   ├── adv-cflared-base               (apt update; unattended-upgrades; clock sync)
│   ├── adv-cflared-cloudflared        (install cloudflared from pkg.cloudflare.com apt repo)
│   ├── adv-cflared-systemd-unit       (drop /etc/systemd/system/cloudflared.service.d/override.conf)
│   ├── adv-cflared-bootstrap-script   (drop /usr/local/sbin/cflared-bootstrap.sh)
│   ├── adv-cflared-cloudwatch-agent   (install + base config)
│   └── adv-cflared-ssm-agent          (verified present; SSM-managed)
├── Test phase:
│   ├── cloudflared --version           returns ≥ 2025.x
│   ├── systemctl list-unit-files       includes cloudflared.service
│   └── ssm-cli get-instance-information returns ManagedInstance
├── Distribution: shared AMI in target region (us-east-1 first; us-west-2 for DR)
├── Output:
│   └── SSM Parameter Store: /adventive/cloudflared/ami-id-latest
│         (Lambda hook updates this on successful pipeline run)
└── Schedule: cron(0 6 ? * TUE *)   # weekly Tuesday 06:00 UTC

The AMI does not contain tunnel credentials. Bootstrap happens at instance launch. AMI itself is environment-agnostic and reusable across dev/stg/prd.

1.1.1 — Prebuilt components and Terraform reference (do not write from scratch)

The pipeline is not hand-authored. AWS publishes a maintained Terraform module that builds the pipeline, and most of the components above already exist as AWS-managed components. Only the adv-cflared-cloudflared and adv-cflared-bootstrap-script components are Adventive-specific — both have full source below.

Terraform module to use:

module "cflared_pipeline" {
  source  = "aws-ia/ec2-image-builder/aws"
  version = "~> 1.5"

  pipeline_name = "adv-cflared-pipeline"
  schedule_expression = "cron(0 6 ? * TUE *)"
  recipe_name   = "adv-cflared-recipe"
  recipe_version = "1.0.0"
  parent_image   = "arn:aws:imagebuilder:us-east-1:aws:image/ubuntu-server-22-lts-x86/x.x.x"

  # AWS-managed components — reuse as-is, no editing needed
  managed_components = [
    "arn:aws:imagebuilder:us-east-1:aws:component/update-linux/x.x.x",
    "arn:aws:imagebuilder:us-east-1:aws:component/aws-cli-version-2-linux/x.x.x",
    "arn:aws:imagebuilder:us-east-1:aws:component/amazon-cloudwatch-agent-linux/x.x.x",
    "arn:aws:imagebuilder:us-east-1:aws:component/amazon-ssm-agent-linux/x.x.x",
    "arn:aws:imagebuilder:us-east-1:aws:component/cis-level-1-ubuntu-22-04-lts/x.x.x",
  ]

  # Adventive-authored components — both sources below
  custom_components = [
    aws_imagebuilder_component.cflared.arn,
    aws_imagebuilder_component.bootstrap.arn,
  ]

  output_ami_ssm_parameter = "/adventive/cloudflared/ami-id-latest"
  ami_lifecycle_retain_count = 4
}

The module wires the EventBridge rule on pipeline success, publishes the AMI ID to SSM, and applies the AMI lifecycle policy. Module repo: https://github.com/aws-ia/terraform-aws-ec2-image-builder.

Custom component 1 — adv-cflared-cloudflared.yml:

name: adv-cflared-cloudflared
description: Install cloudflared from the official Cloudflare APT repo.
schemaVersion: 1.0
phases:
  - name: build
    steps:
      - name: AddCloudflareAptRepo
        action: ExecuteBash
        inputs:
          commands:
            - sudo mkdir -p --mode=0755 /usr/share/keyrings
            - curl -fsSL https://pkg.cloudflare.com/cloudflare-main.gpg | sudo tee /usr/share/keyrings/cloudflare-main.gpg > /dev/null
            - echo 'deb [signed-by=/usr/share/keyrings/cloudflare-main.gpg] https://pkg.cloudflare.com/cloudflared jammy main' | sudo tee /etc/apt/sources.list.d/cloudflared.list
            - sudo apt-get update
      - name: InstallCloudflared
        action: ExecuteBash
        inputs:
          commands:
            - sudo apt-get install -y cloudflared
      - name: DropSystemdOverride
        action: CreateFile
        inputs:
          - path: /etc/systemd/system/cloudflared.service.d/override.conf
            permissions: 0644
            content: |
              [Service]
              ExecStart=
              ExecStart=/usr/bin/cloudflared tunnel --config /etc/cloudflared/config.yml run
              Restart=always
              RestartSec=5s
  - name: validate
    steps:
      - name: VersionCheck
        action: ExecuteBash
        inputs:
          commands:
            - cloudflared --version | grep -E '^cloudflared version 20'
      - name: UnitPresent
        action: ExecuteBash
        inputs:
          commands:
            - systemctl list-unit-files | grep -q '^cloudflared.service'

Custom component 2 — adv-cflared-bootstrap-script.yml:

name: adv-cflared-bootstrap-script
description: Drop the env-resolving cflared-bootstrap.sh into /usr/local/sbin.
schemaVersion: 1.0
phases:
  - name: build
    steps:
      - name: DropBootstrap
        action: S3Download
        inputs:
          - source: s3://adventive-platform-artifacts/cflared/cflared-bootstrap.sh
            destination: /usr/local/sbin/cflared-bootstrap.sh
      - name: SetExec
        action: ExecuteBash
        inputs:
          commands:
            - sudo chmod 0755 /usr/local/sbin/cflared-bootstrap.sh
      - name: WireOnBoot
        action: CreateFile
        inputs:
          - path: /etc/systemd/system/cflared-bootstrap.service
            permissions: 0644
            content: |
              [Unit]
              Description=Adventive cloudflared bootstrap (resolve env, fetch creds)
              Before=cloudflared.service
              [Service]
              Type=oneshot
              ExecStart=/usr/local/sbin/cflared-bootstrap.sh
              [Install]
              WantedBy=cloudflared.service
      - name: EnableUnit
        action: ExecuteBash
        inputs:
          commands:
            - sudo systemctl enable cflared-bootstrap.service

What this means in practice: Phase 1 build is roughly one terraform apply once the bootstrap script is uploaded to S3 and the two YAML components are committed. The pipeline self-runs on its weekly schedule from that point forward.

1.2 — Provision tunnels in Cloudflare

Done once per env via Cloudflare API or cloudflared tunnel create from an admin workstation. Capture the tunnel UUID and credentials JSON.

# Run from a credentialed admin shell (not on the future ASG instances)
cloudflared tunnel login
cloudflared tunnel create adv-aurora-tunnel-dev
cloudflared tunnel create adv-aurora-tunnel-stg
cloudflared tunnel create adv-aurora-tunnel-prd

Each command emits a credentials JSON. Store each in all three tiers:

# AWS Secrets Manager (primary — used by user-data)
aws secretsmanager create-secret \
  --name /adventive/cloudflared/dev \
  --secret-string file://dev-creds.json

# Cloudflare Secrets Store (secondary — recovery)
wrangler secrets-store secret create \
  --store-id adv-cflared-secrets \
  --name tunnel-creds-dev \
  --value "$(cat dev-creds.json)"

# 1Password (tertiary — break-glass; manual)

Securely shred the local credentials JSON files. Repeat for stg and prd.

1.3 — Author the per-env tunnel `config.yml`

Stored in AWS Secrets Manager alongside the credentials so the bootstrap script can fetch both atomically.

# adventive/cloudflared/dev — config.yml
tunnel: <dev-tunnel-uuid>
credentials-file: /etc/cloudflared/<dev-tunnel-uuid>.json
metrics: 0.0.0.0:2000   # exposed for ASG health check
ingress:
  - hostname: aurora-console-dev.internal.adventive.com
    service: tcp://console-cluster.dev.cluster.us-east-1.rds.amazonaws.com:3306
  - hostname: aurora-aggregate-dev.internal.adventive.com
    service: tcp://aggregate-cluster.dev.cluster.us-east-1.rds.amazonaws.com:3306
  - service: http_status:404

1.4 — Bootstrap script (baked into the AMI)

/usr/local/sbin/cflared-bootstrap.sh runs once on first boot via the systemd cloudflared.service.d/override.conf ExecStartPre hook. It queries instance metadata for the env tag, fetches the matching secret, writes config, and starts cloudflared.

#!/usr/bin/env bash
set -euo pipefail
ENV=$(curl -s http://169.254.169.254/latest/meta-data/tags/instance/adv:env)
REGION=$(curl -s http://169.254.169.254/latest/meta-data/placement/region)
SECRET="/adventive/cloudflared/${ENV}"

aws secretsmanager get-secret-value \
  --region "${REGION}" \
  --secret-id "${SECRET}" \
  --query SecretString --output text \
  | jq -r '.config_yaml'  > /etc/cloudflared/config.yml

aws secretsmanager get-secret-value \
  --region "${REGION}" \
  --secret-id "${SECRET}" \
  --query SecretString --output text \
  | jq -r '.credentials_json' > /etc/cloudflared/$(jq -r .TunnelID /etc/cloudflared/config.yml).json

chmod 0600 /etc/cloudflared/*.json

1.5 — Launch Template + ASG (Terraform module)

A single Terraform module infra/cloudflared/ parameterized by env is applied three times (dev, stg, prd workspaces). Key bits:

locals {
  # Sizing per env: dev & stg are non-customer-facing (single AZ, t3.micro);
  # only prd carries customer traffic and gets multi-AZ + 2 replicas.
  is_prod      = var.env == "prd"
  instance_sz  = local.is_prod ? "t3.small" : "t3.micro"
  asg_min      = local.is_prod ? 2 : 1
  asg_desired  = local.is_prod ? 2 : 1
  asg_max      = local.is_prod ? 4 : 2
  asg_subnets  = local.is_prod ? [var.subnet_az_a, var.subnet_az_b] : [var.subnet_az_a]
}

resource "aws_launch_template" "cflared" {
  name_prefix = "adv-cflared-${var.env}-"
  # AMI resolved from SSM at apply time
  image_id      = data.aws_ssm_parameter.cflared_ami.value
  instance_type = local.instance_sz

  iam_instance_profile { name = aws_iam_instance_profile.cflared.name }
  vpc_security_group_ids = [aws_security_group.cflared.id]
  metadata_options { http_tokens = "required"; instance_metadata_tags = "enabled" }

  tag_specifications {
    resource_type = "instance"
    tags = { "adv:env" = var.env, Name = "adv-cflared-${var.env}", OwnedBy = "platform" }
  }
}

resource "aws_autoscaling_group" "cflared" {
  name                = "adv-cflared-${var.env}"
  min_size            = local.asg_min
  max_size            = local.asg_max
  desired_capacity    = local.asg_desired
  vpc_zone_identifier = local.asg_subnets
  health_check_type           = "EC2"
  health_check_grace_period   = 180

  launch_template {
    id      = aws_launch_template.cflared.id
    version = "$Latest"
  }

  instance_refresh {
    strategy = "Rolling"
    preferences {
      # In dev/stg the ASG runs a single instance, so refresh launches a new
      # one before terminating the old (count goes 1 → 2 → 1). Replica HA
      # holds during the swap. In prd, with 2 replicas, count goes 2 → 3 → 2.
      min_healthy_percentage = 50
      instance_warmup        = 120
      auto_rollback          = true
    }
    triggers = ["launch_template"]
  }
}

The security group allows outbound 443/TCP to 0.0.0.0/0 (Cloudflare edge), outbound 3306/TCP to the two Aurora cluster SGs only, and no inbound rules.

1.6 — Wire AMI auto-promotion

EventBridge rule fires on Image Builder pipeline success → Lambda updates /adventive/cloudflared/ami-id-latest → updates each environment’s launch-template version → triggers ASG instance refresh per env on a staggered schedule (dev Tue 07:00, stg Tue 09:00, prd Tue 11:00 UTC). Each refresh is gated by the prior env’s success (Step Functions state machine adv-cflared-rolling-update).

Validation

aws imagebuilder list-image-pipelines shows adv-cflared-pipeline.
/adventive/cloudflared/ami-id-latest resolves to a current AMI.
aws autoscaling describe-auto-scaling-groups returns three groups; each reports HealthyInstances == DesiredCapacity for ≥ 5 minutes.
Cloudflare dashboard → Networks → Tunnels shows each tunnel status Healthy with the expected replica count (1 dev, 1 stg, 2 prd).
From a peer EC2: mysql -h aurora-console-prd.internal.adventive.com -u ro_user -p connects via the tunnel.
wrangler hyperdrive test connection (Phase 2 prerequisite) succeeds.

Deliverables

Image Builder pipeline + 6 components committed under infra/imagebuilder/.
Terraform module infra/cloudflared/ with three workspaces applied.
Three Cloudflare tunnels with their UUIDs and credentials secured in three credential tiers.
ADR decisions/2026-MM-DD-tunnel-infra-pattern.md capturing the locked decisions in §1.0 and any deviations specific to Adventive’s environment.
EventBridge → Lambda → Step Functions wiring for AMI auto-promotion.

Six-pillar callout — Phase 1

Pillar	Treatment
Redundancy	prd ASG runs ≥ 2 replicas across 2 AZs; Cloudflare edge load-balances replica registrations on the same tunnel UUID; ASG self-replaces failed instances. dev/stg run a single replica in a single AZ — brief gap during instance replacement is acceptable for non-customer-facing traffic.
Resiliency	Outbound-only tunnel (no inbound SG rules on Aurora); cloudflared auto-reconnects to edge; ASG `auto_rollback` reverts a bad AMI promotion; instance refresh `MinHealthyPercentage=50` guarantees ≥ 1 healthy replica throughout.
Disaster Recovery	RTO ≤ 5 min for instance loss (ASG auto-launch). RTO ≤ 30 min for AMI rollback (revert SSM param + manual instance refresh). Multi-region playbook: build AMI in us-west-2 monthly; instructions in §1.7 maintenance section. RPO = 0 (cloudflared is stateless).
Backup	Three-tier credential storage (Secrets Manager / CF Secrets Store / 1Password). Image Builder retains four most recent AMI revisions. Terraform state in S3 + DynamoDB lock. ASG and launch-template versions retained indefinitely.
Deployment Strategy	Weekly Image Builder run → SSM param update → Step Functions orchestrates env-staggered ASG instance refresh (dev → stg → prd, gated by prior success). Manual emergency promotion via `terraform apply` + ASG instance refresh trigger.
Observability	cloudflared metrics endpoint `:2000` scraped by CloudWatch agent; logs forwarded CloudWatch → New Relic. Custom CloudWatch metrics: `cflared_replica_count_per_tunnel`, `cflared_systemd_restart_count`, `imagebuilder_ami_age_days`. Alarms: replica count below expected for 2 min (page); restart frequency > 3/h per instance (page); AMI age > 21 days (warn — pipeline failure).

Tunnel maintenance and updates (cross-cutting, applies post-Phase 1)

This section describes how routine and unplanned changes to the tunnel infrastructure happen without taking the public API offline.

Routine: weekly Linux + cloudflared update (zero-downtime)

This is the default mechanism. No human in the loop.

Tuesday 06:00 UTC — Image Builder pipeline runs; pulls current Ubuntu LTS, latest cloudflared from pkg.cloudflare.com, runs the build + test phases.
Pipeline success — EventBridge fires; Lambda writes the new AMI ID to /adventive/cloudflared/ami-id-latest.
Step Functions: adv-cflared-rolling-update orchestrates:
- dev refresh at Tue 07:00 UTC. ASG instance refresh triggers because the launch template’s resolved AMI changed. With desired=1 and MinHealthyPercentage=50, ASG actually launches a new instance to bring count to 2, waits 120 s for warm-up, then terminates the old instance — momentarily 2 replicas, never 0.
- stg refresh at Tue 09:00 UTC, gated by dev success. Same single-replica mechanic as dev: count goes 1 → 2 → 1.
- prd refresh at Tue 11:00 UTC, gated by stg success. With desired=2, count goes 2 → 3 → 2 (one new replica added before the first old replica terminates), repeated for the second old replica.
Each refresh — observability dashboards must show:
- tunnel replica count never < 1
- cloudflared restart frequency normal
- Worker error rate flat Failure of any check pauses the Step Functions state machine and pages on-call.

Net effect: cloudflared and the Linux OS update weekly, with no operator action and no observable customer impact.

Unplanned: emergency cloudflared CVE

A vendor-disclosed CVE that pre-dates the next scheduled pipeline run:

Trigger Image Builder pipeline manually: aws imagebuilder start-image-pipeline-execution --image-pipeline-arn …
On success, manually advance the SSM param (or wait for the EventBridge handler).
Manually invoke adv-cflared-rolling-update Step Functions execution with input {"acceleration": "fast"} which collapses inter-env wait to 15 minutes.
Total RTO ~ 90 minutes from CVE disclosure to all envs patched.

Unplanned: cloudflared running but tunnel reports unhealthy

Health check signal mismatch — daemon up, but tunnel registration broken.

CloudWatch alarm cflared_replica_count_per_tunnel < expected fires.
ASG marks the affected instance unhealthy via custom health check (Lambda evaluates the metric and calls SetInstanceHealth with Unhealthy).
ASG terminates and replaces the instance — fresh boot rebuilds tunnel registration from scratch.
If condition repeats across replacements, page on-call to investigate tunnel-side issues at Cloudflare (likely creds rotated upstream or tunnel deleted).

Unplanned: AMI rollback

A new AMI passes Image Builder tests but fails post-deploy in dev (e.g., cloudflared regression Cloudflare didn’t catch).

ASG auto_rollback = true reverts dev automatically when health checks fail. Step Functions halts the schedule before stg/prd refresh fires.
On-call manually overwrites SSM param to the previous AMI ID: aws ssm put-parameter --name /adventive/cloudflared/ami-id-latest --value <previous> --overwrite.
Manually trigger ASG instance refresh per env to converge on the pinned AMI.

Routine: rotating tunnel credentials

Quarterly or post-incident:

cloudflared tunnel token --cred-file new-creds.json adv-aurora-tunnel-{env} from admin shell.
Update Secrets Manager secret in place (aws secretsmanager update-secret --secret-id …).
Trigger ASG instance refresh for that env only (aws autoscaling start-instance-refresh --auto-scaling-group-name adv-cflared-{env}).
New replicas pick up new creds at boot; old replicas drain. The tunnel UUID does not change — Hyperdrive configurations remain valid.

Routine: changing tunnel ingress

Adding a new internal hostname (e.g., a future analytics replica):

Edit the env’s Secrets Manager secret to append the new ingress block.
Trigger ASG instance refresh for that env.
New replicas come up with the updated config; cloudflared replicas process new ingress on next start.

Capacity scaling

If sustained CPU > 60 % on cloudflared instances (an indicator of high TCP throughput), bump desired_capacity and max_size in Terraform — ASG launches additional replicas; tunnel automatically uses them.

Phase 2 — Hyperdrive provisioning (6 configurations)

One Hyperdrive resource per (cluster, environment). Hyperdrive holds the DB credentials; the Worker never sees them.

Cloudflare reference: Hyperdrive overview · Create with Wrangler · Hyperdrive + Tunnel

Steps

# Console cluster — three environments
wrangler hyperdrive create adv-svc-public-api-console-dev \
  --connection-string="mysql://ro_user:DEV_PASS@aurora-console-dev.internal.adventive.com:3306/console"
wrangler hyperdrive create adv-svc-public-api-console-stg \
  --connection-string="mysql://ro_user:STG_PASS@aurora-console-stg.internal.adventive.com:3306/console"
wrangler hyperdrive create adv-svc-public-api-console-prd \
  --connection-string="mysql://ro_user:PRD_PASS@aurora-console-prd.internal.adventive.com:3306/console"

# Aggregate cluster — three environments
wrangler hyperdrive create adv-svc-public-api-aggregate-dev \
  --connection-string="mysql://ro_user:DEV_PASS@aurora-aggregate-dev.internal.adventive.com:3306/aggregate_ro"
wrangler hyperdrive create adv-svc-public-api-aggregate-stg \
  --connection-string="mysql://ro_user:STG_PASS@aurora-aggregate-stg.internal.adventive.com:3306/aggregate_ro"
wrangler hyperdrive create adv-svc-public-api-aggregate-prd \
  --connection-string="mysql://ro_user:PRD_PASS@aurora-aggregate-prd.internal.adventive.com:3306/aggregate_ro"

Each command returns a Hyperdrive ID. Capture all six and update wrangler.toml in the repo, replacing the six REPLACE_WITH_*_ID placeholders. Commit with message: chore(infra): wire Hyperdrive IDs for console + aggregate (dev/stg/prd).

Validation

wrangler hyperdrive list returns six configurations.
A throwaway Worker with DB_CONSOLE and DB_AGGREGATE bindings can SELECT 1 against each — proves end-to-end (Worker → Hyperdrive → Tunnel → Aurora).
wrangler.toml no longer contains REPLACE_WITH_*_ID placeholders; wrangler deploy --dry-run --env dev passes.

Six-pillar callout — Phase 2

Pillar	Treatment
Redundancy	Hyperdrive itself is a globally-distributed pool. Pair with the dual-AZ tunnel (Phase 1) for end-to-end redundancy.
Resiliency	Pool absorbs cold-start cost and handles transient DB blips. Set application-level timeout < 5 s; let Hyperdrive surface back-pressure.
Disaster Recovery	Hyperdrive can be re-provisioned in minutes using the captured `wrangler hyperdrive create` commands. Aurora point-in-time recovery is the upstream DR control.
Backup	Aurora automated backups + manual snapshot prior to first prod traffic. Hyperdrive stores no data — credentials are the only artifact (kept in Cloudflare Secrets Store).
Deployment Strategy	Hyperdrive ID changes are environment-scoped in `wrangler.toml`. Treat ID rotation (e.g., credential rotation) as a `wrangler deploy --env <env>` event with version routing fallback.
Observability	Enable Hyperdrive metrics in the Cloudflare dashboard; ship to New Relic via Logpush. Alarms: pool-acquire latency p95 > 100 ms, error rate > 0.5%.

Phase 3 — Auth helper Worker (`adv-svc-auth-helper`)

The auth helper is a prerequisite for the public API Worker — Phase 4 cannot dry-run-deploy until the helper service exists at all three names.

Cloudflare reference: Service bindings · KV namespaces

Steps

Create a sibling repo (or sub-folder under a services/ monorepo — Jeffrey’s call): Adventive/adventive-auth-helper-worker. Same TypeScript/Hono pattern as the public API Worker.

Provision three KV namespaces:

wrangler kv namespace create kv-adv-svc-auth-helper-cache-dev
wrangler kv namespace create kv-adv-svc-auth-helper-cache-stg
wrangler kv namespace create kv-adv-svc-auth-helper-cache-prd

Bindings (per env): CACHE (KV) + DB_CONSOLE (Hyperdrive — same console cluster as the public API).
Implement: accept X-Api-Key + X-Integration-Key → 5-min KV cache lookup → on miss, query api table via DB_CONSOLE → return { valid: boolean, accountId: number, rph: number }.
Smoke endpoint: GET /__health returns 200 with COMMIT_SHA.
Deploy all three: wrangler deploy --env dev, --env stg, --env prd. Verify each wrangler tail --env <env> is quiet.
From a one-off Worker (or curl to a private deploy URL), confirm a real key returns valid: true against each environment.

Six-pillar callout — Phase 3

Pillar	Treatment
Redundancy	Worker is globally replicated by default. KV is multi-region eventually-consistent — acceptable for 5-min TTL auth cache.
Resiliency	On Hyperdrive miss + KV miss + DB unreachable, return 503 (not 401) so callers retry rather than treating an outage as auth failure.
Disaster Recovery	KV caches can be flushed and rebuilt in minutes. Helper is the smallest stateful surface; document its restoration as < 10 min.
Backup	KV cache is regenerable from the `api` table — no separate backup needed. The `api` table itself is covered by Aurora backup.
Deployment Strategy	Use `wrangler deploy --env <env>` with version routing; promote 0 % → 10 % → 100 % over a 30-min window for prd.
Observability	Emit structured logs (`auth.lookup`, `auth.cache.hit`, `auth.cache.miss`, `auth.failure`). Forward to New Relic via tail Worker. Alert on cache-miss rate > 25 % over 5 min (suggests rotation event or KV outage).

Phase 4 — Public API Worker scaffolding

This is the entry point for the Claude Code session. The kickoff prompt is handoff/CLAUDE_CODE_KICKOFF.md. Cowork’s job ends here; Claude Code drives.

Steps

Confirm ~/Repositories/GitHub/Adventive/adventive-public-api-worker/ has git init and that CLAUDE.md + PLAN.md are in the repo root (already true as of 2026-04-28).

Install dependencies declared in package.json:

cd ~/Repositories/GitHub/Adventive/adventive-public-api-worker
npm install

Author tsconfig.json, vitest.config.ts, .dev.vars.example, .gitignore, .editorconfig, eslint.config.js, prettier.config.cjs to project conventions.
Author openapi.yaml (OpenAPI 3.1) covering every endpoint in the migration map, including /v1.0/* and /v2.0/* aliases. Keep field names byte-identical to the PHP responses.
Generate types: npx openapi-typescript openapi.yaml -o src/types/api.ts.
Stub src/index.ts with a Hono app exposing /__health, /openapi.yaml, and /docs (Redoc). Wire the env interface in src/lib/env.ts against the bindings declared in wrangler.toml.
Validate the scaffold: npm run typecheck && npm run dry-run:dev.

Deliverables

openapi.yaml complete and lint-clean.
src/index.ts + src/lib/env.ts exporting a typed Hono app.
Three green dry-run deploys (dev, stg, prd).

Six-pillar callout — Phase 4

Pillar	Treatment
Redundancy	n/a (no traffic yet).
Resiliency	Establish error envelopes in `src/lib/response.ts` so every handler returns the same JSON error shape — uniform retry semantics for callers.
Disaster Recovery	`git init` + signed-commit hooks ensure the repo is recoverable from any developer’s local. Push to GitHub gated to end of Phase 4 once `wrangler deploy --dry-run` is green.
Backup	GitHub is the source of truth once pushed; until then, the local repo is single-host — keep an additional clone on iCloud-synced disk.
Deployment Strategy	No live deploys. Every commit must satisfy `npm run typecheck && npm run dry-run:{dev,stg,prd}` before merge.
Observability	Add `src/lib/logger.ts` (structured JSON, fields: `request_id`, `account_id`, `endpoint`, `latency_ms`, `commit_sha`) — every later phase emits through it.

Phase 5 — Core middleware (PAUSE checkpoint)

After Phase 5 ships, Claude Code stops and waits for explicit continue.

Cloudflare reference: Durable Objects · DO alarms · Hyperdrive client libs

Steps

src/lib/db.ts — Hyperdrive connection factory using mysql2/promise. Two functions: getConsole(env) and getAggregate(env). Both return typed connections with 5-second connect timeout and 10-second query timeout. Both close in ctx.waitUntil on response.
src/lib/auth.ts — Thin wrapper around the AUTH service binding. Throws HttpError(401) on valid: false. Caches the result on c.set('auth', …) for the request scope.
src/durable-objects/RateLimiter.ts — DO class with check(key, rph) RPC. Increments hourly counter; returns 429 metadata when count > rph. Uses state.storage.setAlarm(nextHourTopUTC) to self-reset.
CORS middleware — Hono cors() matched against the existing CORS_ALLOWED_ORIGINS var; preserve current allowed-headers list exactly.
src/lib/response.ts — jsonResponse, xmlResponse, csvResponse, HttpError. Field name preservation is a contract — no camelCase renaming.
src/lib/validation.ts — Date defaults (last 30 days, ET via Intl.DateTimeFormat), format, data_connector, removeZeros, advertiser_id, from, to, version.

Per-commit gate:

npm run typecheck && npm run test && \
  npm run dry-run:dev && npm run dry-run:stg && npm run dry-run:prd

Pause checkpoint

Claude Code posts a summary message and waits. The summary must include the gate results, the test count, the Worker bundle size for each env, and any TODO comments left in source.

Six-pillar callout — Phase 5

Pillar	Treatment
Redundancy	DO storage is single-region per object; for a per-key counter that is correct (one counter, one home). KV (`RATE_LIMIT_KV`) is the documented fallback if DO becomes unavailable.
Resiliency	DO `alarm()` is the hourly reset mechanism — no cron worker needed. Failure to set an alarm degrades to “rolling 1-hour windows enforced lazily on next request.”
Disaster Recovery	DOs survive worker version rollback; the `[[migrations]]` tag in `wrangler.toml` is the contract. Never delete a class without a migration tag.
Backup	Counters are ephemeral; loss is acceptable (worst case: a key gets one extra free hour). No backup needed. Document this as a known one-time behavior change at cutover.
Deployment Strategy	DO migrations are global; treat them as schema changes — separate PR, separate review. Use Wrangler version routing on first deploy after a migration.
Observability	DO emits via `state.id`, `class_name`, `key_hash`, `count_after`, `outcome` (allowed/throttled). Forward via tail Worker → New Relic. Alert on throttle rate > 5 %.

Phase 6 — Handler migration (PAUSE checkpoint)

Migrate one handler at a time, simplest first. Each handler ships with unit tests, a smoke probe, and a New Relic custom event.

Order of work (locked)

credentialscheck — auth-only; proves the service binding chain.
advertisers — DB_CONSOLE; two queries.
campaigns — DB_CONSOLE; nested queries (sites, placements, ad units, delivery groups).
analytics (v1 + v2) — DB_AGGREGATE; dual-query (kpi + eng/quartile in v2).
clickthroughs — DB_AGGREGATE; same aggregate pattern.
connector — wraps analytics aggregated per advertiser.
dataconnector — account-wide scope + convertForConnector / convertForConnectorNew. Hardest; ship last.

Per-handler checklist

Field-for-field response equivalence with the PHP origin (use tests/fixtures/php-responses/*.json captured during plan authoring).
Versioned route registration: both /handler and /v1.0/handler and /v2.0/handler where the version param is supported.
Format multiplexing via format=json|xml|csv; php and serialized return 410 Gone with a documentation link.
Handler-level rate-limit check via the RateLimiter DO before any DB call.
Logger emits endpoint, version, format, account_id, query_ms, total_ms, cached (always false for v1 — placeholder for future).

Per-commit gate

npm run lint && npm run typecheck && npm run test && \
  npm run dry-run:dev && npm run dry-run:stg && npm run dry-run:prd

Pause checkpoint

After all seven handlers are merged, Claude Code summarizes endpoint parity (matrix of endpoints × formats × versions × auth states) and waits.

Six-pillar callout — Phase 6

Pillar	Treatment
Redundancy	Workers run globally; handlers are stateless except for the DO call.
Resiliency	Wrap every Hyperdrive call in a 10-second budget and a 1-retry policy with jittered backoff. Surface 504 on exceeded budget rather than hanging.
Disaster Recovery	Each handler must be independently deployable — keep coupling (shared state, shared mutable globals) at zero so a buggy handler can be reverted without touching the rest.
Backup	Snapshot a sample of PHP responses per endpoint in `tests/fixtures/`. These become the regression contract for the lifetime of the Worker.
Deployment Strategy	Handlers ship behind feature flags read from `[vars]` (e.g., `HANDLER_ANALYTICS_V2_ENABLED`). Cutover for a handler is a flag flip + Worker version pin, not a code change.
Observability	Per-handler New Relic custom event includes the response shape hash; mismatches against the PHP baseline alert immediately during dual-run.

Phase 7 — CI / CD pipeline and smoke tests

GitHub Actions wired to the repo (created at the start of this phase, not earlier).

Cloudflare reference: Wrangler in CI · Gradual deployments

Steps

Create the GitHub repo Adventive/adventive-public-api-worker. Add the remote: git remote add origin git@github.com:Adventive/adventive-public-api-worker.git. Push main.
Configure branch protection: required checks (lint, typecheck, test, dry-run:dev, dry-run:stg, dry-run:prd), required code owner review, dismiss stale reviews on push.
Add .github/workflows/ci.yml with three jobs:
- pr-gate (on pull_request): lint → typecheck → test → secret scan → all three dry-runs.
- deploy-stg (on push to main): wrangler deploy --env stg, then scripts/smoke.sh https://api.adventivestg.com.
- deploy-prd (on git tag v*.*.*): wrangler deploy --env prd, then scripts/smoke.sh https://api.adventive.com. Use gradual deployment: 10 % → 25 min wait → 100 %.
Author scripts/smoke.sh. Required probes: GET /__health → 200, GET /credentialscheck (valid creds) → { status: true }, GET /credentialscheck (invalid) → 401, GET /advertisers (valid) → 200 with at least one row.
Wire the Cloudflare API token + account ID as GitHub repo secrets: CF_API_TOKEN, CF_ACCOUNT_ID. Restrict the token scope per pre-flight.
Set up Wrangler tail integration with New Relic by deploying a tail Worker (adv-svc-public-api-tail-{env}) that POSTs to NR Logs API.

Six-pillar callout — Phase 7

Pillar	Treatment
Redundancy	Two-tier deploy (stg → prd) ensures any regression is caught before production. Gradual deployment splits traffic between the previous and current version.
Resiliency	Smoke tests run post-deploy and fail the workflow on any non-200. Alarm if smoke fails more than once consecutively in stg (often signals upstream problem, not code).
Disaster Recovery	Rollback procedure: `wrangler rollback --env <env>` (uses Cloudflare’s deployment history). Document RTO of 5 min for any single environment. Archive the previous tag’s bundle in R2 as belt-and-braces.
Backup	GitHub repo is the source of truth. Tag every prod release. Mirror `main` to an internal Cloudflare R2 bucket nightly via Action (regenerable, but cheap insurance).
Deployment Strategy	Gradual deployment for prd is mandatory: 10 % canary → 25 min health window → 100 %. CI auto-aborts on smoke failure.
Observability	New Relic deploy markers fire on every successful deploy event. Synthetic probes (every 60 s) hit `/__health` and `/credentialscheck` against all three envs.

Phase 8 — Staging soak and DNS cutover

Tier-2 minimum soak: 2 hours on staging before any prod promotion.

Soak protocol (staging)

Run a load profile against api.adventivestg.com matching the production p50 / p95 of the legacy API for the last 14 days. Source: New Relic transaction trace export.
Compare response shapes endpoint-by-endpoint against the legacy PHP stage in real time (the dual-run reconciler from tests/parity/).
Watch dashboards for the full 2-hour window:
- Hyperdrive p95 acquire latency
- DO RateLimiter invocation count and error rate
- Worker CPU time p95 (must stay < 50 ms)
- Auth helper cache-hit ratio (must stay > 70 %)
- Tunnel Up status (must remain Up continuously)
Run the Google Data Studio connector against api.adventivestg.com /dataconnector and confirm schema parity vs. legacy.

DNS cutover (production)

Confirm legacy origin TTL on api.adventive.com is ≤ 300 s (set in Phase 0).
Tag a release: git tag v1.0.0 && git push origin v1.0.0. CI deploys prd at 10 % canary → 25 min → 100 %.
Flip the Cloudflare route for api.adventive.com/* from the legacy origin to adv-svc-public-api-prd.
Watch live dashboards for 60 minutes. Compare against the soak baseline.
Keep the legacy CodeIgniter server running for 48 hours as fallback. Do not stop the PHP-FPM service.
After the 48-hour fallback, decommission the legacy origin per the runbook (Chapter 04). Update decisions/2026-MM-DD-cutover.md with final timestamps.

Rollback (any time during cutover)

Flip the Cloudflare route back to the legacy origin (single dashboard click).
wrangler rollback --env prd to the previous version (5 minutes).
Open an incident; do not re-attempt cutover until the trigger is root-caused.

Six-pillar callout — Phase 8

Pillar	Treatment
Redundancy	Two parallel serving paths exist for 48 hours: legacy origin (warm fallback) and Worker (active). Rollback is a route flip.
Resiliency	Cutover happens during a low-traffic window (Tuesday 14:00–16:00 ET historically lowest per New Relic). 25-min canary phase absorbs unexpected regressions.
Disaster Recovery	Documented rollback steps above. RTO 5 min (route flip + rollback). RPO 0 (Worker is stateless except for DO counters, which are explicitly accepted as resettable at cutover).
Backup	Legacy origin remains running for 48 h. Aurora point-in-time recovery covers the underlying data. R2 nightly mirror of `main` covers code.
Deployment Strategy	Gradual deployment (10 % → 100 %) layered over DNS cutover. Two control surfaces: Worker version (per-request) and DNS route (per-region edge cache).
Observability	Dashboard kit pinned for the 48 h fallback window: Worker CPU, error rate, auth cache hit, Hyperdrive latency, tunnel up/down, NR synthetic probe. Page on any sustained breach > 2 min.

Cross-cutting traceability

Plan phase	Owner	Implementation environment	Output artifact	Pillar emphasis
0	Jeffrey	PHP repo (existing)	rotated creds, removed dead config, git tag	DR (anchor tag)
1	Jeffrey	AWS Image Builder + Cloudflare dashboard	Image Builder pipeline, 3 ASGs, 3 tunnels, ADR	Redundancy (dual-AZ prd) + Deployment (rolling AMI refresh)
2	Jeffrey	Wrangler CLI	6 Hyperdrive IDs, wrangler.toml updated	Backup (creds in Cloudflare Secrets Store)
3	Jeffrey	New repo + Wrangler	auth-helper Worker × 3 envs	Resiliency (5xx on infra failure)
4	Claude Code	Local repo	scaffolded src/, openapi.yaml, dry-runs green	Observability (logger contract)
5	Claude Code	Local repo	middleware + DO + tests; PAUSE	Resiliency (timeouts, error envelope)
6	Claude Code	Local repo	7 handlers + tests; PAUSE	Backup (PHP response fixtures)
7	Claude Code	GitHub Actions	CI green; gradual deploys configured	Deployment (10 % → 100 %)
8	Jeffrey	Cloudflare + AWS	DNS cutover; 48 h fallback	DR (rollback ≤ 5 min)

Verification gates (must pass before next phase begins)

After Phase 0: git grep data_warehouse empty; tag pushed.
After Phase 1: mysql -h aurora-{cluster}-{env}.internal.adventive.com succeeds from a peer EC2.
After Phase 2: wrangler hyperdrive list shows six configs; throwaway Worker SELECT 1 passes for all six.
After Phase 3: curl https://adv-svc-auth-helper-{env}.adventive.workers.dev/__health returns 200 across dev/stg/prd.
After Phase 4: npm run dry-run:{dev,stg,prd} all green; openapi.yaml lints clean.
After Phase 5 (PAUSE): unit tests cover auth, db, ratelimit, response, validation; DO migration tag committed.
After Phase 6 (PAUSE): parity matrix shows 100 % field-for-field equivalence with PHP fixtures.
After Phase 7: main branch protected; CI runs all gates; smoke.sh green against stg.
After Phase 8: Soak baseline captured; cutover executed; 48 h fallback window logged in ADR.

Cloudflare documentation index

Consult these before executing any phase. The list below mirrors the canonical reference memory at reference_cloudflare_documentation.md.

Topic	URL
Workers	https://developers.cloudflare.com/workers/
Wrangler CI/CD	https://developers.cloudflare.com/workers/wrangler/ci-cd/
Versions and gradual deployments	https://developers.cloudflare.com/workers/configuration/versions-and-deployments/gradual-deployments/
Hyperdrive	https://developers.cloudflare.com/hyperdrive/
Hyperdrive + private DB	https://developers.cloudflare.com/hyperdrive/configuration/connect-to-private-database/
Tunnel	https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/
Tunnel TCP routing	https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/use-cases/tcp/
Durable Objects	https://developers.cloudflare.com/durable-objects/
DO alarms	https://developers.cloudflare.com/durable-objects/api/alarms/
KV	https://developers.cloudflare.com/kv/
Service bindings	https://developers.cloudflare.com/workers/runtime-apis/bindings/service-bindings/
Workers Logs / Tail	https://developers.cloudflare.com/workers/observability/logs/
Logpush	https://developers.cloudflare.com/logs/logpush/
Workers Analytics Engine	https://developers.cloudflare.com/analytics/analytics-engine/
Secrets Store	https://developers.cloudflare.com/secrets-store/
Terraform provider	https://developers.cloudflare.com/terraform/

Open items / follow-on ADRs

decisions/2026-MM-DD-tunnel-topology.md — single vs dual-AZ per env (Phase 1 deliverable).
decisions/2026-MM-DD-rate-limit-fallback.md — confirm whether RATE_LIMIT_KV fallback is wired or held in reserve (Phase 5).
decisions/2026-MM-DD-cutover.md — actual cutover timestamps, soak metrics, fallback decommission timestamp (Phase 8).
decisions/2026-MM-DD-terraform-import.md — when to import Hyperdrive + Tunnel into Terraform (deferred from Phase 1; revisit after first prod deploy).

06 — Development Implementation Steps

Document scope

How to use this chapter

Pre-flight prerequisites (one-time)

Phase 0 — Pre-migration cleanup (existing PHP repo)

Steps

Validation

Six-pillar callout — Phase 0

Phase 1 — Cloudflare Tunnel infrastructure

1.0 — Architectural decisions (locked)

1.1 — Build the cloudflared AMI in EC2 Image Builder

1.1.1 — Prebuilt components and Terraform reference (do not write from scratch)

1.2 — Provision tunnels in Cloudflare

1.3 — Author the per-env tunnel config.yml

1.4 — Bootstrap script (baked into the AMI)

1.5 — Launch Template + ASG (Terraform module)

1.6 — Wire AMI auto-promotion

Validation

Deliverables

Six-pillar callout — Phase 1

Tunnel maintenance and updates (cross-cutting, applies post-Phase 1)

Routine: weekly Linux + cloudflared update (zero-downtime)

Unplanned: emergency cloudflared CVE

Unplanned: cloudflared running but tunnel reports unhealthy

Unplanned: AMI rollback

Routine: rotating tunnel credentials

Routine: changing tunnel ingress

Capacity scaling

Phase 2 — Hyperdrive provisioning (6 configurations)

Steps

Validation

Six-pillar callout — Phase 2

Phase 3 — Auth helper Worker (adv-svc-auth-helper)

Steps

Six-pillar callout — Phase 3

Phase 4 — Public API Worker scaffolding

Steps

Deliverables

Six-pillar callout — Phase 4

Phase 5 — Core middleware (PAUSE checkpoint)

Steps

Pause checkpoint

Six-pillar callout — Phase 5

Phase 6 — Handler migration (PAUSE checkpoint)

Order of work (locked)

Per-handler checklist

Per-commit gate

Pause checkpoint

Six-pillar callout — Phase 6

Phase 7 — CI / CD pipeline and smoke tests

Steps

Six-pillar callout — Phase 7

Phase 8 — Staging soak and DNS cutover

Soak protocol (staging)

DNS cutover (production)

Rollback (any time during cutover)

Six-pillar callout — Phase 8

Cross-cutting traceability

Verification gates (must pass before next phase begins)

Cloudflare documentation index

Open items / follow-on ADRs

1.3 — Author the per-env tunnel `config.yml`

Phase 3 — Auth helper Worker (`adv-svc-auth-helper`)