Public API Cloudflare Migration — As-Built Runbook

Document type: Implementation history + reference runbook Last updated: 2026-04-29 Status: Phases 1–7 complete for dev. Public API Worker live at https://api.adventive.dev. Phase 8 (cutover) and stg/prd extension remain.

This document captures the actual sequence used to deploy Phase 1 of the Public API Cloudflare migration, including the gotchas hit and the fixes applied. It exists so that if everything had to be rebuilt — fresh AWS account, fresh Cloudflare account — a competent engineer could follow this end-to-end and arrive at the same working state.

It is a companion to 06-implementation-steps.md (the planning document). Where 06 describes intent, this document describes what was actually executed and observed.

1. Architecture summary

The deployment is composed of four cooperating systems:

EC2 Image Builder produces a hardened Ubuntu 22.04 AMI (adv-cflared-*) that has cloudflared pre-installed plus a boot-time bootstrap script. The pipeline runs weekly, publishes the AMI ID to SSM Parameter Store, and serves as the source of truth for the tunnel fleet.
Cloudflare Tunnel is provisioned per environment via Terraform (currently dev only). The tunnel UUID, secret, and rendered config are stored in AWS Secrets Manager at /adventive/cloudflared/<env> in the JSON shape the AMI’s bootstrap script expects.
AWS Secrets Manager holds the tunnel credentials and config. The bootstrap script on every booting AMI instance reads from a path keyed by the instance’s adv:env IMDS tag, decoupling the AMI from any specific environment.
EC2 (Phase 1.5 — ASG) runs the cloudflared daemon. Each instance, on boot, resolves its adv:env tag, fetches the matching secret, writes the config + credentials to /etc/cloudflared/, then cloudflared.service starts and registers with Cloudflare’s edge.

The AMI is environment-agnostic. The runtime instance’s tag determines which env it joins. This is the standing pattern for all Adventive Cloudflare Tunnel deployments.

2. Reference data (current state)

Resource	Value
AWS account ID	`201161205241`
AWS region	`us-east-1`
Cloudflare account ID	`46a873457665355ba02a85e61d7200a7`
Cloudflare account name	Adventive Tech, Inc.
Dev DNS zone	`adventive.dev`
Dev zone ID (Cloudflare)	`c737bae5b535c0ec3daa72c809721e7d`
VPC	`vpc-e1636084`
Build subnet	`subnet-2ef28677`
Build security group	`sg-0968b424c847f142c` (egress: TCP 80 + 443)
Artifacts S3 bucket	`adventive-platform-artifacts`
Image Builder pipeline	`arn:aws:imagebuilder:us-east-1:201161205241:image-pipeline/adv-cflared-pipeline`
Current recipe version	`adv-cflared-recipe/1.0.2`
Current cflared component	`adv-cflared-cloudflared/1.0.2`
Current bootstrap component	`adv-cflared-bootstrap-script/1.0.0`
Current AMI	`ami-02f2f244d3dd56cb4` (built 2026-04-29 07:13 UTC)
SSM parameter	`/adventive/cloudflared/ami-id-latest`
Dev tunnel UUID	`cb0c2ef4-3426-4968-8175-a3a053ecc5ea`
Dev tunnel name	`adv-cflared-dev`
Dev tunnel hostname	`tunnel.adventive.dev` (CNAME, proxied)
Dev tunnel secret	`arn:aws:secretsmanager:us-east-1:201161205241:secret:/adventive/cloudflared/dev-7lUBaV`
Lambda function	`adv-cflared-publish-ami`
EventBridge rule	`adv-cflared-pipeline-success`
Dev ASG	`adv-cflared-dev` (1× t3.micro, single-AZ in `subnet-2ef28677`)
Dev runtime IAM role	`adv-cflared-runtime-dev`
Dev runtime SG	`adv-cflared-runtime-dev` (egress: TCP+UDP 7844, TCP 443; no inbound)
Dev launch template	`adv-cflared-dev`
Existing Warp tunnel (do not touch)	`cf-tunnel.us-east1.aws.adventive.com` (id `604363f7-e1b9-4dc7-a485-4a3df8b2f751`)
Dev RDS instance	`development.coi6rcntfbgg.us-east-1.rds.amazonaws.com:3306` (us-east-1a, MySQL)
Dev RDS security group	`adv-development-database` (`sg-0b940e4bfc388f9be`)
Dev databases on that instance	`console`, `aggregate`, `billing`, `vast`
Console Hyperdrive	`adv-svc-public-api-console-dev` (id `059838c4abb64a92a4aece2a6a533a29`)
Aggregate Hyperdrive	`adv-svc-public-api-aggregate-dev` (id `c1b18833b07347daa77b56a2d19ef508`)
Dev Access service token	`adv-hyperdrive-dev` (client_id `99fc0a3d3df90300224e08b37fd04f5b.access`)
Console DB hostname	`db-console-dev.adventive.dev` (CNAME, proxied)
Aggregate DB hostname	`db-aggregate-dev.adventive.dev` (CNAME, proxied)
Auth helper Worker	`adv-svc-auth-helper-dev`
Auth helper URL	`https://adv-svc-auth-helper-dev.adventive.workers.dev`
Auth helper repo	`~/Repositories/GitHub/Adventive/adventive-auth-helper-worker/`
Auth helper KV namespace	`kv-adv-svc-auth-helper-cache-dev` (id `22db913488484a46a7f60ebb9b8c1704`)
Cloudflare workers.dev subdomain	`adventive.workers.dev`
Public API Worker	`adv-public-api-dev` (live at `https://api.adventive.dev`)
Public API repo	`~/Repositories/GitHub/Adventive/adventive-public-api-worker/` (GitHub: `adventive/adventive-public-api-worker`, private)
Public API Postman collection	`postman/adventive-public-api.json` (28 requests, dev/stg/prd envs)

3. Prerequisites

Tools

AWS CLI v2 (aws --version)
Terraform 1.6+
jq
dig
curl
brew install --cask session-manager-plugin (for shell access to instances)

AWS access

Authenticated AWS CLI with permission to create EC2, IAM, Secrets Manager, Image Builder, Lambda, EventBridge, SSM, S3 resources

Cloudflare access

Account-owned API token at https://dash.cloudflare.com → Manage Account → Account API Tokens, with policies:
- Entire Account scope: Cloudflare One Connector: cloudflared → Edit
- Specified Domains scope (adventive.dev only): DNS → Edit, Zone → Read
Exported as CLOUDFLARE_API_TOKEN in any shell that runs Terraform on the cloudflare-tunnels module

Git

Repo: Adventive/adventive-platform-infra
Local clone path: ~/Repositories/GitHub/Adventive/adventive-platform-infra/

4. Phase 1.1 — AMI build pipeline

4.1 Network preparation

The build instance runs in a VPC that already exists. We did not create a new VPC. The script scripts/setup-builder-network.sh is idempotent and handles:

Verifying the VPC exists and resolving its CIDR
Creating (or finding existing) security group adv-imagebuilder-builder in vpc-e1636084 with egress on TCP 443 (HTTPS — pkg.cloudflare.com, S3, AWS APIs) and TCP 80 (apt mirror redirects). No inbound rules.
Listing all subnets in the VPC, classifying each as public/private/isolated by inspecting their route tables, and printing the values for infra/imagebuilder/terraform.tfvars

To reproduce:

cd ~/Repositories/GitHub/Adventive/adventive-platform-infra/scripts
VPC_ID=vpc-e1636084 bash setup-builder-network.sh

The script prints two tfvars-formatted lines: build_subnet_id and build_security_group_ids.

4.2 S3 artifacts bucket

Created manually before terraform apply:

aws s3 mb s3://adventive-platform-artifacts --region us-east-1

This bucket holds:

cflared/cflared-bootstrap.sh — the runtime bootstrap script (uploaded once before first build)
imagebuilder-logs/cflared/... — per-build logs (written by the build instance during each pipeline run)

The bootstrap script lives at scripts/cflared-bootstrap.sh in the repo. Upload to S3 with:

aws s3 cp scripts/cflared-bootstrap.sh s3://adventive-platform-artifacts/cflared/cflared-bootstrap.sh

If the script content changes, re-upload to the same key. The Image Builder component adv-cflared-bootstrap-script will pull the latest version on each build.

4.3 Image Builder components (YAML)

Two custom components live in infra/imagebuilder/components/:

cflared.yml — installs cloudflared from Cloudflare’s APT repo and drops a primary systemd unit at /etc/systemd/system/cloudflared.service. Critical: the cloudflared deb package does not ship a systemd unit file. We write our own with Type=simple, --no-autoupdate, and Requires=cflared-bootstrap.service so cloudflared cannot start before the bootstrap finishes writing config.

bootstrap.yml — drops the cflared-bootstrap.sh script from S3 to /usr/local/sbin/, installs jq, and creates the cflared-bootstrap.service systemd unit (Type=oneshot, runs before cloudflared.service).

Component versioning is immutable per version. Any YAML edit requires bumping the version attribute in components.tf AND the recipe version in pipeline.tf. Old versions accumulate as Cloudflare retains build history; the lifecycle pattern is create_before_destroy = true so plan/apply doesn’t trip on AWS rejecting deletes that have downstream image references.

4.4 Terraform module — `infra/imagebuilder/`

Files:

versions.tf — Terraform 1.6+, AWS provider ~> 5.50, archive provider, default tags (adv:owner, adv:project, adv:repo, adv:module)
variables.tf — region, artifacts_bucket, build_subnet_id, build_security_group_ids, ssm_parameter_name, schedule_expression, log_retention_days
locals.tf — Ubuntu 22.04 AMI lookup (Canonical 099720109477), AWS-managed Image Builder component data sources
iam.tf — builder instance role + instance profile (3 managed policies + inline S3 read/write for cflared/* and imagebuilder-logs/*), publish-ami Lambda role
components.tf — two aws_imagebuilder_component resources sourcing from components/*.yml, both with lifecycle { create_before_destroy = true }
pipeline.tf — recipe (with create_before_destroy), infrastructure config (t3.small, references var.build_subnet_id), distribution config, pipeline with cron cron(0 6 ? * TUE *)
ami_publish.tf — SSM parameter (ignore_changes = [value, description]), Lambda function (Python 3.12), EventBridge rule + target + Lambda permission
outputs.tf — pipeline_arn, recipe_arn, ssm_parameter_name, parent_ami_id, etc.
lambda/publish_ami_to_ssm.py — Lambda that handles EC2 Image Builder Image State Change events, extracts AMI ID, writes to SSM
terraform.tfvars — actual values (gitignored)

4.5 Apply procedure

cd ~/Repositories/GitHub/Adventive/adventive-platform-infra/infra/imagebuilder

terraform init
terraform plan -out=tfplan
terraform apply tfplan

First apply creates ~16 resources. None touch existing AWS resources outside the new adv-cflared-* namespace and the adventive-platform-artifacts bucket policies on the builder role.

4.6 Triggering a build

Pipeline runs on its weekly schedule, but you can fire one manually:

aws imagebuilder start-image-pipeline-execution \
  --image-pipeline-arn $(terraform output -raw pipeline_arn)

Build duration: 6–10 minutes total (BUILD workflow ~6 min, TEST workflow ~3 min).

4.7 Validation

After a successful build:

# Image is AVAILABLE and has an AMI ID
aws imagebuilder get-image \
  --image-build-version-arn arn:aws:imagebuilder:us-east-1:201161205241:image/adv-cflared-recipe/1.0.2/<N> \
  --query 'image.{state:state.status,ami:outputResources.amis[0].image}' --output table

# SSM parameter holds that AMI ID
aws ssm get-parameter --name /adventive/cloudflared/ami-id-latest \
  --query 'Parameter.{value:Value,modified:LastModifiedDate}' --output table

If SSM doesn’t update automatically (see deficiency #21), publish manually:

LATEST_AMI=$(aws imagebuilder get-image \
  --image-build-version-arn arn:aws:imagebuilder:us-east-1:201161205241:image/adv-cflared-recipe/1.0.2/<N> \
  --query 'image.outputResources.amis[0].image' --output text)
aws ssm put-parameter --name /adventive/cloudflared/ami-id-latest \
  --value "$LATEST_AMI" --type String --overwrite \
  --description "adv-cflared AMI manually published"

4.8 Gotchas encountered (so you don’t repeat them)

one() rejected multiple ARNs. AWS-managed Image Builder components (update-linux, aws-cli-version-2-linux, etc.) accumulate versions over time. Using Terraform’s one() function on the aws_imagebuilder_components.X.arns set fails when two or more versions exist. Fix: reverse(sort(tolist(...)))[0] to take the highest-version ARN deterministically.
Two managed components don’t exist with the names I guessed. amazon-ssm-agent-linux and cis-level-1-ubuntu-22-04-lts returned empty result sets in us-east-1. We dropped them from the recipe; Ubuntu 22.04 ships SSM agent via snap, and CIS hardening was nice-to-have. Confirm exact names via aws imagebuilder list-components --owner Amazon if you want to add them back.
CreateFile does not auto-create parent dirs in Image Builder workflows. Initially set up a systemd override at /etc/systemd/system/cloudflared.service.d/override.conf, but the .d parent dir was missing on a fresh install. Then realized the deeper issue: cloudflared deb has no base unit file to override. Fix: drop a primary unit at /etc/systemd/system/cloudflared.service instead.
Image Builder recipes/components are immutable per version, and old versions can’t be deleted while images reference them. Default Terraform behavior is destroy-then-create on a version bump, which fails when prior failed/successful builds hold references. Fix: lifecycle { create_before_destroy = true } on both aws_imagebuilder_component.cflared and aws_imagebuilder_image_recipe.cflared. Old versions accumulate harmlessly in AWS as orphaned history.
Build instance’s IAM role needs S3 PutObject for the imagebuilder-logs/* prefix — the Image Builder TOE (Task Orchestrator and Executor) uploads per-step logs there. We initially granted only Read on cflared/*. Symptom: first build failed with “User is not authorized to perform: s3:PutObject” trying to upload D0__update-linux__1.0.2_1.yml.
EventBridge rule’s source-pipeline-arn filter doesn’t match Image State Change events. That field is on a different event type (Pipeline Execution Status Change) which uses different statuses (COMPLETED/FAILED). Image State Change carries image-arn, not source-pipeline-arn. Fix: prefix-match on image-arn instead. Note: even after this fix, builds 1.0.2/2 and 1.0.2/3 did not trigger the Lambda — the actual event field shape differs from documentation. Tracked as task #21.

5. Phase 1.2 — Cloudflare Tunnel + Secrets Manager

5.1 API token creation

Account-owned token via dashboard at Manage Account → Account API Tokens → Create Token. Name: adv-platform-infra-tunnels-dev. Two policies on the same token:

Policy scope	Permission group	Access
Entire Account	Cloudflare One Connector: cloudflared	Edit
Specified Domains → adventive.dev	DNS	Edit
Specified Domains → adventive.dev	Zone	Read

Critical: the tunnel permission lives at account scope. If you only set up “Specified Domains” scope, the tunnel permission group is invisible — that’s why scope matters here.

Verify with:

export CLOUDFLARE_API_TOKEN='<the cfat_... token>'

curl -sS "https://api.cloudflare.com/client/v4/accounts/46a873457665355ba02a85e61d7200a7/cfd_tunnel" \
  -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  | jq '{success, count: (.result | length), errors}'
# Expected: { "success": true, "count": <N>, "errors": [] }

curl -sS "https://api.cloudflare.com/client/v4/zones/c737bae5b535c0ec3daa72c809721e7d/dns_records?per_page=1" \
  -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  | jq '{success, errors}'
# Expected: { "success": true, "errors": [] }

5.2 Terraform module — `infra/cloudflare-tunnels/`

Files:

versions.tf — Terraform 1.6+, cloudflare ~> 4.50, aws ~> 5.50, random ~> 3.6
variables.tf — cloudflare_account_id, environments map (with zone_id + apex per env), tunnel_subdomain (default tunnel), tunnel_name_prefix (default adv-cflared), secret_path_prefix (default /adventive/cloudflared)
tunnels.tf — for each env: random_bytes (32-byte secret), cloudflare_zero_trust_tunnel_cloudflared, cloudflare_record (CNAME, proxied), aws_secretsmanager_secret, aws_secretsmanager_secret_version
outputs.tf — tunnel_ids, tunnel_names, tunnel_hostnames, secret_arns, secret_names
terraform.tfvars — currently dev only; adding stg/prd later is a tfvars edit, no code change

The secret payload structure exactly matches what cflared-bootstrap.sh expects:

{
  "config_yaml": "tunnel: <uuid>\ncredentials-file: /etc/cloudflared/<uuid>.json\nno-autoupdate: true\nmetrics: 0.0.0.0:2000\n\ningress:\n  - hostname: tunnel.<apex>\n    service: http_status:503\n  - service: http_status:404\n",
  "credentials_json": "{\"AccountTag\":\"...\",\"TunnelSecret\":\"...\",\"TunnelID\":\"...\"}"
}

The ingress uses a 503 placeholder until real services are added. Replace via secret version bump when there’s an origin to route to.

5.3 Apply procedure

cd ~/Repositories/GitHub/Adventive/adventive-platform-infra/infra/cloudflare-tunnels

cp terraform.tfvars.example terraform.tfvars   # then fill in real IDs
export CLOUDFLARE_API_TOKEN='<cfat_... token>'

terraform init
terraform plan -out=tfplan
terraform apply tfplan

Five resources per env. None touch existing tunnels (the pre-existing Warp tunnel cf-tunnel.us-east1.aws.adventive.com is not affected because we create with a different name).

5.4 Validation

# Tunnel registered
TUNNEL_ID=$(terraform output -json tunnel_ids | jq -r '.dev')
curl -sS "https://api.cloudflare.com/client/v4/accounts/46a873457665355ba02a85e61d7200a7/cfd_tunnel/$TUNNEL_ID" \
  -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" | jq '.result | {id, name, status, deleted_at}'

# DNS resolves through Cloudflare
dig +short tunnel.adventive.dev

# Secret payload self-consistent
aws secretsmanager get-secret-value --secret-id /adventive/cloudflared/dev \
  --query SecretString --output text \
  | jq '{tunnel_id_in_creds: (.credentials_json | fromjson | .TunnelID), config_starts_with_tunnel: (.config_yaml | startswith("tunnel: "))}'

5.5 Gotchas encountered

Cloudflare provider v4 schema field is secret, not tunnel_secret. Initially used tunnel_secret = ... per a hallucinated argument name; provider v4.52.7 rejects with “argument is not expected here”. Fix: secret = random_bytes.tunnel_secret[each.key].base64.
/user/tokens/verify endpoint doesn’t validate account-owned tokens — that endpoint is user-token specific. Account-owned tokens (the cfat_* prefix) authenticate fine on real API endpoints but return success: false on the user-token verify endpoint. Don’t use that endpoint as a token-validity test for account-owned tokens.
The pre-existing Warp tunnel uses the same Cloudflare account. Be careful not to delete it during cleanup. Filter by name (adv-cflared-*) when scripting tunnel operations.

6. End-to-end smoke test (Phase 1.1 + 1.2 validation)

Manually launched a single EC2 instance from the AMI to validate the entire chain works. This is one-shot, not part of the persistent infrastructure — Phase 1.5 replaces it with a proper ASG.

6.1 IAM role + instance profile (manual)

ROLE_NAME="adv-cflared-tunnel-smoketest"
SECRET_ARN_PATTERN="arn:aws:secretsmanager:us-east-1:201161205241:secret:/adventive/cloudflared/dev-*"

aws iam create-role --role-name "$ROLE_NAME" \
  --assume-role-policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"Service":"ec2.amazonaws.com"},"Action":"sts:AssumeRole"}]}'

aws iam attach-role-policy --role-name "$ROLE_NAME" \
  --policy-arn arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore

aws iam put-role-policy --role-name "$ROLE_NAME" --policy-name read-cflared-dev-secret \
  --policy-document "{\"Version\":\"2012-10-17\",\"Statement\":[{\"Effect\":\"Allow\",\"Action\":\"secretsmanager:GetSecretValue\",\"Resource\":\"$SECRET_ARN_PATTERN\"}]}"

aws iam create-instance-profile --instance-profile-name "$ROLE_NAME"
aws iam add-role-to-instance-profile --instance-profile-name "$ROLE_NAME" --role-name "$ROLE_NAME"
sleep 10  # IAM eventual consistency

6.2 Launch parameters

aws ec2 run-instances \
  --image-id $(aws ssm get-parameter --name /adventive/cloudflared/ami-id-latest --query 'Parameter.Value' --output text) \
  --instance-type t3.micro \
  --subnet-id subnet-2ef28677 \
  --security-group-ids sg-0968b424c847f142c \
  --iam-instance-profile Name=adv-cflared-tunnel-smoketest \
  --metadata-options 'HttpTokens=required,HttpEndpoint=enabled,InstanceMetadataTags=enabled' \
  --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=adv-cflared-tunnel-smoketest},{Key=adv:env,Value=dev},{Key=adv:project,Value=cloudflare-tunnel}]'

Two non-obvious requirements:

InstanceMetadataTags=enabled — without this, the bootstrap script can’t read adv:env from IMDS, can’t resolve which secret to fetch, and the tunnel never starts. This is per-instance, not part of the AMI.
The adv:env tag must be present at launch — the bootstrap script reads it via IMDS; missing tag → bootstrap exits 1.

6.3 Expected boot sequence

The bootstrap completes in ~25 seconds after the instance reaches running:

cloud-init finishes
cflared-bootstrap.service starts, fetches IMDS tag, fetches Secrets Manager value, writes /etc/cloudflared/config.yml and /etc/cloudflared/<tunnel-uuid>.json
cflared-bootstrap.service finishes (Type=oneshot, RemainAfterExit=yes)
cloudflared.service starts (Requires=cflared-bootstrap.service ensures order)
cloudflared dials Cloudflare edge on TCP 7844 (or UDP 7844 / QUIC), authenticates with the tunnel ID + secret, registers as a connector
Cloudflare’s tunnel status flips from inactive to healthy

6.4 Validation

# Tunnel goes healthy within ~60s of instance running
INSTANCE_ID=<from run-instances output>
for i in {1..15}; do
  STATUS=$(curl -sS "https://api.cloudflare.com/client/v4/accounts/46a873457665355ba02a85e61d7200a7/cfd_tunnel/cb0c2ef4-3426-4968-8175-a3a053ecc5ea" \
    -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" | jq -r '.result.status')
  echo "[$(date +%H:%M:%S)] tunnel status: $STATUS"
  [ "$STATUS" = "healthy" ] && break
  sleep 10
done

6.5 Critical SG gap discovered

The build SG (sg-0968b424c847f142c) only opens TCP 80 + 443. Cloudflare Tunnel uses TCP+UDP 7844 to dial the edge. Without that egress, cloudflared timed out repeatedly:

ERR Unable to establish connection with Cloudflare edge
error="DialContext error: dial tcp 198.41.200.13:7844: i/o timeout"

Fix during smoke test:

aws ec2 authorize-security-group-egress --group-id sg-0968b424c847f142c \
  --ip-permissions \
    'IpProtocol=tcp,FromPort=7844,ToPort=7844,IpRanges=[{CidrIp=0.0.0.0/0,Description="cloudflared edge TCP"}]' \
    'IpProtocol=udp,FromPort=7844,ToPort=7844,IpRanges=[{CidrIp=0.0.0.0/0,Description="cloudflared edge UDP/QUIC"}]'

This rule was reverted after the smoke test. The runtime SG in Phase 1.5 owns the 7844 egress rule — the build SG only needs 80/443 for AMI builds. Two distinct SGs for two distinct workloads.

6.6 Diagnostics via SSM (no shell plugin needed)

If the tunnel fails to register, fetch service journal logs via aws ssm send-command (works without session-manager-plugin):

CMD_ID=$(aws ssm send-command \
  --instance-ids "$INSTANCE_ID" \
  --document-name AWS-RunShellScript \
  --parameters 'commands=[
    "echo === cflared-bootstrap.service ===",
    "journalctl -u cflared-bootstrap.service --no-pager -n 100",
    "echo === cloudflared.service ===",
    "journalctl -u cloudflared.service --no-pager -n 60",
    "ls -la /etc/cloudflared/"
  ]' --query 'Command.CommandId' --output text)
sleep 8
aws ssm get-command-invocation --command-id "$CMD_ID" --instance-id "$INSTANCE_ID" \
  --query '{Status: Status, StdOut: StandardOutputContent}' --output text

6.7 Smoke test teardown

INSTANCE_ID=<the smoke test instance ID>
ROLE=adv-cflared-tunnel-smoketest

aws ec2 terminate-instances --instance-ids $INSTANCE_ID
aws ec2 wait instance-terminated --instance-ids $INSTANCE_ID

aws iam remove-role-from-instance-profile --instance-profile-name $ROLE --role-name $ROLE
aws iam delete-instance-profile --instance-profile-name $ROLE
aws iam detach-role-policy --role-name $ROLE \
  --policy-arn arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
aws iam delete-role-policy --role-name $ROLE --policy-name read-cflared-dev-secret
aws iam delete-role --role-name $ROLE

aws ec2 revoke-security-group-egress --group-id sg-0968b424c847f142c \
  --ip-permissions \
    'IpProtocol=tcp,FromPort=7844,ToPort=7844,IpRanges=[{CidrIp=0.0.0.0/0}]' \
    'IpProtocol=udp,FromPort=7844,ToPort=7844,IpRanges=[{CidrIp=0.0.0.0/0}]'

7. Operational procedures

7.1 Bumping a Image Builder component

Edit the YAML in infra/imagebuilder/components/. Then in the same commit:

Bump the matching component’s version in components.tf (e.g., 1.0.2 → 1.0.3)
Bump the recipe’s version in pipeline.tf to match (recipes are immutable per version)
terraform plan -out=tfplan && terraform apply tfplan

The lifecycle { create_before_destroy = true } on both resources handles the version transition cleanly. Old versions remain in AWS as orphan history (free).

7.2 Manually publishing an AMI to SSM

Until the EventBridge auto-publish bug is fixed (task #21), publish manually after each build:

LATEST_AMI=$(aws imagebuilder get-image \
  --image-build-version-arn <image-build-version-arn from list-image-build-versions> \
  --query 'image.outputResources.amis[0].image' --output text)

aws ssm put-parameter --name /adventive/cloudflared/ami-id-latest \
  --value "$LATEST_AMI" --type String --overwrite \
  --description "adv-cflared AMI manually published"

The Lambda is wired correctly and works on direct invocation:

aws lambda invoke --function-name adv-cflared-publish-ami \
  --cli-binary-format raw-in-base64-out \
  --payload '{"detail":{"image-arn":"<image-build-version-arn>","state":{"status":"AVAILABLE"}}}' \
  /tmp/lambda-out.json
cat /tmp/lambda-out.json

7.3 Cleaning up old AMIs

Until the lifecycle policy lands (task #20), periodic manual sweep:

KEEP_AMI=$(aws ssm get-parameter --name /adventive/cloudflared/ami-id-latest --query 'Parameter.Value' --output text)
for ami in $(aws ec2 describe-images --owners self --filters 'Name=tag:adv:image,Values=cflared' --query 'Images[].ImageId' --output text); do
  if [ "$ami" = "$KEEP_AMI" ]; then continue; fi
  SNAPS=$(aws ec2 describe-images --image-ids "$ami" --query 'Images[0].BlockDeviceMappings[?Ebs].Ebs.SnapshotId' --output text)
  aws ec2 deregister-image --image-id "$ami"
  for snap in $SNAPS; do aws ec2 delete-snapshot --snapshot-id "$snap"; done
done

7.4 Adding a new environment to the tunnels module

In infra/cloudflare-tunnels/terraform.tfvars:

environments = {
  dev = { zone_id = "c737bae5b535c0ec3daa72c809721e7d", apex = "adventive.dev" }
  stg = { zone_id = "<adventivestg.com zone ID>", apex = "adventivestg.com" }
}

terraform plan should show 5 new resources (1 random_bytes + 1 tunnel + 1 CNAME + 1 secret + 1 secret_version). terraform apply tfplan.

The API token must have permissions on the new zone. If using the dev-only token, create a new token with broader scope or extend the existing one.

8. Known deficiencies (tracked as backlog)

ID	Description	Impact	Mitigation
#20	No Image Builder lifecycle policy — old AMIs accumulate	Cost: ~$1/month per orphan AMI’s snapshot	Manual sweep script in §7.3; full fix is `aws_imagebuilder_lifecycle_policy`
#21	EventBridge → Lambda auto-publish doesn’t fire on Image State Change events	New AMIs don’t update SSM automatically	Manual `aws ssm put-parameter` per §7.2; root cause is event field shape mismatch
Token type	Cloudflare auth uses user-owned token, not account-owned	Token tied to user account; dies if user leaves	Migrate to account-owned token when its UI properly exposes Tunnel permission
Local TF state	Both modules use local state	State files contain tunnel secrets; state lives on developer’s laptop	Mandatory before stg/prd: encrypted S3 backend + DynamoDB locking; backend stub already in `versions.tf`
Build SG dual-purpose	(resolved in Phase 1.5 design)	—	Phase 1.5 ASG module owns the runtime SG with 7844 egress

9. Phase 1.5 — Per-env ASG module

Module location: infra/cflared-asg/. Replaces the manual smoke test with a declarative, self-healing fleet.

9.1 Files

versions.tf — Terraform 1.6+, AWS provider ~> 5.50, default tags, S3 backend stub (commented)
variables.tf — aws_region, ssm_ami_parameter, secret_path_prefix, environments map (per-env: subnet_ids, instance_type, desired_capacity, min_healthy_percentage, health_check_grace_period)
iam.tf — per-env runtime role (EC2 trust) + AmazonSSMManagedInstanceCore + inline secretsmanager:GetSecretValue on /adventive/cloudflared/<env>-* + instance profile
security_group.tf — per-env runtime SG (no inbound) + 3 egress rules: TCP 7844, UDP 7844, TCP 443
launch_template.tf — image_id from SSM (data.aws_ssm_parameter.ami_id.insecure_value), IMDSv2 required, instance_metadata_tags=enabled, instance/volume tag specifications including adv:env
asg.tf — ASG with rolling instance refresh policy, min_healthy_percentage=100 for dev (launch new before terminating old), references LT via latest_version
outputs.tf — ami_id_used, asg_names, launch_template_ids, launch_template_versions, runtime_role_arns, runtime_security_group_ids, instance_profile_names
terraform.tfvars — dev only: subnet_ids = ["subnet-2ef28677"], instance_type = "t3.micro", desired_capacity = 1

9.2 Apply procedure

cd ~/Repositories/GitHub/Adventive/adventive-platform-infra/infra/cflared-asg

terraform init
terraform plan -out=tfplan
terraform apply tfplan

Creates 10 resources for dev: 1 IAM role + 1 instance profile + 1 inline policy + 1 managed policy attachment + 1 SG + 3 SG egress rules + 1 LT + 1 ASG.

9.3 As-built validation (2026-04-29)

ASG: adv-cflared-dev
AMI in use: ami-02f2f244d3dd56cb4
Instance: i-060419d0a98c98422 (Healthy / InService)

Tunnel status progression after terraform apply:

[12:57:14] down
[12:57:24] down
[12:57:34] down
[12:57:44] down
[12:57:55] down
[12:58:05] healthy   ← registered

~60 seconds from instance launch to tunnel registration — comparable to the manual smoke test.

down (vs inactive) is the right state when the tunnel has prior connection history. inactive would mean Cloudflare has never seen a connector; down means it’s seen one before but it’s currently absent.

9.4 Gotchas encountered

data.aws_ssm_parameter.value is sensitive by default. SSM parameters can hold SecureString secrets, so the AWS provider marks .value sensitive even when the parameter is type=String. This makes the image_id field on the launch template render as (sensitive value) and breaks any output that references it. Fix: use data.aws_ssm_parameter.ami_id.insecure_value (sibling attribute, only valid for type=String, returns the same content without the sensitive flag). Applied in both launch_template.tf and outputs.tf.
VPC has no private subnets. Survey of vpc-e1636084 showed all 6 subnets are public (IGW route, no NAT). Pragmatic decision: use the public subnet-2ef28677 for dev with strict SG (no inbound). Tracked as deficiency for prd planning. See §8 and project_phase15_subnet_deficiency.md in memory.
min_healthy_percentage=100 for desired=1 ASG is the right setting for dev. Forces ASG to launch the replacement first, wait healthy, then terminate the old instance — briefly running 2 instances during refresh, achieving zero downtime. With =50 on a single-instance ASG, AWS rounds 50% of 1 down to 0 and may terminate before launching, causing brief downtime. Prd with desired=2 will use =50 correctly (1 always healthy).

9.5 Operational implications

Rolling AMI updates: when Image Builder publishes a new AMI to SSM, run terraform plan/apply in this module. Plan shows LT image_id change → new LT version → ASG instance refresh → rolling replacement.
Self-healing: if the instance crashes or fails health checks, ASG launches a replacement automatically. No human action needed.
Tunnel state continuity: Cloudflare tunnel UUIDs and secrets persist across instance replacements (they live in the cloudflare-tunnels module’s state). Any number of cloudflared connectors can register against the same tunnel UUID — instance churn doesn’t affect the tunnel’s identity in Cloudflare’s view.

10. Phase 2 — Hyperdrive provisioning

Module location: infra/cloudflare-hyperdrive/. Creates Cloudflare Hyperdrive configs that the Public API Worker will bind to for reaching the dev RDS through the existing tunnel.

10.1 Architecture

Worker → Hyperdrive → Cloudflare edge → Access (service token auth)
       → Tunnel → cloudflared on adv-cflared-dev (ASG)
       → TCP 3306 → RDS MySQL (development.coi6rcntfbgg…)

Hyperdrive resolves the public hostname (db-<db>-dev.adventive.dev) through Cloudflare; the Access application protecting that hostname requires a service-token credential which Hyperdrive presents on every connection; cloudflared then proxies the TCP stream through the tunnel to the actual RDS endpoint.

Two Hyperdrives created:

adv-svc-public-api-console-dev → MySQL database console
adv-svc-public-api-aggregate-dev → MySQL database aggregate

10.2 Module dependencies (changes made to upstream modules)

infra/cloudflare-tunnels/ was refactored to drive ingress + DNS records from a per-env list in tfvars rather than a single hardcoded ingress. Each entry creates a CNAME under the env’s apex zone and adds an ingress stanza to the tunnel’s config_yaml. Existing single-tunnel resource keys (["dev"]) became compound keys (["dev/tunnel"]); the resource refresh during plan handled this without a destroy.

infra/cflared-asg/ added one egress rule on the runtime SG: TCP 3306 to 0.0.0.0/0. Without it cloudflared can dial the RDS hostname but the runtime SG drops the SYN.

10.3 New module: `infra/cloudflare-hyperdrive/`

Files:

versions.tf — Cloudflare ~> 4.50, AWS ~> 5.50, default tags
variables.tf — environments map (non-sensitive structure: rds_sg_id, rds_host, databases) + database_passwords (sensitive; keyed [env][db]). Split because Terraform forbids sensitive values in for_each keys.
locals.tf — flattens (env, db) pairs for uniform iteration; data source looking up the cflared runtime SG by name.
secrets.tf — Secrets Manager secret + version per (env, db). JSON shape: {username, password, host, port, database}. Hyperdrive doesn’t read from Secrets Manager directly; this is for any other tooling that wants the canonical home for these creds.
access.tf — one service token per env + one Access application per (env, db) hostname + one Access policy per app allowing the env’s service token (decision = "non_identity").
hyperdrive.tf — cloudflare_hyperdrive_config per (env, db). Note origin = { ... } (attribute, not block) and port MUST be omitted when access_client_id/access_client_secret are set (the cloudflared ingress determines the port).
db_security_group.tf — adds an ingress rule to the existing RDS SG allowing 3306 from the runtime SG (referenced by ID via the data source).
outputs.tf — Hyperdrive IDs (used in wrangler.toml), service token client IDs, secret ARNs.

10.4 Apply procedure

cd ~/Repositories/GitHub/Adventive/adventive-platform-infra/infra/cloudflare-tunnels
terraform plan -out=tfplan && terraform apply tfplan

cd ../cflared-asg
terraform plan -out=tfplan && terraform apply tfplan

aws autoscaling start-instance-refresh --auto-scaling-group-name adv-cflared-dev \
  --preferences '{"MinHealthyPercentage": 100, "InstanceWarmup": 90}'
# Wait for refresh to complete; the new instance picks up the new tunnel ingress

cd ../cloudflare-hyperdrive
terraform init
terraform plan -out=tfplan && terraform apply tfplan

The instance refresh between cflared-asg and cloudflare-hyperdrive is mandatory — Hyperdrive validates the database connection at creation time, and that validation goes through the live cloudflared instance. If the instance still has the old config without the database ingress rules, Hyperdrive create fails with 404 Not Found (2015).

10.5 Token permissions (cumulative through Phase 2)

The adv-platform-infra-tunnels-dev Cloudflare API token now has these policies:

Scope	Permission group	Access
Entire Account	Cloudflare One Connector: cloudflared	Edit
Entire Account	Access: Apps and Policies	Edit
Entire Account	Access: Service Tokens	Edit
Entire Account	Hyperdrive	Edit
Specified Domains → adventive.dev	DNS	Edit
Specified Domains → adventive.dev	Zone	Read

10.6 Gotchas encountered

Sensitive variables can’t be used in for_each. Marking a variable sensitive = true makes everything derived from it sensitive, including its keys. Terraform forbids sensitive resource addresses (would leak structure metadata). Fix: split into a non-sensitive structure variable + a sensitive password-only map keyed identically.
Cloudflare provider v4: origin is an attribute, not a block. origin = { ... } (with =), not origin { ... }.
origin.port cannot coexist with origin.access_client_id. When using Access service-token auth, port routing is determined by the cloudflared ingress rule’s TCP service URL — not by the Hyperdrive config. Drop port from the origin block when using Access.
MySQL non-existent users on RDS trigger AuthSwitchRequest. Hyperdrive doesn’t support AuthSwitch; if the username in the Hyperdrive config doesn’t exist on the database, the error reads as a Hyperdrive-side limitation rather than “user not found.” Always verify the user exists first via SELECT User, Host, plugin FROM mysql.user.
Token permission granularity is finer than expected. Cloudflare splits Access into Access: Apps and Policies (apps + policies) and Access: Service Tokens (service tokens) — these are separate permission groups. Hyperdrive needs its own permission group too. All three must be added to the token before the Phase 2 module applies cleanly.
Hyperdrive validates the DB connection at creation. This is a feature: terraform apply won’t lie to you about connectivity. If the tunnel ingress is wrong, MySQL credentials are wrong, or the SG path is broken, cloudflare_hyperdrive_config create will fail and you’ll know immediately. The flip side: the tunnel’s runtime instance must be running the current ingress config before applying this module.

10.7 Worker integration (Phase 4 territory)

When the Public API Worker is built (Phase 4), its wrangler.toml binds the Hyperdrive IDs:

[[hyperdrive]]
binding = "DB_CONSOLE"
id = "059838c4abb64a92a4aece2a6a533a29"

[[hyperdrive]]
binding = "DB_AGGREGATE"
id = "c1b18833b07347daa77b56a2d19ef508"

Standard mysql2/promise connection inside the Worker:

const c = await mysql.createConnection(env.DB_CONSOLE.connectionString);
const [rows] = await c.query('SELECT 1');

11. Phase 3 — Auth helper Worker

Repo: ~/Repositories/GitHub/Adventive/adventive-auth-helper-worker/. New TypeScript+Hono Worker that any Adventive Worker can call over a service binding to validate X-Api-Key + X-Integration-Key against the console.api table.

11.1 Architecture

caller Worker --(service binding "AUTH")--> adv-svc-auth-helper-dev
                                                 │
                                                 ├── KV CACHE (5-min TTL) ── hit ──► return
                                                 │
                                                 └── miss
                                                       │
                                                       └─► Hyperdrive DB_CONSOLE
                                                              │
                                                              └─► Cloudflare edge → Access (service token) → Tunnel
                                                                     │
                                                                     └─► cloudflared on adv-cflared-dev → RDS console.api

11.2 Endpoints

GET /__health → {status, commit_sha, environment}
POST /auth with X-Api-Key + X-Integration-Key headers → {valid, accountId, rph, name} or {valid:false} on bad keys, 503 on DB unreachable

11.3 Repo layout

adventive-auth-helper-worker/
├── package.json                  hono + mysql2 deps; npm scripts for typecheck/deploy/tail
├── tsconfig.json                 strict, Workers types
├── wrangler.toml                 [env.dev] only; stg/prd stubbed
├── src/
│   ├── env.ts                    Bindings interface (DB_CONSOLE Hyperdrive + CACHE KV)
│   ├── logger.ts                 Structured JSON logger; events match plan vocabulary
│   ├── auth.ts                   SHA-256 cache key + KV lookup + mysql2 query
│   └── index.ts                  Hono app
├── tests/                        Stub vitest; full suite is Phase 4 territory
├── README.md
├── RUNBOOK.md
└── CHANGELOG.md

11.4 SQL query (verbatim from legacy)

SELECT account_id, name, rph
FROM api
WHERE int_key = ? AND api_key = ? AND !is_deleted
LIMIT 1

The legacy CodeIgniter Api_model::validateKeys() was the source. The legacy also does an UPDATE api SET r_count = r_count + 1 for rate limiting; per the plan, that responsibility moves to a Durable Object in the public API Worker (Phase 5) — the auth helper itself is a pure validator.

11.5 Deploy procedure

cd ~/Repositories/GitHub/Adventive/adventive-auth-helper-worker
npm install
npm run typecheck
npx wrangler deploy --env dev

After deploy, tail logs:

npx wrangler tail --env dev

11.6 Smoke test results (validated 2026-04-29)

GET  /__health                               → 200 OK with commit_sha
POST /auth (real keys)                       → {valid:true, accountId:246, rph:500, name:"CLOUDFLAREDEVAPITEST"}
POST /auth (real keys, second hit)           → same JSON; logs show auth.cache.hit (no DB query)
POST /auth (bogus keys)                      → {valid:false, accountId:0, rph:0}

11.7 Gotchas encountered

mysql2 + Cloudflare Workers requires disableEval: true. mysql2 uses eval/new Function for SQL parser optimization by default; CF Workers’ V8 isolate blocks runtime code generation (Code generation from strings disallowed for this context). Fix: pass connection config as an object (host/user/password/database/port + disableEval: true) instead of a connection string. The connection string path doesn’t accept extra options.
wrangler hits /memberships even with account-owned tokens despite whoami succeeding. Tracked Cloudflare quirk. Workaround: wrangler login for OAuth-based local dev, separate auth context from the API token Terraform uses. The two coexist; switch by setting/unsetting CLOUDFLARE_API_TOKEN.
Cloudflare account workers.dev subdomain is unique per account and not obvious from the dashboard. Pull from wrangler deploy output (printed at end of every deploy) or from the Worker’s page on dash.cloudflare.com. For Adventive: adventive.workers.dev.
KV binding name vs namespace title. When you create a KV namespace via wrangler kv namespace create kv-adv-svc-auth-helper-cache-dev, wrangler suggests a binding name matching the title. We use CACHE as the binding (clean, code-side), with the descriptive name as the namespace title. The two are independent.

11.8 Token permissions added in Phase 3

The adv-platform-infra-tunnels-dev token didn’t get new permissions; we used wrangler login (OAuth) for Worker deploys instead. If we later want CI to deploy the auth helper, we’ll need to add to that token (or a CI-specific token):

Permission group	Access
Workers Scripts	Edit
Workers KV Storage	Edit
User Details	Read (so wrangler’s `/memberships` probe succeeds)

12. Phases 4–7 — Public API Worker (driven by Claude Code)

Repo: ~/Repositories/GitHub/Adventive/adventive-public-api-worker/. Live: https://api.adventive.dev.

12.1 Architecture (in one paragraph)

Hono router on Cloudflare Workers. Auth via service binding to adv-svc-auth-helper-dev. Two Hyperdrive MySQL connections — DB_CONSOLE (campaign / advertiser / placement structure) and DB_AGGREGATE (kpi / engagement / quartile / clickthrough metrics). Rate limiting via a RateLimiter Durable Object per API key, hourly window, alarm-based reset. All DB queries use db.query() not db.execute() (Hyperdrive rejects COM_STMT_PREPARE). placement_name is a computed CASE expression joining ad_html5.ad_name or asset.asset_name, not a real column.

12.2 Endpoints (all 7 verified working against real dev DBs)

Method	Path	Returns
GET	`/credentialscheck`	`{ status: true }`
GET	`/advertisers`	`{ data: [{id, name}] }`
GET	`/advertisers/:id`	`{ data: AdvertiserDetail }` (with contacts[])
GET	`/campaigns`	`{ data: [CampaignListItem] }` (no default date filter)
GET	`/campaigns/:id`	`{ data: CampaignDetail }` (sites→placements→ad_units→delivery_groups)
GET	`/analytics/:campaignId`	`{ data: CampaignAnalytics }` (scalar, not array)
GET	`/clickthroughs`	`{ data: [ClickthroughRow] }` (requires advertiser_id; 4-month default)
GET	`/connector`	`{ data: [ConnectorRowV1\|V2] }` (requires advertiser_id; 2 bulk queries)
GET	`/dataconnector`	`{ data: [ConnectorRowV1\|V2] }` (account-wide; 2 bulk queries)

All routes accept /v{N}.{x}/ prefix. Major version ≥ 2 activates v2 schema (adds engagement + video quartile data on analytics / connector / dataconnector).

12.3 Error format

RFC 7807. Body: { status, title, detail }. Content-Type: application/problem+json. 429 responses include Retry-After: 3600.

12.4 Per-commit gate

npm run typecheck
npm run test
npx wrangler deploy --dry-run --env dev

CLOUDFLARE_ACCOUNT_ID=46a873457665355ba02a85e61d7200a7 for dry-run/deploy.

12.5 Gotchas captured during Phases 4–7

Cloudflare does NOT auto-create DNS records when a Worker route deploys. Manually add an AAAA record pointing at 100:: (proxied) for each custom hostname before the first deploy to that env. ADR at decisions/2026-04-29-worker-dns.md in the public-api-worker repo. This will hit again at stg/prd setup — pre-create api.adventivestg.com and api.adventive.com AAAA records before deploying.
mysql2 is pinned; do not upgrade without testing. Hyperdrive compatibility is the constraint. Pin notes are in src/lib/db.ts. The disableEval: true requirement (already documented in memory) still applies.
Hyperdrive rejects COM_STMT_PREPARE. Always use db.query(), never db.execute(). Documented in the public API Worker but worth surfacing for any new MySQL Worker.
AUTH helper interface contract: POST with { apiKey, integrationKey } headers → { accountId, rph } plus valid flag. The auth helper is in a separate repo (adventive-auth-helper-worker); its valid:false (200) and DB-unreachable (503) semantics MUST be respected on the consumer side. Public API Worker translates valid:false to its own 401 response shape for end users.

12.6 Outstanding for stg / prd extension

Per Claude Code’s handoff:

Phase 1–3 extension to stg / prd: Hyperdrive provisioning (4 more configs: console / aggregate × stg / prd), service binding deploys of adv-svc-auth-helper-stg / adv-svc-auth-helper-prd, AAAA records for api.adventivestg.com and api.adventive.com, RDS-side SG ingress for the runtime SGs that don’t exist yet.
Phase 8 (cutover): Traffic move from PHP app to this Worker.
Public API Worker stg / prd deploy: wrangler deploy --env stg / --env prd only after Hyperdrive IDs are filled in wrangler.toml and the upstream infra is up.

12.7 Post-Phase-7 hardening cycle (2026-04-29)

After the initial Phase 7 deploy, the following landed in subsequent commits per Claude Code’s report:

OpenAPI spec compliance fixes (analysis ADR, schema sync)
RFC 7807 error envelopes
Banner headers on responses
Flexible version routing (/v{N}.{x}/ prefix matching)
Compliance gap closure: Retry-After on 429, removeZeros v1 query parameter, analytics scalar rather than array
Bulk connector implementation (2 bulk queries)
Postman collection at postman/adventive-public-api.json with all 28 requests + pre-request auth scripting

13. Outstanding work (tracked as task backlog)

Item	Owner	Notes
Image Builder lifecycle policy	task #20	Auto-delete old AMIs
EventBridge → Lambda auto-publish	task #21	Runtime event-shape mismatch
Account-owned token migration	memory `project_phase12_token_deficiency`	Wait for Cloudflare UI fix
Local TF state → encrypted S3 backend	runbook §8	Required before stg/prd
Public subnet → NAT + private subnet	memory `project_phase15_subnet_deficiency`	Required before prd
`-ro` suffix vs actual user names	runbook §8	Cosmetic; reflects future-cluster-split intent
Looker / Data Studio integration guide	task #33	End-user docs
TapClicks integration guide	task #34	End-user docs
Google Sheets integration guide	task #35	End-user docs
Windsor.ai integration guide	task #41	End-user docs — marketing data pipeline / connector platform
`developer.adventive.com` docs site	task #36	Public-facing developer portal
WAF policies on Public API	task #37	Layered defense beyond per-key rate limit
New Relic observability + alerting	task #38	Per standing observability memory
Integrations admin-dashboard screen	task #39	Product idea
Adventive MCP server	task #40	Product idea
Stg / prd extension of Phases 1–3	not yet ticketed	Multi-phase workstream
Phase 8 — cutover	planning doc	After stg / prd burn-in

14. Change log

Date	Change
2026-04-29	Initial document. Phase 1.1 + 1.2 complete and validated. Smoke test torn down. Phase 1.5 starting.
2026-04-29	Phase 1.5 complete — `infra/cflared-asg/` applied, ASG `adv-cflared-dev` running one t3.micro from `ami-02f2f244d3dd56cb4`, tunnel `adv-cflared-dev` registered as `healthy` with Cloudflare. Phase 1 fully done.
2026-04-29	Phase 2 complete — `cloudflare-tunnels` refactored for multi-ingress, `cflared-asg` added 3306 egress, new `cloudflare-hyperdrive` module created. Two Hyperdrive configs (console + aggregate) for dev validated end-to-end. Worker bindings ready for Phase 4.
2026-04-29	Phase 3 complete — `adventive-auth-helper-worker` repo scaffolded, KV namespace provisioned, deployed to dev, smoke-tested against real and bogus key pairs. Public API Worker can now consume `AUTH` service binding in Phase 4.
2026-04-29	Phases 4–7 complete (Claude Code) — Public API Worker deployed live at `https://api.adventive.dev`. All 7 endpoint handlers verified against real dev DBs. RFC 7807 errors, /v{N}.{x}/ version routing, Postman collection. Backlog updated with documentation, WAF, observability, product items.
2026-04-29	Engineering review presentation delivered: `Adventive_Public_API_DEV_Status.pptx` (18 slides, Adventive engineering style). Save point — public API workstream paused for the night. Resume tomorrow on stg/prd extension, WAF, or any of the backlog items.

Public API Cloudflare Migration — As-Built Runbook

1. Architecture summary

2. Reference data (current state)

3. Prerequisites

Tools

AWS access

Cloudflare access

Git

4. Phase 1.1 — AMI build pipeline

4.1 Network preparation

4.2 S3 artifacts bucket

4.3 Image Builder components (YAML)

4.4 Terraform module — infra/imagebuilder/

4.5 Apply procedure

4.6 Triggering a build

4.7 Validation

4.8 Gotchas encountered (so you don’t repeat them)

5. Phase 1.2 — Cloudflare Tunnel + Secrets Manager

5.1 API token creation

5.2 Terraform module — infra/cloudflare-tunnels/

5.3 Apply procedure

5.4 Validation

5.5 Gotchas encountered

6. End-to-end smoke test (Phase 1.1 + 1.2 validation)

6.1 IAM role + instance profile (manual)

6.2 Launch parameters

6.3 Expected boot sequence

6.4 Validation

6.5 Critical SG gap discovered

6.6 Diagnostics via SSM (no shell plugin needed)

6.7 Smoke test teardown

7. Operational procedures

7.1 Bumping a Image Builder component

7.2 Manually publishing an AMI to SSM

7.3 Cleaning up old AMIs

7.4 Adding a new environment to the tunnels module

8. Known deficiencies (tracked as backlog)

9. Phase 1.5 — Per-env ASG module

9.1 Files

9.2 Apply procedure

9.3 As-built validation (2026-04-29)

9.4 Gotchas encountered

9.5 Operational implications

10. Phase 2 — Hyperdrive provisioning

10.1 Architecture

10.2 Module dependencies (changes made to upstream modules)

10.3 New module: infra/cloudflare-hyperdrive/

10.4 Apply procedure

10.5 Token permissions (cumulative through Phase 2)

10.6 Gotchas encountered

10.7 Worker integration (Phase 4 territory)

11. Phase 3 — Auth helper Worker

11.1 Architecture

11.2 Endpoints

11.3 Repo layout

11.4 SQL query (verbatim from legacy)

11.5 Deploy procedure

11.6 Smoke test results (validated 2026-04-29)

11.7 Gotchas encountered

11.8 Token permissions added in Phase 3

12. Phases 4–7 — Public API Worker (driven by Claude Code)

12.1 Architecture (in one paragraph)

12.2 Endpoints (all 7 verified working against real dev DBs)

12.3 Error format

12.4 Per-commit gate

12.5 Gotchas captured during Phases 4–7

12.6 Outstanding for stg / prd extension

12.7 Post-Phase-7 hardening cycle (2026-04-29)

13. Outstanding work (tracked as task backlog)

14. Change log

4.4 Terraform module — `infra/imagebuilder/`

5.2 Terraform module — `infra/cloudflare-tunnels/`

10.3 New module: `infra/cloudflare-hyperdrive/`