04 — Runbook
On-call quick reference
Section titled “On-call quick reference”| Role | Name | Contact |
|---|---|---|
| Primary on-call | Jeffrey Lambert | Slack: @jeffrey |
| Secondary on-call | Patrick | Slack: @patrick |
| Escalation (Stripe issues) | Stripe Support | dashboard.stripe.com → Support |
| Escalation (Mailgun issues) | Mailgun Support | app.mailgun.com → Support |
| Escalation (Cloudflare issues) | Cloudflare Support | dash.cloudflare.com → Support |
Quick checks:
# Worker healthcurl -s https://billing-worker.adventive.com/health | jq .
# Real-time logswrangler tail adventive-billing-worker --env production --format pretty
# Stripe webhook delivery status# Stripe Dashboard → Developers → Webhooks → endpoint → Event deliveriesDay-to-day procedures
Section titled “Day-to-day procedures”Monthly reconciliation (after each billing cycle)
Section titled “Monthly reconciliation (after each billing cycle)”The daily Cron Trigger Worker runs reconciliation automatically at 06:00 UTC. After a billing cycle closes (typically month-end), run a manual verification:
# Trigger reconciliation manually (admin endpoint — CF Access protected)curl -s https://billing-worker.adventive.com/admin/reconcile \ -H "CF-Access-Client-Id: ${SERVICE_TOKEN_ID}" \ -H "CF-Access-Client-Secret: ${SERVICE_TOKEN_SECRET}" | jq .
# Expected: {"status":"clean","accounts_checked":N,"drift_count":0}# If drift_count > 0: see "Symptom: Reconciliation drift" incident playbookCheck Mailgun delivery for the billing cycle:
- Mailgun Dashboard → Logs → filter by
billing-noreply@notify.adventive.comand date range - All invoiced accounts should show a delivered event for
invoice_neworinvoice_notify
Deploy a hotfix
Section titled “Deploy a hotfix”# 1. Create hotfix branch from maingit checkout main && git pullgit checkout -b hotfix/description
# 2. Make the fix, commit, pushgit commit -m "fix(billing): ..."git push origin hotfix/description
# 3. Open PR → staging firstgit checkout staging && git merge hotfix/description && git push# GitHub Actions deploys to staging automatically
# 4. Verify fix on staging (use Stripe test mode + Stripe CLI to trigger events)stripe trigger invoice.finalized --override invoice:customer=cus_test123
# 5. Merge to main → production deploygit checkout main && git merge staging && git push# GitHub Actions deploys to production
# 6. Monitor: tail logs for 10 minutes post-deploywrangler tail adventive-billing-worker --env production --format prettyRoll back the billing Worker
Section titled “Roll back the billing Worker”# See available deploymentswrangler deployments list --env production
# Roll back to previous deploymentwrangler rollback --env production
# Or roll back to a specific deploymentwrangler rollback [deployment-id] --env production
# Verifycurl -s https://billing-worker.adventive.com/health | jq .versionAfter rollback, Stripe will continue delivering webhook events. Unprocessed events from the failed deployment will be retried by Stripe automatically for up to 3 days.
Rotate a Worker secret
Section titled “Rotate a Worker secret”# Example: rotating Stripe keywrangler secret put STRIPE_SECRET_KEY --env production# Enter new value when prompted
wrangler secret put STRIPE_SECRET_KEY --env staging
# Verify by triggering a Stripe-dependent endpointcurl -s https://billing-worker.adventive.com/health | jq .stripe_connectedNo Worker redeploy required — secrets are fetched per-request.
Rotate the Stripe webhook signing secret
Section titled “Rotate the Stripe webhook signing secret”If the Stripe webhook endpoint is deleted and re-registered, or if the signing secret is rotated via Stripe Dashboard:
- Stripe Dashboard → Developers → Webhooks → click endpoint → Reveal signing secret
- Copy the new secret
- Update the Wrangler secret:
Terminal window wrangler secret put STRIPE_WEBHOOK_SECRET --env productionwrangler secret put STRIPE_WEBHOOK_SECRET --env staging - Verify: trigger a test event from Stripe Dashboard → endpoint should return 200
Resend an invoice email
Section titled “Resend an invoice email”If an operator reports an invoice email was not received:
# Admin resend endpoint (CF Access protected)curl -s -X POST https://billing-worker.adventive.com/admin/resend-invoice \ -H "CF-Access-Client-Id: ${SERVICE_TOKEN_ID}" \ -H "CF-Access-Client-Secret: ${SERVICE_TOKEN_SECRET}" \ -H "Content-Type: application/json" \ -d '{"invoice_id": "in_1THM9xCSGXf35AB6gDkwmE5y"}' | jq .This re-fetches the invoice from Stripe, re-renders the PDF, and re-sends via Mailgun. It does NOT create a new Stripe invoice event.
Issue a refund
Section titled “Issue a refund”# Via Stripe Dashboard: Dashboard → Payments → find charge → Refund# Stripe issues the refund; billing Worker does not need to act# Acodei picks up the refund event and creates a QuickBooks credit note
# For a partial refund or credit note via Stripe CLI:stripe refunds create --charge ch_xxx --amount 5000 # amount in cents
# Confirm: check Acodei sync within 15 minutes of Stripe refundHandle a dispute
Section titled “Handle a dispute”- Stripe Dashboard → Disputes → find the dispute
- Review the evidence Stripe requests (invoice PDF, delivery confirmation, account records)
- Upload evidence via Stripe Dashboard — deadline shown on dispute detail page
- Invoice PDF available in R2:
r2://adventive-invoices/{acct_id}/{inv_uuid}.pdf - Mailgun delivery log: Dashboard → Logs → search by recipient email for delivery confirmation
- After dispute closes: Stripe resolves automatically; Acodei picks up the outcome event
Update a customer’s payment method
Section titled “Update a customer’s payment method”- Direct the customer to the Stripe Customer Portal (if B.9 is enabled):
https://billing.stripe.com/p/login/{portal_token} - If Customer Portal is not yet enabled: use Stripe Dashboard → Customers → find customer → Payment methods → Add payment method → share Stripe Checkout setup link with customer
Investigate a QuickBooks / Acodei drift
Section titled “Investigate a QuickBooks / Acodei drift”If an operator reports that QuickBooks totals don’t match Stripe:
# Check the reconciliation endpoint outputcurl -s https://billing-worker.adventive.com/admin/reconcile \ -H "CF-Access-Client-Id: ${SERVICE_TOKEN_ID}" \ -H "CF-Access-Client-Secret: ${SERVICE_TOKEN_SECRET}" | jq .
# Check Acodei dashboard for sync errors:# Acodei → Connected accounts → Adventive → Event log → filter by errorCommon causes: Stripe event not yet picked up by Acodei (5–15 min delay), Acodei auth token expired (re-connect in Acodei settings), QuickBooks API rate limit (resolves automatically).
Incident playbooks
Section titled “Incident playbooks”Symptom: Stripe webhook not delivered (endpoint failing)
Section titled “Symptom: Stripe webhook not delivered (endpoint failing)”Likely cause: Worker returning non-2xx, or Worker throwing an unhandled exception.
Diagnosis:
# 1. Check Stripe webhook delivery log# Stripe Dashboard → Developers → Webhooks → endpoint → Event deliveries# Look for: 5xx responses, timeouts, or connection refused
# 2. Check Worker logs for the errorwrangler tail adventive-billing-worker --env production --format pretty
# 3. Check Worker error rate in Cloudflare Dashboard# Workers & Pages → adventive-billing-worker → Metrics → ErrorsFix:
- If Worker throwing: fix the exception in code; hotfix deploy
- If Worker returning 400 (signature mismatch): verify
STRIPE_WEBHOOK_SECRETmatches the endpoint’s current signing secret (see “Rotate the Stripe webhook signing secret”) - If Worker unavailable: Cloudflare infrastructure issue — check cloudflare.com/status; Stripe will retry for up to 3 days
Verification: Stripe Dashboard → Event deliveries → redeliver the failed event → should show 200 response.
Symptom: Invoice email not delivered
Section titled “Symptom: Invoice email not delivered”Likely cause: Mailgun API failure, Mailgun API key expired, or PDF render failure (email suppressed when PDF is missing).
Diagnosis:
# 1. Check Worker logs around the time of invoice.finalized eventwrangler tail adventive-billing-worker --env production --format pretty \ --search "invoice.finalized"
# 2. Check Mailgun delivery log# Mailgun Dashboard → Logs → search by recipient email or message-id
# 3. Check if PDF exists in R2# Cloudflare Dashboard → R2 → adventive-invoices → browse for {acct_id}/{inv_uuid}.pdfFix:
- If Mailgun API key expired: rotate
MAILGUN_API_KEY(see “Rotate a Worker secret”); resend via/admin/resend-invoice - If PDF render failed: see “Symptom: PDF render failed”
- If Mailgun shows
bouncedorfailed: check recipient email address is correct in Stripe Customer record; update if needed - If Mailgun shows
suppressed: customer previously unsubscribed — contact them via alternative channel; do not send via Mailgun until they re-subscribe
Verification: Resend via /admin/resend-invoice; confirm Mailgun shows delivered for the message.
Symptom: PDF render failed
Section titled “Symptom: PDF render failed”Likely cause: Cloudflare Browser Rendering timeout, HTML template error, or Browser Rendering concurrency limit hit.
Diagnosis:
# 1. Check Worker logs for pdf_render_failed eventwrangler tail adventive-billing-worker --env production --format pretty \ --search "pdf_render_failed"
# 2. Check Analytics Engine for pdf_render_failed events# Workers Analytics Engine dashboard → filter by event=pdf_render_failed
# 3. Attempt manual render for the specific invoicecurl -s -X POST https://billing-worker.adventive.com/admin/render-pdf \ -H "CF-Access-Client-Id: ${SERVICE_TOKEN_ID}" \ -H "CF-Access-Client-Secret: ${SERVICE_TOKEN_SECRET}" \ -H "Content-Type: application/json" \ -d '{"invoice_id": "in_xxx"}' | jq .Fix:
- If template error (malformed HTML, missing variable): fix Handlebars template and redeploy
- If timeout: invoice data may be unusually large; check if specific invoice has an extreme number of line items
- If concurrency limit: Browser Rendering allows 2 simultaneous renders; if >2 invoices are finalizing simultaneously, renders queue automatically — wait and retry via resend endpoint
- If Browser Rendering service unavailable: check developers.cloudflare.com/browser-rendering for service status; Stripe will hold the webhook event for retry
Verification: Confirm PDF appears in R2 at {acct_id}/{inv_uuid}.pdf; resend invoice email.
Symptom: Reconciliation drift detected
Section titled “Symptom: Reconciliation drift detected”Likely cause: Metering event double-counted, usage record missed, Stripe invoice total doesn’t match billing_invoice DB, or rounding difference.
Diagnosis:
# 1. Get drift detailscurl -s https://billing-worker.adventive.com/admin/reconcile?detail=true \ -H "CF-Access-Client-Id: ${SERVICE_TOKEN_ID}" \ -H "CF-Access-Client-Secret: ${SERVICE_TOKEN_SECRET}" | jq .
# 2. Compare: Stripe invoice for the drifting accountstripe invoices retrieve in_xxx | jq .amount_due
# 3. Compare: billing_invoice DB for the same account + period# (Direct DB query via secure bastion)
# 4. Check metering events for the account in questionstripe usage_records list --subscription_item si_xxx | jq .Fix:
- If usage double-counted: find the duplicate Usage Record; issue a Stripe Credit Note for the overcharge; fix the idempotency key logic in billing Worker
- If usage missed: report missing usage via
/usageendpoint for the affected account and period; Stripe will include on the next invoice - If rounding: document as acceptable if < $0.01; if systematic, fix rounding in billing Worker
- During dual-run (cohorts not yet fully migrated): drift may indicate legacy billing_service also ran for the same account — investigate double-billing risk first
Verification: Re-run reconciliation; confirm drift_count = 0 for affected account.
Symptom: Failed payment — customer’s invoice not collected
Section titled “Symptom: Failed payment — customer’s invoice not collected”Likely cause: Stripe Smart Retries exhausted, card declined, or payment method expired.
Diagnosis:
# 1. Check Stripe invoice statusstripe invoices retrieve in_xxx | jq '{status, amount_due, amount_paid, next_payment_attempt}'
# 2. Check Smart Retries schedule# Stripe Dashboard → Billing → Revenue recovery → Smart Retries configuration# Stripe Dashboard → Customers → {customer} → Invoices → see retry scheduleFix:
- If Smart Retries still running: no action needed; Stripe will retry per configured schedule; dunning emails sent automatically
- If Smart Retries exhausted (invoice status =
uncollectible): contact customer via Slack / direct outreach; ask them to update payment method via Customer Portal (B.9) or Stripe Checkout setup link - If customer updates card: Stripe Dashboard → manually retry collection on the invoice
Verification: Invoice status changes from open to paid in Stripe Dashboard after successful collection.
Symptom: Customer reports incorrect invoice amount
Section titled “Symptom: Customer reports incorrect invoice amount”Likely cause: Usage record incorrect, wrong Stripe Price tier applied, or managed service job amount wrong.
Diagnosis:
# 1. Pull full invoice line itemsstripe invoices retrieve in_xxx --expand='lines' | jq .lines.data[]
# 2. Compare metering data to source (Redshift impression counts for the period)# Ask Patrick to run: SELECT SUM(impressions) FROM stats WHERE acct_id=... AND period=...
# 3. Check which Price tier was applied# Look at lines[].price.tiers for the metered impression linesFix:
- If usage record wrong: issue a Stripe Credit Note for the overcharge; correct the metering pipeline for future periods
- If Price tier wrong: verify the Stripe Price object’s tier thresholds match
account_plan_usagefor this account’s plan - If managed service job wrong: issue a Stripe Credit Note; correct the InvoiceItem amount for the specific job
Verification: Customer acknowledges corrected invoice; Credit Note sent to customer via Mailgun if applicable.
Symptom: Acodei not syncing to QuickBooks
Section titled “Symptom: Acodei not syncing to QuickBooks”Likely cause: Acodei OAuth token expired, QuickBooks API rate limit, or Acodei configuration drift.
Diagnosis:
- Acodei dashboard → Connected accounts → Adventive → check last sync timestamp
- Acodei → Event log → look for sync errors or auth failures
- QuickBooks → check for duplicate entries (indicates sync ran twice)
Fix:
- If OAuth expired: reconnect Acodei to QuickBooks via Acodei settings (OAuth re-authorization flow)
- If rate limited: Acodei retries automatically; wait 30–60 minutes
- If configuration drift: compare Acodei field mappings against the expected Stripe event shape; update mappings in Acodei settings
Verification: Acodei → Event log shows successful sync for recent Stripe events; QuickBooks shows new entries.
Known issues & workarounds
Section titled “Known issues & workarounds”| Issue | Workaround | Permanent fix |
|---|---|---|
| Browser Rendering concurrency limit (2 simultaneous renders) | Renders queue naturally; ~200 invoices/month at non-simultaneous pacing is fine | If batch volume spikes, implement a queue (Workers Queue) to serialize render requests |
| PHPExcel still used for month-end Excel export | Legacy admin continues to generate until Cron Trigger Worker replacement is live | Port month-end summary to CSV via Cron Trigger Worker (Phase 6+) |
| Stripe Smart Retries not yet configured | Operators manually track failed invoices until Smart Retries is configured in B.3 | Configure Smart Retries as part of B.3; enable at least 3 retry attempts on configurable schedule |
| Historical invoice PDF unavailable in R2 pre-B.4 | PDFs still served from billing.adventivecdn.com for pre-migration invoices | Batch-copy historical PDFs from S3 to R2 as part of B.4 setup |
14-year billing_invoice history not in Stripe | Legacy DB remains accessible read-only for historical lookups | Export to R2 / archive format before legacy DB decommission (post-B.8) |
| Custom dunning still in admin during parallel run | Legacy admin handles dunning for non-migrated cohorts; billing Worker handles migrated cohorts | Retire at B.8 when 100% of customers are on Stripe Billing |