From Jupyter Notebook to AWS: Deploying an A/B Test Duration Calculator

I built an A/B test duration calculator in a Jupyter notebook. It worked great locally: plug in your baseline conversion rate, expected lift, and daily traffic, and it would tell you exactly how many days to run your test. The problem was it lived entirely on my machine, which meant it was useful to exactly one person: me.

So I decided to turn it into a real, deployed web tool. What followed was a proper end-to-end ML deployment project: FastAPI, Docker, AWS ECS Fargate, a load balancer, HTTPS, and an automated CI/CD pipeline. This post walks through what I built, the decisions I made, and the things that went wrong along the way.

If you just want to see the thing go to: datahook.co.uk/ab-calculator

What the calculator does

The calculator uses a two-proportion z-test to determine how long an A/B test needs to run to detect a given effect at a specified power and significance level. You give it:

Your baseline conversion rate
The minimum relative lift you want to detect
Your daily traffic, allocation, and split ratio
Target statistical power and significance level

It returns the required sample sizes for control and treatment, and translates that into a number of days based on your traffic. It also renders two Plotly charts: a sensitivity curve showing days needed across a range of effect sizes, and a power heatmap showing how power varies with sample size and lift.

The architecture

The final setup looks like this:

Browser (Hugo site on datahook.co.uk)
    ↓ HTTPS
AWS Application Load Balancer (api.datahook.co.uk)
    ↓ HTTP
AWS ECS Fargate (FastAPI container)
    ↓
AWS ECR (Docker image registry)

GitHub → GitHub Actions → ECR → ECS (on every push)

Each layer has a specific job. The Hugo site is just static HTML and JavaScript: it calls the API and renders the results. The load balancer provides a stable HTTPS endpoint. ECS Fargate runs the container without me having to manage servers. ECR stores the Docker images. And GitHub Actions ties it all together so deploying new code is just a git push.

Step 1: Getting the notebook logic out of the notebook

The first real piece of work was extracting the calculator logic into a clean Python module. Notebooks are great for exploration but terrible for deployment: global variables, cells that depend on each other’s state, and no clear inputs or outputs.

The key was identifying a single entry point function that takes all the inputs and returns all the outputs. Everything else got stripped away: the print statements, the chart code, the markdown explanations. What remained was two core functions and a run_calculation() wrapper that became the bridge between the notebook and the API.

This step is worth taking seriously. A clean calculator.py with well-defined inputs and outputs makes every subsequent step easier.

Step 2: Wrapping it in FastAPI

FastAPI was the natural choice here. It’s fast, it generates interactive API documentation automatically, and Pydantic handles input validation out of the box. The API ended up with two endpoints:

POST /calculate: takes the calculator inputs, returns results as JSON
GET /health: returns {"status": "ok"}, used by AWS for health checks

The Pydantic input model enforces constraints automatically: baseline conversion rate must be between 0 and 1, daily traffic must be positive, and so on. This means the API fails fast with a clear error message rather than returning nonsense results.

Step 3: Dockerising

The Dockerfile uses a two-stage build. The first stage installs all the Python dependencies. The second stage copies only the installed packages across, leaving behind pip caches and build tools. This keeps the final image lean.

Two things I learned here that I’ll always do going forward:

Pin your dependency versions. fastapi==0.115.5 instead of just fastapi. Unpinned dependencies mean your build might work today and break in three months when a new version ships.

Use a non-root user. Adding a dedicated appuser and switching to it before the CMD is a small change that reflects real production practice.

Step 4: The AWS deployment journey

This is where things got interesting.

My first attempt was AWS App Runner, which promises to be the simplest way to run a container on AWS. After several hours of health check failures, stuck deployments, and OPERATION_IN_PROGRESS states I couldn’t cancel, I gave up on it. App Runner abstracts away too much: when something goes wrong, there’s very little visibility into why.

I switched to ECS Fargate, which exposes all the underlying components explicitly. More setup, but much easier to debug. The components you create are:

ECS Cluster: a logical grouping for your services
Task Definition: the blueprint: which image, how much CPU and memory, which port
ECS Service: keeps your task running and restarts it if it crashes
VPC, subnets, security groups: the networking layer that controls what can talk to what

There was also an AWS account quota issue: new accounts have a Fargate vCPU limit of zero, which means no tasks can be placed at all. This required raising a support ticket. Worth knowing before you start.

Step 5: The load balancer and HTTPS

ECS Fargate tasks get a new public IP every time they restart. Without a load balancer, that means updating your frontend every time your container redeploys: obviously not sustainable.

An Application Load Balancer sits in front of the ECS service and provides a stable DNS name. ECS automatically registers new task IPs with the load balancer’s target group, so the DNS name always resolves correctly regardless of what’s happening behind it.

HTTPS came via AWS Certificate Manager, which provides free SSL certificates for domains you own. The validation process involves adding a CNAME record to your DNS: GoDaddy warned it could take 48 hours, but it resolved in a few hours.

The final result is a stable https://api.datahook.co.uk endpoint that the Hugo frontend can call safely from an HTTPS page without triggering mixed content errors.

Step 6: CI/CD with GitHub Actions

The final piece was automating the deployment so that pushing code to GitHub triggers a full rebuild and redeploy automatically.

The workflow does three things:

Builds the Docker image
Tags it with the git commit SHA and pushes it to ECR
Updates the ECS task definition with the new image tag and forces a redeployment

A few things in the workflow are worth explaining. The paths filter means the workflow only runs when files in app/, Dockerfile, or requirements.txt change. Making changes to the Hugo page won’t trigger a redeployment.

Tagging images with the commit SHA rather than just latest means every deployed version is traceable and rollbacks are straightforward.

What I’d do differently

Start with ECS, skip App Runner. App Runner’s simplicity is a trap when things go wrong. ECS gives you more to configure upfront but much better observability.

Check account quotas before starting. The Fargate vCPU quota issue cost hours. A quick check of Service Quotas at the start would have flagged it immediately.

The result

The calculator is live at datahook.co.uk/ab-calculator. Any change to the API code is automatically built, pushed to ECR, and deployed to ECS with a single git push. The Hugo frontend calls the live API over HTTPS and renders the results and charts in the browser.

The full stack (FastAPI, Docker, ECS Fargate, ALB, ACM, GitHub Actions) mirrors how real ML models get deployed in production environments. That was the point: build something small enough to actually finish, but use the real tools.

P.S. Yes, I know Streamlit exist. I regret nothing.