Why A/B test
Three concrete payoffs:- Better open rates on this campaign — the winning variant gets the bigger remaining audience if you’re using progressive winner rollout, or you learn enough to pick the right one next time.
- Compounding intuition — after a dozen tests, you start to see what kind of subjects work for your audience (specific numbers vs curiosity, short vs descriptive, branded vs personal).
- Cheap insight — no separate tooling, no research budget. The cost is “type a second subject line”.
How the flow works
The mechanics in Hiveku:- You write two subject variants (A and B) when composing a campaign.
- The audience gets randomly split 50/50 at send time. Each contact is assigned a variant deterministically based on contact ID, so re-tests on the same audience see consistent splits.
- Both variants send at the same time (so time-of-day isn’t a confound).
- After 48 hours, Hiveku looks at open rate by variant. If the difference is statistically significant, the winner is identified and labeled. If not, the campaign report says “no significant difference” and you pick whichever you preferred.
The flow
Compose your campaign
Build the campaign as you normally would: pick a template, pick an audience, write the body.
Toggle A/B test subjects
In the Subject section, toggle A/B test subjects. The single subject field becomes two — A and B — plus a preheader (which stays the same for both variants).Keep the variants meaningfully different. If A is “What’s new in April” and B is “What’s new in April 2026”, you’ve learned nothing from the test. Better:
- A: What’s new in April
- B: 3 product updates worth your time
Schedule or send
Schedule like any other campaign. Both variants go out at the same time — there’s no winner-then-rollout phase by default.
If you’d rather use progressive winner rollout — send to 20% as a test, identify a winner, send the winning variant to the remaining 80% — toggle that mode in the campaign settings. It needs an audience over ~5,000 to produce reliable picks.
Wait 48 hours
Hiveku posts the winner automatically once the window closes. You’ll get an in-app notification and (optionally) an email summary.The campaign report shows the breakdown:
A
| Variant | Sent | Opened | Open rate | p-value |
|---|---|---|---|---|
| A | 4,973 | 1,392 | 27.99% | — |
| B | 4,985 | 1,547 | 31.03% | 0.001 (winner) |
p-value under 0.05 is the rule of thumb for significance.Document the result
Add a one-line note in the campaign report: “Variant B (specific numbers + curiosity) won. Try this angle for next month.”Over time, you build a library of what works. The coach references this history when drafting future variants — “last three tests showed specific numbers in the subject outperformed curiosity by 8% — leaning that way.”
How the AI coach helps
The coach is good at drafting variants that test meaningfully different angles, not surface tweaks:- “draft two subject variants for this campaign — one curiosity, one specific.”
- “what subject patterns have worked best for our newsletter audience over the last six tests?”
- “this campaign is going to engaged customers. Suggest variants that lean on what they’ve responded to before.”
Statistical significance caveats
A few things to know about the math:- Sample size matters. With under ~1,000 recipients per variant, even a 10% difference in open rate isn’t reliably significant. Smaller campaigns will often end with “no significant difference” — that’s honest, not a bug.
- Open rate is noisy. Apple Mail Privacy Protection inflates opens by 20-40% for iOS-heavy audiences. The inflation is symmetric (both variants benefit equally), so winners are still meaningful, but absolute open rates are higher than reality.
- One test isn’t truth. A single result with p=0.04 is probably real, but not certain. Patterns over 5-10 tests are much more trustworthy than any single result.
- Don’t peek and stop early. Wait the full 48 hours. Stopping when you see the spread you wanted inflates false positives.
What to test
A short list of subject angles worth testing systematically:- Length — short (4-6 words) vs long (10-12 words).
- Specificity — “Three updates” vs “3 product updates from this week”.
- Curiosity vs clarity — “Something we’ve been quietly working on” vs “New dashboard is live”.
- Question vs statement — “Are you using audience filters?” vs “How to use audience filters.”
- Personal vs branded — first-person (“I built this for you”) vs third-person (“Acme launched…”).
Troubleshooting
No winner identified after 48 hours
No winner identified after 48 hours
Two reasons:
- Sample too small. Under ~500 delivered per variant, almost no difference clears the significance bar.
- Variants too similar. If both subjects produce essentially the same response, that’s also a real result — they’re equivalent for your audience.
Open rate is much higher in one variant but p-value is high
Open rate is much higher in one variant but p-value is high
p-value high means “this could be noise”. Common with small audiences. The campaign report flags this — the percent difference is real on this send but not necessarily reproducible.
A/B test toggle is disabled
A/B test toggle is disabled
A/B testing requires an audience over 200 (so each variant has at least 100). For smaller sends, you’d need a much bigger effect to detect anything — Hiveku just hides the toggle.
Can I A/B test more than just the subject?
Can I A/B test more than just the subject?
Not yet on a campaign — only subject and preheader. For body or template testing, the recommended pattern is two campaigns to two halves of an audience, sent at the same time. The coach can help split an audience evenly.
Related
Campaigns
Where the A/B test toggle lives.
AI Email Coach
Drafts variants and remembers what’s worked.