Skip to main content
The subject line is the single highest-leverage decision in any campaign. A/B testing it costs you nothing extra — you’d be sending the campaign anyway — and over time, it builds real intuition about what resonates with your audience. Hiveku’s A/B test flow is built into the campaign composer. This guide walks through the full loop and the caveats you should know about.

Why A/B test

Three concrete payoffs:
  • Better open rates on this campaign — the winning variant gets the bigger remaining audience if you’re using progressive winner rollout, or you learn enough to pick the right one next time.
  • Compounding intuition — after a dozen tests, you start to see what kind of subjects work for your audience (specific numbers vs curiosity, short vs descriptive, branded vs personal).
  • Cheap insight — no separate tooling, no research budget. The cost is “type a second subject line”.

How the flow works

The mechanics in Hiveku:
  1. You write two subject variants (A and B) when composing a campaign.
  2. The audience gets randomly split 50/50 at send time. Each contact is assigned a variant deterministically based on contact ID, so re-tests on the same audience see consistent splits.
  3. Both variants send at the same time (so time-of-day isn’t a confound).
  4. After 48 hours, Hiveku looks at open rate by variant. If the difference is statistically significant, the winner is identified and labeled. If not, the campaign report says “no significant difference” and you pick whichever you preferred.
The 48-hour window is a balance — long enough that most opens land, short enough to be useful for follow-up decisions.

The flow

1

Compose your campaign

Build the campaign as you normally would: pick a template, pick an audience, write the body.
2

Toggle A/B test subjects

In the Subject section, toggle A/B test subjects. The single subject field becomes two — A and B — plus a preheader (which stays the same for both variants).Keep the variants meaningfully different. If A is “What’s new in April” and B is “What’s new in April 2026”, you’ve learned nothing from the test. Better:
  • A: What’s new in April
  • B: 3 product updates worth your time
The coach lints both subjects with the same length and spam-pattern checks as a non-A/B campaign.
3

Schedule or send

Schedule like any other campaign. Both variants go out at the same time — there’s no winner-then-rollout phase by default.
If you’d rather use progressive winner rollout — send to 20% as a test, identify a winner, send the winning variant to the remaining 80% — toggle that mode in the campaign settings. It needs an audience over ~5,000 to produce reliable picks.
4

Wait 48 hours

Hiveku posts the winner automatically once the window closes. You’ll get an in-app notification and (optionally) an email summary.The campaign report shows the breakdown:
VariantSentOpenedOpen ratep-value
A4,9731,39227.99%
B4,9851,54731.03%0.001 (winner)
A p-value under 0.05 is the rule of thumb for significance.
5

Document the result

Add a one-line note in the campaign report: “Variant B (specific numbers + curiosity) won. Try this angle for next month.”Over time, you build a library of what works. The coach references this history when drafting future variants — “last three tests showed specific numbers in the subject outperformed curiosity by 8% — leaning that way.”

How the AI coach helps

The coach is good at drafting variants that test meaningfully different angles, not surface tweaks:
  • “draft two subject variants for this campaign — one curiosity, one specific.”
  • “what subject patterns have worked best for our newsletter audience over the last six tests?”
  • “this campaign is going to engaged customers. Suggest variants that lean on what they’ve responded to before.”
The coach won’t suggest meaningless tweaks (capitalizing one word, swapping an emoji) because those don’t teach you anything.

Statistical significance caveats

A few things to know about the math:
  • Sample size matters. With under ~1,000 recipients per variant, even a 10% difference in open rate isn’t reliably significant. Smaller campaigns will often end with “no significant difference” — that’s honest, not a bug.
  • Open rate is noisy. Apple Mail Privacy Protection inflates opens by 20-40% for iOS-heavy audiences. The inflation is symmetric (both variants benefit equally), so winners are still meaningful, but absolute open rates are higher than reality.
  • One test isn’t truth. A single result with p=0.04 is probably real, but not certain. Patterns over 5-10 tests are much more trustworthy than any single result.
  • Don’t peek and stop early. Wait the full 48 hours. Stopping when you see the spread you wanted inflates false positives.

What to test

A short list of subject angles worth testing systematically:
  • Length — short (4-6 words) vs long (10-12 words).
  • Specificity“Three updates” vs “3 product updates from this week”.
  • Curiosity vs clarity“Something we’ve been quietly working on” vs “New dashboard is live”.
  • Question vs statement“Are you using audience filters?” vs “How to use audience filters.”
  • Personal vs branded — first-person (“I built this for you”) vs third-person (“Acme launched…”).
After 5-10 tests, the patterns that work for your audience start to dominate. Trust that signal more than general best-practice posts on the internet.

Troubleshooting

Two reasons:
  • Sample too small. Under ~500 delivered per variant, almost no difference clears the significance bar.
  • Variants too similar. If both subjects produce essentially the same response, that’s also a real result — they’re equivalent for your audience.
Pick whichever you preferred and try a more meaningfully different test next time.
p-value high means “this could be noise”. Common with small audiences. The campaign report flags this — the percent difference is real on this send but not necessarily reproducible.
A/B testing requires an audience over 200 (so each variant has at least 100). For smaller sends, you’d need a much bigger effect to detect anything — Hiveku just hides the toggle.
Not yet on a campaign — only subject and preheader. For body or template testing, the recommended pattern is two campaigns to two halves of an audience, sent at the same time. The coach can help split an audience evenly.

Campaigns

Where the A/B test toggle lives.

AI Email Coach

Drafts variants and remembers what’s worked.