How I Evaluate a Good Performance Creative
So many ads that “work” only work for a few days. The job is finding the ones that work for months.
I’ve run ~$150M in Meta spend and helped produce 20,000+ ads. I’ve reviewed way more than that.
I get asked all the time “what makes a good ad” here is my best advice.
First, here is the order I prioritize the data:
Scale. Then efficiency. Then diagnostics.
Once we establish the right evaluation flow we can talk evaluation specifics.
Don’t start with high ROAS:
Here’s what too many people do.
They open Ads Manager. They sort by ROAS. They look at the top ad. They look at hook rate. They look at hold rate. They look at CTR. They form an opinion about why it worked.
Then they do the same for the bottom ad and form an opinion about why it failed.
The problem: the bottom ad spent $200. The top ad spent $400. Neither has enough data to support an opinion about anything. Those conversions might have been flukes or early audiences.
You’re reading tea leaves at this point.
Here is the order of evaluation first before you start looking at ad specifics.
Layer 1: Scale
Before you look at any other metric, ask one question.
Did this ad get enough spend for the result to mean anything?
The threshold varies by account size, but my general rule: an ad needs to clear roughly 5-10x your CPA target before any of its numbers tell you anything. If your CPA target is $50, you want $250-$500 minimum on an ad before you call it a winner or a loser.
Meta’s algorithm spends the first chunk of an ad’s budget testing into warm audiences it already trusts. You’ll notice that you always get new ads pushed to you. Starting there is great, it gives the algo data but it isn’t always indicative of true performance.
The real question isn’t “does this ad work on warm traffic?” (if it was then just run a promo) The real question is “does this ad work on cold audiences Meta hasn’t found yet?”
You don’t find that out at $200 in spend. You find it out at $2,000 or $200,000.
Strong ROAS for 2 days then it dropped off? Just ignore that ad.
Layer 2: Efficiency
Once an ad has cleared the scale threshold, then you look at efficiency.
CPA. ROAS. Contribution margin. Whatever the unit economics metric is for your business.
This is where the ROAS intuition is helpful.
But be careful about being overly reliant on click based tracking.
A problem solution ad may drive better overall performance (tracked by incrementality) than a promotion ad but not show it on your MTA. This is not an excuse to ignore those metrics but rather to measure more carefully.
Layer 3: Diagnostics
Only after an ad has scaled and proven its efficiency do you dig into the supporting metrics.
Hook rate. Hold rate. CTR. Average watch time. Add-to-cart rate. Landing page conversion.
These metrics don’t tell you whether an ad worked. They tell you why it worked and if it can be improved, so you know what to iterate on.
A 60% hook rate on an ad that didn’t get enough spend or perform means nothing. Same goes for hold rate. Same goes for CTR.
But once an ad has scaled and proven it works, the diagnostic metrics let you understand the formula. Then you can iterate intelligently.
This is the order: scale, efficiency, diagnostics. Skip any layer and you’ll waste time.
What Makes an Ad Actually Compelling
Alright so what makes a compelling ad that can scale at high performance?
Here is what I’ve learned:
A compelling ad shares something genuinely unique about your brand, in an engaging way, and communicates the benefit the customer gets from it that is unique to you.
That’s it. That’s the pattern.
Everything else (Format, length, USPs, etc.) shifts in my experience.
Your job as a creative strategist is to figure out how to engage someone’s attention while telling that story. The best ads do it in three beats:
Engage and qualify in the first few seconds
Explain the differentiation and the benefit
Make it compelling to buy
Notice what’s not on that list. CTAs. Problem-solution structure. Specific formats. Those are all optional. They are helpful for specific ads, but they’re not required in my opinion.
Engage AND Qualify
This is the single biggest framing shift I push on clients.
It’s tempting to chase hook rate as if it’s the goal. You are paying for impressions, so higher hook rate equals better ad, right?
No. A high hook rate from the wrong audience is worse than a lower hook rate from the right one.
I’d much rather have an ad with a 30% hook rate that qualifies the right person than an ad with a 70% hook rate that engages everyone but qualifies no one.
A 70% hook rate that pulls in college students for a $200 supplement is a waste of spend. A 30% hook rate that pulls in 45-year-old men with joint pain for the same product is a winning ad.
The job of the first three seconds isn’t to maximize attention. It’s to maximize qualified attention. The right person, not the most people.
You can engage and qualify in lots of ways:
Problem-solution engages with a pain point only your customer feels.
Founder story engages with a personal narrative that screens for the right audience.
Testimonial engages with the kind of person your customer wants to be.
All three work. The format can shift. The point is whether the first few seconds pull in the right person AND give them a reason to keep watching.
Once you figure that out, you can go back and optimize for engagement. But it’s secondary to qualification, not primary.
Differentiate, Benefit, Then Make It Compelling
Once you’ve engaged and qualified the right person, you have a few seconds to make them care.
Differentiation: what’s actually different about your product.
The test: swap your brand name for a competitor’s. Does the ad still work? If yes, you’re describing the category, not your product.
“Clean ingredients” is describing the category. “Only 4 ingredients, no seed oils, made in a facility we own” is differentiation. Specific. Verifiable. Hard to copy.
Benefit: what the customer actually gets.
Features are what the product does. Benefits are what the customer’s life looks like once they have it.
“High in protein” is a feature. “You’ll stop feeling hungry at 3pm” is a benefit. Founders love features because they built around them. Customers don’t translate features into benefits. They just scroll.
If you can’t draw a clear line from a feature to something concrete in the customer’s day, cut the feature.
Compelling to buy: why now.
I try to avoid discounts and fake urgency to handle. It works short term and trains your customer to never pay full price.
The durable version is stakes. What does it cost the customer to keep doing what they’re doing? If the answer is “nothing really,” you have a positioning problem, not a creative problem.
The strongest versions are usually:
Cost of inaction (every month on the wrong solution is another month wasted)
Risk reversal (try it for 60 days, full refund)
Specific moment of relevance (a deadline the customer already has in their life)
Discounts close the deal. They don’t make the case.
When you put all three together cleanly, the ad starts to feel inevitable to the right customer. Thats why this is so powerful and why most of the best ads follow this pattern.
Building Specific Learnings: Evaluating Across Ads
Individual winners = signal. An ad that cleared the scale threshold and posted strong efficiency is real. You can study it. You have stat sig.
Individual losers = noise. A single ad can fail for ten different reasons. The hook was wrong. The hold was wrong. The CTA was bad. The landing page was off. The targeting got weird. You can’t tell which one from a sample of one.
Aggregate patterns across multiple ads = real learning. If five UGC ads with similar hooks all underperformed and five with different hooks all worked, that’s signal you can act on.
The practical rule: don’t take a definitive learning from a single losing ad. Build off what’s winning. Use losers as starting points for new tests, not as diagnostic puzzles to solve.
Variations vs. Concepts
When you find a winner, you have two ways to test off it.
Variations: Take the winner and make three similar versions. Same idea to a new creator, change the first 5 seconds (not just 1), or extend the length heavily. Keep the underlying concept the same.
This is your safest, highest-hit-rate testing. Most variations work because the concept is already proven.
Just don’t make variations like it’s 2024. Small hook changes or CTA changes don’t count anymore.
New concepts off the same insight: Take the core insight from a winner and build three completely different ads around it. Different visual style, different format, different angle, but the same underlying story about why the product matters.
Often these don’t even look like the original ad. But they’re built on what you learned from it.
We find a ton of winners through both paths. The first compounds your existing concept. The second extends what you can do with the insight underneath it.
If you’re only running variations, your account will eventually hit a ceiling. If you’re only running new concepts, you’re leaving easy wins on the table.
A Few More Common Mistakes
Over-indexing on warm audience performance. Your retargeting CPA is not your creative quality signal. It’s your customer base. New customer CPA on prospecting is the real test.
Defaulting to statics because they’re cheap. Statics work. They have a place. But if 80% of your output is static because video is hard, you’re capping your account’s ceiling.
Discount-led creative as the primary lever. It works in the short term and degrades your business in the long term. The brands that win are the ones that figured out how to make creative compelling without a price-led hook.
The “feel cool” trap. Sarcastic, hyper-stylized, snazzy copy that makes the founder feel like they have a cool brand. It rarely converts. The ads that win usually look more boring than the founder wants them to look.
If you’re falling into any of these, the fix isn’t more ads. It’s better evaluation of the ones you have, and a clearer view of what compelling actually looks like.
Putting It Together
Evaluating creative well is mostly about discipline.
Scale before efficiency. Efficiency before diagnostics. Engage AND qualify, not just engage. Differentiate, benefit, make it compelling: in that order. Build off winners with both variations and new concepts. Don’t try to diagnose losers.
And remember the thing this whole framework is solving for:
You’re not trying to find an ad that works for three days on warm audiences. You’re trying to find ads that consistently resonate with cold audiences at scale.








$200 budget is not inherently noisey. It's a lot of impressions
The issue is purely in people's interpretation (e.g., stories about the data).