Marketing Engineering

Marketing Effectiveness:
Measuring What Actually Works

By ·March 2026·25 min read·Marketing Engineering
Share

You are the CMO of a successful brand. You ask a simple question: What is the ROI of paid search? You receive five different answers. The performance team says cost per acquisition is £5, last click. The attribution team revised it to £7.50. Your new MMM provider says £30, nearly double the previous MMM at £15. And two years ago, a controlled incrementality experiment showed PPC was barely incremental at all, cost per incremental acquisition was over £50.

Five methods. Five answers. Thousands of pounds spent on effectiveness evaluation. No clarity on what should be a simple question.

This scenario, adapted from real engagements, is not an edge case. It is the norm. And it reveals something fundamental about marketing effectiveness that most organizations have not yet confronted: the problem is not technical. It is cultural.

Making effectiveness work requires a combination of capabilities and culture. The industry has overwhelmingly emphasized capabilities, debating which measurement technique is superior, chasing the chimera of a single source of truth. But fragmentation, with its messy measurement and organizational silos, has made the cultural dimension even more important than the technical one.

Why Culture Eats Technique for Breakfast

A large-scale study of over 70,000 global campaigns on Meta revealed something that should humble every measurement vendor in the industry: even accounting for firm size and industry, the best 10% of campaigns were at least five times more effective than the average. The variation was not explained by targeting data or platform algorithms. It was explained by advertiser-specific factors, including, critically, whether the organization actively learned from its results.

Advertisers who are active learners can improve ROI by 20 to 200%. Not through better models. Not through better data. Through a commitment to asking better questions and adjusting behavior based on what they find.

Effectiveness is about creating an evidence-based culture, enthusiastic about data and analytics, but designed to manage its blind spots and committed to learning.

The Causal Ladder: Not All Evidence Is Created Equal

If correlation is not causation, what is causality? Beyond marketing, causal measurement is one of the defining intellectual achievements of the last 25 years. The key notion is the counterfactual, a parallel universe identical to ours in all respects but one: we did not advertise. We will never know for certain what would have happened in that counterfactual world. But there is a ladder of techniques that progressively make bias less likely.

At the lowest rung, digital attribution observes touchpoints near conversion. Above it, Marketing Mix Modeling decomposes revenue across all drivers from historical data. Above that, controlled experiments construct a synthetic counterfactual through randomization. At the top, counterfactual simulation combines all the prior evidence to answer strategic what-if questions.

Figure 01 · The causal ladder
Not all evidence is created equal
Controlled experimentsthe validator

Geo tests, holdouts and synthetic controls build a real counterfactual. The closest thing to causal truth.

causal confidence
0%
Marketing Mix Modelingthe spine

Decomposes revenue across every driver from aggregated history. Answers cross-channel allocation, survives cookie loss.

causal confidence
0%
Digital attributionthe tactical optimizer

Observes touchpoints near conversion. Useful within a channel, never for cross-channel allocation.

causal confidence
0%
Causality is about the counterfactual: what would have happened without the spend. The three methods are not rivals, they are rungs. Each higher rung buys more confidence about cause at the cost of speed. The discipline is using each for the job it is built for.
Hierarchy of causal evidence

Three Core Techniques, Three Different Use Cases

Marketing Mix Modeling is the spine. It runs on aggregated, weekly historical data and answers the cross-channel budget allocation question. It does not need user-level tracking, which means it survives the death of cookies and other privacy changes.

Controlled experiments are the validator. Geo-paired tests, holdout cohorts, and synthetic controls measure the true causal lift of a channel by turning it off in one place and comparing against a matched control. They are slower and more expensive than MMM, but they produce the highest-confidence causal claim available.

Attribution is the tactical optimizer. It ranks creative variants, keyword groups, and audiences within a single channel, calibrated against the higher-confidence methods above. It should never be used for cross-channel allocation.

The MESI Framework

MESI is Model, Experiment, Simulate, Implement. A continuous loop, not a project. Start with a model (typically MMM) to map what matters based on historical data. Use the model to identify central decisions that lack evidence. Design experiments to discover new causal insights. Simulate the impact of proposed changes using combined evidence. Implement the best options, then validate with continued modeling and testing.

Figure 02 · MESI
A loop, not a project
MModelEExperimentSSimulateIImplementcontinuousloop
Hover a stage to read its role.
Model, Experiment, Simulate, Implement. Effectiveness is not a report delivered once. It is a continuous loop: the model surfaces the decisions that lack evidence, experiments resolve them, simulation prices the options, and implementation produces the next round of data. Hover a stage.
The MESI operating cycle

The Learning Agenda

A Learning Agenda is a structured program focused on filling critical knowledge gaps. Unlike a research plan, which manages analytics assets and vendor contracts, a Learning Agenda focuses on the central information that changes minds and shapes decisions. It is a commitment to experimentation, innovation, and change. It puts marketers in control of effectiveness investment rather than being led by the technical capabilities of individual vendors.

Going deeper

Start With the Decision, Not the Data

Most measurement programs begin with the data they happen to have and ask what it can tell them. The strongest ones invert this. They begin with the decisions the business is about to make, then ask what evidence would change those decisions, and only then decide what to measure. A Learning Agenda is simply that question, written down and prioritized: which uncertainties, if resolved, would change where the next dollar goes.

The discipline is ruthless prioritization. You cannot measure everything, and trying to is how teams produce dashboards no one acts on. Rank the open questions by how much the answer would move the budget and how feasible it is to learn. Work the top of that list. A single well-chosen experiment that resolves a high-stakes uncertainty is worth more than a hundred metrics that change no decision at all.

Long-Term Effects: The Half of the Story Most MMMs Miss

Long-term effects have two dimensions: duration and breadth. Half of advertising's impact typically occurs within three months; the other half between three and eighteen months. Standard MMM captures only the short-term half. Measuring the full return requires combining MMM with industry benchmarks, brand tracking, and simulation tools that explicitly model long-term value creation. For many brands, what is easily measurable is literally only half the story.

Figure 03 · Long-term effects
The half of the story most MMMs miss
100% of impact measured
0369121518monthsshort-termlong-term (often unmeasured)
Roughly half of advertising’s impact lands within three months. The other half accrues between three and eighteen, as brand memory compounds. A standard MMM, fit on a short window, captures only the first half and quietly under-credits brand building. Toggle the view.
Illustrative response over time

The People Problem

The hardest part of effectiveness measurement is not the math. It is the organizational architecture: who owns the Learning Agenda, who has the authority to reallocate budget based on its findings, how internal effectiveness leadership coordinates with external specialist partners, and how the operating cadence converts measurement insight into decision change. Companies that get this architectural piece right consistently outperform those that buy a sophisticated MMM platform and assume the platform will fix the problem.

Going deeper

Effectiveness Compounds or It Decays

There is a reason the active learners from the opening of this piece pull away from their peers, and it is not that their individual studies are better. It is that learning compounds and reporting does not. Each experiment a learning organization runs sharpens the prior for the next. The measurement system grows more accurate, the questions get sharper, and the cost of the next answer falls. A reporting organization starts every quarter roughly where it started the last one.

This is the doctrine’s third commitment applied to measurement: build for compounding, not for events. A one-time effectiveness study is a sample, useful once and then stale. A learning loop is a distribution that keeps tightening. The gap between the two looks small in any single quarter, which is exactly why most organizations never close it, and it is decisive over a few years, which is exactly why the ones that do become impossible to catch.

The takeaways
Interactive · Scorecard
How effective is your effectiveness practice?

The best campaigns are not the ones with the best models, they are the ones run by the best learners. Five questions on whether your organization is built to learn.

1How does your team treat measurement results?
2What is your primary cross-channel measurement method?
3How often do you run controlled experiments?
4Do you have a written, prioritized Learning Agenda?
5Do you measure the long-term effect of brand and upper-funnel media?
0/5 answered
Indicative self-assessment, not a substitute for a diagnostic
Continue the series
The Doctrine
The Stochastic Doctrine
Read →
Operating Model
The End of the Deck
Read →
Insight
Marketing Mix Modeling: End of Attribution Mythology
Read →
The next step

Bring us the decision that will not hold still.

A Strategic Diagnostic is a focused working session, not a sales call. You leave with a clear read on whether our models can resolve your friction, and the first move if they can.

Request a Strategic Diagnostic
Frequently Asked

Questions on This Topic

What is marketing effectiveness measurement?+

Marketing effectiveness measurement is the discipline of quantifying the causal impact of marketing activities on business outcomes, revenue, profit, market share, and brand equity. It goes beyond tracking clicks and impressions to establish what actually drove commercial results. The three core techniques are Marketing Mix Modeling (MMM), controlled experiments, and digital attribution, each with different strengths, limitations, and appropriate use cases.

Why do different measurement methods give different answers?+

Because they measure different things. Last-click attribution measures who happened to click before converting, it says nothing about causation. MMM uses statistical regression on historical data to isolate each channel's marginal contribution. Experiments use controlled exposure to measure true incrementality. A single channel can show a £5 CPA in attribution, £15 in one MMM, £30 in another MMM, and £50 in an incrementality test. The variation is not error, it reflects fundamentally different methodological assumptions about what counts as an advertising effect.

What is incrementality and why does it matter?+

Incrementality measures the additional business outcome caused by advertising, beyond what would have happened anyway. It is the gold standard of effectiveness measurement because it answers the counterfactual question: "What would have happened if we had not run this campaign?" Without incrementality measurement, organizations risk paying for conversions that would have occurred organically, particularly acute in branded search, where attribution models routinely overestimate impact by 200 to 300%.

What is the MESI framework?+

MESI stands for Model, Experiment, Simulate, Implement, a structured approach to advertising effectiveness. Start with a model (typically MMM) to map what matters based on historical data. Use the model to identify central decisions that lack evidence. Design experiments to discover new causal insights. Simulate the impact of proposed changes using combined evidence. Implement the best options, then validate with continued modeling and testing. MESI creates a continuous learning loop rather than treating measurement as a one-off project.

How much should we invest in effectiveness measurement?+

Most advertisers invest 1 to 5% of their media budget in measurement of some type. Research suggests this investment can improve advertising returns by 15 to 20% or more. The question is not whether to invest, but where to focus. A Learning Agenda helps prioritize: place the highest burden of proof on the riskiest decisions (strategic budget allocation), and accept lower evidence standards for tactical optimizations where the cost of being wrong is small.

What is a Learning Agenda and how does it differ from a research plan?+

A Learning Agenda is a structured program focused on filling critical knowledge gaps that support the marketing plan. Unlike a research plan, which manages analytics assets, dashboards, and vendor contracts, a Learning Agenda focuses on the central information that changes minds and shapes decisions. It is a commitment to experimentation, innovation, and change. It puts marketers in control of effectiveness investment rather than being led by the technical capabilities of individual vendors.

Can Bayesian methods really improve MMM?+

Yes, substantively. Bayesian MMM incorporates prior knowledge, known constraints about how media works, such as the impossibility of negative returns from TV spend, and produces probability distributions rather than point estimates. This makes models more robust with limited data, better at quantifying uncertainty, and better suited to budget optimization. However, Bayesian priors can also be misused to fix results. Sensitivity testing and experimental validation are essential regardless of methodology.

Should we build effectiveness capabilities internally or outsource them?+

The answer depends on scale and ambition. For most organizations, the optimal model combines an internal effectiveness lead who owns the Learning Agenda with external specialist partners for complex modeling and experimental design. The internal lead ensures results are integrated into planning, challenges vendor assumptions, and maintains organizational learning. Pure outsourcing risks treating effectiveness as a deliverable rather than a capability. Pure insourcing requires specialist recruitment that is unrealistic for all but the largest advertisers.

How do we measure long-term advertising effects?+

Long-term effects have two dimensions: duration (how long advertising works, typically half the impact occurs within three months, half between three and eighteen months) and breadth (how advertising creates value beyond direct sales, including price elasticity, distribution, and competitive defense). Standard MMM captures only short-term effects. Measuring the full return requires combining MMM with industry benchmarks, brand tracking, and simulation tools that explicitly model long-term value creation. For many brands, what is easily measurable is literally only half the story.

What role should attribution play in a modern measurement stack?+

Attribution has the most limited use case of the three core techniques. Last-click attribution should rarely be used for anything beyond basic reporting. Multi-touch attribution can rank tactics within a platform or channel (such as keywords within search), but should never be used for cross-channel budget allocation. The appropriate hierarchy is: experiments for causal validation, MMM for cross-media budget allocation, and attribution only for intra-channel tactical ranking, and even then, calibrated against incrementality tests.