What Is a Semantic Layer? And Why AI Analytics Is Unreliable Without One

A semantic layer is the part of a data system that holds what your numbers mean. It stores the official definition of each metric (what counts as an "active user"), how your tables relate (which user row joins to which order), and which source is the source of truth when two tables disagree. It sits between raw database tables and the people or tools asking questions, so everyone counts the same thing the same way.
Key takeaways
- A semantic layer is the official, shared definition of your metrics, table relationships, and source of truth. It turns raw tables into business concepts a person or an AI can use without guessing.
- Without a semantic layer, an AI analytics tool guesses what your terms mean. It will write a query that runs cleanly, returns a confident number, and is measuring the wrong thing.
- A good semantic layer encodes three things: what each metric means, how the tables join, and which table wins when sources conflict.
- This is why Sundial runs its agents on a context layer instead of letting a model improvise. The agent looks up the real definition before it answers, then shows its work and a confidence signal.
What is a semantic layer?
The simplest way to picture a semantic layer is a shared dictionary that sits on top of your raw database. Raw tables are just columns and rows: a users table, an events table, a subscriptions table. None of them say what "active user" means or which date column to trust. A semantic layer adds that meaning. It says "active user means someone who took a core action in the last 28 days," it says "join the order to the user on user_id," and it says "when finance and product both have a revenue table, finance is truth."
A semantic layer is not a dashboard and not a chart. It is the governed business logic that dashboards, AI agents, and analysts all read so they interpret the same data the same way.
Here is what happens without one. Our founders saw it at Meta: before metrics were standardized, teams had three different versions of "monthly active users" and five definitions of "session." Every team built its own dashboards and its own definitions, so the same question got different answers depending on whose chart you opened. Meetings turned into arguments about which number was right instead of what to do. The fix was a canonical set of a few hundred metrics, each with one definition, linked through the equations that govern the business. After that, everyone spoke the same metric language.
Semantic layer vs. data warehouse vs. BI tool
These three sit in a stack, and the semantic layer is the middle that most teams skip. A data warehouse (Snowflake, BigQuery, Redshift) stores the raw tables. A BI tool (Looker, Tableau, Power BI) draws the charts. The semantic layer is the meaning in between: the layer that says what a metric is before any chart gets drawn from it.
You may already know this layer by a product name. dbt's Semantic Layer, Looker's LookML, Cube, and MetricFlow are all ways to define metrics once and reuse them everywhere. They differ in detail, but the job is the same: stop every tool and team from redefining "revenue" on its own. The warehouse holds the data, the BI tool shows it, the semantic layer decides what it means.
Why does AI analytics need a semantic layer?
An AI analytics tool with no semantic layer does not know what your words mean, so it fills the gap with the most common pattern it saw in training. Ask it for "active users this month" and it has to guess: active how? Logged in, or took an action? This month meaning the calendar month, or the trailing 30 days? Which of your three user tables? The model picks the most statistically typical answer and presents it as the obvious one, without telling you it guessed.
The danger is not that the query fails. The danger is that it runs. A model can write SQL that executes cleanly, returns a precise-looking number, and is counting the wrong thing entirely. A wrong answer delivered with confidence is more dangerous than a slow one, because someone makes a decision on it before anyone catches the mistake.
An AI analytics tool with no governed definitions gives you a number with no way to tell if it is right. It can write a query that runs cleanly, returns a confident number, and is measuring the wrong thing.
This is the same failure mode as feeding an AI a single metric with no context. Tell a model "DAU/MAU is 80%" and nothing else and it does not know if the product has a hundred users or a hundred million, whether it is a game or a banking app. So it defaults to the typical case and builds its analysis on an assumption it never states. A semantic layer is how you stop the guessing at the source: the meaning is written down, so the agent looks it up instead of inventing it.
What does a good semantic layer include?
A semantic layer earns its keep by encoding three specific things, not by being clever. Each one closes a gap where an AI tool would otherwise guess wrong.
- Metric definitions. One definition, one owner, one source of truth per metric. "Revenue" is net of refunds and excludes test accounts. "New user" means registered and then took a first action. When the definition lives in the layer, the answer does not change based on who asked.
- Table relationships. How the entities connect: which key joins a user to an order, how a subscription rolls up to an account, which events feed which outcome. This is the difference between a query that double-counts and one that does not.
- Source of truth. When two tables hold the same fact and disagree, the layer says which one wins. Finance owns revenue, product owns engagement. No more two numbers in one meeting.
The reason to keep this set small and stable is the same reason it works. In our experience, most businesses run on fewer than 100 metrics that explain about 90% of what is happening. The rest live in the long tail, useful for one-off investigations. A tight, well-defined core is something a person trusts and an AI can reason over. A swamp of two thousand undocumented metrics is neither.
How Sundial uses the semantic layer instead of guessing
Sundial puts an AI agent between you and your data, and that agent runs on a context layer rather than improvising. The context layer is the semantic layer plus the rest of the business knowledge the agent needs: not just metric definitions and joins, but which source is truth and the relationships between concepts. You ask a question in plain language. Before the agent answers, it looks up the real definition of each metric, how the tables relate, and which source is truth. Then it plans the investigation, runs the queries itself, checks its own work, and hands back an answer with the reasoning attached.
What the agent can do depends on who is using it. For business users, the data consumers, it is read-only by default: they ask questions and get answers and cannot change the data. For data practitioners, the same agent does more. It is the one tool that can both help model the tables and run data-quality checks, so the practitioners build and maintain the context layer that the consumers rely on.
The work splits across four agents, each doing a job a human analyst does without thinking. Quality checks whether the underlying data is even right, fresh, and complete. Modeling holds what the metrics and entities mean, the semantic layer itself. Analysis runs the chain of queries and reasoning that gets to "why," following an analytics playbook, an encoded method for the question. Storytelling turns the result into something a decision-maker can act on.
This does not remove the human, but it does change the job, and a team likely needs fewer analysts than before. The agent can handle much of what analysts used to spend their days on: the repetitive pulls and the first-pass investigation. So the role shifts from reactively answering query requests to architecting the context, defining the metrics, the relationships, and the source of truth, so everyone's questions get high-quality answers. Humans stay in the loop on judgment and the highest-stakes calls.
Because the answer is grounded in your definitions and not the model's best guess, you can check it. Sundial shows its work, the steps and the queries, so a data team can audit how it got there. It gives a confidence signal, so a decision-maker knows a solid answer from a rough estimate. And it leaves an audit trail. Gamma's co-founder put it plainly: "We trust what's on Sundial more than we trust wading through our own tables."
When do you need a semantic layer?
You need one the moment more than one person or tool can ask the same question and get a different answer. The concrete triggers:
- Two teams define the same metric differently, so the CEO gets two numbers for "active customers" in one meeting.
- You are rolling out AI or text-to-SQL on top of the warehouse, and you cannot have it guess what your terms mean.
- You have duplicate dashboards that disagree, and nobody is sure which one to trust.
- You are opening self-serve analytics to non-analysts who do not know which table or definition is correct.
You can skip it when the stakes are low: a tiny team, one source table, no self-serve, and only exploratory questions. If one person writes every query and remembers every definition, the dictionary is in their head and that is fine for now. The cost shows up when that person leaves, or when a second team starts counting on its own.
The semantic layer is what moves you from "which number is right" to "why did it change"
A semantic layer changes the question your team asks. When everyone counts the same way, you stop relitigating definitions and start investigating causes. Meta's teams stopped asking "which number is right" and started asking "why did this number change, and what should we do." OpenAI ran the same investigation Sundial automates, tracing a conversion drop to a country, platform, or segment, in seconds instead of the one to two days of analyst effort it used to take. The change was not a faster chart. It was that the definitions underneath were shared and trusted, so the work moved straight to the answer.
Common questions
Is a semantic layer the same as a data warehouse? No. The warehouse stores the raw tables. The semantic layer stores what those tables mean: the metric definitions, the joins, and which source is truth. They sit in the same stack, one on top of the other.
Does a semantic layer replace BI dashboards? No. Dashboards still draw the charts, and the semantic layer is the shared definition the dashboards read, so two charts of "revenue" agree instead of fighting. A dashboard is the easy, right place to work for a fixed KPI everyone watches. What is changing is that more of the work moves to asking your data directly, in Slack or Teams, the way you would ask a human analyst, instead of hunting for the right chart. And this is not a choice between a dashboard and an AI analyst: Sundial has dashboard capabilities too, so you get the fixed dashboard views and the ask-anything investigation in one place.
Do AI models still need a semantic layer? Yes, more than ever. A model with no governed definitions guesses what your terms mean and presents the guess as fact. The semantic layer gives it the real definition to look up, so the answer is grounded and checkable.
Who owns the semantic layer? Whoever owns each metric. The principle is one definition, one owner, one source of truth per metric, so when a definition changes there is a person accountable for it.
How many metrics should be in it? In our experience, fewer than 100 for most companies, which tends to cover about 90% of the questions a business asks. Keep the core small and stable; let the long tail live in scratch analyses.
If you want AI analytics you can actually trust, this is the foundation: the meaning of your data, written down once, so the agent looks it up instead of guessing. That is what we build at Sundial.