How to Build a Trusted Analytics Chatbot Using Generative BI

Chatbots are becoming increasingly popular with users globally. In fact, they’re so popular that ChatGPT alone receives over 2.5 billion prompts per day1. The development of generative AI (GenAI) created the concept of generative BI, referring to the application of GenAI in business intelligence use cases. Bringing business intelligence and AI together opens the possibility for users to ask natural-language questions and receive meaningful visual and textual insights in return (see Figure 1). To build a reliable system, however, well-structured data and an effective AI agent, which produces appropriate and reliable visualisations, need to be in place. In this article, we will look at how a chatbot is built to help business users find the answers they need and reduce manual work for data visualisation experts. This is an introductory article on the topic. 

Figure 1. A diagram of user interaction with an agent 

The process starts with establishing a well-constructed data mart. This step helps to reduce errors, improve performance, and increase confidence in the results. The data that the bot queries, should follow standard data-quality rules. 

If we take the following case study as an example: we are a data visualisation expert working with a record label company that has two years of data on global music sales, records, and artist databases (see Figure 2). We need to ensure the data are clean and complete. There should be no missing time periods or unexpected NULLs. Data types should be correctly assigned and consistent, and duplicate records should be removed. As an extra step, data can be aggregated to speed up querying. For instance, daily transactions can be summed into daily and monthly sales totals by artist_id. This makes it much easier for the agent to answer questions such as “What were our physical sales in the UK last quarter?” or to look ahead and “Make a prognosis for sales in the UK next quarter.” Column names should be meaningful and follow rules agreed by your team or organisation. 

Figure 2. ER Diagram of an example database of a record label company 

Next, consider building a strong ontology. In our day-to-day lives, we refer to things using different synonyms and expressions. To help computers understand us, we need an ontology. Using our example, we might ask, “What were our physical sales in the UK last quarter?”. Sales could refer to the “revenue” or the “physical_sales” column. To remedy this confusion, we can add instructions for the computer in a separate file with column definitions to help ensure consistent answers. For instance, we could say that sales are equal to revenue, and physical sales refer to music sales in physical formats such as CDs, vinyl records, and cassette tapes. In addition to a description of each table and column, the ontology can provide key metric calculations, as well as business terms and acronym definitions. A list of sample natural-language questions and their corresponding SQL queries can be created in a vector database. 

With a robust data mart and ontology, we can focus on the core of generative AI: the text-to-SQL agent. The text-to-SQL agent is powered by an LLM that translates business language into technical queries. The model does not generate SQL freely; instead, it is guided by the schema, ontology, and validation rules. A user prompt could look like this: 

“You are an agent created to interact with a SQL database. Understand each question in its business context and use the company’s approved data definitions and schema. Rely on the semantic layer to interpret metrics and filters. Ask for clarification when a request is ambiguous. Generate accurate and efficient SQL using authorized tables and best practices, and do not access restricted data. Review all queries and results for correctness and reasonableness and flag unusual outputs or uncertainty before output. Select clear and appropriate visualisations based on established data visualisation principles and provide concise explanations that highlight.” 

If we ask the question, “What were our sales in the UK last quarter?”, the agent will look up the keywords: 

  • Sales 
  • UK 
  • Quarter 

It will then create a SQL query: 

SELECT SUM(p.revenue) 

FROM fact_music_performance p 

LEFT JOIN dim_country c 

ON p.country_id = c.country_id 

LEFT JOIN dim_date d 

ON p.date_id = d.date_id 

WHERE c.country_name = “United Kingdom” 

AND d.year = “2025” 

AND d.quarter = “Q4” 

If previous users have already asked the question, or it is stored as an example question, the agent will retrieve the answer directly, ensuring a reliable answer every time. When users ask questions that involve predictions, the agent follows a different path. It identifies what type of prediction needs to be made and links to a pre-trained model or statistical forecasts. The agent then applies the same validation checks before producing the output. 

Before retrieving results, the agent validates the SQL code, checking syntax, table access, and sensitive data. The agent also applies rules (e.g., revenue can’t be negative, IDs can’t have spaces, dates must be current or past) and, if needed, rewrites and reruns queries. This validation ensures accuracy and security of responses. The agent further checks for unusual answers, like abnormally low quarterly sales, and can alert users.  

Before delivering an answer, the text-to-SQL agent chooses how to present the result, using rules and learned patterns. Time-related terms prompt a line chart; category-based queries like “Top 5 most streamed artists” suggest a bar chart. The agent also considers axis types, data ranges, and previous user preferences from chat history when selecting visuals. 

Creating an AI agent might seem daunting; however, with the right steps, it can be built quickly and produce efficient, correct results, saving companies time and effort. For business users, such a tool provides faster access to insights. For data visualisation analysts, less time is spent on manual tasks, and more time is spent on the business problem, as well as on developing their expertise in AI. 

References:
1. Roth, E. (2025). OpenAI says ChatGPT users send over 2.5 billion prompts every day. [online] The Verge. Available at: https://www.theverge.com/news/710867/openai-chatgpt-daily-prompts-2-billion.

    Making Sense of SAP: 16 Questions on Data, BDC, and Modern Analytics

    We’ve been hearing the same thing more and more in conversations with customers and organisations navigating SAP: “SAP data is messy”, “What’s going on with SAP Business Data Cloud?”, “It feels complicated”.

    At the same time, SAP has been introducing major changes to how SAP data is accessed, governed, and analysed, particularly with SAP Business Data Cloud (BDC), Datasphere, and tighter integration with platforms like Snowflake and Databricks. For many organisations, it’s not always clear what problem SAP is actually trying to solve, or how these pieces fit together.

    To unpack this, I sat down with Jorel Digman, our Head of New Business, to answer the 16 questions we’re most often asked about SAP.

    1. What do most people misunderstand about SAP data? 

    Most people assume SAP data is messy. In reality, it is highly structured. Its complexity simply reflects the complex business processes it supports. 

    2. Is that a technology problem or more of a legacy problem? 

    It’s primarily a problem of limited understanding. SAP contains tens of thousands of tables because it models diverse business processes. Within their own domain, people usually understand the data well, but cross‑domain understanding is often limited. 

    3. When clients say their SAP data is a mess, what are they actually dealing with? 

    They are usually referring to how SAP data appears in analytics or management reports. By the time data reaches those reports, it has been extracted, transformed, and often mixed with other sources, sometimes even manually manipulated in Excel. The mess is typically in the extraction and modelling, not the SAP source data itself. 

    4. How would you explain SAP Business Data Cloud to a finance or operations leader? 

    SAP Business Data Cloud provides SAP‑authored reporting models built on deep understanding of SAP processes. These models get organisations most of the way to meaningful reporting out of the box. BDC also introduces AI‑enabled analytics, allowing faster insights with prebuilt models and AI features. 

    5. What problem is SAP trying to solve with BDC? 

    Traditionally, analytics systems were separate from SAP and required overnight data extractions. BDC makes SAP data instantly available for analytics, dramatically reducing time and effort to move data and enabling near real‑time insights. 

    6. What doesn’t BDC try to do? 

    BDC is not designed to be a repository for all enterprise data. It focuses on SAP sources, while enabling easy integration with external big‑data platforms such as Databricks and Snowflake. 

    7. BDC isn’t built for a single destination, right? 

    Correct. BDC integrates seamlessly with popular cloud data platforms like Snowflake and Databricks, giving customers flexibility to use BDC alongside their preferred data cloud environments. 

    8. How should customers think about Snowflake vs. Databricks in relation to BDC? 

    The choice typically comes down to cost and in‑house skill sets. Teams strong in Python and code‑first engineering may prefer Databricks; teams focused on SQL‑based warehousing may prefer Snowflake. From a BDC standpoint, both are supported equally well. 

    9. Where does the SAP Datasphere fit into all of this? 

    Datasphere focuses on preparing SAP data for analytics, machine learning, and AI. Workloads may run in Snowflake or Databricks, but Datasphere governs and models the SAP data and makes it available to those workloads. 

    10. Does Datasphere matter more once AI or advanced analytics enter the picture? 

    Yes. AI requires high‑quality, trusted data. Because SAP data represents critical business processes, it is essential for AI initiatives. Datasphere provides well-modelled, accessible SAP data to support AI workloads effectively. 

    11. What does zero‑copy data change in practice? 

    Zero‑copy data allows workloads to access data without moving or duplicating it. It enables rapid provisioning of non‑production environments for development or testing, often in minutes, without the overhead of traditional data cloning. 

    12. If an organisation is still on SAP ECC, do they need to wait to use BDC? 

    No. BDC works with ECC as well as S/4HANA. While S/4HANA provides additional features, ECC customers can already benefit from BDC’s predefined models and data‑management capabilities. 

    13. If an organisation has shifted to S/4HANA, what changes for their data architecture? 

    S/4HANA combined with BDC creates a fully integrated, real‑time data platform. This enables near real‑time analytics, embedded AI capabilities, and tighter coupling between operational systems and analytics. 

    14. What’s the biggest myth to clear up? 

    The myth that SAP data is difficult. With the right expertise and accelerators, both those built into BDC and those provided by partners, working with SAP data can be efficient and highly effective. 

    15. What is one thing companies walk away with from Snap Analytics’ SAP BDC Readiness Workshop? 

    A clear roadmap showing where BDC will deliver business value. The workshop highlights opportunities to move beyond traditional BI reporting and begin leveraging AI‑enabled analytics supported by BDC’s accelerators and governance features. 

    16. Is the retirement of BW 7.5 a good opportunity to evaluate future direction? 

    Yes. BDC’s bridge functionality allows organisations to maintain their existing BW backend while modernising the front end through BDC. This enables a controlled, phased migration from BW to BDC without a disruptive big‑bang transition.