Getting Data out of SAP C4C into Snowflake with Matillion using OData

The SAP Sales & Service Cloud (still widely referred to by its old name ‘SAP Cloud for Customer’, aka C4C) is SAP’s flagship application for managing customer interaction. For many enterprises, this is their primary source system for sales and services analytics. The C4C platform comes with no less than two integrated analytics solutions. Sooner or later, most sales managers and service managers have analytics requirements that go beyond the capability of the on-board analytics solutions. This happens when C4C data needs to be combined with data from other sources, or when C4C data needs to be made available for 3rd party tools, such as for machine learning or forecasting purposes. As C4C is a ‘Software as a Service’ (SaaS) platform (well, most of the time, anyway), the only way to get data out is by using the C4C APIs.

C4C comes with plenty of APIs, but not all of them are suitable for extracting data into a data platform. Furthermore, it can be a real challenge to get this configured correctly, particularly when building enterprise-grade data pipelines that require robust delta extraction logic. In this article, I will explore how to use the OData protocol to extract data from SAP C4C (in full or delta loads) and load it into Snowflake using Matillion.

For more about using OData to get data out of SAP, see my fellow colleague Jan van Ansem’s blog here.

OData & C4C

An OData service is a web-based endpoint that exposes data in a standardized format, making it straightforward to consume by different applications. It uses a RESTful API and can be accessed using standard HTTP methods. SAP C4C uses the OData protocol to expose its data for consumption by external applications, ideal for integrating into an ELT (Extract, Load, Transform) pipeline.

Firstly, it is important to distinguish between the two different object types that exist in C4C, those being Data Sources and Collections.

  • Data Sources are for connecting to and building reports on from within a BI tool as the OData API pre-aggregates data sent in its API responses. The OData API does not provide any server-side pagination capability for Data Sources and so these objects are not ideal for ELT data extraction. Note: client-side pagination is possible but not recommended, as outlined in the disadvantages section in this SAP help documentation Client-Side Pagination | SAP Help Portal.
  • Collections are the underlying data objects holding master and transactional data which are joined together to create the reporting Data Sources. Collections do have server-side pagination which makes them suitable for ELT data extraction via the OData API.

As a result, I recommend using collections rather than data sources for ELT data extraction. If you need to replicate the data structures of standard data sources then you can do so by using the collections and applying data transformations in your data warehouse.

Connection Testing

I always like to start in Postman to ensure that the API endpoint can be connected to and is returning data as expected. I suggest reading my previous blog here to understand how I use Postman to test and debug the OData APIs.

Matillion with OData

Matillion is an enterprise-grade and cloud-native ELT tool that connects to hundreds of source systems and APIs and loads data into your cloud data warehouse of choice. Matillion provides an OData extractor component which does most of the heavy lifting for us such as formatting the API URL with parameters and filters, as well as pagination logic.

Connecting in Matillion

In an OData component, add the base URL (i.e. up to but not beyond ‘…/c4codataapi’) to the ‘URL’ property in the OData component, as well as the basic authentication details (username and password). Note that we always recommend using a cloud key store integrated with Matillion to keep passwords securely. The URL format for the C4C OData service is as follows where the <myXXXXXX> is your C4C subdomain:

https://<myXXXXXX>.crm.ondemand.com/sap/c4c/odata/v1/c4codataapi/

Next, we can select a Collection in the ‘Data Source’ property which will then allow you to choose from a list of fields in the ‘Data Selection’ property.

Select the fields you’d like to load and head to the ‘Sample’ tab where you should be able to pull back some data using the ‘Data’ button. You can also get a row count of the table which can be compared against a row count obtained in Postman, which I walk through in my previous blog here.

Filtering in Advanced Mode

Next, we can set the ‘Basic/Advanced’ Mode property to ‘Advanced’ which lets us hand-write a SQL query which Matillion translates into an OData API request. This allows us to select specific fields and apply filters via a where clause.

The ability to write our own where clause allows us to perform dela logic when loading incrementally from C4C using a timestamp filter. However, at Snap we’ve had an issue where the OData component could not correctly use a Snowflake timestamp to do delta loading from C4C. To overcome this issue, we’ve needed to add two Connection Options to the OData component:

The first property will ensure that the component URL encodes spaces to %20 instead of + which is required by the OData API. The second property specifies the format for the timestamp that we will be passing into a where clause so that Matillion can understand it and use it correctly in the API call. Even though the DateTimeOffsetFormat is yyyy-MM-dd’T’hh:mm:ss.fffffffZ above, standard Snowflake timestamps can be passed in without needing to be modified e.g. WHERE “EntityLastChangedOn” > ‘2022-12-01 01:16:48.666’.

Once this is working correctly, this approach can be used to perform delta loads on tables to only load new or changed records, avoiding the need to load entire tables each time. This means your ELT pipelines can run faster and cheaper. I hope this article helps you with getting your data out of C4C. Please feel free to reach out to me on LinkedIn here if you have any further questions.

5 main challenges getting data out of SAP and how to overcome them

One of the most common questions I get from clients is “why is getting data out of SAP so hard? Isn’t it just another source system?”. After a while pondering over this question I thought I would list out the reasons based on our numerous projects getting data from SAP into cloud systems such as Snowflake.  Once I’d started I couldn’t stop. Here is my top 5 outlined below.

1. The data is complex

SAP systems are the nerve centres of global enterprises. Many business critical processes are managed and controlled with SAP systems. Consequently, SAP systems contain the most treasured information large organisations have – namely their financial and operational data. As a result, these systems are both complex in terms of the number of processes that they have but also in the volume of the data. A typical SAP system contains 100,000s tables and there are many complex relationships between the tables.

2. SAP systems and SAP data are heavily governed

As a system with the most crucial and sensitive data it’s only right that there is a lot of governance and processes in place to protect the data and the system itself. That means added time and complexity when trying to get the data out of SAP. This will often involve various SAP teams and stakeholders will need to be involved to ensure the correct access is given to enable you to get your data out of SAP, and operational processes are not jeapordized by the process of ‘getting the data out’.

3. SAP at its core is old technology

SAP is 50 years old this year. When SAP started, memory was precious, to the extent that table names and field names were abbreviated to four and six characters respectively. SAP was originally developed in Germany, so the abbreviations are German abbreviations. Over time, SAP have created several metadata layers which can sometimes help to get more descriptive names in a data model but when you look at a system today, at its core you still find the incomprehensible abbreviations. This is why you need to be a SAP functional expert to understand how to both get to and make sense of the data that you need.

4. Lots of different options and frameworks for ‘getting the data’

SAP systems have a wide variety of SAP specific  object types and is not immediately obvious to those not from an SAP background – extractors, ABAP reports, BADIs, iDocs, ADSOs, CompositeProviders etc.

Lastly, to be able to use things such as the ODP framework to be able to use SAP extractors then you will need to set up various things on the SAP system itself. This will often require the help of SAP basis teams to ensure that you are able to use delta extraction enabling incremental loading which is a must given the data volumes of some key sources of data such as GL line items. For more information on the SAP extraction options then please read this excellent blog from the SAP guru Jan van Ansem here.

5. Complex licensing model, which means options which are technically available may not be used within the license agreement.

Often customers think that the easiest way to extract the data will just be by connecting to the HANA database and replicating the data that they need. Whilst this is a relatively simple process there are licensing constraints that constrain most organisations from doing so. Those that have a HANA Runtime license (which is the majority of clients) are not able to extract from the database layer and can only extract from the application layer. SAP has been known to sue for some extremely large sums of money when their licensing constraints are broken by their clients, as Anheuser Busch found out to their peril.

Hopefully the above gives you a bit of an idea of why loading from SAP to cloud data platforms is not just the same as other source systems and why it’s imperative to have people that are both experts in SAP systems and cloud data platforms and architectures. Luckily at Snap we are a team of SAP data experts with a focus on modern data cloud technologies such as Snowflake and Matillion and have a range of accelerators to simplify the process of extracting your data from SAP systems. Please do reach out to us if you’re interested in maximising the value of your SAP data in the cloud.

Photo credit: Mitchell Luo on Unsplash

Lessons Learned from an Sustainability ESG Reporting Project

What is ESG and EPR?

Organizations in the UK involved in packaging supply or importation must now adhere to the ‘Extended Producer Responsibility’ regulation (EPR). This regulation carries significant weight as they are legally binding and not complying could cause serious brand damage.

The processes related to managing packaging fall within the realm of ‘Environmental and Social Governance’ (ESG). Having recently completed an ESG reporting project for a global food manufacturer, I thought it useful to share some lessons learned here.

Companies are expected to provide evidence regarding the recyclable and non-recyclable components of their packaging. This requirement has been in effect since 2023, and achieving automation in this process necessitates a verified enterprise data set and a suitable platform for generating the required outputs.

Managers responsible for this task may find it daunting. Manual execution of the work is excessively time-consuming, labour-intensive, and susceptible to errors.

The data is complex and requires subject matter experts throughout the project.

The data required is complex. It entails product master data and bills of materials (BOMs) typically stored in SAP ERP systems. Additional data may be necessary from other packaging specification databases. Multiple versions of BOMs may exist, and packaging specifications reside within the system, incorporating various fields related to weight and dimensions. Some packaging items are composite in nature, consisting of both plastic and cardboard, requiring separation in calculations. Addressing all these factors requires careful consideration and understanding in collaboration with business subject matter experts (SMEs) and data owners.

Requirements will change. Be prepared to adapt.

The reporting output requirements are still unclear and evolving. 2023 marks the inaugural year for the formal collection of EPR data and reporting, but the precise details of what data should be reported and how are yet to be finalized. However, certain agreed classifications include:

  • – Packaging activity: How the packaging is supplied.
  • – Packaging type: Whether the packaging is household or non-household.
  • – Packaging class: Whether the packaging is primary, secondary, shipment, or tertiary.
  • – Packaging material and weight.

This calls for a solution that can swiftly adapt to changes, likely necessitating a platform separated from the strict internal SAP change control process.

Traditional methods of piecing together reports are inadequate. Attempting to manipulate sales volume data at the BOM component level for all products across multiple sites using MS Excel often results in unwieldy and unmanageable files.   The ideal solution for this scenario involves leveraging a cloud-based Data Warehouse equipped with robust capabilities to handle substantial data volumes. It should be accompanied by an efficient ETL (Extract, Transform, Load) tool capable of seamlessly extracting and loading data from SAP and other databases. Additionally, a versatile toolkit enabling flexible manipulation of the data into reusable assets is crucial. To effectively present the data in diverse report formats, a data visualization tool would be essential.

Check the quality of your data.

Jumping directly to final report outputs will lead to frustration. It is crucial to comprehend the state of your data beforehand. Master data may be incomplete or inconsistent. Begin by creating master data reports that allow for comprehensive data analysis and filtering. Generate exception reports highlighting products with missing weight data, for instance. Correct any underlying data issues and then proceed to generate the required reports with specific calculations in a second phase.

Choose the appropriate technology stack.

At Snap, we possess extensive experience in extracting data from SAP environments  and combining SAP data with other sources into cloud data warehouses such as AWS Redshift, Snowflake, and Google BigQuery. We use the best of breed cloud data platform tools for managing the data warehouse processes in an effective way. We use modern BI platforms to provide actionable insights in the ESG context, using Matillion. We also have expertise in data visualization across various tools.

Find a partner with prior experience.

Collaborating with a team that has gone through this process before will expedite your work and minimize risks. ESG responsibilities encompass more than just EPR and often require similar data transformation projects.

I hope this article helps you with your ESG reporting. If you like to discuss your requirements and how to create a flexible reporting and analytics platform for ESG then please contact us at Snap Analytics.