Matillion releases SAP ODP and Anaplan connectors. This is why it matters.

Matillion has always been clear on its ambition to become the leading ‘data productivity cloud’ provider. It forms the backbone of many data warehouses for global enterprises. Matillion offers user-friendly low-code ETL capabilities and provides an open and flexible environment. This makes it a popular choice for small and medium enterprises, and for supporting departmental requirements in larger enterprises. To become the data integrator of choice for global data warehouse implementations, there was one challenge left: The ability to integrate SAP data at scale. With the introduction of the SAP ODP connector, Matillion now ticks this box. Releasing the Anaplan connector is the cherry on top.

Snap Analytics have worked very closely with Matillion on the development of the new SAP connector. I had the pleasure of working with the product development team and been involved with early testing. As part of the ‘Private Preview’ programme for the Anaplan connector, I ran a PoC for a global enterprise. I mention this here to make it clear that I am not a ‘neutral observer’. But then, one rarely is. The views I express here are my own and I have not been paid for this article.

The new way of connecting Matillion to SAP: The SAP ODP Connector

SAP ODP stands for Operational Data Provisioning. It’s SAP way of providing data to 3rd party data consumers, with rich metadata and context. There’s plenty of information on the internet about ODP but I’ll just refer you to one of my previous blog posts if you want to read up on this. The Matillion ODP connector uses a native SAP RFC connection to connect to ODP. The key benefits of using the Matillion SAP ODP connector for getting data out of SAP onto the data cloud are:

  • – There is no configuration required on the SAP side . You can immediately use all existing ODP data sources. This includes (but is not limited to) data extraction enabled CDS views, HANA Calculation Views and the SAP ‘BW’ extractors or S-API extractors)
  • – SAP handles deltas for delta-enabled data sources
  • – The data source includes metadata and is provided in context of a business transaction, instead of a technical view of the data
There are still other connectors available in Matillion to connect to SAP. Are they still relevant?

The SAP Netweaver Query connector is still useful to quickly connect to tables if there is no ODP datasource available. When getting the data this way you will get the raw table- and field names so you will have to do more work to prepare the data for consumption.

The generic database query connector lets you get data out of SAP in the same format as the Netweaver Query connector. You do need an enterprise license for the database underpinning your SAP system. The price tag for this is prohibitive for most customers.

In recent years, Snap Analytics have implemented several cloud data warehouse solutions for our customers using the Matillion OData component to get data out of SAP. This also leverages the ODP framework so it is a reasonable solution. It does require SAP configuration (NetWeaver Gateway). The OData connections seems less performant compared to RFC connections. As the ODP Connector has access to the same data sources as the OData connection (within the ODP framework) there is no use case anymore for the OData connector.

Screenshot of a Matillion job with four connection types to SAP. Use the ones with the SAP logo for the best experience.

Can the ODP connector cope with high volumes and near-real time requirements?

There are inherit limitations in the ODP framework. SAP did not design this framework with high volume/low latency scenarios in mind. Some vendors have developed proprietary code they deploy directly on the SAP system to replicate high volumes of SAP data in near-real time to a target environment. Products like SNP Glue, Theobald and Fivetran probably beat the ODP framework in terms of throughput performance. Whether or not you need these SAP specialist tools very much depends on your business requirements and data strategy. Bear in mind that you would still need a tool for data transformation. In many cases, adding an SAP specialist tool would increase the complexity of the overall solution.

Where does Anaplan fit in?

With Matillion it has always been easy to connect to a wide variety of data sources. Unfortunately, up to now, there was no predefined connector for Anaplan. Many enterprises use Anaplan as the planning application for ‘SAP’ data. You could get data out of Anaplan using Matillion, buy you had to create your own Python code for it and this was rather cumbersome.

Code snippet – connecting to Anaplan before Matillion had a standard connector.

Now, all you need is your Anaplan username and password (or Oauth token) and you can simply navigate to your view or export dataset using the dropdown options for the Anaplan Workspace/Model and so on.

Hooray – the Anaplan connector is here – simply use the drop-down menus to connect to your Export or View.

So these new connectors, why do they matter?

Business users need easy access to all relevant data in the business, regardless of where the data originates. Too often, it takes too long to connect a new source system to a data platform and business users will create a shadow IT solution instead. Connecting standard business applications to a data platform should be plug and play. With Matillion’s standard connectors this often is the case (there are nearly 100 connectors last time I counted). SAP and Anaplan are business critical applications for many enterprises and with these new connectors Matillion can now truly become the leading ‘data cloud provider’.  

Useful links

Unleash the Power of SAP Data With Matillion’s SAP ODP Connector

Anaplan Connector for Matillion: Next-Level Forecasts and Planning

4 ways to connect to SAP to Matillion

Credit: Title image by Freepik

Testing OData APIs for Data Acquisition in Postman

Enterprises are relying more and more on Software as a Service (SaaS) applications to run their business processes. Sooner or later, the data held in SaaS applications needs to be integrated into an enterprise data warehouse, so it can be combined with data from other sources, enriched, and prepared for analytics and machine learning use cases. The interface to connect to SaaS applications is the API (Application Programming Interface). They act as a data source for ELT (Extract, Load, Transfer) processes into a data warehouse.

Configuring API calls correctly for ELT can be challenging, especially when performing delta loads to load only new or changed data. I always like to start in Postman to ensure that the endpoint is in fact working and returning data as expected. In this blog, I will discuss how to use Postman to connect to and test the APIs used for data extraction so that it is easier to integrate them into ELT processes. Although this blog will focus mainly on the SAP C4C OData API, the same principles will apply across many different APIs. For some background, an OData service is a web-based endpoint that exposes data in a standardised format using a RESTful API, making it straightforward to consume by different applications using standard HTTP methods. If you would like to find out more about OData and how it differs to REST APIs, you will find a helpful blog post here.

Postman – the easy way of testing APIs

Postman is a powerful development tool for building and testing APIs with an intuitive interface for building and managing API requests. Many different properties can be included in requests such as a request body, headers and different authentication types for testing all number of different APIs. Furthermore, responses can be visualised from within the tool to see what APIs return as a result of your requests. For help getting started, I suggest visiting the Postman documentation here: https://learning.postman.com/docs/getting-started/overview/

Getting started in Postman

Firstly, create a new GET Request in Postman and enter your API’s URL. This usually points to the API object or table that you are loading; I will be using the RegisteredProductCollection C4C collection (table) throughout this blog. The API will likely require some form of authentication which will need to be configured. If your API service accepts basic authentication, select this in the ‘Authentication’ tab and ente the username and password. Other authentication methods include API keys, bearer tokens, and OAuth. If unsure, please consult your specific API’s documentation for how best to authenticate to the API.

Sending this request should return a page (1000 rows is default for the C4C API) of data in the response pane. If you receive a response, then congratulations, your API is returning data! However, it is very important to check that the returned data is in the expected format.

Using $count and $filter options to speed up your process

Another useful request I often make is a count of the entire table by adding ‘/$count’ to the end of the request. This helps to identify whether my ELT pipeline is indeed loading the correct amount of rows.

Next, we’ll apply a delta filter to only bring back records where the ‘EntityLastChangedOn’ field is greater than, for example, 2023-03-26 14:00:00. To add a filter, head to the ‘Params’ tab, add the key ‘$filter’ and the value “EntityLastChangedOn gt datetimeoffset’2023-03-26T14:00:00.0Z'” (without the enclosing double-quotes) as seen below. Note, if following on from the previous count request, you’ll need to remove the ‘/$count’ from the end of the request. You will need to consult your specific API’s documentation on how to correctly apply filters in your request.

Lastly, we can combine the count and the filter requests to get a count of the records that match the filter by adding ‘/$count’ into the URL after the Collection name and before the ‘$filter…’ as seen below.

With these useful API requests, you can now test and debug whether your APIs have been correctly integrated into your ELT pipelines. I hope you found this useful, and please feel free to reach out to me on LinkedIn here if you have any further questions.

For the full C4C OData documentation, see their GitHub page here where you can learn about row limits ($top), offsets ($skip) and more advanced filtering.

ChatGPT in the work environment

Chances are you have used or at least heard anecdotes of the now household name ChatGPT since its phenomenally popular release in November 2022. For those of you who might have missed it, here are some quick key facts:

  • – ChatGPT is a language model created by OpenAI, which is used for natural language processing tasks.
  • – It uses huge amounts of pre-training data such as books, articles, and websites which allows it to understand natural language patterns and generate realistic text with a high degree of accuracy.
  • – ChatGPT broke the headline news upon release, growing its use base to 1 million users within 5 days and 100 million within 2 months, as this article from the Guardian shows.
  • – ChatGPT-4 (the new version of ChatGPT) was released in March 2023 for subscribers only. It is a more advanced multimodal model which can use image as well as text input, and was trained on a much larger amount of data.

Working in industries like data analytics, the power and potential behind AI tools like ChatGPT sparks two simultaneous but contradictory thoughts:

  • – This is awesome! How can I utilise it in my workplace?
  • – … So, how long before a chatbot replaces me?

This blog will signpost a selection of useful ways to enlist ChatGPT in the workplace, and some considerations around the limitations or risks surrounding ChatGPT.

Increase Productivity

The catalogue of ideas for how we can use ChatGPT to the benefit of business is essentially limitless. The overarching theme in all use cases is that they allow employees to automate tasks, improve accuracy, work more efficiently, reduce expenses, and allow time to focus on and develop other areas of business.

Feedback from employees within our organisation has shed light on the benefits of ChatGPT in the workplace. Several have found ChatGPT has significantly reduced the time they spend writing code, stating that it often produces cleaner and better structured code that theirs. Others have noted the time saving impacts of ChatGPT, particularly in writing documentation as it allows you to feed in key information and ChatGPT produces outlines for documentation in half the original time. I have personally found ChatGPT to be most beneficial in stripping back and simplifying my written language. When communicating complicated technical jargon or processes, leveraging ChatGPT to remove unnecessary and flowery language can significantly improve communication with colleagues and customers alike.

Here are some more quick ideas of how ChatGPT can be utilised on a technical level:

  • – Debugging, writing, and explaining code.
  • – Adding comments to code, creating data dictionaries, and optimising queries.
  • – Providing insights on datasets.
  • – Writing or proof-reading documentation.

Business use cases more generally can include:

  • – Writing emails.
  • – Social media posts and content creation (quick creativity inspiration).
  • – Producing training material.
  • – Understanding customer feedback and making decisions based off this.
  • – Writing policy or strategy documentation succinctly and clearly.
Risks and Limitations

Before we get too carried away with its array of powerful features, it is important to remember that ChatGPT should be treated like a co-pilot, or peer review. Relying solely on ChatGPT can lead to the oversight of issues and associated complications.

Alongside its success, a host of security, privacy, and ethical concerns have been raised, alerting users to the potential shortcomings of using ChatGPT. This was particularly pertinent after a major privacy breach in March 2023 where several users’ personal chat histories were leaked. Moreover, in April 2023, Italy banned the use of ChatGPT stating that it violated several GDPR regulations; namely, misinformation about data collection and the absence of parental consent or age control systems.  

ChatGPT has been known to produce factual inaccuracies, and nonsensical, or even in some circumstances abusive and biased responses. Don’t forget to fact-check responses from ChatGPT as its knowledge is limited to September 2021 and it does not have access to current events unless it is fed this information. Beyond the workplace, schools and wider communities have the responsibility to guide people through using AI, which can easily be misused without proper education.

Final Verdict

The key take-away here is to soar high as a business with ChatGPT as your co-pilot. Take advantage of its powerful features to increase productivity and save time. And why stop at ChatGPT? There is a plethora of highly useful AI tools available which offer alternative features which may be more suitable for your business needs. But be mindful of what the tool is and is not appropriate for. Stay informed and curious about developments of AI more broadly, and how these systems handle your data. Finally, certainly don’t train it to write your blog posts.