Enterprises are relying more and more on Software as a Service (SaaS) applications to run their business processes. Sooner or later, the data held in SaaS applications needs to be integrated into an enterprise data warehouse, so it can be combined with data from other sources, enriched, and prepared for analytics and machine learning use cases. The interface to connect to SaaS applications is the API (Application Programming Interface). They act as a data source for ELT (Extract, Load, Transfer) processes into a data warehouse.
Configuring API calls correctly for ELT can be challenging, especially when performing delta loads to load only new or changed data. I always like to start in Postman to ensure that the endpoint is in fact working and returning data as expected. In this blog, I will discuss how to use Postman to connect to and test the APIs used for data extraction so that it is easier to integrate them into ELT processes. Although this blog will focus mainly on the SAP C4C OData API, the same principles will apply across many different APIs. For some background, an OData service is a web-based endpoint that exposes data in a standardised format using a RESTful API, making it straightforward to consume by different applications using standard HTTP methods. If you would like to find out more about OData and how it differs to REST APIs, you will find a helpful blog post here.
Postman – the easy way of testing APIs
Postman is a powerful development tool for building and testing APIs with an intuitive interface for building and managing API requests. Many different properties can be included in requests such as a request body, headers and different authentication types for testing all number of different APIs. Furthermore, responses can be visualised from within the tool to see what APIs return as a result of your requests. For help getting started, I suggest visiting the Postman documentation here: https://learning.postman.com/docs/getting-started/overview/
Getting started in Postman
Firstly, create a new GET Request in Postman and enter your API’s URL. This usually points to the API object or table that you are loading; I will be using the RegisteredProductCollection C4C collection (table) throughout this blog. The API will likely require some form of authentication which will need to be configured. If your API service accepts basic authentication, select this in the ‘Authentication’ tab and ente the username and password. Other authentication methods include API keys, bearer tokens, and OAuth. If unsure, please consult your specific API’s documentation for how best to authenticate to the API.
Sending this request should return a page (1000 rows is default for the C4C API) of data in the response pane. If you receive a response, then congratulations, your API is returning data! However, it is very important to check that the returned data is in the expected format.
Using $count and $filter options to speed up your process
Another useful request I often make is a count of the entire table by adding ‘/$count’ to the end of the request. This helps to identify whether my ELT pipeline is indeed loading the correct amount of rows.
Next, we’ll apply a delta filter to only bring back records where the ‘EntityLastChangedOn’ field is greater than, for example, 2023-03-26 14:00:00. To add a filter, head to the ‘Params’ tab, add the key ‘$filter’ and the value “EntityLastChangedOn gt datetimeoffset’2023-03-26T14:00:00.0Z'” (without the enclosing double-quotes) as seen below. Note, if following on from the previous count request, you’ll need to remove the ‘/$count’ from the end of the request. You will need to consult your specific API’s documentation on how to correctly apply filters in your request.
Lastly, we can combine the count and the filter requests to get a count of the records that match the filter by adding ‘/$count’ into the URL after the Collection name and before the ‘$filter…’ as seen below.
With these useful API requests, you can now test and debug whether your APIs have been correctly integrated into your ELT pipelines. I hope you found this useful, and please feel free to reach out to me on LinkedIn here if you have any further questions.
For the full C4C OData documentation, see their GitHub page here where you can learn about row limits ($top), offsets ($skip) and more advanced filtering.