Expanding SAP Systems with AWS - Part IV
Previously we have discussed:
· The Technical / Data Architecture and SAP OData Services
· Configuring a Site-to-Site VPN
· Configuring your SAP System for Extraction
This week we will discuss the simple steps involved in configuring AppFlow to ingest the data from your on-premise SAP system.
Creating a Connection
Before we can create a flow which transports the data from your SAP system to the cloud, a connection must be created. AWS AppFlow supports several pre-built connectors such as SAP OData, Zendesk and RedShift to name a few. The connector we wish to focus on in this blog is the SAP OData Connector. When you select to create a new SAP OData connection the pop-up displayed below will appear. In this pop-up, you can enter the details specific to your client system. Once you press “connect” AWS will run several pre-flight checks to ensure that a connection can be established between AWS AppFlow and your on-premise SAP system.
There are two points of interest in the pop-up displayed below. Firstly, we have PrivateLink which provides private connectivity between supported AWS Services (AppFlow) and your on-premise networks. In other words, via the use of PrivateLink, we can avoid exposing our SAP data to the public internet. The second point of interest is the authentication methods available to you. AWS AppFlow supports both basic authentication (username & password) and OAuth2.
Creating a Flow
This is where the magic happens. Flows are the tool responsible for ensuring your data gets from A to B while enforcing the necessary filters, transformations, and validation techniques during transit. In the AppFlow portal, we can select “Create Flow”. Upon selecting this option, the window displayed below will appear. Here is where you can give your flow a name, description, appropriate tags and customize the encryption settings.
Next is where we can configure the source and destination details. In the source details section, we can use the SAP OData Connection we created in the first part of this blog. The appropriate SAP OData Object and sub-objects can then be selected.
The destination section is slightly more involved. In this section, we must have a location which is suitable to house the ingested SAP data. For the purposes of this demonstration, the data will be stored in an Amazon S3 account. However, additional options are available such as Amazon Redshift, Amazon RDS for PostgreSQL and SAP OData.
In the destination section, you can also select whether you want to catalogue your data with AWS Glue Data Catalog. Selecting this option allows you to discover and access your data from AWS analytics and machine learning services. AWS AppFlow also allows you to specify your preferred file format (CSV, JSON or Parquet) within this section.
Now it is time to determine when your flow will run. When using the SAP OData connector, we have two options - either on-demand or scheduled. For the purposes of this blog series, I will be using “Run on demand”. However, the use of “Run flow on Schedule” allows you to use more advanced features of AppFlow such as incremental/delta transfers.
The next window allows you to map the fields within your data. This can be done manually or by uploading a .csv file containing the mapped fields. In this blog, I will be using the manual approach. This is relatively simple as we can ask AWS to “Map all fields directly”. Once you are happy with the mappings provided, you can move on to the next window.
Here is where we can add filters to your data flow. Only records that meet the filtering criteria will be transferred to AWS by Amazon AppFlow. For example, you may only wish to analyse customers that are based in Ireland. Here is the perfect place to configure this filter to avoid transferring more data to your destination than is required.
Lastly, we come to the review screen if everything checks out, we can select “Create Flow”. Your flow will now be created in Amazon AppFlow. To test this flow, we can select “Run Flow”, if everything is configured correctly your data flow should succeed and display the number of records transferred at the top of the screen.
This information can also be seen by navigating to the “Run History” tab:
Scheduling a Flow
Flows can also be scheduled, the process outlined above does not change. Instead of selecting “Run on demand” we select “Run flow on schedule”. The frequency, start & end times should all be provided along with the desired transfer mode:
· Full Transfer: Transfer all records every time a flow runs.
· Incremental Transfer: Transfer new or changed records via the use of a delta token.
For the purposes of this blog, I selected “Incremental Transfer”. By navigating to “Run History” we can see the impacts of this selection. In the image below you can see that the first flow transferred a total of 7854 records. But the following flows transferred zero records. This is because nothing has changed in the data source since it was last run.
We hope you have enjoyed the fourth blog in this series and follow along as we continue the process of helping you add value to your data housed in SAP.
If you or your colleagues have further questions or queries, please do not hesitate to contact us at firstname.lastname@example.org