SAP Datasphere and Databricks: A Game-Changing Partnership (Part II)
In our last blog, we discussed various use cases for the partnership between SAP Datasphere and Databricks, including how data can be extracted from SAP Datasphere and utilized by Databricks. In this blog, I want to explore how we can consume data housed in Databricks with Datasphere.
The ‘Why’?
There are several reasons why you might want to extract data from Databricks using Datasphere. Databricks, with its best-in-class lakehouse, has the potential to store a wide variety of data. For example, companies may want to enrich data in Databricks with several different types of data. By using the approach I will describe below, we can bring this value back to Datasphere without replicating the enriched data. Similarly, companies may wish to run machine learning experiments or workflows in Databricks, and it is important that the results from these workflows can be sent back to Datasphere for analytics and reporting.
‘How’ can I get data from Databricks to SAP Datasphere?
SAP and Databricks through some clever innovations have made the integration process as easy as possible. To enable this connection, it is important that the SAP Data Provisioning Agent (DP Agent) is set up and the camelJDBC adapter has been configured. Once, configured we can then deploy a remote table within SAP Datasphere which points to our desired dataset in Databricks.
Firstly, to enable this connection we must configure an SQL Warehouse within Databricks. It is this warehouse that will act as the compute service whenever data is queried by SAP Datasphere. In this instance, I have opted for a serverless warehouse for simplicity.
Once, the warehouse has been created we can then write our data to the Databricks catalogue. Here I have output by the results of my machine learning model which predicts customer churn to a table conveniently called “customer_churn”.
Before, we leave Databricks – we need an additional piece of information to configure the connection between Datasphere and Databricks. Navigate to the warehouse you created in Step 1 >> Connection details and make note of the JDBC URL.
Now we can turn our attention to SAP Datasphere. The first step is to create the connection to Databricks. Connections >> Select your Desired Space >> Create. A new pop-up will appear. Enter the necessary information. The JDBC URL we collected in Step 3 can be pasted into the “JDBC URL” dialogue box. If you use a Databricks Developer Token to authenticate this connection the username should be set to “token” and the Developer Token should be added to the password field.
With the connection established, we can now jump over to the Data Builder area within SAP Datasphere. Select “Import Remote Table” from the import drop-down menu.
Select the Databricks connection we created in Step 4.
Select the table/s you wish to import from the Databricks Catalog.
Finally, select “Import and Deploy”:
Once the remote table has been successfully deployed you can view data from the remote table by selecting the “Data Preview” button. If the data is visible, you have successfully consumed data from Databricks using SAP Datasphere. Happy modelling!!!
SeaPark currently has a fully operational demo showing how this connection works, both from SAP Datasphere to Databricks and vice versa. This demo also highlights how machine learning can be implemented in Databricks to gain further insights into customer data (customer churn). For a sneak peek or assistance setting up your connection contact dearbhla.keenan@seaparkconsultancy.com.
Comments