Quick Start Tutorial for a Standalone Data Lake


Download Quick Start Tutorial for a Standalone Data Lake


Preview text

PUBLIC SAP HANA Cloud, Data Lake Document Version: 1.0.0 – 2022-06-21
Quick Start Tutorial for a Standalone Data Lake
THE BEST RUN

© 2022 SAP SE or an SAP affiliate company. All rights reserved.

Content

1

Quick Start Tutorial for a Standalone Data Lake. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2

Prerequisite: Set Up the Tutorial Components. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1 Create a Data Lake Instance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Add a Client Trust and Authorization to Data Lake Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7

2.3 Download Sample TPCH Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.4 Review the Data Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.5 Add Data to Data Lake Files Using the SAP HANA Database Explorer. . . . . . . . . . . . . . . . . . . . . . . . 11

2.6 Create a New User. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15

2.7 Connect as the New User. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3

Tutorial: Use SAP HANA Cloud, Data Lake. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19

3.1 Create Tables in Data Lake Relational Engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2 Load Data from Data Lake Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21

3.3 Query Data Lake Relational Engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23

4

Tutorial: Use SQL on Files for Data Lake Relational Engine . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.1 Create a Remote Server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.2 Create a Schema. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.3 Create SQL on Files Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28

4.4 Add Datasources to the SQL on Files Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.5 Create Virtual Tables in Data Lake Relational Engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .31

4.6 Query the Virtual Table in Data Lake Relational Engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2

PUBLIC

Quick Start Tutorial for a Standalone Data Lake Content

1 Quick Start Tutorial for a Standalone Data Lake
Experience using data lake Relational Engine, data lake Files, and SQL on Files for SAP HANA Cloud, data lake. SAP HANA Cloud, data lake is composed of two key components: the data lake Relational Engine component, and the default data lake Files component.

Prerequisites
A global account and subaccount exist following the data lake specifications outlined in the SAP Business Technology Platform Cockpit onboarding documentation. These specifications are required to use data lake in a Cloud Foundry space.
Tutorial Overview
1. Prerequisite: Set Up the Tutorial Components [page 4] Create a data lake instance with the data lake Relational Engine component enabled, set up data lake Files for your data lake instance, and load data into data lake Files.
2. Tutorial: Use SAP HANA Cloud, Data Lake [page 19] Load data into data lake Relational Engine tables from data lake Files or a hyperscaler, and query the data.
3. Tutorial: Use SQL on Files for Data Lake Relational Engine [page 25] Setup SQL on Files to query data from data lake Files.

Quick Start Tutorial for a Standalone Data Lake Quick Start Tutorial for a Standalone Data Lake

PUBLIC

3

2 Prerequisite: Set Up the Tutorial Components
Set up the components required to do the tutorials: Learn how to create a data lake instance with data lake Relational Engine enabled, set up data lake Files for your data lake instance, load data into data lake Files, and create a new user.
1. Create a Data Lake Instance [page 4] Use SAP HANA Cloud Central to create a data lake instance with the data lake Relational Engine component enabled.
2. Add a Client Trust and Authorization to Data Lake Files [page 7] Create a certificate to identify you as a trusted user for the data lake Files component of your data lake instance, and an authorization to allow you to manipulate the TPCH data that will be stored in data lake Files.
3. Download Sample TPCH Data [page 9] Download the sample tutorial data from GitHub.
4. Review the Data Structure [page 10] Review the sample tutorial data to familiarize yourself with the columns and data types in each file.
5. Add Data to Data Lake Files Using the SAP HANA Database Explorer [page 11] Add a data lake Files instance to the SAP HANA Database Explorer and upload TPCH files to the instance.
6. Create a New User [page 15] Create a new data lake user and grant this user the appropriate permissions to be able to create SQL on Files tables and run queries.
7. Connect as the New User [page 17] Connect as the new data lake user tutorial_user in SAP HANA Database Explorer.

2.1 Create a Data Lake Instance
Use SAP HANA Cloud Central to create a data lake instance with the data lake Relational Engine component enabled.
Context
In this task, you'll learn how to create a data lake instance from the SAP Business Technology Platform Cockpit.

4

PUBLIC

Quick Start Tutorial for a Standalone Data Lake Prerequisite: Set Up the Tutorial Components

Procedure
1. In SAP Business Technology Platform Cockpit, navigate to your Cloud Foundry space and click Create.
2. Choose SAP HANA Cloud, Data Lake and then click Next Step.

3. Specify the details for your data lake instance: a. For Location, select the Organization and Space where you want to create your tutorial instance. b. The Paid Tier license is automatically selected. c. For Basics, give the instance an Instance Name and optional Description. Enter DL_RE_TUTORIAL as the Instance Name and then click Next Step.
The instance name must start and end with alphanumeric characters and can include "-", "_", and "." characters. 4. For Allowed connections, select Allow all IP addresses and then click Next Step. 5. Specify data lake Relational Engine settings for your data lake instance: a. The Data Lake Relational Engine component for your data lake instance is automatically enabled.
b. For Credentials, enter a password for the HDLADMIN user. Confirm the password by re-entering it. Make note of this password, since you'll need it later in this tutorial.
The HDLADMIN user is the default user created when a new data lake Relational Engine database is created. This administrator account is automatically created with the instance.
 Note
The password must have at least 8 characters and comprise at least one uppercase letter, two lowercase letters, and at least one number. The password must not include the user name, the characters ' " ` \ ; [ ], or control characters, such as newline, backspace, tab.
c. For Size, accept the default values for Coordinator, Workers, and Storage. d. For Storage Service, accept the default Storage Service Type and then click Next Step. 6. Specify advanced settings for your data lake instance: a. For Initialization Mode, select Configure to be most compatible with SAP IQ.

Quick Start Tutorial for a Standalone Data Lake Prerequisite: Set Up the Tutorial Components

PUBLIC

5

b. For General Options, accept the default values for Collation, Case Sensitive, and Blank Padding. c. For NChar Options, accept the default values for NChar Collation and NChar Case Sensitivity. d. The Backup Database Automatically option for your data lake instance is automatically enabled. 7. Click Review and Create. 8. Review the new instance settings, and then click Create Instance.
Results
A data lake instance is created and appears in the list of SAP HANA Cloud Central instances. You can monitor the progress of the creation process on the overview page.
 Note
Instance creation can take some time to complete, depending on your environment. Instance creation is finished when the status displays Running. Click Refresh  to update the status of the instance.
Next Steps
You’re now ready to connect to your data lake Relational Engine instance so you can begin using the instance. You'll need SAP HANA Cloud Central in the next step of this tutorial, so don't close the browser tab.
Task overview: Prerequisite: Set Up the Tutorial Components [page 4] Next task: Add a Client Trust and Authorization to Data Lake Files [page 7]
Related Information
Creating Data Lake Instances

6

PUBLIC

Quick Start Tutorial for a Standalone Data Lake Prerequisite: Set Up the Tutorial Components

2.2 Add a Client Trust and Authorization to Data Lake Files
Create a certificate to identify you as a trusted user for the data lake Files component of your data lake instance, and an authorization to allow you to manipulate the TPCH data that will be stored in data lake Files.

Prerequisites
● You've completed all previous tasks in this tutorial. ● You have one of the following:
○ A valid certificate. ○ OpenSSL installed.

Context
In this task, you'll learn how to add trusts and authorizations to data lake Files. A trust configures which certificate authorities and intermediaries your data lake Files should use when reached by a client. An authorization assigns roles with privileges to trusted clients in data lake Files using patterns. Authorizations allow you to define what actions specific clients are allowed to do once their trust is validated. Authorizations are given by matching the client's subject to a pattern and assigning roles to that client.

Procedure
1. If you have a valid certificate, skip to Step 2. Otherwise, create one using OpenSSL. a. Execute the following command in a terminal on your platform:
 Tip
Click the  icon to quickly copy any command.
openssl req -new -newkey rsa:4096 -days 3650 -nodes -x509 -subj "/C=/ST=/L=/O=/CN=tutorialuser" -keyout newkey.pem -out newcert.pem 2. Navigate to data lake Files for your data lake instance. a. Open SAP HANA Cloud Central. b. From your data lake instance, open the Actions menu. c. Select Manage Configuration. d. On the configurations page, select Edit. 3. Create a trust for data lake Files.

Quick Start Tutorial for a Standalone Data Lake Prerequisite: Set Up the Tutorial Components

PUBLIC

7

a. On the configurations page, navigate to the Data Lake Files section. b. In the Trusts section, select Add.
c. Create an alias for your certificate. d. Add your certificate by either uploading the file or pasting the certificate string into the text box. e. Apply your changes. 4. Create an authorization for data lake Files. a. In the Authorizations section, select Add. b. Rank the authorization as 1. c. Select the user role in the drop-down menu to add the role to your authorization. d. In the Pattern text box, enter the pattern for your certificate.
For information on patterns, see Manage Authorizations in Data Lake Files. If you created a certificate following the instructions in Step 1, use the pattern CN=tutorialuser.
5. Save your changes. Task overview: Prerequisite: Set Up the Tutorial Components [page 4] Previous task: Create a Data Lake Instance [page 4] Next task: Download Sample TPCH Data [page 9]
Related Information
Managing Data Lake Files

8

PUBLIC

Quick Start Tutorial for a Standalone Data Lake Prerequisite: Set Up the Tutorial Components

2.3 Download Sample TPCH Data
Download the sample tutorial data from GitHub.
Context
In this task, you'll download the sample data that will be used throughout these tutorials. You don't need a GitHub account to download this data. The sample data is a subset of the database by TPC. TPCH is a publicly available, business-oriented database with industry-wide relevance.
Procedure
1. Go to the public SAP GitHub sample data repository: hana-cloud-relational-data-lake-onboarding . 2. Open the TPCH folder and select one of the following four files.
○ customer.tbl ○ nation.tbl ○ region.tbl ○ supplier.tbl 3. Click Raw.
4. Using the settings in your browser, save the raw data to your machine.
 Note
Ensure the file saves to your machine as the .tbl file type. Remove any additional file types that are appended to the filename shown in Step 2. For example, if the file type saves as xxx.tbl.txt, remove the appended .txt from the filename.

5. Repeat steps 2 to 4 to download the remaining files. Task overview: Prerequisite: Set Up the Tutorial Components [page 4] Previous task: Add a Client Trust and Authorization to Data Lake Files [page 7]
Quick Start Tutorial for a Standalone Data Lake Prerequisite: Set Up the Tutorial Components

PUBLIC

9

Next task: Review the Data Structure [page 10]

2.4 Review the Data Structure
Review the sample tutorial data to familiarize yourself with the columns and data types in each file.
Context
In a real-world scenario, you would already be familiar with the data you want to move into data lake. For the purposes of this tutorial, review the provided data set to familiarize yourself with the contents in order to identify: ● The names of the tables you'll create. ● The number of columns and the column names. ● The information in each column and its data type. The table structure you define in the CREATE TABLE statements is based on the structure of this sample data.

Procedure
1. Open the location on your machine where you saved the TPCH files. Open one of the four files using your preferred text viewing application. ○ customer.tbl ○ nation.tbl ○ region.tbl ○ supplier.tbl
2. Analyze the structure of the table so you understand the column names and data types for your upcoming CREATE TABLE and LOAD TABLE statements.
3. Repeat steps 1 and 2 for the remaining files.
Task overview: Prerequisite: Set Up the Tutorial Components [page 4]
Previous task: Download Sample TPCH Data [page 9]
Next task: Add Data to Data Lake Files Using the SAP HANA Database Explorer [page 11]

10

PUBLIC

Quick Start Tutorial for a Standalone Data Lake Prerequisite: Set Up the Tutorial Components

Preparing to load PDF file. please wait...

0 of 0
100%
Quick Start Tutorial for a Standalone Data Lake