Upload files from CentOS to Azure Data Lake Gen2 with azcopy

In the following article, I tried to create TPC-H data, but since I uploaded the file from CentOS with azcopy, I will describe the method. Create a test environment using TPC-H (Synapse SQL pool)

1. Creating a storage account and container

First, create a storage account and container to upload.

1-1. Select a storage account in the Azure portal

image.png

1-2. Addition of storage account

image.png

1-3. Add necessary information on the basic tab

In this example, all but resource group, storage account name, and replication are specified by default.

1-4. Select the required information on the Network tab

In the network settings, this time I created it without changing anything with the default. image.png

1-5. Select data protection

Nothing is set this time. image.png

1-6. Select Data Lake Storage Gen2 on the Details tab

image.png

1-7. Create a storage account by checking and creating

image.png

Click the create button to create a storage account.

1-8. Creating a container

After the storage account is created, create the container from the created storage account. image.png Select + Container from the screen below. image.png I created a container called azcopytest. image.png

2. Storage account access control (IAM) settings

IAM settings are required to access Blobs. In addition, azcopy will be performed using the IAM information set here. If this setting is not made, an error such as 403 This request is not authorized to perform this operation using this permission. will occur during azcopy, and azcopy will not be possible.

2-1. Select the storage account you created earlier

image.png

2-2. Add role assignment from access control (IAM)

image.png

For the role, select the required permissions such as Storage Blob Data Co-Creator and specify the user to assign IAM. image.png

3. Introduced azcopy on CentOS

First, download azcopy with wget.

$ wget https://azcopyvnext.azureedge.net/release20200818/azcopy_linux_amd64_10.6.0.tar.gz

After downloading, unzip and move to the created directory.

$ tar xvfx azcopy_linux_amd64_10.6.0.tar.gz
$ cd azcopy_linux_amd64_10.6.0

4. Log in with azcopy

You need to log in with azcopy before uploading the file with azcopy.

4-1. Confirmation of tenant ID

Confirm the tenant ID because you need to enter the tenant ID when logging in with azcopy. You can check the tenant ID from Azure AD. image.png

You can check it from Tenant Information after the screen transition. image.png

4-2. Login with azcopy

Log in from CentOS as follows.

$ ./azcopy login --tenant-id "<Tenant ID>"

When you run it, it will open a browser and you will be asked to enter the code from the specified URL, so open the browser and enter the code. image.png

When you enter the specified URL in the browser, the following screen will appear, so enter the code. image.png

If the login is successful, the message "succeeded" will be output as shown below. image.png

4. Upload to BLOB with azcopy

Upload to Blob using azcopy's copy command.

$ ./azcopy copy "Local file name" "https://<Storage account name>.blob.core.windows.net/<Container name>"

Also, if you want to upload multiple files, you can specify * etc.

$ ./azcopy copy "Local directory/*" "https://<Storage account name>.blob.core.windows.net/<Container name>"

5. Bonus

Once the upload is complete, you can load the data into the Azure Syanpase Analytics SQL pool using PolyBase etc. The method is included in another article, so please refer to it if you like. I tried to populate the Synapse SQL pool with PolyBase

Recommended Posts

Upload files from CentOS to Azure Data Lake Gen2 with azcopy
I tried upgrading from CentOS 6.5 to CentOS 7 with the upgrade tool
[Note] Download from S3, upload to S3
Upgrade from MYSQL5.7 to 8.0 on CentOS 6.7
Update MySQL from 5.7 to 8.0 with Docker