Azure Data Factory and Copy Wizard – Hands on activity
In my previous post I tried to cover an overview of Azure Data Factory as well relevant bits & pieces like pipelines, activities, datasets and linked services, etc. We have been already gone through that Azure Data Factory permits us to manage the information in a feasible way to create, orchestrate, and monitor data pipelines over divers data sources.
In other words, you can connect to your on-premises SQL Server, Azure database, tables or blobs, and create data pipelines that will transform the data in a way which suits your requirements using Hive and Pig scripting, or custom C# processing, etc.
Next, here in this article we will execute some hands on activities on top of Azure cloud, like –
- Create a Storage Account and a Blob Container
- Create an Azure SQL Database
- Provisioning an Azure Data Factory
- Use the Azure Data Factory Copy Wizard to Copy the Data
Pre-requisites
Before moving ahead, we need some pre-requisites to accomplish this Azure Data Factory hands on activities on top of Azure Cloud.
- Azure subscription, if you don't have an account then sign up for a free Azure account - https://azure.microsoft.com/en-gb/free/
- Download the required lab files, essential to this task from the GitHub repository - https://github.com/rajendraazure/azure-data-factory-copy-wizard.
Create a Storage Account and a Blob Container
In this walk-through, you will provision a couple of Azure resources like Azure Storage account, Azure SQL Database instance, and Azure Data Factory. The motto behind in this exercise, you will upload file to Azure Blob Storage and using Azure Data Factory Copy Wizard, the same information will be loaded to Azure SQL Database.
Now onwards, create a Storage Account and a Blob Container where the source data in the data pipeline will be stored.
Login to the Azure portal https://portal.azure.com/
STEP – 1: Launch Storage Account
On the hub menu, click All services and in the Filter dialog type Storage.
Promptly the list of resource filters as you type; click the Storage accounts in the list when it is available.
STEP – 2: Submit details for a Storage Account
Post selection of Storage accounts, the dashboard of the Storage accounts will be loaded. Might be there would be another storage account exist there, but required to create a new storage account for this specific task.
On the Storage accounts menu, click Add, the Create Storage account blade will be loaded, prompting you to provide the specified details.
On the Basics tab, under Project Details, make sure the correct subscription is selected and then choose either an existing Resource group or can create a new using Create new.
I am going with earlier created resource group ‘raj-resource’.
Next, under Instance Details, enter a valid Storage account name and choose particular Location, I submitted the name as ‘adfdemostorageac’ and left the other defaults as it was. In brief following settings required to create an instance of storage account –
- Subscription – The name of your Azure subscription, it would be either free, paid or specific subscription etc.
- Resource group – The name of the resource group used to organize related resources, you can either create a new resource or choose an existing one.
- Storage account name – The name of storage account, a unique name.
- Location – The region where to store your storage account information.
- Performance – Simply go with default selection as Standard.
- Account kind – Again, go with default selection as StorageV2 (general purpose v2).
- Replication – Choose an appropriate replication strategy, I will go with Locally-redundant storage (LRS).
- Access tier (default) – Simply go with default selection as Hot.
STEP – 3: Create the Storage Account
Once you are sure about all above said details and configuration you provided are correct, go ahead and click the Review + create button that appears at the bottom of the Create storage account blade.
As soon as you click the Review + create button, the validation starts and post successful validation summarization will be appearing along with Create button in the Create storage account blade.
If you settled with the published briefs, then proceed by clicking the Create button to create the said storage account. It can take a few minutes for your storage account to be deployed, in the Azure Portal you can view the notification.
Sooner you will be notified once the storage account is created successfully.
Congratulation, Storage account is deployed!! 😊
STEP – 4: Add a container
Post successful creation of the storage account, required to browse and load its blade in the Azure portal. Here you can see all details and options under different sections like Essentials and Services.
Time to add a container under the storage account; click the Blobs, which will load the container blade.
Since, you do not have any containers so the list is empty, click + Container from the menu to launch the New container blade with following properties –
- Name – The name of the container.
- Public access level – Simply go ahead with default selection as Private (no anonymous access).
Click the OK button to proceed with creating the said container; notification will be appeared about the validation and deployment. Sooner you can see the new entry in the container list.
Congratulation, Container is created!! 😊
STEP – 5: Setup an Azure SQL Database
In this demo activity, the data pipeline will copy the source data to an Azure SQL Database. In fact, SQL databases are hosted on servers, so you will have to create both a database and a server to host it.
In the Microsoft Azure portal, click the + Create a resource from the Hub and select the SQL Database from the Popular tab which comes under New blade. Apart from this, you either can select the Databases from the Azure Marketplace tab or can search from the search box.
- Database name – The name of database.
- Subscription – The name of your Azure subscription, it would be either free, paid or specific subscription etc.
- Resource group – The name of the resource group used to organize related resources, you can either create a new resource or choose an existing one.
- Select source – Simply go ahead with default selection as Blank database.
STEP – 6: Configure an Azure SQL Server
Next, the property Server, here essentials to configure the server settings to create a new server with the following sets –
- Server name - The name of DB server (a unique name).
- Server admin login - Administrator login name as per your choice.
- Password - Strong password for the above said administrator.
- Confirm password – Confirm the same password.
- Location – Ideally, select the same location as your storage account.
- Allow azure services to access server - Go ahead with default selection as Selected.
- Advanced Data Security – Here the same default selection as Not now.
If you see all information are correct, then proceed by clicking the Select button. It will create the said DB server with defined properties. Once again, continue to provide rest properties details in the context of creating a SQL database under the SQL Database blade –
- Server – Now, it has configured with the created server name.
- Want to use SQL elastic pool – Since it is the demo task, so going with Not now, the default selection.
STEP – 7: Configure Pricing tier and Create SQL Database
Post submission of Server and SQL elastic pool, you can notice the Pricing tier, which is pre-selected with Standard version and DTUs etc. details. No doubt the default selection is enough for this demo activity.
Though If you need some different configuration , then can select the other one and click the Apply button from the Configure blade.
Once again, continue to provide rest properties details in the context of creating a SQL database under the SQL Database blade –
- Pricing tier - Now, it has configured with the selected configuration.
- Collation – Simply go ahead with the default selection as SQL_Latin1_General_CP1_CI_AS.
Now, almost all properties have been filled up; proceed by clicking the Create button to create the SQL Database. Post validation, it can take a few minutes depends on the configuration for your defined SQL database to be deployed.
In the Azure Portal, you can view the notification and sooner you will be notified once SQL database is created successfully. Once deployed, you can verify the newly created SQL database under the SQL databases blade as a new entry.
Congratulation, an Azure SQL Database is created!! 😊
STEP – 8: Create an Azure Data Factory
In the context of this demo Azure Data Factory task, the data stores are finalized, you can go ahead to create an Azure Data Factory.
In the Microsoft Azure portal, in the Hub menu, click + Create a resource and click the Analytics under the Azure Marketplace tab which will load all services. Click the Data Factory from the Featured tab to launch Data Factory blade.
Under the New data factory blade, essential to provide following parameter details about the data factory to proceed further –
- Name - The name Azure Data factory (a unique name).
- Subscription – The name of your Azure subscription, it would be either free, paid or specific subscription etc.
- Resource group – The name of the resource group used to organize related resources, you can either create a new resource or choose an existing one.
- Version - Select the version 1, though V2 available.
- Location - Ideally, select the same location you specified for your storage account (if it is not available, select any other location).
Post filling all properties details, time to click the Create button to create an Azure Data factory. Sooner after validation, it will process the deployment and might be can take a few minutes to be deployed.
Simultaneously, you can view the notification and sooner you will be notified once Data Factory is created successfully. Once deployed, you can verify the newly created Azure Data Factory through the All resources.
STEP – 9: Upload a Data File to the Blob Container
In the context of Azure Data Factory hands on activity, you will use the wizard to copy data from your Azure Blob store account to your Azure SQL database. Henceforth required a simple data file for the blob container.
Might be you have noticed a sample file in the pre-requisites section I had mentioned in the beginning, go to the said GitHub location and download the sample file - https://github.com/rajendraazure/azure-data-factory-copy-wizard.
You can review the data file, which consist of a few rows of dates and amounts. Now required to upload this file to the earlier created Blob Container. Move to blob container blade and click the Upload from the menu.
Make sure to fill the appropriate details during uploading the data file to the blob container as –
- Files – File name will be displayed once you browse the data file.
- Overwrite if files already exist – Select the box.
Under Advanced option, provide a few more details as –
- Authentication type – Simply go with default selection SAS.
- Blob type – No doubt, it would be Block blob as the default selection.
- Block size – Go with default selection as 4 MB.
- Upload to folder – Provide a folder name.
Once you have filled all details then can click the Upload button to load the data file from your local source to the Azure Blob Container.
Sooner the data file will be uploaded to the container, refresh the blob container blade and you can notice this new data file exist under Data folder.
STEP – 10: Create a Table in the Database
Since during this Azure Data Factory demo task we need to copy the sales transaction data to a table, henceforth essential to create an empty table named transactions in the Azure SQL Database.
Move to All Resources, and then select your created Azure SQL Database. You either can connect this database remotely or can go ahead in the Azure web-based query interface. Click the Query editor (preview) from the database blade. Subsequently, it will launch login blade and submit the same administrator credentials, which you had specified during the provision of SQL database.
If you provide the correct credentials, then it will log on and load the query blade promptly where you can see all DB objects in left side as well a new query window in the right side.
In the query editor, write the following T-SQL query to create a table named transactions in your SQL database –
CREATE TABLE transactions (
Id INT IDENTITY,
Tdate DATE,
Amount DECIMAL );
Next, click Run to execute the T-SQL statement.
You can validate the table using SELECT query or go inside the Tables object on the left side.
SELECT * FROM transactions;
STEP – 11: Launch the Azure Data Factory Copy Wizard
In this exercise, you will use the Copy Wizard to copy data from the file, which already uploaded to the Azure Blob Storage in the previous activity, to the recently created table in an Azure SQL Database.
Since Data Factory instance, has been provisioned, browse All resources and move to the blade for your created data factory.
Under the data factory blade you can notice a couple of action tiles, click the Copy data (PREVIEW) which will open a new tab in the browser.
Here you can see different segments, start with the Properties page; fill the appropriate details for the following properties –
- Task name – Name of the copy task (Data Factory pipeline), required.
- Task description – Description of the task, though optional.
- Task cadence or Task schedule – Depends on need, though can go with default selection as Run regularly on schedule. Since demo, exercise so Run once now is preferable.
- Based on above selection configure the either Expiration time or Recurring pattern as well Start & End date time etc.
STEP – 12: Configure the Source (Linked Service)
Post filling all details, you can click the Next button to load the Source data store page that specifies the source data store for the copy task. Here you can get two tabs as FROM EXISTING CONNECTIONS and CONNECT TO A DATA STORE.
Under the CONNECT TO A DATA STORE tab, you can see a variety of data store connection (which is, in fact, Linked Services), select the Azure Blob Storage and click the Next button accordingly.
Next, under the Source section Specify the Azure Blob storage account page will be loaded, required to fill following parameters –
- Connection name – Name of the Azure Data Factory linked service.
- Account selection method – Precise method required, though you can go with default selection From Azure subscriptions since the data file is the Azure Blob Container.
- Azure subscription – The specific Azure subscription that synchronized up with the storage account.
- Storage account name – Select the storage account name which you had provisioned earlier in this exercise.
After filling all specific source information, click the Next button to sync up with the data file. It will load the Choose the input file or folder page, where you can select the precise folder & file.
Double click the blob container you created previously, and then select the data folder, which contains the transactions.txt file.
Once the data folder selected, click the Choose button and click Next button subsequently. It will load the File format settings page. On the File format settings page, wait a few seconds for the data to be read, and then verify the following details –
- File format – Text format
- Column delimiter – Comma (,)
- Row delimiter – Carriage Return + Line feed (\r\n)
- Skip line count – 0
- The first data row contains column names – Selected
- Treat empty column value as null – Selected
STEP – 13: Configure the Destination (Linked Service)
Post ensuring that the rows of data in the Preview section match the table below, go ahead and click the Next button that will load the Destination data store page. Here once again, you can get two tabs as FROM EXISTING CONNECTIONS and CONNECT TO A DATA STORE similar to the Source section you went through.
Under the CONNECT TO A DATA STORE tab, select the Azure SQL Database linked service and click the Next button accordingly.
Next, under the Destination section Specify the Azure SQL database page will be loaded, required to fill following parameters –
- Connection name – Name of the Azure Data Factory linked service.
- Server/database selection method – Precise method required, though you can go with default selection From Azure subscriptions since the destination table is the Azure SQL Database.
- Azure subscription – The specific Azure subscription that synced up with the SQL database.
- Server name – Select the server name which you had provisioned earlier in this exercise.
- Database name – Select the database name that you had provisioned earlier in this exercise.
- User name – Provide the administrator user name which you had set up during the provision of SQL database.
- Password – Same admin password, which you had specified during the provision of SQL database.
After filling all specific destination information, click the Next button to sync up with the SQL table. It will load the Table mapping page, where you can select the precise table.
Post selection of [dbo].[transactions] in the Destination list, click Next button that will load the Schema mapping page. On the Schema mapping page, make ensure the following settings are configured correctly –
- Blob path: adfcontainer/Data/ < - > [dbo].[transactions]
- tdate (DateTime) < - > Tdate (DateTime) [Selected]
- amount (Double) < - > Amount (Decimal)
- Repeatability settings – None
STEP – 14: Configure Settings
Post ensuring that how source and destination columns are mapped, go ahead and click the Next button that will load the Settings page.
Here in this section, nothing precisely required to configure or set just review the default settings.
STEP – 15: Summary & Deployment
Post validating the Advanced settings and all default values on the Performance settings page, click the Next button that will load the Copy Data Summary page.
This page briefs about all pipelines and linked services details, like –
- Properties – Task details (pipeline)
- Source – Connection (linked service)
- Destination – Connection (linked service)
- Copy settings
Once you went through the summary details and find nothing exceptional then click the Next button, which will promptly start the validation and deployment. The sooner the deployment will be complete.
Here you can monitor the execution of the copy pipeline that you configured, click the Click here to monitor copy pipeline link to expand the RESOURCE EXPOLRER. Under this place you can review and monitor all bits & pieces of Data factory.
Verify that the Data Has Been Copied
In this exercise, the Copy Data wizard should have created a pipeline, and run it to copy the transactions data from your Azure Blob store to Azure SQL Database. You can verify the task accomplishment, i.e. data copied or not using the Query editor.
Move to the Azure SQL Database and launch the Query editor as you did previously, post submission of SQL login credentials.
Run the following query –
SELECT * FROM dbo.transactions;
Validate the result; it is the same data, copied from the data file in your Azure Blob Storage to the Azure SQL Database.
Congratulation, Copy Wizard exercise completed!! 😊
Therefore, in this article we went through to provision a couple of Azure resources and utilized the Azure Data Factory Copy Wizard to copy data from an Azure Blob Storage to an Azure SQL Database. In brief, we covered following hands on activities on top of Azure cloud, like –
- Create a Storage Account
- Create a Blob Container
- Create an Azure SQL Database
- Provisioning an Azure Data Factory
- Use the Azure Data Factory Copy Wizard to Copy the Data
Stay in touch for further posts.
I agree with a lot of the points you made in this article. I appreciate the work you have put into this and hope you continue writing on this subject. Great content about azure data. Oracle fusion hcm training
ReplyDelete