TALK GRID (Cloud Trainer): February 2019

Monday, February 25, 2019

Azure Data Factory and Data Pipeline - Copy data from Azure Blob to Azure SQL

Azure Data Factory and Data Pipeline – Hands on activity

In reference to Azure Data Factory hands on activities, we already walked through in one of the previous post, provisioning an Azure Data Factory and copy data from a file in Azure Blob Storage to a table in an Azure SQL Database using Copy Wizard. Along with it, we covered some other activities simultaneously like creating a Storage Account and a Blob Container as well set up an Azure SQL Database in context of Copy Wizard task.

Next, in this series here we will extend some more hands on activities on top of Azure Data Factory, like –

Implement a data Pipeline.
Copy data from a file in Azure Blob Storage to a table in an Azure SQL Database using Pipeline.

Pre-requisites

Before moving ahead, we need some pre-requisites to accomplish this Azure Data Factory hands on activities on top of Azure Cloud.

Meanwhile, in previous article already we covered provisioning few resources, from now assuming you completed those. In case not covered, then please move to that article and do all preliminary hands on activities.

Azure subscription, if you don't have an account then sign up for a free Azure account - https://azure.microsoft.com/en-gb/free/
Download the required lab files, essential to this task from the GitHub repository - https://github.com/rajendraazure/azure-data-factory-copy-wizard.
Previously provisioned Storage Account and a Blob Container
Previously provisioned Azure SQL Database.
Previously provisioned Azure Data Factory.

Validate the SQL Database and prepare the Table

In this walk-through, you will consume all previously provisioned resources though better to validate before proceeding ahead. To avoid mix-up, you will have to prepare the SQL table again by deleting the database records, which were inserted by the Azure Data Factory Copy Wizard in the previous post.

STEP – 1: Launch SQL Database

In the Microsoft Azure portal, click the All resources from the Hub and click your Azure SQL Database.

It will load the SQL database blade for the selected database; you either can connect this database remotely or can go ahead in the Azure web-based query interface.

STEP – 2: Launch Query editor (preview)

Next, click the Query editor (preview) from the database blade. Subsequently, it will launch login blade and submit the same administrator credentials, which you had provided during the provision of SQL database.

If you provide the correct credentials, then it will log on and load the query blade promptly where you can see all DB objects in left side as well a new query window in the right side.

STEP – 3: Execute and prepare the SQL Table

In the query editor, write the following T-SQL query to truncate and prepare the dbo.transactions table in your SQL database –

TRUNCATE TABLE dbo.transactions;

Next, click Run to execute the T-SQL statement.

Run the following command to verify that the table is empty again -

SELECT * FROM dbo.transactions;

Validate the existing Linked Services

An Azure Data Factory uses linked services to access, transform, and store data. Even in the previous article, you applied the same using Copy Wizard.

Here in the case of the pipeline approach, you require linked services for the blob store account where the source data is stored as well the Azure SQL Database containing the table you want to load.

Move to the Microsoft Azure Portal.

STEP – 1: Launch Azure Data factory

In the Microsoft Azure portal, click the All resources from the Hub and click your Azure Data Factory.

It will load the Data factory blade for the selected Azure Data Factory; and required to click the Author and deploy tile.

STEP – 2: Validate existing Linked services

By clicking the Author and deploy tile, the next blade will be available along with all required configurable settings like linked services, datasets and pipelines, etc.

In the pane on the left, expand the first option Linked Services and you can notice two linked services named destination-sqldatabase-linked-service and source-blobstorage-linked-service are already defined for the blob store and SQL database.

In fact, the Copy Data wizard in the previous hands on activities created these linked services.

Next, click the blob linked service source-blobstorage-linked-service, the JSON definition should like something this –

{

"name": "source-blobstorage-linked-service",

"properties": {

"hubName": "democopydatafactory_hub",

"type": "AzureStorage",

"typeProperties": {

"connectionString": "DefaultEndpointsProtocol=https;AccountName=dfactorystorageaccount;AccountKey=**********"

}

Similarly, click the SQL linked service destination-sqldatabase-linked-service, the JSON definition should like something this –

{

"name": "destination-sqldatabase-linked-service",

"properties": {

"hubName": "democopydatafactory_hub",

"type": "AzureSqlDatabase",

"typeProperties": {

"connectionString": "Data Source=datafactorydbserver.database.windows.net;Initial Catalog=datafactorydb;Integrated Security=False;User ID=dbadmin;Password=**********;Connect Timeout=30;Encrypt=True"

}

Create and configure Datasets

You see that datasets define schema for the data flows, the data, which required extracting from a source, and the data to load into a destination or sink. In other words, a dataset is a named view or a data structure of data that refer the precise data you want to use with the associated activities as input as well as output.

Visit the precise post to know about all other relevant bits & pieces like pipelines, activities, datasets and linked services, etc.

Henceforth must create datasets to define the text file data source and the database table, which will be used in your pipeline.

STEP – 1: Create a source Dataset (Azure Blob)

Click the context menu More which will display a couple of options, click the New dataset, which further will display different options.

By clicking the New dataset a further context menu will be appeared with multiple options, then click the Azure Blob storage to create a new document for an Azure Blob store dataset.

STEP – 2: Configure source Dataset JSON

By clicking the Azure Blob storage, a new JSON document would be launched. In the new JSON document, replace the default code with the following code, which you can copy and paste from the transactions-txt.json file available in the GitHub repository –

{

"name": "transactions-txt",

"properties": {

"structure": [

{

"name": "tdate",

"type": "Datetime"

{

"name": "amount",

"type": "Decimal"

}

"type": "AzureBlob",

"linkedServiceName": "source-blobstorage-linked-service",

"typeProperties": {

"folderPath": "dfactorycontainer/data/",

"format": {

"type": "TextFormat",

"columnDelimiter": ",",

"firstRowAsHeader": true

}

"external": true,

"availability": {

"frequency": "Minute",

"interval": 15

}

The custom JSON defines a schema for comma-delimited text containing a tdate column of DateTime values, and an amount column of Decimal values. Along with it, the data is available in the dfactorycontainer/data folder of the blob storage account defined by the source-blobstorage-linked-service linked service, and new data will be available every fifteen minutes.

Once you validate the JSON parameters and values, deploy the new dataset definition to your Azure Data Factory by clicking the Deploy button that appears in the top menu.

Sooner you will be notified once the new dataset is created successfully. Even you can verify the same thing, by selecting a new dataset named transactions-txt that exist under the Datasets option from the left pane.

STEP – 3 : Create and configure destination Dataset (Azure SQL)

Similar to source dataset, required to create and configure the destination dataset for the Azure SQL Database. Once again, click the context menu More and subsequently click the New dataset from the displayed option. Next then click Azure SQL to create a new JSON document for an Azure SQL Database dataset.

In the new JSON document, replace the default code with the following code, which you can copy and paste from the dbo-transactions.json file available in the GitHub repository –

{

"name": "dbo-transactions",

"properties": {

"type": "AzureSqlTable",

"linkedServiceName": "destination-sqldatabase-linked-service",

"structure": [

{

"name": "tdate",

"type": "Datetime"

{

"name": "amount",

"type": "Decimal"

}

"typeProperties": {

"tableName": "dbo.transactions"

"availability": {

"frequency": "Minute",

"interval": 15

}

The custom JSON defines a schema for a table named dbo.transactions containing a tdate column of DateTime values, and an amount column of Decimal values in the database defined by the destination-sqldatabase-linked-service linked service. The dataset can be updated every fifteen minutes.

Once you validate the JSON parameters and values, deploy the new dataset definition to your Azure Data Factory by clicking the Deploy button that appears in the top menu.

Sooner you will be notified once the new dataset is created successfully. Even you can verify the same thing, by selecting a new dataset named dbo.transactions that exist under the Datasets option from the left pane.

Create and configure Pipeline

You know that a pipeline encapsulates a data flow that includes copying the data and transforming the data as we move it from one place to another. Since now that you have defined the linked services and datasets for your data flow, you can create a pipeline to encapsulate it.

In other words, a pipeline defines a series of actions that use linked services to operate on datasets. In this case, your pipeline will consist of a single action to copy data from the source-blobstorage-linked-service linked service to the destination-sqldatabase-linked-service linked service.

You can visit the precise post to know about all other relevant bits & pieces like pipelines, activities, datasets and linked services, etc.

STEP – 1: Create a Pipeline

Click the context menu More and subsequently click the New pipeline from the all available options.

STEP – 2: Configured Pipeline JSON

In the new JSON document, replace the default code with the following code, which you can copy and paste from the copytrans.json file available in the GitHub repository –

{

"name": "copytrans",

"properties": {

"activities": [{

"type": "Copy",

"typeProperties": {

"source": {

"type": "BlobSource",

"recursive": false

"sink": {

"type": "SqlSink",

"writeBatchSize": 0,

"writeBatchTimeout": "00:00:00"

"translator": {

"type": "TabularTranslator",

"columnMappings": "tdate:tdate,amount:amount"

}

"inputs": [{

"name": "transactions-txt"

}],

"outputs": [{

"name": "dbo-transactions"

}],

"name": "CopyTransData"

}],

"pipelineMode": "OneTime"

}

The custom JSON defines a pipeline containing a Copy action to copy the transaction-txt dataset in the blob store to the dbo.transactions table in the SQL database.

Once you validate the JSON parameters and values, deploy the new pipeline definition to your Azure Data Factory by clicking the Deploy button that appears in the top menu.

Sooner you will be notified once the new pipeline is created successfully. Even you can verify the same thing, by selecting new pipeline named copytrans that exist under the Pipelines option from the left pane.

Validate the Pipeline Status

Now all datasets and pipeline have deployed successfully, time to validate and monitor the pipeline status using the Azure portal.

Move to your Azure Data Factory blade, and click the Pipelines tile.

By clicking the Pipelines tile, promptly the pipelines blade will be launched. Here you can notice two pipelines exist, one was created using the Copy Data Wizard previously and along with it you can see the newly created copytrans pipeline with a Pipeline State of Running.

Next, click the copytrans pipeline, and observe the state of the activities it contains. Might be initially it contain no activities, as the datasets are validated and the pipeline is prepared.

For the moment close the Pipelines blade, and wait for a few minutes, approx. 15 minutes. Later on, reopen the Pipelines blade and the copytrans blade to view its state. When the pipeline has been run successfully, its State in the copytrans blade will be indicated as Ready.

Along with it, under the Pipelines blade its Pipeline State will be set to Completed simultaneously.

Verify the Data has been copied at Azure SQL end

Now onward your pipeline has run without any exception, time to verify that the data has been copied to a SQL database. Return to the Query editor (preview) from the Azure SQL database blade and submit the administrator credentials.

Post login, run the following command to verify that data has been inserted into the table -

SELECT * FROM dbo.transactions;

If you verify the table, you can see it contains the same transaction data, which has been copied from the text file, exist in the Azure Blob store.

Congratulation, Data Pipeline exercise completed!! 😊

As a result, we covered one more hand on activity in the context of Azure Data Factory using Data Pipeline to copy data from an Azure Blob Storage to an Azure SQL Database. Here we brushed up some previous accomplishments like pipeline, linked service and datasets, etc. as well did some JSON codes.

Keep visiting for further posts.

Friday, February 08, 2019

Clone an Azure Virtual Machine using Snapshot option

Clone an Azure VM

We already talked about an Azure Virtual Machine as well how to clone or create a managed image of an Azure Virtual Machine. Undoubtedly, in some precise scenario, it is essential to keep the exact configuration and details as exist in the original VM, here cloning fulfil the desired need.

We discussed that, the introduction of managed disks, makes quite feasible to simplify working with Azure Virtual Machines (VM) and Virtual Machine Scale Sets (VMSS). Now cloning or creating a managed image of an Azure VM is realistic and can be achieved by different ways, using –

Generalized
Snapshot
Site Recovery

In one of the previous article, we already covered how to clone an Azure Virtual Machine using Generalized option. Generalization has been a preparation process of a virtual machine creating an image, through a tool called as Sysprep that located under system32/sysprep.

However, we have noticed that post execution of Sysprep utility over an Azure VM, that VM would be considered as generalized for capturing the image, the VM gets deallocated and you cannot login to it again. In other words, the process of generalizing a Virtual Machine is not reversible and here overcome the situation Snapshot appears.

In this post, we will talk and walk through how to clone an existing Azure VM and keep old VM in working condition, i.e. both VM would be in running mode.

Snapshot an Azure Virtual Machine

In fact, a snapshot is a full, read-only copy of a Virtual Hard Drive (VHD) belongs to an Azure Virtual machine. Purpose behind taking snapshot of an OS or a data disk VHD to use as a backup, or to troubleshoot Azure Virtual Machine (VM) issues.

Using snapshot, you can create an Azure Virtual Machine by creating a managed disk from the snapshot and then attaching the new managed disk as the OS disk. You can either use the Azure Portal or instead, achieve the same through PowerShell.

In this hand on an activity, you will create a cloned VM using following steps as –

Validate existing VM
Create Snapshot from the VM
Create Managed Disk from the Snapshot
Create/Restore VM from the Managed Disk

Note: If you are moving ahead to use the snapshot to create a new VM, it is preferred that you quickly shut down the VM before taking a snapshot, to clear out any processes that are in progress.

Pre-requisites

Before proceeding, we need some pre-requisites to accomplish this Cloning VM task using Azure Portal.

Azure subscription, if you don't have an account then sign up for a free Azure account - https://azure.microsoft.com/en-gb/free/
An existing Windows Azure virtual machine (VM).

STEP – 1: Validate an Azure Virtual Machine (VM)

It is essential to exist a Windows based Azure Virtual Machine (VM) to accomplish this demo task, login to Azure portal https://portal.azure.com/.

On the left Hub menu, click All resources and select the existing virtual machine, verify the VM is either running or stopped.

STEP – 2: Move to attached Disk

Next, required to select the Disks option under the Settings section that will load the existing virtual machine disk blade. Here you can see the OS disk with brief details like size, storage type etc.

Now launch the OS disk blade by clicking the disk link, which appears in the OS disk list. Here you can see the brief details under the Overview section.

STEP – 3: Create a Snapshot

Post launches the OS disk blade, you can notice a + Create snapshot option exist on the menu tab, that is being used to create a snapshot.

By clicking the + Create snapshot option, the Create snapshot blade will be loaded and essential to fill the following parameters to proceed further –

Name – The name of snapshot.
Subscription – The name of your Azure subscription, it would be either free, paid or specific subscription etc.
Resource group – The name of the resource group used to organize related resources, you can either create a new resource or choose an existing one.
Location – The region where to store your snapshot information, it would be pre-selected by default.
Account Type – Select the Account type to use to store the snapshot. Go ahead with by default option as Standard HDD, unless you need the snapshot to be stored on a high-performing disk.

Once you are done with all required properties, proceed by clicking the Create button to create the snapshot. Promptly the process will be started and complete the creation of the snapshot, the same you can validate in the Notifications section.

STEP – 4: Validate the newly created Snapshot

In previous steps we have created the snapshot of the Azure Virtual Machine’s OS disk, now validate the snapshot. Go to Resource groups and here you can see a new entry of the newly created snapshot of the disk.

Along with this, you can notice the existing OS disk is also exist, that means we can still log on the old VM, will validate later.

By clicking the newly created snapshot, you can launch the Snapshot blade and verify the details like source disk, size, account type etc.

Congratulations, snapshot has been created!! 😊

STEP – 5: Create a Managed Disk

Post creation of a snapshot, essential to create a Managed disk so that further you can create a virtual machine (VM) from the disk accordingly.

Move to Hub menu and click the All services, which will launch the All service blade. Here in the All services search box, enter disk that will display the list of available disks.

Select the Disks that you would like to use; most probably, the first option is preferable, which will load the Disks blade.

Here you can see there is only one disk in the list, which is in fact the disk belongs to the old existing virtual machine. Now similar to this disk need to create a new managed disk, click the + Add button from the top menu that will load the Create managed disk where essential to fill the following parameters to proceed further –

Name – The name of your new managed disk.
Subscription – The name of your Azure subscription, it would be either free, paid or specific subscription etc.
Resource group – The name of the resource group used to organize related resources, you can either create a new resource or choose an existing one.
Location – The region where to store your storage account information, prefer to go with default-selected location.
Availability zone – Since this is demo task and it is not required so go with default-selected option as None.
Account Type – Select the Account type to use to store the snapshot. Go ahead with by default option as Standard HDD, unless you need the snapshot to be stored on a high-performing disk.
Source type – Here make sure the Snapshot should be selected.
Source snapshot – Select the created snapshot from the Source snapshot drop-down list.
Size – Size of disk as per need, by default it would be pre-populated based on a snapshot.

If required, make any other adjustments as needed and then proceed by clicking the Create button to create the managed disk. Quickly the process will be started and complete the creation of managed disk, the same you can validate in the Notifications section.

Along with it, if you reload the Disks blade, you can see this new entry in the disk list.

STEP – 6: Restore (Create) Virtual Machine from the Managed Disk

Now you have the managed disk VHD, next you can restore or better to say can create an Azure Virtual Machine (VM) using the same managed disk.

Time to load the Disks blade again, which list all available managed disks.

You can notice the newly created managed disk, proceed ahead by clicking the same and load the Disk blade.

In the Overview page, ensure that Disk state appears as Unattached. If it is not, you might need to either detach the disk from the source. Here from the top menu, click the + Create VM to proceed creating a new VM.

It will load the Create a virtual machine blade the time you click the + Create VM button inside the disk blade in the previous step.

On the Basics tab, under Project Details, make sure the following parameters need to be filled up correctly as –

Subscription – The name of your Azure subscription, it would be either free, paid or specific subscription etc.
Resource group – The name of the resource group used to organize related resources, you can either create a new resource or choose an existing one.

STEP – 7: Provide specifics for Instance Details

Under the Instance Details section of Create a virtual machine blade, provide following property details in the context of new VM as –

Virtual machine name – The name new VM which is being created using the snapshot.
Region – Since this is based on image so most probably it would be pre-selected, go with the default-selection.
Availability options – Go ahead with by default selection as No infrastructure redundancy required.
Image – You can see the earlier creared managed disk is already pre-selected, go with this selection.

If you create a fresh VM then most probably the size would be pre-selected with Standard D2s v3 2 vcpus, 8 GB memory options (might be variations in some scenario), but here it appears blank and asking for Select size due to selection of image/managed disk.

Next, click the Select size link, it will load the Select a VM size blade where you can choose an appropriate size for the new VM.

Once you select the desired size, click the Select button which will lead to back to Instance Details with selected option of Size.

STEP – 8: Provide specifics for Inbound Port rules

Under Inbound Port Rules - Public inbound ports, choose Allow selected ports and then select RDP from the drop-down.

Along with under the Save Money – Already have a Windows license, go with the default selection as No.

Next, proceed ahead by clicking the Review + create button at the bottom of the page, promptly the validation will be started.

STEP – 9: Review and Creating the VM

On the Create a virtual machine page, post validation success you can see the details for the VM you are about to create from the managed disk (based on snapshot).

If you feel each information is fine and ready, then click the Create button. It will take a few minutes for your VM to be deployed. Sooner you will be notified once the cloned VM is created successfully.

The new virtual machine using the cloned image is deployed successfully, time to connect the VM now.

STEP – 10: Connect to new Virtual Machine

Move to newly created VM Overview dashboard and copy the Public IP address or can click the Connect button from the menu bar to log on the VM.

As usual, you have both options to log on the VM using either mstsc utility and copied the public IP address or go ahead with the downloaded RDP file, I am moving with mstsc.

Note: Here the same administrator credentials will be required, which was provided during the provisioning of old virtual machine (VM).

Once you get connected with newly created Azure VM, here you can notice the same Operating Software, all other applications and settings are appearing which were part of previous VM.

Congratulation, Clone to an Azure VM using Snapshot done!! 😊

STEP – 11: Connect to old Virtual Machine

Since we created the new virtual machine based on Snapshot and Managed Disk option, respectively, henceforward still we can access the old virtual machine. In other words, still the old virtual machine is running.

Let us connect the same, move to old VM Overview dashboard and copy the Public IP address or can click the Connect button from the menu bar to log on the VM.

As usual, I am moving with mstsc utility and copied the public IP address to connect to the virtual machine.

Post get connected with the old Azure VM, here you can notice nothing has been changed neither Operating Software nor all other applications and settings.

Therefore, in previous article, we have seen that using sysprep.exe which is a part of Generalization process, the virtual machine (VM) is considered Generalized and cannot be restarted or access the same. In brief, using Generalized option we can create the cloned VM, but the old VM get deallocated so further you cannot access the old VM.

However, using the Snapshot option, we clone an existing Azure VM and keep old VM in working condition, i.e. both VM would be in running mode.

Keep visiting for further posts.

Pages

Monday, February 25, 2019

Azure Data Factory and Data Pipeline - Copy data from Azure Blob to Azure SQL

Azure Data Factory and Data Pipeline – Hands on activity

Pre-requisites

Validate the SQL Database and prepare the Table

STEP – 1: Launch SQL Database

STEP – 2: Launch Query editor (preview)

STEP – 3: Execute and prepare the SQL Table

Validate the existing Linked Services

STEP – 1: Launch Azure Data factory

STEP – 2: Validate existing Linked services

Create and configure Datasets

STEP – 1: Create a source Dataset (Azure Blob)

STEP – 2: Configure source Dataset JSON

STEP – 3 : Create and configure destination Dataset (Azure SQL)

Create and configure Pipeline

STEP – 1: Create a Pipeline

STEP – 2: Configured Pipeline JSON

Validate the Pipeline Status

Verify the Data has been copied at Azure SQL end

Friday, February 08, 2019

Clone an Azure Virtual Machine using Snapshot option

Clone an Azure VM

Snapshot an Azure Virtual Machine

Pre-requisites

STEP – 1: Validate an Azure Virtual Machine (VM)

STEP – 2: Move to attached Disk

STEP – 3: Create a Snapshot

STEP – 4: Validate the newly created Snapshot

STEP – 5: Create a Managed Disk

STEP – 6: Restore (Create) Virtual Machine from the Managed Disk

STEP – 7: Provide specifics for Instance Details

STEP – 8: Provide specifics for Inbound Port rules

STEP – 9: Review and Creating the VM

STEP – 10: Connect to new Virtual Machine

STEP – 11: Connect to old Virtual Machine