Azure service cloud summarized: Part I

Practicing DatScy
6 min readApr 24, 2023

Over the past few years Data Science has MIGRATED from individual computers to service cloud platforms. One can only train and mange so many algorithms/commands with one computer, thus it is attractive to use a service cloud platform with more computers, storage, and deployment options. Learning about the framework of a service cloud platform is time consuming and frustrating because there is a lot of new information from many different computing fields (computer science/database, software engineering/developers, data science/scientific engineering & computing/research). And, it is difficult to truly understand the functionalities if you do not read about it a lot and try it yourself.

I just finished learning Azure’s service cloud platform using Coursera and the Microsoft Learning Path for Data Science. The Coursera class is direct to the point and gives concrete instructions about how to use the Azure Portal interface, Databricks, and the Python SDK; if you know nothing about Azure and need to use the service platform right away I highly recommend this course. But, it does not give you all the information about the different functionalities and services, like Data Factory/Linked Services/Analytics Synapse(how to combine and manage databases, ETL), Cognitive Services/Form Recognizer/ (how to do image, text, audio processing), IoT, Deployment, GitHub Actions (running Azure scripts from GitHub). I highly recommend finding your job learning track, and completely all the modules; it gives a full understanding of the features on the platform. It will take a couple of months but it is worth it!

In this post I list six important modules that I think could help in performing Data Science tasks quickly. In my last consulting job, I was asked to do tasks that Data Factory and Form Recognizer can easily do for AWS/Amazon cloud services. But, since I did not know Azure or AWS, I was trying to horribly re-code them by hand with python and pandas; knowing these services on the cloud platform could have saved me a lot of time, energy, and stress.

Quick Guide to create objects for 6 main types of projects

  1. Data Factory (Linked Services, DataSets, Activities) : Ingestion/Importation of data from unstructured/unorganized sources (ie: blobstorage) to structured sources (ie: SQL)
  2. Cognitive Services (Language, Voice, Vision, Decision) : Applications include; text-to-speech, speech recognition, image-to-text (Form Recognizer), Computer Vision (object tracking and detection), face recognition, text analysis (Chatbot, translation), content moderation/personalization, anomaly detection
  3. Azure Synapse Analytics : Extract Transform Load (ETL) of ingested data
  4. Machine Learning (ML) Azure : pre-process/transform features, create and monitor ML/DL models & parameters
  5. Azure (DevOps) Pipelines with GitHub : Launch and monitor an Azure project completely from scripts. GitHub is useful, in comparison to DevOps Pipeline, because many people can work on the project using branches; new changes can be automatically ran and tracked using versioning.
  6. Databricks Azure : Computing platform like Machine Learning Azure with optimal data management

Background information about how to create Azure objects

If you have never used Azure it will appear challenging, but there are 3 main steps to follow to setup each project:

  1. Create the objects needed for each project using Azure Portal (the GUI Market Place), command line (CLI), or SDKs (Python, C#, Java, Go, etc).
  2. Collect objects together by specifying object authorization and locations (connection keys, IP address, etc)
  3. Make a script (SDK, YAML) that launches the sequential process of how to run the objects, and specify where to put the output information/results (MLFlow, dashboard, etc)

In this post I will just outline what objects are needed for each type of project, and how to create the objects using the CLI. In my opinion, the CLI is more comprehensive than the SDKs in terms of object creation because it can be used to create any Azure object; there are some objects like Machine Learning Studio that can not be created using the SDK, the SDK was made to only reference an existing object. You can use the bash or powershell CLI, all the commands for this post are in bash.

Setup the terminal/CLI

Login to Azure Portal (https://portal.azure.com/) — click on Cloud Shell

OR

Install the CLI tool for your PC : https://learn.microsoft.com/en-us/cli/azure/install-azure-cli

Now you are ready to enter the commands! Once your familiar with these commands, it is best practice to make a script (ie: .sh for linux) that just creates all of these objects with one button push.

Good practices

  1. Log into Azure
  2. Identify your Azure subscription ID

Needed objects

For all projects you need the following objects:

  1. Resource Group
  2. Stockage Account

[1] Data Factory

The purpose of Data Factory is to import many different data sources (databases, folder of videos or images, text files, etc) and then map them to an organized database. There are 3 important functions:

  1. Linked Services : Add your input and output data sources. For example, if you have a blobstorage, which is like a folder, of hundreds of text files and you want to put the text files in one SQL database, you assign the input source to be your blobstorage and your output source to be your SQL database.
  2. DataSets : It uses a mapping function called DataSets, which is a template of how you wish your output data to be organized; I actually had to code this at a consulting job and it is just organizing columns of data in a certain order as one would do in pandas. Thinking about your folder of hundreds of text files to be uploaded to blobstorage, let’s say you wish your hundreds of text file data to be organized in a database by a certain column name order. To do this, you use DataSets to specify the column order; 1. use the Data Factory GUI and type in the column names in a certain order, or 2. run an SQL query that creates an empty table with the column order that you wish and then associate this table with your blobstorage data in Data Factory.
  3. Activities : It does additional transformations that you specify, on the data that is in the output source

Needed objects for the most classical case (the example of transferring a folder of hundreds of text files in a similar format to an SQL database):

1. Data Factory
2. Input source (Create/Provision a container in a Storage Account)
3. Output source (Azure SQL Database)

Information Sources:

  1. Data Factory creation — https://learn.microsoft.com/en-us/azure/data-factory/quickstart-create-data-factory-azure-cli
  2. SQL server & Database creation — https://learn.microsoft.com/en-us/azure/azure-sql/database/single-database-create-quickstart?view=azuresql&tabs=azure-cli
  3. Blob storage creation — https://learn.microsoft.com/en-us/cli/azure/storage/blob?view=azure-cli-latest

[2] Cognitive Services

Information Sources:

  1. Cognitive Services creation — https://learn.microsoft.com/en-us/cli/azure/cognitiveservices/account?view=azure-cli-latest#az-cognitiveservices-account-create

[3] Azure Synapse Analytics

Information Sources:

  1. Azure Synapse Analytics creation — https://learn.microsoft.com/en-us/azure/synapse-analytics/quickstart-create-workspace-cli

[4] Machine Learning (ML) Studio Azure

Information Sources:

  1. ML Studio creation — https://learn.microsoft.com/fr-fr/cli/azure/ml/job?view=azure-cli-latest

You can use the python SDK to run training and test scripts, to create and train models. Be sure to create an Environment for the ML workspace. An environment includes the available Python packages that you need to run the training and test scripts on the mlcomputeclustername compute cluster.

Outline of a Python SDK script to run a model

Note: not everything may be correct, but this is an outline for how to run and deploy an ML/DL model using the Python SDK.

Information Source:

  1. Build and Operate Machine Learning Solutions with Azure — https://www.coursera.org/learn/build-and-operate-machine-learning-solutions-with-azure

[5] Azure (DevOps) Pipelines with GitHub

Go to https://azure.microsoft.com/services/devops/?portal=true , and select Start free with GitHub.

It will create an account from your account as https://dev.azure.com/{Azureusername}, and your GitHub will be connected.

When you launch a Pipeline from the Python SDK given above, you will see it here! You can monitor and make changes to the deployed model using the three areas in the DevOps platform.

[6] Databricks Azure

Information Sources:

  1. Databricks Azure creation — https://learn.microsoft.com/fr-fr/cli/azure/databricks?view=azure-cli-latest

I really hope that this short summary of how to use Azure helps!

Happy Practicing! 👋

BECOME a WRITER at MLearning.ai

--

--

Practicing DatScy

Practicing coding, Data Science, and research ideas. Blog brand: Use logic in a clam space, like a forest, and use reliable Data Science workflows!