Secure Databricks Notebook Run in Azure Data Factory with Service Principal

We recently had a case where a customer was running Azure Databricks notebook from Azure Data Factory. They mentioned that the configuration used to run the notebook was not secure as it was associated with a user account (probably a service account). The username and password was known to multitude of users and it had already caused some trouble. The concept of a “service account” has been quite prevalent since the times of on-premise applications. The service account was a user account that was used to run various applications and nothing else. In the modern world of cloud services, service accounts are obsolete and should never be used.

He wondered if there was a better way to run such notebooks in a more secure way?

In a single line, the solution to this problem is running the notebook as a service principal (and not a service account).

Here is a detailed step by step guide on how to make this work for Azure Databricks notebooks running from Azure Data Factory:

  1. Generate a new Service Principal (SP) in Azure Entra ID and create a client secret
  2. Add this Service Principal to the Databricks workspace and grant “Cluster Creation” rights
  3. Grant this Service Principal User Access to the Databricks workspace as a user
  4. Generate a Databricks PAT for this Service Principal
  5. Create a new Linked Service in ADF using Access Token credentials of the Service Principal
  6. Create a new workflow and execute it

If you followed the above instructions, your Databricks notebook will be run as a Service Principal.

One thing to understand before going over the detailed steps is the fact that when you add a Databricks notebook as an activity in Azure Data Factory, you are creating an “ephemeral job” in Databricks in the background when the ADF pipeline is run. You can control what kind of job cluster gets created and the user credentials under which the job will be run based on the linked service settings. Let’s get started:

  1. Generate a new SP with client secret
  1. Go to portal.azure.com and search for “Microsoft Entra ID” in the search tab and click it
  2. Click on Add and select “App registration”
  3. Give a meaningful name and leave the rest as-is and click “register”
  4. You should now see something like below
  5. Now click on “Certificates & secrets” and under the “Client secrets” tab click “New client secret”
  6. Under the “Add a client secret” pop-up, give a meaningful description and choose the appropriate expiration window and click “create”. Once done you should see something like this under the “Client secrets” tab
  7. Now open a notepad and have all the information noted there.
    Host: Databricks workspace URL
    Azure_tenant_id and Azure_client_id will be what is shown in the image under point 4
    Azure_client_secret will be the value in the image under point 6

2. Add this Service Principal to the Databricks workspace and grant “Cluster Creation” rights

  1. Now go to your target workspace and click on your Name to expand the menu, click open the settings( this requires workspace admin privilege)
  2. Now click on” identity and access” and under service principals click on manage

Click on “add service principal” button
Now click on Add new
Select the “Microsoft Entra ID managed” radio button and populate with the Application ID(client id) and give the same name as shown in point 4 and click Add

Now provide “Allow cluster creation” permission to the service principal to ensure it can create a new job cluster when we run through an ADF and click “update”

Generate a Databricks PAT for this Service Principal

  1. If you are on Mac open the terminal and use the command “nano .databrickscfg” to open the databricks config file and paste the information from the notepad from point 8.
    If you are on Windows press windows + R and type “notepad
    %APPDATA%\databricks\databricks.cfg” and press enter. Now paste the information from the notepad from point 8 into the Databricks config file. You will also have to give a meaningful profile name to identify the workspace in the config file.

  2. The config file should look like below once you are done. [hdfc-dbx] is the profile name here.Save and exit.
  3. Now go to Databricks CLI and authenticate the service principal’s profile using a PAT token by typing “databricks configure”. It will prompt for the host workspace’s URL and PAT token, provide it, and press enter.
  4. Now goto databricks cli and generate a PAT token for the Service Principal using the command “databricks tokens create –lifetime-seconds 86400 -p <profile name>”. You should see a proper response like below:
  5. Now grab the “token value” and keep it in a notepad safely

Create a new Linked Service in ADF using Access Token credentials of the Service Principal

  1. Now go to ADF, click on Manage -> Linked services -> New
    Under New Linked Service popup click on “compute” and select “Azure Databricks” click continue
  2. Now provide a meaningful name to the linked service, select the proper Azure subscription, and your target workspace. Under Authentication Type leave the option as Access Token and under the access token field update the access token we generated in point 13.(This is being done as an interim solution, the best practice is to use an Azure Key Vault and pass the token through secret).
  3. Now select the cluster version as 13.3 LTS, Node type, Python version, and Worker options as relevant to current cluster configuration. You can select the UC Catalog Access Mode as “Assigned” for single user and “Shared” for shared access mode. Click on test connection to ensure everything is working as intended.

Create a new workflow and execute it

  1. Now create a pipeline select the linked service that you created and point the ADF to a notebook.
  2. Click “Debug” and the job should run successfully

If you open the job execution link, you can see the job has been run as a Service Principal under task details

This is indicated by the value against “Run as” which is that of the Service Principal.

Azure Databricks – What is it costing me?

A customer wanted to know the total cost of running Azure Databricks for them. They couldn’t understand what DBU (Databricks Units) was, given that is the pricing unit for Azure Databricks.

I will attempt to clarify DBU in this post.

DBU is just really an abstraction not related to any amount of compute or compute metrics. At the same time, it changes with the following factors:

  1. When there is a change in the kind of machine you are running the cluster on
  2. When there is change in the kind of workload (e.g. Jobs, All Purpose)
  3. The tier / capabilities of the workload (Standard / Premium)
  4. The kind of runtime (e.g. with or without Photon)

So, the DBUs really measure the consumption of resources. It is billed on a per second basis. The monetary value of these DBUs is called $DBU and is determined by a $ rate charged per DBU. Pricing rate per DBU is available here.

You are paying DBUs for the software that Databricks has made available for processing your Big Data workload. The hardware component needed to run the cluster is charged directly by Azure.

When you visit the Azure Pricing Calculator and look at the pricing for Azure Databricks, you would see that there are two distinct sections – one for compute and another for Databricks DBUs that specifies this rate.

The total price of running an Azure Databricks cluster is a combination of the above two sections.

Let us now understand if we are running a cluster of 6 machines what is the total cost likely to be:

For the All Purpose Compute running in West US in Standard tier with D3 v2, the total cost for compute (in PAYG model) is likely to be USD 1,222/ month.

From the calculator you can see that DBU consumption is 0.75 DBU with a rate of USD 0.4/ hr. So, the total cost of running 6 D3 v2 machines in the cluster will be $ 219 * 6 = $ 1,314.

The total cost hence would be USD 2,536.

From an optimization perspective, you can reduce your hardware compute by making reservations e.g. 1 yr or 3 yrs in Azure. DBUs from the Azure calculator doesn’t have any upfront commitment discounts available. However, they are available if you contact Databricks and request for discount on a fixed commitment.

If you are interested in saving costs while running Databricks clusters, here are two blog posts that you may find interesting:
https://medium.com/similarweb-engineering/how-we-cut-our-databricks-costs-by-50-7c60d6b6c069
https://www.databricks.com/blog/2022/10/18/best-practices-cost-management-databricks.html

Code-less way to authenticate to Azure Resource Manager API from Azure App Services

This is a guest post by Sujay Sarma:

Typical examples that show you how to connect from a web application to Azure Resource Manager API have you wading through configuring and meddling with OAuth and Owin, not to mention getting you confused between ADAL, MSAL and the different types of Active Directory tenants offered by Azure. We do not need ANY of that, especially if your web application is going to live on as an Azure App Service.

Teeth gnashing? Mouth Salivating?

dreamstime_xl_43928193The short answer is to use “Azure App Service Authentication”. And it is nothing new. It has been around since at least November 2014 (wow! a little over four years since!). At least for me, though I have seen it plenty of times while configuring my App Services in Azure, I have scarcely looked at what it can do. Until now.

A project I was working on for a client required authenticating Azure subscribers to the portal. Initially, I went with the regular walkthroughs. I went into my Azure Active Directory blade and under App Registrations, created a new app, secrets and so on. But I faced really strange issues: There were no issues for me (developer never faces issues and has zero bugs on their dev box, yeah?). But my client contact could not login. He was using a Hotmail login address. I had to add him as a Guest User in my Active Directory tenant! That was not going to be a viable action plan for any further step of the project.

The problem, I determined, was that folks on my tenant could log in, but not others. Strangely, another friend was able to login — his account was a custom domain hosted within another Azure Active Directory Tenant (he was using his Organization ID and apparently they were Azure subscribers as well).

So, I tried to use Azure B2C. This is an poorly documented system, where the current documentation and the portal’s user experience are so different. Not only that, there is a lot of confusing terminology used in the documents — and you have to register “apps” in at least three places, not to mention the Web App I was trying to configure! Short story: It was a mess!!!

I told everybody I was giving up on the issue. We would find some “manual” way to get people to authenticate. That was when, an unrelated Google search threw up the page on App Service Authentication. I sent the URL to my mobile to read it during dinner and turned off my computer for the day. Even after I had read the article in question, I only thought of writing a small POC to see what it could do. The next morning, I sat down to set it up. And boy, oh boy! was I in for a pleasant surprise!

To save people the trouble of having go through the same trial and error I did, here is a concise walkthrough of how to do it. I must thank Chris Gillum, whose 2016 blog post clued me into the right course correction to get everything working.

The Walkthrough

  1. Log in to the Azure Portal.
  2. If it is already on the menu on your left/right hand-side, use that. Otherwise, click “All Services” and search there. Go into Azure Active Directory”. If you’re having a hard time chasing it down, click here to go there directly (you maybe prompted to login).
  3. Now click on the “App registrations (Preview)” item. The official documentation follows the flow of going into the other “App registrations” — do not do that, that will end up giving you “OAuth v1.0” tokens. We need “OAuth v2.0” tokens. I found this out by trial and error. Click here to go to the right blade.
  4. Open a Notepad window.
  5. Along the top of the applications view, find the button that says “Endpoints” and click that. Towards the bottom of the pane full of URLs, find the one that says “WS-Federation sign-on endpoint” (third from the bottom at this time). Copy that FULL address and paste it into your Notepad. Now, in Notepad, carefully delete the “/wsfed” from the end of that address — be careful not to delete anything before the “/”. To be safe, you can hit CTRL+H, in Find, type “/wsfed”, leave Replace as blank and hit “Replace All”.
  6. Now hit “+ New Registration”. Enter any name. Be aware that what you enter here will be shown in big bold letters when Azure later asks the user trying to login for consent (the “…. is asking you for permission to access…” UI). Select the option “Accounts in any organizational directory”. Leave the “Redirect URI” blank for now, we will come back to it later. Click on Register.
  7. Once the Azure Portal tells you that the application was deployed successfully, find it again in the same “App Registrations (Preview)” screen and click on it to enter it.
  8. From the overview page, find the “Application (client) ID”. It will be a Guid. Hover on the value to make the “copy” icon appear. Click it to copy it.
  9. Switch to your Notepad window, type in “App ID”, hit ENTER and paste what you copied on step 7.
  10. OPTIONAL. Back on the Azure Portal, go into “Branding” and upload a logo. a picture of size 48×48 pixels works best. Anything else will cause the consent screen to appear in strange shapes and sizes. What you enter into the various URL fields there is not relevant — they will be used to show information links at the bottom of the consent screen. You may leave them blank or enter valid URLs into them — they need not even be on your website!
  11. IMPORTANT. Go into “Certificates & Secrets”. Under the “Client secrets” heading, click “+ New client secret”. Enter a name (does not matter, it is for your convenience), select an expiry value (“Never” is all you need). Click Add. When the new password is generated, it will be shown there. Again, hover on the value under the “VALUE” heading to make the little icon appear and copy it (if you don’t copy it fully, you will be in a world of pain).
  12. Switch to your Notepad window, type in “Secret”, hit ENTER and paste what you copied in step 10.
  13. IMPORTANT. Go into “API Permissions”. Click on “+ Add a permission”. Select “Microsoft Graph” (at the time of writing this, it is a large banner like button right on top of the list that appears). Select “Delegated Permissions”. Check ON: email, offline_access, openid and profile. Scroll to the bottom and find “User”, expand it and check ON “User.Read”. Click “Add permissions” at the bottom.
  14. Now click the “+ Add a permission” again. This time, select “Azure Service Management” (there are many similar looking “Azure” permissions on the list, select the right one). There is only one permission at this time, select it (or select “user_impersonation” if you find more permissions when you’re reading this!). Click “Add permissions” at the bottom.
  15. Right-click on the “App Services” menu item on the navigation (or find it under “All Services”) and select to open it in a new tab — you need to come back to the settings you were working with so far — we are not done there yet! Anyway, end up here.
  16. NOTE: If you already have an app service that you are configuring this for, you can use that. Otherwise, create a new app service. There is nothing special to be done there — and you don’t need to upload any code YET. Once you have selected the app service or created one, continue below.
  17. Select the App Service, open its Authentication/Authorization blade. Set the “App Service Authentication” option to “On”. Immediately a bunch of options will appear below it.
  18. We are only interested in the “Azure Active Directory” option in this walkthrough. So select that. A new blade will open.
  19. Select “Advanced”. A different set of options will appear under it.
  20. For “Client ID”, paste the value pasted under “App ID” from your Notepad window.
  21. Under “Issuer Url” paste the URL you pasted from Step 5 above (it will look like “https://login.microsoftonline.com/…&#8221;).
  22. Under “Client Secret”, paste in the value under “Secret” from your Notepad window.
  23. Now, this is very important. Under “Allowed Token Audiences”, first paste in “https://management.core.windows.net/&#8221;, tab out. Another text box will appear. Now paste in “https://management.azure.com/&#8221; (ensure the final “/” are there). This tells the system to get you the Bearer Tokens that will work with the Azure REST API 🙂 This is the secret magic sauce to the whole thing!
  24. Click OK.
  25. Back on the “Authentication / Authorization” blade, select one of “Allow Anonymous requests (no action)” or “Login with Azure Active Directory”. If you plan to show a “Sign in” link on your website — that is, you want the user to see something before they need to login, then use the “Allow Anonymous requests (no action)” option. If like with the Azure Portal, you want them to be signed in from the get-go, use the “Login with Azure Active Directory” option.
  26. Ensure “Token Store” (bottom of the page, under “Advanced settings”) is “On”.
  27. Click “Save” on top of the page to save everything.
  28. IMPORTANT CAVEAT: If at any point of time, you make changes to this set up, you will need to Restart your App Service before it will use the new values.
  29. Go into the “Overview” blade of your App Service. Wait for all the properties on the top panel to load, and copy the full value of the “URL” (“https://xyz.azurewebsites.net&#8221;). Note how this is “https” ?
  30. Now go back to the Azure Directory screen — if you left it open in another tab or window at the end of step 14, switch to that tab. Otherwise, navigate to it from the menu, or click here.
  31. Ensure you are within the Azure Directory Application (the one you configured from step 3 to 14).
  32. Click into the “Authentication” tab.
  33. Under “Redirect URIs”, ensure “Web” is selected for “TYPE”, under “REDIRECT URI”, paste in the URL to the App Service (Step 29). At the end of this URL, paste in “/.auth/login/aad/callback” [be careful to paste in everything between the quotes]. Note that there is a dot (“.”) in front of “auth”. Your final URL should look like: “https://contoso.azurewebsites.net/.auth/login/aad/callback“.
  34. Scrolling down, under Advanced settings, paste/enter the Logout URL to make it thus: “https://contoso.azurewebsites.net/.auth/logout“. Again, note the “.” in front of “auth”.
  35. Scrolling down, under “Implicit grant”, check ON both “Access tokens” and “ID tokens”.
  36. Click Save above.

Your Azure configuration is DONE.

From your App Service Code

Fire up Visual Studio, create a new web application. I am using a regular Web Forms application. You are free to do this in MVC or .NET Core or whatever. I am using Visual Studio 2019, and selected the “ASP.NET Web Application (.NET Framework)” option. If you are prompted to select the type of authentication — leave it as “No authentication”.

You do not need to install new NuGet packages! Azure App Service automatically fetches the right Bearer token for you (without any plumbing!). This is available to you in the Request Header “X-MS-TOKEN-AAD-ACCESS-TOKEN”. Fetch it from:

string token = Request.Headers[“X-MS-TOKEN-AAD-ACCESS-TOKEN”];

You can now pass this token to your AzureRM REST API calls. I do not use any of the Azure SDKs to talk to AzureRM, and write System.Net.HttpClient based GET/PUT/etc calls. My code to pull all the subscriptions for a logged in user now looks like this:

HttpRequestMessage request = new HttpRequestMessage(HttpMethod.Get, “https://management.azure.com/subscriptions?api-version=2019-03-01&#8221;);

request.Headers.Authorization = new AuthorizationHeaderValue(“Bearer”, Request.Headers[“X-MS-TOKEN-AAD-ACCESS-TOKEN”]);

HttpResponseMessage response = await client.SendAsync(request);

Simple, huh?

Telerik Platform

Resources for Webinar “Consume Azure Mobile Services in Telerik Platform”

On Aug 27 2015, we conducted one more webinar as part of Telerik India. This time we talked about how to “Consume Azure Mobile Services in Telerik Platform“. This blog post is a recap of the webinar. You will find the video recording of the webinar in this blog post.

About Telerik Platform:

Telerik Platform is a modular platform for web, hybrid and native development that integrates a rich set of UI tools with powerful cloud services. This end-to-end development and project management solution provides tools and services for every stage of your application lifecycle – from idea to deployment and on-device performance.

Telerik Platform

You can know more about Telerik platform here.

Video Recording:

Here is the video recording of the webinar:

 

There is no slide deck for this webinar. I did not use any slide deck and did most of the things hands on.

 

Till next time – Happy Coding.