Copy Data in Azure Databricks Table from one region to another

One of our customers had a requirement of copying data that was locked in an Azure Databricks Table in a specific region (let’s say this is eastus region). The tables were NOT configured as Delta tables in the originating region and a subset of personnel had access to both the regions.

However, the analysts were using another region (let’s say this is westus region) as it was properly configured with appropriate permissions. The requirement was to copy the Azure Databricks Table from eastus region to westus region. After a little exploration, we couldn’t find a direct/ simple solution to copy data from one Databricks region to another.

One of the first thoughts that we had was to use Azure Data Factory with the Databricks Delta connector. This would be the simplest as we would simply need a Copy Data activity in the pipeline with two linked services. The source would be a Delta Lake linked service to eastus tables and the sink would be another Delta Lake linked service to westus table. This solution faced two practical issues:

  1. The source table was not a Delta table. This prevented the use of Delta Lake linked service as source.
  2. When sink for copy activity is a not a blob or ADLS, it requires us to use a staging storage blob. While we were able to link a staging storage blob, the connection could not be established due to authentication errors during execution. The pipeline error looked like the following:
Operation on target moveBlobToADB failed: ErrorCode=AzureDatabricksCommandError,Hit an error when running the command in Azure Databricks. Error details: shaded.databricks.org.apache.hadoop.fs.azure.AzureException: shaded.databricks.org.apache.hadoop.fs.azure.AzureException: Unable to access container adf-staging in account xxx.blob.core.windows.net using anonymous credentials, and no credentials found for them in the configuration. Caused by: shaded.databricks.org.apache.hadoop.fs.azure.AzureException: Unable to access container adf-staging in account xxx.blob.core.windows.net using anonymous credentials, and no credentials found for them in the configuration. Caused by: hadoop_azure_shaded.com.microsoft.azure.storage.StorageException: Public access is not permitted on this storage account..

On digging deeper, this requirement is documented as a prerequisite. We wanted to use the Access Key method but it requires the keys to be added into Azure Databricks cluster configuration. We didn’t have access to modify the ADB cluster configuration.

To work with this, we found the following alternatives:

  1. Use Databricks notebook to read data from non-Delta Tables in Databricks. The data can be stored in a staging Blob Storage.
  2. To upload the data into the destination table, we will again need to use a Databricks notebook as we are not able to modify the cluster configuration.

Here is the solution we came up with:

In this architecture, the ADB 1 notebook is reading data from Databricks Table A.1 and storing it in the staging blob storage in the parquet format. Only specific users are allowed to access eastus data tables, so the notebook has to be run in their account. The linked service configuration of the Azure Databricks notebook requires us to manually specify: workspace URL, cluster ID and personal access token. All the data transfer is in the same region so no bandwidth charges accrue.

Next the Databricks ADB 2 notebook is accesses the parquet file in the blob storage and loads the data in the Databricks Delta Table A.2.

The above sequence is managed by the Azure Data Factory and we are using Run ID as filenames (declared as parameters) on the storage account. This pipeline is configured to be run daily.

The daily run of the pipeline would lead to a lot of data in the Azure Storage blob as we don’t have any step that cleans up the staging files. We have used the Azure Storage Blob lifecycle management to delete all files not modified for 15 days to be deleted automatically.

Azure AD Authentication in ASP.NET Core Web API

The next step to building an API is to protect it from anonymous users. Azure Active Directory (AD) serves as an identity platform that can be used to secure our APIs from anonymous users. After the authentication is enabled, users will need to provide a OAuth 2.0/ JST token to gain access to our API.

Let us begin to implement Azure AD Authentication in ASP.NET Core 5.0 Web API.

I will be creating ASP.NET Core 5.0 project and show you step by step how to enable authentication on it using Azure AD Authentication. We will be doing it using the MSAL package from nuget.

Prerequisites

Before you start to follow steps given in this article, you will need an Azure Account, and Visual Studio 2019 with .NET 5.0 development environment step.

Creating ASP.NET Core 5.0 web application

Open visual studio and click on Create a new project in the right and select “Asp.net core web app” as shown in below image and click next.

In the configure your new project section enter name and location of your project as shown in below image and click next

In the additional information step, select .NET 5.0 in the target framework, Authentication Type to none and check Configure HTTPS checkbox and click on create.

Configuring ASP.NET Core 5.0 App for Azure AD Authentication

Open appsettings.json of your web api and add following lines of code.

"AzureAd": {
    "Instance": "https://login.microsoftonline.com/",
    "Domain": "gtmcatalyst.com",  "qualified.domain.name",
    "ClientId": "your-client-id",
    "TenantId": "your-tenant-id"
  }

Replace your-client-id and your-tenant-id with the actual values that you copied while doing app registration in azure ad

Next, add package manager console and add following two package references to your web application.

using Microsoft.AspNetCore.Authentication.JwtBearer;
using Microsoft.Identity.Web;

Next, open startup.cs in your project and paste following code in the ConfigureServices method

   services.AddAuthentication(JwtBearerDefaults.AuthenticationScheme)
               .AddMicrosoftIdentityWebApi(Configuration.GetSection("AzureAd"));

Next, in the Startup.cs, go to Configure method and add app.UseAuthentication(); line before app.UseAuthorization(); line.

Next, open any Controller and add [Authorize] attribute:

    [Authorize]
    [Route("[controller]")]
    [ApiController]
    public class SupportController : ControllerBase
    {

    }

Save all files and run your project.

You will notice that once you run the project, and try to access any method in support controller from the browser you will get return the HTTP ERROR 401 ( Unauthorized client error).

Conclusion

Our API is no longer available for anonymous access. It is now protected by Azure AD Authentication.

Code-less way to authenticate to Azure Resource Manager API from Azure App Services

This is a guest post by Sujay Sarma:

Typical examples that show you how to connect from a web application to Azure Resource Manager API have you wading through configuring and meddling with OAuth and Owin, not to mention getting you confused between ADAL, MSAL and the different types of Active Directory tenants offered by Azure. We do not need ANY of that, especially if your web application is going to live on as an Azure App Service.

Teeth gnashing? Mouth Salivating?

dreamstime_xl_43928193The short answer is to use “Azure App Service Authentication”. And it is nothing new. It has been around since at least November 2014 (wow! a little over four years since!). At least for me, though I have seen it plenty of times while configuring my App Services in Azure, I have scarcely looked at what it can do. Until now.

A project I was working on for a client required authenticating Azure subscribers to the portal. Initially, I went with the regular walkthroughs. I went into my Azure Active Directory blade and under App Registrations, created a new app, secrets and so on. But I faced really strange issues: There were no issues for me (developer never faces issues and has zero bugs on their dev box, yeah?). But my client contact could not login. He was using a Hotmail login address. I had to add him as a Guest User in my Active Directory tenant! That was not going to be a viable action plan for any further step of the project.

The problem, I determined, was that folks on my tenant could log in, but not others. Strangely, another friend was able to login — his account was a custom domain hosted within another Azure Active Directory Tenant (he was using his Organization ID and apparently they were Azure subscribers as well).

So, I tried to use Azure B2C. This is an poorly documented system, where the current documentation and the portal’s user experience are so different. Not only that, there is a lot of confusing terminology used in the documents — and you have to register “apps” in at least three places, not to mention the Web App I was trying to configure! Short story: It was a mess!!!

I told everybody I was giving up on the issue. We would find some “manual” way to get people to authenticate. That was when, an unrelated Google search threw up the page on App Service Authentication. I sent the URL to my mobile to read it during dinner and turned off my computer for the day. Even after I had read the article in question, I only thought of writing a small POC to see what it could do. The next morning, I sat down to set it up. And boy, oh boy! was I in for a pleasant surprise!

To save people the trouble of having go through the same trial and error I did, here is a concise walkthrough of how to do it. I must thank Chris Gillum, whose 2016 blog post clued me into the right course correction to get everything working.

The Walkthrough

  1. Log in to the Azure Portal.
  2. If it is already on the menu on your left/right hand-side, use that. Otherwise, click “All Services” and search there. Go into Azure Active Directory”. If you’re having a hard time chasing it down, click here to go there directly (you maybe prompted to login).
  3. Now click on the “App registrations (Preview)” item. The official documentation follows the flow of going into the other “App registrations” — do not do that, that will end up giving you “OAuth v1.0” tokens. We need “OAuth v2.0” tokens. I found this out by trial and error. Click here to go to the right blade.
  4. Open a Notepad window.
  5. Along the top of the applications view, find the button that says “Endpoints” and click that. Towards the bottom of the pane full of URLs, find the one that says “WS-Federation sign-on endpoint” (third from the bottom at this time). Copy that FULL address and paste it into your Notepad. Now, in Notepad, carefully delete the “/wsfed” from the end of that address — be careful not to delete anything before the “/”. To be safe, you can hit CTRL+H, in Find, type “/wsfed”, leave Replace as blank and hit “Replace All”.
  6. Now hit “+ New Registration”. Enter any name. Be aware that what you enter here will be shown in big bold letters when Azure later asks the user trying to login for consent (the “…. is asking you for permission to access…” UI). Select the option “Accounts in any organizational directory”. Leave the “Redirect URI” blank for now, we will come back to it later. Click on Register.
  7. Once the Azure Portal tells you that the application was deployed successfully, find it again in the same “App Registrations (Preview)” screen and click on it to enter it.
  8. From the overview page, find the “Application (client) ID”. It will be a Guid. Hover on the value to make the “copy” icon appear. Click it to copy it.
  9. Switch to your Notepad window, type in “App ID”, hit ENTER and paste what you copied on step 7.
  10. OPTIONAL. Back on the Azure Portal, go into “Branding” and upload a logo. a picture of size 48×48 pixels works best. Anything else will cause the consent screen to appear in strange shapes and sizes. What you enter into the various URL fields there is not relevant — they will be used to show information links at the bottom of the consent screen. You may leave them blank or enter valid URLs into them — they need not even be on your website!
  11. IMPORTANT. Go into “Certificates & Secrets”. Under the “Client secrets” heading, click “+ New client secret”. Enter a name (does not matter, it is for your convenience), select an expiry value (“Never” is all you need). Click Add. When the new password is generated, it will be shown there. Again, hover on the value under the “VALUE” heading to make the little icon appear and copy it (if you don’t copy it fully, you will be in a world of pain).
  12. Switch to your Notepad window, type in “Secret”, hit ENTER and paste what you copied in step 10.
  13. IMPORTANT. Go into “API Permissions”. Click on “+ Add a permission”. Select “Microsoft Graph” (at the time of writing this, it is a large banner like button right on top of the list that appears). Select “Delegated Permissions”. Check ON: email, offline_access, openid and profile. Scroll to the bottom and find “User”, expand it and check ON “User.Read”. Click “Add permissions” at the bottom.
  14. Now click the “+ Add a permission” again. This time, select “Azure Service Management” (there are many similar looking “Azure” permissions on the list, select the right one). There is only one permission at this time, select it (or select “user_impersonation” if you find more permissions when you’re reading this!). Click “Add permissions” at the bottom.
  15. Right-click on the “App Services” menu item on the navigation (or find it under “All Services”) and select to open it in a new tab — you need to come back to the settings you were working with so far — we are not done there yet! Anyway, end up here.
  16. NOTE: If you already have an app service that you are configuring this for, you can use that. Otherwise, create a new app service. There is nothing special to be done there — and you don’t need to upload any code YET. Once you have selected the app service or created one, continue below.
  17. Select the App Service, open its Authentication/Authorization blade. Set the “App Service Authentication” option to “On”. Immediately a bunch of options will appear below it.
  18. We are only interested in the “Azure Active Directory” option in this walkthrough. So select that. A new blade will open.
  19. Select “Advanced”. A different set of options will appear under it.
  20. For “Client ID”, paste the value pasted under “App ID” from your Notepad window.
  21. Under “Issuer Url” paste the URL you pasted from Step 5 above (it will look like “https://login.microsoftonline.com/…”).
  22. Under “Client Secret”, paste in the value under “Secret” from your Notepad window.
  23. Now, this is very important. Under “Allowed Token Audiences”, first paste in “https://management.core.windows.net/”, tab out. Another text box will appear. Now paste in “https://management.azure.com/” (ensure the final “/” are there). This tells the system to get you the Bearer Tokens that will work with the Azure REST API 🙂 This is the secret magic sauce to the whole thing!
  24. Click OK.
  25. Back on the “Authentication / Authorization” blade, select one of “Allow Anonymous requests (no action)” or “Login with Azure Active Directory”. If you plan to show a “Sign in” link on your website — that is, you want the user to see something before they need to login, then use the “Allow Anonymous requests (no action)” option. If like with the Azure Portal, you want them to be signed in from the get-go, use the “Login with Azure Active Directory” option.
  26. Ensure “Token Store” (bottom of the page, under “Advanced settings”) is “On”.
  27. Click “Save” on top of the page to save everything.
  28. IMPORTANT CAVEAT: If at any point of time, you make changes to this set up, you will need to Restart your App Service before it will use the new values.
  29. Go into the “Overview” blade of your App Service. Wait for all the properties on the top panel to load, and copy the full value of the “URL” (“https://xyz.azurewebsites.net”). Note how this is “https” ?
  30. Now go back to the Azure Directory screen — if you left it open in another tab or window at the end of step 14, switch to that tab. Otherwise, navigate to it from the menu, or click here.
  31. Ensure you are within the Azure Directory Application (the one you configured from step 3 to 14).
  32. Click into the “Authentication” tab.
  33. Under “Redirect URIs”, ensure “Web” is selected for “TYPE”, under “REDIRECT URI”, paste in the URL to the App Service (Step 29). At the end of this URL, paste in “/.auth/login/aad/callback” [be careful to paste in everything between the quotes]. Note that there is a dot (“.”) in front of “auth”. Your final URL should look like: “https://contoso.azurewebsites.net/.auth/login/aad/callback“.
  34. Scrolling down, under Advanced settings, paste/enter the Logout URL to make it thus: “https://contoso.azurewebsites.net/.auth/logout“. Again, note the “.” in front of “auth”.
  35. Scrolling down, under “Implicit grant”, check ON both “Access tokens” and “ID tokens”.
  36. Click Save above.

Your Azure configuration is DONE.

From your App Service Code

Fire up Visual Studio, create a new web application. I am using a regular Web Forms application. You are free to do this in MVC or .NET Core or whatever. I am using Visual Studio 2019, and selected the “ASP.NET Web Application (.NET Framework)” option. If you are prompted to select the type of authentication — leave it as “No authentication”.

You do not need to install new NuGet packages! Azure App Service automatically fetches the right Bearer token for you (without any plumbing!). This is available to you in the Request Header “X-MS-TOKEN-AAD-ACCESS-TOKEN”. Fetch it from:

string token = Request.Headers[“X-MS-TOKEN-AAD-ACCESS-TOKEN”];

You can now pass this token to your AzureRM REST API calls. I do not use any of the Azure SDKs to talk to AzureRM, and write System.Net.HttpClient based GET/PUT/etc calls. My code to pull all the subscriptions for a logged in user now looks like this:

HttpRequestMessage request = new HttpRequestMessage(HttpMethod.Get, “https://management.azure.com/subscriptions?api-version=2019-03-01”);

request.Headers.Authorization = new AuthorizationHeaderValue(“Bearer”, Request.Headers[“X-MS-TOKEN-AAD-ACCESS-TOKEN”]);

HttpResponseMessage response = await client.SendAsync(request);

Simple, huh?