A customer wanted to know the total cost of running Azure Databricks for them. They couldn’t understand what DBU (Databricks Units) was, given that is the pricing unit for Azure Databricks.
I will attempt to clarify DBU in this post.
DBU is just really an abstraction not related to any amount of compute or compute metrics. At the same time, it changes with the following factors:
- When there is a change in the kind of machine you are running the cluster on
- When there is change in the kind of workload (e.g. Jobs, All Purpose)
- The tier / capabilities of the workload (Standard / Premium)
- The kind of runtime (e.g. with or without Photon)
So, the DBUs really measure the consumption of resources. It is billed on a per second basis. The monetary value of these DBUs is called $DBU and is determined by a $ rate charged per DBU. Pricing rate per DBU is available here.
You are paying DBUs for the software that Databricks has made available for processing your Big Data workload. The hardware component needed to run the cluster is charged directly by Azure.
When you visit the Azure Pricing Calculator and look at the pricing for Azure Databricks, you would see that there are two distinct sections – one for compute and another for Databricks DBUs that specifies this rate.
The total price of running an Azure Databricks cluster is a combination of the above two sections.
Let us now understand if we are running a cluster of 6 machines what is the total cost likely to be:
For the All Purpose Compute running in West US in Standard tier with D3 v2, the total cost for compute (in PAYG model) is likely to be USD 1,222/ month.
From the calculator you can see that DBU consumption is 0.75 DBU with a rate of USD 0.4/ hr. So, the total cost of running 6 D3 v2 machines in the cluster will be $ 219 * 6 = $ 1,314.
The total cost hence would be USD 2,536.
From an optimization perspective, you can reduce your hardware compute by making reservations e.g. 1 yr or 3 yrs in Azure. DBUs from the Azure calculator doesn’t have any upfront commitment discounts available. However, they are available if you contact Databricks and request for discount on a fixed commitment.
If you are interested in saving costs while running Databricks clusters, here are two blog posts that you may find interesting:
That clearly explains the cost to customer when using DB platform.
Very Informative, i like the way the costing is illustrated in very easy manner so everyone can understand.
One gets a good clarity regarding how to save costs when it comes to clusters
It is so precisely explained , I have better understanding of what DBU is now than ever. Thank you Abhishek.
Easy to understand and direct to the point