Telerik Products now easily available in India via India Distributor

Diwali has just passed and the festivities are dialling down. Diwali has brought changes for Telerik in India and now I write this as an evangelist for Telerik/ Progress. In this blog post, I will list the changes and attempt to clarify if it has any impact on our community:

Telerik (a part of Progress) has changed its operations structurally in India. Indian Rupee billing was number one request from businesses who found USD billing to a US entity time taking and troublesome.

Telerik products will now be sold in India in Indian Rupees only via a distributor.While this does not affect you as a developer, your accounts team and business managers would love this change due to the following:

  1. A lot less hassle in making the payments done. Earlier, they had to get multiple forms (Form 15 CA & Form 15 CB) signed from a Chartered Accounts and make the payment via a wire transfer through the bank. This could take over 4-5 days to make a single payment and accrue additional charges for the wire transfer. Now, it is a simple NEFT or a UPI transfer.
  2. Lowering cost of projects to the customers. Earlier products like Telerik DevCraft couldn’t be used to offset indirect tax liability. With the new GST regime, your accounts team can offset the tax on their project lowering the cost of the implementation for their customers in India.

Isn’t this just great?

GTM final logo_512The new distributor is GTM Catalyst for the India market. What is even better is that I will be handling this new distributor organisation. This will bring the familiarity and continuity of business for you.

You may reach GTM Catalyst team for any questions or comments at: info@gtmcatalyst.com

To allay any questions in your mind, all the Telerik products continue to be available for sales and are fully supported by Telerik/ Progress. All products are moving full steam ahead including DevCraft, Kendo UI, Telerik Platform, Telerik Reporting, Sitefinity and Test Studio. Telerik (a Progress company) will continue to release new updates on a rigorous pace and continue to provide you with benefits of the latest technologies.

The latest webinar for R3 release for Telerik controls is now available here: https://www.youtube.com/watch?time_continue=10&v=sxv_7RnOwVI 

Lastly, we are eager to continue our India webinar series to share our learnings with you. We will be sending the webinar schedule to you shortly with an update here.

Remember to stay tuned here!

Indian GDPR (DPDP) affects every marketer in India

The Indian government has recently passed the Digital Personal Data Protection Bill (DPDP) in 2023. This is a significant step towards establishing a framework for managing citizens’ data in India.

Previously, data protection was governed by the Information Technology Act of 2000 (Section 21). With this law, customers have been granted specific rights over their data including correction, erasure, grievance redressal, and withdrawing consent.

For every marketer in India, it’s essential to understand and follow the provisions of the DPDP law to avoid severe penalties. Whether you do digital marketing, events marketing or are an organisation collecting data, this bill affects you. Even companies collecting data from their employees are within ambit of this law.

Penalties for Data Fiduciaries/ data collectors breaching customer data security can reach up to Rs 250 crore or USD 30 mn. Penalties are influenced by severity, repetition, and actions taken by the fiduciary.

In this blog post, we’ll highlight the key points of this law that are relevant for marketers:

To grasp the DPDP, it’s important to know the main entities involved:

  1. Data Fiduciaries: These are the parties primarily responsible for handling data. As a website owner, you are a Data Fiduciary.
  2. Data Principals: These are your customers or individuals whose data you are handling.
  3. Data Processors: These are entities that process data on behalf of Data Fiduciaries. Processing includes various operations like collecting, storing, transmitting, and more.

Whether your website is hosted in India or abroad, if it deals with data of Indian citizens, the DPDP law applies (Section 3(b)). As website owners, you can appoint a different Data Processor (Section 8(2)), but you are responsible for data handling and ensuring the processor complies by implementing appropriate measures. This means that you can use external service providers e.g. for emails, SMSs, Whatsapp, Social Media but are responsible for their adherence to these laws.

Obligations of Data Fiduciaries are as follows:

  • Processing for Lawful Purposes (Section 4): Data can only be used for lawful purposes for which the data principal has given consent. You need to notify individuals about the purpose of data collection at the time of data collection. If you acquired customers before this law was enacted, you must provide this notification as soon as possible. The burden of proof for consent lies with the Data Fiduciary.
  • Consent and Withdrawal (Section 6): Individuals can withdraw consent at any time, and you must stop processing their data within a reasonable time (not defined). This includes deleting data from both the processor and fiduciary.
  • Data Protection Officer (Section 8(9)): You must appoint this officer to address customer data queries.
  • Exceptions for Processing without Consent (Section 7): Certain exceptions exist where prior consent isn’t needed, such as government processing, medical emergencies, financial assessments, compliance with legal judgments, and natural disasters.
  • Breach Notification (Section 8(6)): If there’s a data breach, you must notify affected parties.
  • Data of Children (Section 9): Consent from parents is required for individuals under 18. Advertising targeting children is prohibited.

DPDP isn’t as strict as GDPR in terms of processing data within national boundaries, but this can be restricted by government notification. It clarifies that if other laws limit data residency, DPDP doesn’t relax those restrictions. The Government can override data protection laws for maintaining state security and sovereignty (Section 17(2)).

Foreign citizens’ data can be processed in India through valid contracts (Section 17).

DPDP provides an additional advantage to registering yourself as a startup. A lot of exceptions around data compliance apply to startups (Section 17(3)).

As a final clarification, if there are any other laws that are in conflict with this law, then the provisions of this law will prevail to the extent of this conflict.

Disclaimer: This is an overview, consult your legal representative for specific advice.

Securing Azure Databricks ML Model Serving Endpoint with Service Principals: Step by Step

A recent customer of mine had deployed a Machine Learning Model developed using Databricks. They had correctly used MLFlow to track the experiments and deployed the final model using Serverless ML Model Serving. This was deemed extremely useful as the serverless capability allowed the compute resources to scale to zero when not in use shaving off precious costs.

Once deployed, the inferencing/ scoring can be done via an HTTP POST request that requires a Bearer authentication and payload in a specific JSON format.

curl -X POST -u token:$DATABRICKS_PAT_TOKEN $MODEL_VERSION_URI \
  -H 'Content-Type: application/json' \
  -d '{"dataframe_records": [
    {
      "sepal_length": 5.1,
      "sepal_width": 3.5,
      "petal_length": 1.4,
      "petal_width": 0.2
    }
  ]}'

This Bearer token can be generated using the logged in user dropdown -> User Settings -> Generate New Token as shown below:

Databricks Personal Access Token

In Databricks parlance, this is the Personal Access Token (PAT) and is used to authenticate any API calls that are made to Databricks. Also, note that lifetime of the token (defaults to 90 days) and is easily configured.

The customer had the Data Scientist generate the PAT and use it for calling the endpoint.

However, this is an anti-pattern for security. Since the user is a Data Scientist on the Databricks workspace, the PAT would get access to the workspace and allow a nefarious actor to achieve higher privileges. If a user’s account is decommissioned, the user loses access to all Databricks objects. This user based PAT was flagged by the security team and they were requested to replace it with a service account as is a standard practice in any in-premise system.

Since this is an Azure deployment, the customer’s first action was to create a new user account in the Azure Active Directory, add the user to the Databricks workspace and generate the PAT for this “service account”. However, this is still a security hazard given the user has interactive access to Databricks workspace.

I suggested the use of Azure Service Principal that would secure the API access with the lowest privileges. The service principal acts as a client role and uses the OAuth 2.0 client credentials flow to authorize access to Azure Databricks resources. Also, the service principal is not tied to a specific user in the workspace.

We will be using two separate portals and Azure CLI to complete these steps:

  1. Azure Portal
  2. Databricks Workspace

In this post, I am detailing the steps required to do the same (official documentation here):

  1. Generate an Azure Service Principal: Now this step is completely done in the Azure Portal (not Databricks workspace). Go to Azure Active Directory -> App Registrations -> Enter some Display Name -> Select “Accounts in this organisation only” -> Register. After the service principal is generated, you will need its Application (client) ID and Tenant ID as shown below:
  2. Generate Client Secret and Grant access to Databricks API: We will need to create Client Secret and Register Databricks access.
    To generate client secret, remain on Service Principal’s blade -> Certificates & Secrets -> New Client secret -> Enter Description and Expiry -> Add. Copy and store the Client Secret’s Value (not ID) separately. Note: The client secret Value is only displayed once.

    Further, In the API permissions -> Add a permission -> APIs my organization users tab -> AzureDatabricks -> Permissions.
  3. Add SP to the workspace: Now, Databricks has the ability to sync all identities created to the workspace if “Identity Federation” is enabled. Unfortunately, for this customer this was not enabled. Next, they needed to assign permission for this SP.
    This step in done in your Databricks workspace. Go to Your Username -> Admin Settings -> Service Principals tab -> Add service principal -> Enter Application ID from the Step 1 and a name -> No other permissions required -> Add.
  4. Get Service Principal Contributor Access: Once the service principal has been generated, you will need to get Contributor access to the subscription. You will need help of your Azure Admin for the same.
  5. Authenticate as service principal: First login into Azure CLI as the Service Principal with the command:
az login \
--service-principal \
--tenant <Tenant-ID>
--username <Client-ID> \
--password <Client-secret> \
--output table

Use the values from Step 1 (tenant ID & Client ID) and Step 2 (Client Secret) here:

  1. Generate Azure AD tokens for service principals: To generate Azure AD token you will need to use the following command:
az account get-access-token \
--resource 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d \
--query "accessToken" \
--output tsv

The resource ID should be the same as shown above as it represents Azure Databricks.

This will create a Bearer token that you should preserve for the final step.

  1. Assign permissions to SP to access Model Serving: The Model Serving endpoint has its own access control that provides for following permissions: Can ViewCan Query, and Can Manage. Since we require to query the endpoint, we need to grant “Can Query” permission to the SP. This is done using the Databricks API endpoints as follows:
$ curl -n -v -X GET -H "Authorization: Bearer $ACCESS_TOKEN" https://$WORKSPACE_URL../api/2.0/serving-endpoints/$ENDPOINT_NAME 

// $ENDPOINT_ID = the uuid field from the returned object 

Replace @ACCESS_TOKEN with the bearer token generated in Step 6. The @WORKSPACE_URL should be replaced by the Databricks Workspace URL. Finally, the $ENDPOINT_NAME should be replaced by the name of the endpoint created.

This will give you the ID of the Endpoint as shown:

Next, you will need to issue the following command:

$ curl -n -v -X PATCH -H "Authorization: Bearer $ACCESS_TOKEN" -d '{"access_control_list": [{"user_name": "$SP_NAME", "permission_level": "CAN_QUERY"}]}' https://$WORKSPACE_URL../api/2.0/permissions/serving-endpoints/$ENDPOINT_ID

$ACCESS_TOKEN: From Step 6
$SP_NAME: Name of the service principal as created in Azure Portal
$WORKSPACE_URL: Workspace URL
@ENDPOINT_ID: Generated in the command above

  1. Generate Databricks PAT from Azure AAD Token: While you have the Azure AAD token now available and it can used for calling the Model Serving endpoint, the Azure AAD tokens are shortlived.
    To create a Databricks PAT, you will need to make a POST request call to the following endpoint (I am using Postman for the request): https://$WORKSPAE_URL/api/2.0/token/create
    The Azure AAD Bearer token should be passed in the Authorization header. In the body of the request, just pass in a JSON with “comment” attribute.
    The response of the POST request will contain an attribute – “token_value” that can be used as Databricks PAT.

The token_value, starting with “dapi” is the PAT that can be used for calling the Model Serving Endpoint that is in a secure configuration using the Service Principal.

Introduction to Machine Learning with Spark ML – III

In the last post, we saw how we can use Pipelines to streamline our machine learning workflow.

We will start off with a fantastic feature that will blow the lid off your mind. All the effort done till now, in post I and II, can be done in 2 lines of code! Ah yes, this magic is possible with use of a feature called AutoML. Not only will it perform preprocessing steps automatically, it will also select the best algorithm from a multitude of them including XGBoost, LightGBM, Prophet etc. The hyper parameter search comes for free 🙂 If all this was not enough it will share the entire auto-generated code for your use and modification. So, the two magic lines are:

from databricks import automl
summary = automl.classify(train, target_col="category", timeout_minutes=20)

The AutoML requires you to specify the following:

  1. Type of machine learning problem – Classification/ Regression/ Forecasting
  2. Specify the training set, target column
  3. Timeout after which the autoML will stop looking for better models.

While all this is very exciting, the real world use case of AutoML tends to be to create a baseline for model performance and give us a model to start with. In our case, the AutoML gave an RoC score of 91.8% (actually better than our manual work so far!!) for the best model.

Looking at the auto-generated Python notebook, here are the broad steps it took:

  1. Load Data
  2. Preprocessing
    • Impute values for missing numerical columns
    • Convert each categorical column into multiple binary columns through one-hot encoding
  3. Train – Validation – Test Split
  4. Train classification model
  5. Inference
  6. Determine Accuracy – Confusion matrix, ROC and Precision-Recall curves for validation data

This is broadly in line with what we did in our manual setup.

Apart from the coded approach we saw, you can create AutoML experiment using the UI from the Experiments -> Create AutoML Experiment button. If you can’t find the experiments tab, make sure that you have the Machine Learning persona selected in the Databricks workspace.

In an enterprise/ real world scenario, we will build many alternate models with different parameters and use the one with higest accuracy. Next, we deploy the model and use it for predictions with new data. Finally, the selected model will be updated over time as new data becomes available. Until now, we didn’t talk about how to handle these enterprise requirements or were doing it manually on best effort.

These requirements are covered under what we know as MLOps. Spark supports MLOps using an open source framework called MLFlow. MLFlow supports the following objects:

  1. Projects: Provides a mechanism for storing machine learning code in a reusable and reproducible format.
  2. Tracking: Allows for tracking your experiments and metrics. You can see the history of your model and its accuracy evolve over time. There is also a tracking UI available.
  3. Model Registry: Allows for storing various models that you develop in a registry with an UI to explore the same. It also provides for model lineage (which MLflow experiment and run produced the model), stage transitions (for example from staging to production)
  4. Model Serving: We can use serving to provide inference endpoints for either batch or inline processing. Mostly this will be made available as REST endopoints.

There is a very easy way to get started with MLFlow where we allow MLFlow to log automatically the metrics and models. This can be done using a single line:

import mlflow
mlflow.autolog()

This will log the parameters, metrics, models and the environment. The core concept to get with MLOps is the concept of runs. Each run is a unique combination of parameters and algorithm that you have executed. Many runs can be part of the same experiment.

To get started we can set name of the experiment with the command: mlflow_set_experiment(“name_of_the_experiment”)

To start tracking the experiments manually, we can setup the context as follows:

with mlflow.start_run(run_name="") as run:
   <pseudo code for running an experiment>
   mlflow.log_param("key","value")
   mlflow.log_metrics("key","value")
   mlflow.spark.log_model(model,"model name")

You can track parameters and metrics using log_param and log_metrics functions of mlflow object. The model can now be registered with the Model Registry using the function: mlflow.register_model(model_uri=””, name=””)

What is important here is the model_uri. The model_uri takes the form: runs:/<runid>/model. The runid identifies the specific run in the experiment and each model is stored at the model_uri location mentioned above.

You can now load the model and perform inference using the following code:

import mlflow

# Load model
loaded_model = mlflow.spark.load_model(model_uri)

# Perform inference via model.transform()
loaded_model.transform(data)

While we have seen how to track experiments explicitly, Databricks Workspaces also track the experiments automatically (from Databricks Runtime 10.3 ML and above). You can view the expriments and their runs in the UI via the Experiments sidebar of the Machine Learning persona.

You will need to click on the specific experiment (in our case Adult Dataset), that will show all the runs of the experiment. Click on the specific run to get more details about the run. The run will show the metrics recorded which is in our case was areaUnderRoC of 91.4%.

Under Artifacts, if you click on the model, you can see the URI of the model run. This URI can be used to register the model with the Model Registry and use it for predictions at any point in time.

MLFlow also supports indicating the state of the model for production. Different states supported by MLFlow are:

  1. None
  2. Staging
  3. Production
  4. Archived

Once your model is registered with the Model registry, you can change the state of the model to any other state with the function transition_model_version_stage() function.

From the model registry you are able to create model serving endpoint using the Serverless Real-Time Inference service that uses managed Databricks compute service to provide a REST endpoint.

Introduction to Machine Learning with Spark ML – II

In the earlier post, we went over some concepts regarding Machine Learning done with Spark ML. Here are primarily 2 types of objects relating to machine learning we saw:

  1. Transformers: Objects that took a DataFrame, changed something in it and returned a DataFrame. The method used here was “transform”.
  2. Estimator: Objects that are passed in a DataFrame and would apply an algorithm on it to return a transformer. E.g. GBTClassifier. We used the “fit” function to apply the algorithm on the Dataframe.

In our last example of predicting income level using Adult dataset, we had to change our input dataset to a format that is suitable for machine learning. There was a sequence of changes we had done e.g. converting categorical variables to numeric, One Hot Encoding & Assembling the columns in a single column. Everytime there is additional data available (which will be numerous times), we will need to do these steps again and again.

In this post, we will introduce a new Object that organises these steps in sequence that can be run as many times as needed and it is called the Pipeline. The Pipeline chains together various transformers and estimators in sequence. While we could do the machine learning without the Pipeline, it is a standard practice to put the sequence of steps in a Pipeline. Before we get there, let’s try to add an additional step in fixing our pipeline and that is to identify and remove Null data. This is indicated in our dataset as ‘?’.

To know how many null values exist let’s run this command:

from pyspark.sql.functions import isnull, when, count, col
adultDF.select([count(when(isnull(c), c)).alias(c) for c in adultDF.columns]).show()

The result shows that there are no null values. Inspecting the data, we see that null values have been replaced with “?”. We would need to remove these rows from our dataset. We can replace the ? with null values as follows:

adultDF = adultDF.replace('?', None)

Surprisingly this doesn’t change the ? values. It appeared that the ? is padded with some spaces. So we will use the when and trim function as follows:

from pyspark.sql.functions import isnull, when, count, col,trim
adultDF = adultDF.select([when(trim(col(c))=='?',None).otherwise(col(c)).alias(c) for c in adultDF.columns])

This replaces ? will null that we can now drop from our dataframe using dropna() function. The number of rows remaining are now 30,162.

Now let’s organise these steps in a Pipeline as follows:

from pyspark.ml import Pipeline

adultPipeline = Pipeline(stages = [wcindexer,eduindexer,maritalindexer,occupationindexer,relindexer,raceindexer,sexindexer,nativecountryindexer,categoryindexer,ohencoder,colvectors])

The stages list contains all the transformers we used to convert raw data into dataset ready for machine learning. This includes all the StringIndexers, OneHotEncoder and VectorAssembler. Next, the process of defining the GBTClassifier and BinaryClassificationEvaluator remains the same as in the earlier post. You can now include the GBTClassfier in the pipeline as well and run the fit() on this pipeline with train dataset as follows:

adultMLTrainingPipeline = Pipeline(stages = [adultPipeline,gbtclassifier])
gbmodel  = adultMLTrainingPipeline.fit(train)

However, we can perform another optimization at this point. The model currently trained is based of a random split of values from the dataset. Cross Validation can help generalise the model even better by determining best parameters from a list of parameters and do it by creating more than one train and test datasets (called as folds). The list of parameters are supplied as ParamGrid as follows:

from pyspark.ml.tuning import CrossValidator, ParamGridBuilder

paramGrid = ParamGridBuilder()\
  .addGrid(gbtclassifier.maxDepth, [2, 5])\
  .addGrid(gbtclassifier.maxIter, [10, 100])\
  .build()

# Declare the CrossValidator, which performs the model tuning.
cv = CrossValidator(estimator=gbtclassifier, evaluator=eval, estimatorParamMaps=paramGrid)

The cross validator object takes the estimator, evaluator and the paramGrid objects. The pipeline will need to be modified to use this cross validator instead of the classifier object we used earlier as follows:

adultMLTrainingPipeline = Pipeline(stages = [adultPipeline,gbtclassifier])


adultMLTrainingPipeline = Pipeline(stages = [adultPipeline,cv])

With these settings, the experiment ran for 22 mins and the evalution result came out to be 91.37% area under RoC.

Introduction to Machine Learning with Spark ML – I

Machine Learning is most widely done using Python and scikit-learn toolkit. The biggest disadvantage of using this combination is the single machine limit that Python imposes on training the model. This limits the amount of data that can be used for training to the maximum memory on the computer.

Industrial/ enterprise datasets tend to be in terabytes and hence the need for a parallel processing framework that could handle enormous datasets was felt. This is where Spark comes in. Spark comes with a machine learning framework that can be executed in parallel during training using a framework called Spark ML. Spark ML is based on the same Dataframe API that is widely used within the Spark ecosystem. This requires minimal additional learning for preprocessing of raw data.

In this post, we will cover how to train a model using Spark ML.

In the next post, we will introduce the concept of Spark ML pipelines that allow us to process the data in a defined sequence.

The final post will cover MLOps capabilities that MLFlow framework provides for operationalising our machine learning models.

We are going to be Adult Dataset from UCI Machine Learning Repository. Go ahead and download the dataset from the “Data Folder” link on the page. The file you are interested to download is named “adult.data” and contains the actual data. Since the format of this dataset is CSV, I saved it on my local machine as adult.data.csv. The schema of this dataset is available in another file titled – adult.names. The schema of the dataset is as follows:

age: continuous.
workclass: categorical.
fnlwgt: continuous.
education: categorical.
education-num: continuous.
marital-status: categorical.
race: categorical.
sex: categorical.
capital-gain: continuous.
capital-loss: continuous.
hours-per-week: continuous.
native-country: categorical.
summary: categorical

The prediction task is to determine whether a person makes over 50K in a year which is contained in the summary field. This field contains value of <50K or >=50K and is our target variable. The machine learning task is that of binary classification.

Upload file in DBFS

The first step was to upload the dataset from where it is accessible. I chose DBFS for ease of use and uploaded the file at the following location: /dbfs/FileStore/Abhishek-kant/adult_dataset.csv

Once loaded in DBFS, we need to access the same as a Dataframe. We will apply a schema while reading the data since the data doesn’t come with header values as indicated below:

adultSchema = "age int,workclass string,fnlwgt float,education string,educationnum float,maritalstatus string,occupation string,relationship string,race string,sex string,capitalgain double,capitalloss double,hoursperweek double,nativecountry string,category string"

adultDF = spark.read.csv("/FileStore/Abhishek-kant/adult_dataset.csv", inferSchema = True, header = False, schema = adultSchema)

A sample of the data is shown below:

We need to move towards making this dataset machine learning ready. Spark ML only works with numeric data. We have many text values in the dataframe that will need to be converted to numeric values.

One of the key changes is to convert categorical variables expressed as string into labels expressed as string. This can be done using StringIndexer object (available in pyspark.ml.feature namespace) as illustrated below:
eduindexer = StringIndexer(inputCol=”education”, outputCol =”edu”)

The inputCol indicates the column to be transformed and outputCol is the name of the column that will get added to the dataframe after converting to the categorical label. The result of the StringIndexer is shown to the right e.g. Private is converted to 0 while State-gov is converted to 4.

This conversion will need to be done for every column:

#Convert Categorical variables to numeric
from pyspark.ml.feature import StringIndexer

wcindexer = StringIndexer(inputCol="workclass", outputCol ="wc")
eduindexer = StringIndexer(inputCol="education", outputCol ="edu")
maritalindexer = StringIndexer(inputCol="maritalstatus", outputCol ="marital")
occupationindexer = StringIndexer(inputCol="occupation", outputCol ="occ")
relindexer = StringIndexer(inputCol="relationship", outputCol ="relation")
raceindexer = StringIndexer(inputCol="race", outputCol ="racecolor")
sexindexer = StringIndexer(inputCol="sex", outputCol ="gender")
nativecountryindexer = StringIndexer(inputCol="nativecountry", outputCol ="country")
categoryindexer = StringIndexer(inputCol="category", outputCol ="catlabel")

This creates what is called a “dense” matrix where a single column contains all the values. Further, we will need to convert this to “sparse” matrix where we have multiple columns for each value for a category and for each column we have a 0 or 1. This conversion can be done using the OneHotEncoder object (available in pyspark.ml.feature namespace) as shown below:
ohencoder = OneHotEncoder(inputCols=[“wc”], outputCols=[“v_wc”])

The inputCols is a list of columns that need to be “sparsed” and outputCols is the new column name. The confusion sometimes is around fitting sparse matrix in a single column. OneHotEncoder uses a schema based approach to fit this in a single column as shown to the left.

Note that we will not sparse the target variable i.e. “summary”.

The final step for preparing our data for machine learning is to “vectorise” it. Unlike most machine learning frameworks that take a matrix for training, Spark ML requires all feature columns to be passed in as a single vector of columns. This is achieved using VectorAssembler object (available in pyspark.ml.feature namespace) as shown below:

colvectors = VectorAssembler(inputCols=["age","v_wc","fnlwgt","educationnum","capitalgain","capitalloss","v_edu","v_marital","v_occ","v_relation","v_racecolor","v_gender","v_country","hoursperweek"],
outputCol="features")

As you can see above, we are adding all columns in a vector called as “features”. With this our dataframe is ready for machine learning task.

We will proceed to split the dataframe in training and test data set using randomSplit function of dataframe as shown:

(train, test) = adultMLDF.randomSplit([0.7,0.3])

This will split our dataframe into train and test dataframe in 70:30 ratio.
The classifier used will be Gradient Boosting classifier available as GBTClassifier object and initialised as follows:

from pyspark.ml.classification import GBTClassifier

classifier = GBTClassifier(labelCol="catlabel", featuresCol="features")

The target variable and features vector column is passed as attributes to the object. Once the classifier object is initialised we can use it to train our model using the “fit” method and passing the training dataset as an attribute:

gbmodel = classifier.fit(train)

Once the training is done, you can get predictions on the test dataset using the “transform” method of the model with test dataset passed in as attribute:

adultTestDF = gbmodel.transform(test)

The result of this function is addition of three columns to the dataset as shown below:

A very important task in machine learning is to determine the efficacy of the model. To evaluate how the model performed, we can use the BinaryClassificationEvaluator object as follows:

from pyspark.ml.evaluation import BinaryClassificationEvaluator

eval = BinaryClassificationEvaluator(labelCol = "catlabel", rawPredictionCol="rawPrediction")

eval.evaluate(adultTestDF)

In the initialisation of the BinaryClassificationEvaluator, the labelCol attribute specifies the actual value and rawPredictionCol represents the predicted value stored in the column – rawPrediction. The evaluate function will give the accuracy of the prediction in the test dataset represented as AreaUnderROC metric for classification tasks.

You would definitely want to save the trained model for use later by simply saving the model as follows:

gbmodel.save(path)

You can later retrieve this model using “load” function of the specific classifier:

from pyspark.ml.classification import GBTClassificationModel
classifierModel = GBTClassificationModel.load(path)

You can now use this classification model for inferencing as required.

Azure Databricks – What is it costing me?

A customer wanted to know the total cost of running Azure Databricks for them. They couldn’t understand what DBU (Databricks Units) was, given that is the pricing unit for Azure Databricks.

I will attempt to clarify DBU in this post.

DBU is just really an abstraction not related to any amount of compute or compute metrics. At the same time, it changes with the following factors:

  1. When there is a change in the kind of machine you are running the cluster on
  2. When there is change in the kind of workload (e.g. Jobs, All Purpose)
  3. The tier / capabilities of the workload (Standard / Premium)
  4. The kind of runtime (e.g. with or without Photon)

So, the DBUs really measure the consumption of resources. It is billed on a per second basis. The monetary value of these DBUs is called $DBU and is determined by a $ rate charged per DBU. Pricing rate per DBU is available here.

You are paying DBUs for the software that Databricks has made available for processing your Big Data workload. The hardware component needed to run the cluster is charged directly by Azure.

When you visit the Azure Pricing Calculator and look at the pricing for Azure Databricks, you would see that there are two distinct sections – one for compute and another for Databricks DBUs that specifies this rate.

The total price of running an Azure Databricks cluster is a combination of the above two sections.

Let us now understand if we are running a cluster of 6 machines what is the total cost likely to be:

For the All Purpose Compute running in West US in Standard tier with D3 v2, the total cost for compute (in PAYG model) is likely to be USD 1,222/ month.

From the calculator you can see that DBU consumption is 0.75 DBU with a rate of USD 0.4/ hr. So, the total cost of running 6 D3 v2 machines in the cluster will be $ 219 * 6 = $ 1,314.

The total cost hence would be USD 2,536.

From an optimization perspective, you can reduce your hardware compute by making reservations e.g. 1 yr or 3 yrs in Azure. DBUs from the Azure calculator doesn’t have any upfront commitment discounts available. However, they are available if you contact Databricks and request for discount on a fixed commitment.

If you are interested in saving costs while running Databricks clusters, here are two blog posts that you may find interesting:
https://medium.com/similarweb-engineering/how-we-cut-our-databricks-costs-by-50-7c60d6b6c069
https://www.databricks.com/blog/2022/10/18/best-practices-cost-management-databricks.html

Constructing a PySpark DataFrame Dynamically

Spark provides a lot of connectors to load data from various formats. Whether it is a CSV or JSON or Parquet you can use the magic of “spark.read”.

However, there are times when you would like to create a DataFrame dynamically using code. The one use case that I was presented with was to create a dataframe out of a very twisted incoming JSON from an API. So, I decided to parse the JSON manually and create a dataframe.

The approach we are going to use is to create a list of structured Row types and we are using PySpark for the task. The steps are as follows:

  1. Define the custom row class
personRow = Row("name","age")

2. Create an empty list to populate later

community = []

3. Create row objects with the specific data in them. In my case, this data is coming from the response that we get from calling the API.

qr = personRow(name, age)

4. Append the row objects to the list. In our program, we are using a loop to append multiple Row objects to the list.

community.append(qr)

5. Define the schema using StructType

person_schema = StructType([ \
                              StructField("name", StringType(), True), \
                              StructField("age", IntegerType(), True), \

                    ])

6. Create the dataframe using createDataFrame. The two parameters required is the data and schema to applied.

communityDF = spark.createDataFrame(community, person_schema)

Using the steps above, we are able to create a dataframe for use in Spark applications dynamically.

Azure AD Login for your SPA using React Hooks

In this tutorial we are implementing Microsoft Azure Ad Login in React single-page app and retrieve user information using @azure/msal-browser.

The MSAL library for JavaScript enables client-side JavaScript applications to authenticate users using Azure AD work and school accounts (AAD), Microsoft personal accounts (MSA) and social identity providers like Facebook, Google, LinkedIn, Microsoft accounts, etc. through Azure AD B2C service. It also enables your app to get tokens to access Microsoft Cloud services such as Microsoft Graph.

Azure application registration

You can register you web app using  Microsoft tutorial.

In the below App.js file code we are using Context and Reducer for managing the login state. useReducer allows functional components in React access to reducer functions from your state management

First we initialize the initial values for reducer,
Reducers are functions that take the current state and an action as arguments, and return a new state result. In other words, (state, action) => newState

You can see the code form 17 line where we created reducer with using switch case “LOGIN” and “LOGOUT”. and from Line 58 we use the context that defined in line 6 and passing the reducer values to it.

And In a line 66 : we are checking the state.isAuthenticated or not , if it’s not the page redirect to the home page and if it’s Authenticated it will be redirected to the admin dashboard home page.

App.Js

import React, { useReducer } from "react";
import { createBrowserHistory } from "history";
import { Router, Route, Switch, Redirect } from "react-router-dom";
import AdminLayout from "layouts/Admin.js";
import HomeLayout from "layouts/Home.js";
export const AuthContext = React.createContext();
const hist = createBrowserHistory();


const initialState = {
    isAuthenticated: false,
    user: null,
    token: null,
    
};

export const reducer = (state, action) => {
    switch (action.type) {
        case "LOGIN":
            localStorage.setItem("user", action.payload.user);
            localStorage.setItem("token", action.payload.token);
            return {
                ...state,
                isAuthenticated: true,
                user: action.payload.user,
                token: action.payload.token
            };
        case "LOGOUT":
            localStorage.clear();
            return {
                ...state,
                isAuthenticated: false,
                user: null
            };
        default:
            return state;
    }
};

function App() {
    const [state, dispatch] = useReducer(reducer, initialState);

    React.useEffect(() => {
        const user = localStorage.getItem('user') || null;
        const token = localStorage.getItem('token') || null;

        if (user && token) {
            dispatch({
                type: 'LOGIN',
                payload: {
                    user,
                    token
                }
            })
        }
    }, [])
    return (
        <AuthContext.Provider
            value={{
                state,
                dispatch
            }}
        >
            <Router history={hist}>
                <Switch>
                    <div className="App">{!state.isAuthenticated ?
                        <>
                            <Route path="/home" render={(props) => <HomeLayout {...props} />} />
                            <Redirect to="/home" /> </> :
                        <><Route path="/admin" render={(props) => <AdminLayout {...props} />} />
                            <Redirect to="/admin/dashboard" /></>}</div>
                </Switch>
            </Router>
        </AuthContext.Provider>
    );
}

export default App;

Now we are coming the the Login Component with Azure Ad using with @azure/msal-browser

In a below code we are importing AuthContext from App.js and PublicClientApplication from @azure/msal-browser.

and In a 7th line we created constant variable “dispatch” with providing the reference of AuthContext then we created the msalInstandce variable using the PublicClientApplication and providing the clientId, authority and redirectUri in auth block.

In a line 22 we created the startlogin function then we created the loginRequest variable and providing the scope for login.

and In a line 31 we use the msalInstance.loginPopup(loginRequest) for getting pop up window for login and in a “loginRespone” variable save the login response. then we are fetching the username and token form loginRespone. dispatch to the AuthContext (see line 41).

And In a line 56 we created a button it’s call the startLogin function on click event.

Login.cs

import React from "react";

import { AuthContext } from "App";
import {  PublicClientApplication } from '@azure/msal-browser';

export const Login = () => {
  const { dispatch } = React.useContext(AuthContext);

  const msalInstance = new PublicClientApplication({

    auth: {
      clientId: "65395645-90dd-4436-84a4-47d7c0e961fc",
      authority: "https://login.microsoftonline.com/common",
      redirectUri: "https://localhost:3000",
    },
    cache: {
      cacheLocation: "sessionStorage",
      storeAuthStateInCookie: true,
           }
  });

  const startLogin = async (e) => {
    e.preventDefault();
   
    var loginRequest = {
      scopes: ["65395645-90dd-4436-84a4-47d7c0e961fc/.default"] // optional Array<string>
    };
   
    try {
      
      const loginResponse = msalInstance.loginPopup(loginRequest);
      
      const username = (await loginResponse).account.username;
      const token = (await loginResponse).accessToken;

      const resJson = {
        token: token,
        user: username
      };

      dispatch({
        type: "LOGIN",
        payload: resJson
      })
      console.log("Login successfully");
    } catch (err) {

      console.log("Authentication Failed.....");
      console.log(err);
    }
  }
 
  return (
    <div>

      <button className="btn btn-sm btn-outline-success my-2 my-sm-0" data-toggle="button" onClick={(e) => { startLogin(e) }}>Login</button>

    </div>

  );
};

export default Login;

Above you see we used the reducer and context in App.js file and crated a Login Component.

Conclusion

Now we can import the Login component anywhere in a project where we want to login button that uses the Azure AD for Authentication.

Copy Data in Azure Databricks Table from one region to another

One of our customers had a requirement of copying data that was locked in an Azure Databricks Table in a specific region (let’s say this is eastus region). The tables were NOT configured as Delta tables in the originating region and a subset of personnel had access to both the regions.

However, the analysts were using another region (let’s say this is westus region) as it was properly configured with appropriate permissions. The requirement was to copy the Azure Databricks Table from eastus region to westus region. After a little exploration, we couldn’t find a direct/ simple solution to copy data from one Databricks region to another.

One of the first thoughts that we had was to use Azure Data Factory with the Databricks Delta connector. This would be the simplest as we would simply need a Copy Data activity in the pipeline with two linked services. The source would be a Delta Lake linked service to eastus tables and the sink would be another Delta Lake linked service to westus table. This solution faced two practical issues:

  1. The source table was not a Delta table. This prevented the use of Delta Lake linked service as source.
  2. When sink for copy activity is a not a blob or ADLS, it requires us to use a staging storage blob. While we were able to link a staging storage blob, the connection could not be established due to authentication errors during execution. The pipeline error looked like the following:
Operation on target moveBlobToADB failed: ErrorCode=AzureDatabricksCommandError,Hit an error when running the command in Azure Databricks. Error details: shaded.databricks.org.apache.hadoop.fs.azure.AzureException: shaded.databricks.org.apache.hadoop.fs.azure.AzureException: Unable to access container adf-staging in account xxx.blob.core.windows.net using anonymous credentials, and no credentials found for them in the configuration. Caused by: shaded.databricks.org.apache.hadoop.fs.azure.AzureException: Unable to access container adf-staging in account xxx.blob.core.windows.net using anonymous credentials, and no credentials found for them in the configuration. Caused by: hadoop_azure_shaded.com.microsoft.azure.storage.StorageException: Public access is not permitted on this storage account..

On digging deeper, this requirement is documented as a prerequisite. We wanted to use the Access Key method but it requires the keys to be added into Azure Databricks cluster configuration. We didn’t have access to modify the ADB cluster configuration.

To work with this, we found the following alternatives:

  1. Use Databricks notebook to read data from non-Delta Tables in Databricks. The data can be stored in a staging Blob Storage.
  2. To upload the data into the destination table, we will again need to use a Databricks notebook as we are not able to modify the cluster configuration.

Here is the solution we came up with:

In this architecture, the ADB 1 notebook is reading data from Databricks Table A.1 and storing it in the staging blob storage in the parquet format. Only specific users are allowed to access eastus data tables, so the notebook has to be run in their account. The linked service configuration of the Azure Databricks notebook requires us to manually specify: workspace URL, cluster ID and personal access token. All the data transfer is in the same region so no bandwidth charges accrue.

Next the Databricks ADB 2 notebook is accesses the parquet file in the blob storage and loads the data in the Databricks Delta Table A.2.

The above sequence is managed by the Azure Data Factory and we are using Run ID as filenames (declared as parameters) on the storage account. This pipeline is configured to be run daily.

The daily run of the pipeline would lead to a lot of data in the Azure Storage blob as we don’t have any step that cleans up the staging files. We have used the Azure Storage Blob lifecycle management to delete all files not modified for 15 days to be deleted automatically.

Azure AD Authentication in ASP.NET Core Web API

The next step to building an API is to protect it from anonymous users. Azure Active Directory (AD) serves as an identity platform that can be used to secure our APIs from anonymous users. After the authentication is enabled, users will need to provide a OAuth 2.0/ JST token to gain access to our API.

Let us begin to implement Azure AD Authentication in ASP.NET Core 5.0 Web API.

I will be creating ASP.NET Core 5.0 project and show you step by step how to enable authentication on it using Azure AD Authentication. We will be doing it using the MSAL package from nuget.

Prerequisites

Before you start to follow steps given in this article, you will need an Azure Account, and Visual Studio 2019 with .NET 5.0 development environment step.

Creating ASP.NET Core 5.0 web application

Open visual studio and click on Create a new project in the right and select “Asp.net core web app” as shown in below image and click next.

In the configure your new project section enter name and location of your project as shown in below image and click next

In the additional information step, select .NET 5.0 in the target framework, Authentication Type to none and check Configure HTTPS checkbox and click on create.

Configuring ASP.NET Core 5.0 App for Azure AD Authentication

Open appsettings.json of your web api and add following lines of code.

"AzureAd": {
    "Instance": "https://login.microsoftonline.com/",
    "Domain": "gtmcatalyst.com",  "qualified.domain.name",
    "ClientId": "your-client-id",
    "TenantId": "your-tenant-id"
  }

Replace your-client-id and your-tenant-id with the actual values that you copied while doing app registration in azure ad

Next, add package manager console and add following two package references to your web application.

using Microsoft.AspNetCore.Authentication.JwtBearer;
using Microsoft.Identity.Web;

Next, open startup.cs in your project and paste following code in the ConfigureServices method

   services.AddAuthentication(JwtBearerDefaults.AuthenticationScheme)
               .AddMicrosoftIdentityWebApi(Configuration.GetSection("AzureAd"));

Next, in the Startup.cs, go to Configure method and add app.UseAuthentication(); line before app.UseAuthorization(); line.

Next, open any Controller and add [Authorize] attribute:

    [Authorize]
    [Route("[controller]")]
    [ApiController]
    public class SupportController : ControllerBase
    {

    }

Save all files and run your project.

You will notice that once you run the project, and try to access any method in support controller from the browser you will get return the HTTP ERROR 401 ( Unauthorized client error).

Conclusion

Our API is no longer available for anonymous access. It is now protected by Azure AD Authentication.

Testing SOAP APIs with Telerik Test Studio

The legacy SOAP based web services/ APIs are still in use at a lot of organisations. While Telerik Test Studio has direct support for REST based APIs, testing SOAP web services requires a few additional steps.

In the short video below, we share how we can test SOAP based APIs in Test Studio:

The three things needed include:

  1. A Test Oracle to encapsulate web services call: Built easily with Visual Studio
  2. Coded Steps in Test Studio
  3. Data Driven tests for testing various inputs

The transcript of the video is as follows:

Can Test studio test APIs? And i think most of you know the answer that it does support testing REST APIs. Recently one of the customers asked me if Telerik Test Studio can be used for automation of SOAP-based web services and the answer to that is it can be done in telerik test studio using coded tests.

In this video we will see how we can test SOAP based web services using Test Studio. The soap web service that i am using for testing is a publicly available service that converts digits to numbers. So in this web service i’m going to use the number and what it allows me to do is enter a number so let’s enter 789 and by invoking this i get the the words or the digits converted into words as seven eighty nine. So this is the web service that i’m going to test back in the test studio.

I have authored a test that will feed in various digits and verify if the web service returns the correct words for those digits. For the same i have put in some local data with digits in one column and in words in the second column. So this is going to be a data driven test. Let’s test this with our script. I’m just starting the test with the headless chrome as execution engine because web services don’t have a UI. The test has started and you can see that i’ve got the debugger going and already the test has finished.

Now there were four iterations of this test, this being a data driven test and if you want to see the logs for each of the iterations that’s visible here as well. In iteration number one you will notice the overall result is pass. In fact here it shows we’re using chrome headless version and our test was successful in iteration number two where we had put in two three eight nine the result is fail and the failure information is also mentioned here. Well we were expecting the web service to return two hundred and eighty nine but the web service actually returned two thousand three hundred and eight nine not eighty nine. So this actually indicates a failure on the assertion. Well the test continued because we have marked this step as continue on failure. Iteration number three has result as pass wherein we passed in a single digit 4 and we also got back the same value which is four. Iteration or the last iteration here is checking for digits 10 and inwords it should also be ten and once again the overall result is pass. So here you can see its result is pass but your overall result is a fail because one of the iterations here did actually fail.

So in the next few minutes we will see how we had created this SOAP test in telerik test studio. So to work the magic of a soap api test and it’s actually quite painless in telerik test studio. You need three things:

1. the first is a test oracle number

2. two is the coded step in test studio and

3. number three is data driving the verification

Okay let’s start with the test oracle. The test oracle is an external assembly that you are going to use within test studio inside your script but this assembly is going to be created by someone else. So what i had gotten done was to call the SOAP API using a visual studio function of adding web service reference and created that project as an assembly. Now the next step would be to add that assembly in test studio so if you go to your project and go to settings you will have a section that says script. Now within this section you will notice some assemblies that are automatically referenced here. Now if you notice this reference, this is something that has been added as a custom reference and this is where my test oracle has been created. So this assembly uses a simple function to invoke the SOAP API and provides a simple programmatic interface to invoke this soap service. So if you want to add more test oracles or external assemblies you can click on add reference and that’s what allows you to refer any .NET based assembly. So that takes care of the first step which is of getting a test article in place.

The second step is to add a coded step in my test script so you can add a coded step in test studio by using the step builder. Here where i’ve selected the common and within it there is a coded step and i can click on add step button. And this will add a coded step so that’s already done here. As you can see the coded step i am using is titled sample soap underscore coded step. In fact the code for this is available in the cs file located here. I just opened up the cs file the first thing that i need to do is to add a reference or to soap call namespace now in my function which is called as sample code underscore coded step. I am creating an instance of the object and within this i am passing in the specific digits. And finally i’m asserting if the digit that i have passed is the same as the one it should be in words.

Now the syntax here is slightly different. The syntax looks different because we are actually doing the third step in this demo which is data driving the verification. To data drive the verification as well as input values you’ve got to use the data and the specific column that you have created. So that’s what i’m using, so data and digits. And since the function towords requires me to give in an integer value i’m passing that in to the towords function of the test oracle. So conversion service and the function towords are actually coming from the test oracle. Finally i’m adding an assertion to see if the value that has been returned which is the nword value here and then the value that is there in my data driven column which is called inwords is also mentioned here. So both of them need to match up for this test to succeed. To quickly check where we had put in this data driven values we can go back to the test script and down below next to the test steps you will find the local data tab. And that’s where we can manually add these values.

This demonstration has showed you that Test Studio can be easily used to data drive SOAP based APIs automation

Enjoy watching the video and share your questions in the comments section below.

Handling a peculiar DATETIME issue in Telerik Reporting

We got a request from our customer wanting to use a CSV file as Data Source in Telerik Report Designer to generate the report. They were unable to parse DateTime column to DateTime data type in Telerik Report Designer because the CSV returns the column with double quotes for example (“4/14/2021 12:42:25PM”).

Values can be easily cast to DateTime in Telerik Reporting during the datasource definition. The issue here was the quotes that came in the datasource.

In this blog post, we will explore how we can work with this scenario using Expressions provided by Telerik Report Designer..

First, we need to Add Data Source so click on Data in Menu then select the CSV Data Source and select and then click on Next button as below :

check the checkbox if CSV file has headers then click on NEXT button.

Now we the screen appear for Map columns to type and you can see the StartTime column below with double quotes:

Now we try to change the type of StartTime column string to DateTime. Now need to provide Date format in our case like yyyy-mm-dd hh:ss but in our case it will not work and the column will show blank.

Cause of blank again we need the change DateTime to String and Click on NEXT Button and Finish.

Now to the Data Source is connected with the report. the customer want the report with the parameter of data. So we need to convert StartTime(string) to StartTime(DateTime) using expression.

First we need to remove double quotes from StartTime using Replace function.

//Syntax of Replace function
=Replace(text, old substirng, new substring)

In a below code we take a text from Fields.StartTime and provide old substring that to be remove is double quotes in single quotes. and new substring is blank. 

=Replace(Fields.StartTime,'"',"")

We got a StartTime in string without double quotes. so now we need to convert String to DateTime using CDate(value) Funciton. In a CDate function we provide the above Replace function that returning the StartTime without double qoutes.

= CDate(Replace(Fields.StartTime,'"',""))

Now we got the StartTime with the type of DateTime. and the above Expression we can use any where we want like filter the data and making parameter range with StartTime.

How to use Word Template with Telerik Document Processing Library using ASP.NET Core

We get frequent requests from customers wanting to build a document using a template with header and footer and just including text in the Word document using ASP.NET Core.

We can achieve this using the Telerik WordsProcessing library. In this blog post, we will explore how.

You can do this by importing the document template with the DocxFormatProvider (if the template is a DOCX document or with any other of the supported by the WordsProcessing format providers) into a RadFlowDocument, edit its content using the RadFlowDocumentEditor and export it with the appropriate format provider. 

First, you need to to install Telerik.Documents.Flow NuGet package see below picture where I am using Trial version of Telerik.

Add Template.docx file in root directory with header and footer.

In Program.cs file

Step 1 : Add below mentioned namespaces :

using System.Diagnostics;
using Telerik.Windows.Documents.Flow.FormatProviders.Docx;
using Telerik.Windows.Documents.Flow.Model;
using Telerik.Windows.Documents.Flow.Model.Editing;

Step 2 : Create RadFlowDocument object.

RadFlowDocument document;

Step 3 : And add the DocxFormatProvider for importing the DOCX Template in RadFlowDocument object “document”.

DocxFormatProvider provider = new DocxFormatProvider();

string templatePath = "Template.docx";

  using (Stream input = File.OpenRead(templatePath))
   {
       document = provider.Import(input);
   }

Step 4 : Now add the RadFlowDocumentEditor for editing the RadFlowDocument with providing it’s object. and with the RadFlowDocumentEditor object “editor” we are inserting the text.

RadFlowDocumentEditor editor = new RadFlowDocumentEditor(document);
editor.InsertText("Telerik WordsProcessing Library");

Step 5 : Now Define the Output of Document path. and export the document using above mentioned RadFlowDocumentEditor object “provider”.



 string outputDocumentPath = "Output.docx";
     
 using (Stream output = File.OpenWrite(outputDocumentPath))
     {
          provider.Export(document, output);
     }

Step 6 : And I am opening the file after exporting using below line of code you can also remove that if you don’t want to open file.

 Process.Start(new ProcessStartInfo() { FileName = outputDocumentPath, UseShellExecute = true });

The whole program look likes below :

Now you can get the Word Document File with name of Output.docx.

And for more information about the RadFlowDocumentEditor you can find in the RadFlowDocumentEditor help article.

If you are new to this library, I would suggest starting with the Getting Started help article and checking the WordsProcessing`s Cross-Platform Support topic.

Hierarchical structure using Telerik TreeView in ASP.NET MVC

One of our customers required hierarchical UI to be implemented. The data was residing in an API and required remote data binding. This task is very easy to implement with Telerik TreeView from Telerik UI for ASP.NET MVC suite.

A TreeView component represents hierarchical data in a tree structure. It allows users to perform single or multiple selection of items, drag and drop of elements within the TreeView.

The Telerik UI for ASP.NET MVC TreeView component comes with built-in checkbox support, keyboard navigation, RTL support, accessibility and provides templates for complete customization of each node. You can bind the TreeView to various data sources and take advantage of its load on demand feature, and request data only when a node is expanded.

Let’s see how we can use Telerik TreeView control to implement heirarchial structures:

  1. Sample Database and Table Using below script

For hierarchical structure we need to do one to many relation in tables, but in a below example, we are doing relation one-to-many from the Products table to itself (in a single table). In the table, EmployeeId is primary key and ReportsTo is foreign key. See the highlighted lines below:

CREATE DATABASE [Sample]

GO
USE [Sample]
GO
/****** Object:  Table [dbo].[Employees]    Script Date: 2/16/2021 1:34:48 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Employees](
	[EmployeeID] [int] IDENTITY(1,1) NOT NULL,
	[LastName] [nvarchar](20) NOT NULL,
	[FirstName] [nvarchar](10) NOT NULL,
	[Title] [nvarchar](30) NULL,
	[TitleOfCourtesy] [nvarchar](25) NULL,
	[BirthDate] [datetime] NULL,
	[HireDate] [datetime] NULL,
	[Address] [nvarchar](60) NULL,
	[City] [nvarchar](15) NULL,
	[Region] [nvarchar](15) NULL,
	[PostalCode] [nvarchar](10) NULL,
	[Country] [nvarchar](15) NULL,
	[HomePhone] [nvarchar](24) NULL,
	[Extension] [nvarchar](4) NULL,
	[ReportsTo] [int] NULL,
 CONSTRAINT [PK_Employees] PRIMARY KEY CLUSTERED 
(
	[EmployeeID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET IDENTITY_INSERT [dbo].[Employees] ON 

INSERT [dbo].[Employees] ([EmployeeID], [LastName], [FirstName], [Title], [TitleOfCourtesy], [BirthDate], [HireDate], [Address], [City], [Region], [PostalCode], [Country], [HomePhone], [Extension], [ReportsTo]) VALUES (1, N'Davolio', N'Nancy', N'Sales Representative', N'Ms.', CAST(N'1948-12-08T00:00:00.000' AS DateTime), CAST(N'1992-05-01T00:00:00.000' AS DateTime), N'507 - 20th Ave. E.
Apt. 2A', N'Seattle', N'WA', N'98122', N'USA', N'(206) 555-9857', N'5467', 2)

INSERT [dbo].[Employees] ([EmployeeID], [LastName], [FirstName], [Title], [TitleOfCourtesy], [BirthDate], [HireDate], [Address], [City], [Region], [PostalCode], [Country], [HomePhone], [Extension], [ReportsTo]) VALUES (2, N'Fuller', N'Andrew', N'Vice President, Sales', N'Dr.', CAST(N'1952-02-19T00:00:00.000' AS DateTime), CAST(N'1992-08-14T00:00:00.000' AS DateTime), N'908 W. Capital Way', N'Tacoma', N'WA', N'98401', N'USA', N'(206) 555-9482', N'3457', NULL)

INSERT [dbo].[Employees] ([EmployeeID], [LastName], [FirstName], [Title], [TitleOfCourtesy], [BirthDate], [HireDate], [Address], [City], [Region], [PostalCode], [Country], [HomePhone], [Extension], [ReportsTo]) VALUES (3, N'Leverling', N'Janet', N'Sales Representative', N'Ms.', CAST(N'1963-08-30T00:00:00.000' AS DateTime), CAST(N'1992-04-01T00:00:00.000' AS DateTime), N'722 Moss Bay Blvd.', N'Kirkland', N'WA', N'98033', N'USA', N'(206) 555-3412', N'3355', 2)

INSERT [dbo].[Employees] ([EmployeeID], [LastName], [FirstName], [Title], [TitleOfCourtesy], [BirthDate], [HireDate], [Address], [City], [Region], [PostalCode], [Country], [HomePhone], [Extension], [ReportsTo]) VALUES (4, N'Peacock', N'Margaret', N'Sales Representative', N'Mrs.', CAST(N'1937-09-19T00:00:00.000' AS DateTime), CAST(N'1993-05-03T00:00:00.000' AS DateTime), N'4110 Old Redmond Rd.', N'Redmond', N'WA', N'98052', N'USA', N'(206) 555-8122', N'5176', 2)

INSERT [dbo].[Employees] ([EmployeeID], [LastName], [FirstName], [Title], [TitleOfCourtesy], [BirthDate], [HireDate], [Address], [City], [Region], [PostalCode], [Country], [HomePhone], [Extension], [ReportsTo]) VALUES (5, N'Buchanan', N'Steven', N'Sales Manager', N'Mr.', CAST(N'1955-03-04T00:00:00.000' AS DateTime), CAST(N'1993-10-17T00:00:00.000' AS DateTime), N'14 Garrett Hill', N'London', NULL, N'SW1 8JR', N'UK', N'(71) 555-4848', N'3453', 2)

INSERT [dbo].[Employees] ([EmployeeID], [LastName], [FirstName], [Title], [TitleOfCourtesy], [BirthDate], [HireDate], [Address], [City], [Region], [PostalCode], [Country], [HomePhone], [Extension], [ReportsTo]) VALUES (6, N'Suyama', N'Michael', N'Sales Representative', N'Mr.', CAST(N'1963-07-02T00:00:00.000' AS DateTime), CAST(N'1993-10-17T00:00:00.000' AS DateTime), N'Coventry House
Miner Rd.', N'London', NULL, N'EC2 7JR', N'UK', N'(71) 555-7773', N'428', 5)

INSERT [dbo].[Employees] ([EmployeeID], [LastName], [FirstName], [Title], [TitleOfCourtesy], [BirthDate], [HireDate], [Address], [City], [Region], [PostalCode], [Country], [HomePhone], [Extension], [ReportsTo]) VALUES (7, N'King', N'Robert', N'Sales Representative', N'Mr.', CAST(N'1960-05-29T00:00:00.000' AS DateTime), CAST(N'1994-01-02T00:00:00.000' AS DateTime), N'Edgeham Hollow
Winchester Way', N'London', NULL, N'RG1 9SP', N'UK', N'(71) 555-5598', N'465', 5)

INSERT [dbo].[Employees] ([EmployeeID], [LastName], [FirstName], [Title], [TitleOfCourtesy], [BirthDate], [HireDate], [Address], [City], [Region], [PostalCode], [Country], [HomePhone], [Extension], [ReportsTo]) VALUES (8, N'Callahan', N'Laura', N'Inside Sales Coordinator', N'Ms.', CAST(N'1958-01-09T00:00:00.000' AS DateTime), CAST(N'1994-03-05T00:00:00.000' AS DateTime), N'4726 - 11th Ave. N.E.', N'Seattle', N'WA', N'98105', N'USA', N'(206) 555-1189', N'2344', 2)

INSERT [dbo].[Employees] ([EmployeeID], [LastName], [FirstName], [Title], [TitleOfCourtesy], [BirthDate], [HireDate], [Address], [City], [Region], [PostalCode], [Country], [HomePhone], [Extension], [ReportsTo]) VALUES (9, N'Dodsworth', N'Anne', N'Sales Representative', N'Ms.', CAST(N'1966-01-27T00:00:00.000' AS DateTime), CAST(N'1994-11-15T00:00:00.000' AS DateTime), N'7 Houndstooth Rd.', N'London', NULL, N'WG2 7LT', N'UK', N'(71) 555-4444', N'452', 5)

SET IDENTITY_INSERT [dbo].[Employees] OFF
GO

ALTER TABLE [dbo].[Employees]  WITH NOCHECK ADD  CONSTRAINT [FK_Employees_Employees] FOREIGN KEY([ReportsTo])
REFERENCES [dbo].[Employees] ([EmployeeID])
GO

ALTER TABLE [dbo].[Employees] CHECK CONSTRAINT [FK_Employees_Employees]
GO

ALTER TABLE [dbo].[Employees]  WITH NOCHECK ADD  CONSTRAINT [CK_Birthdate] CHECK  (([BirthDate] < getdate()))
GO

ALTER TABLE [dbo].[Employees] CHECK CONSTRAINT [CK_Birthdate]
GO

The Employee table should look like the below:

Next create Entity Module.

After Creation of Entity, Employee class look like below

public partial class Employee
    {
        public Employee()
        {
            this.Employees1 = new HashSet<Employee>();
        }
    
        public int EmployeeID { get; set; }
        public string LastName { get; set; }
        public string FirstName { get; set; }
        public string Title { get; set; }
        public string TitleOfCourtesy { get; set; }
        public Nullable<System.DateTime> BirthDate { get; set; }
        public Nullable<System.DateTime> HireDate { get; set; }
        public string Address { get; set; }
        public string City { get; set; }
        public string Region { get; set; }
        public string PostalCode { get; set; }
        public string Country { get; set; }
        public string HomePhone { get; set; }
        public string Extension { get; set; }
        public Nullable<int> ReportsTo { get; set; }
        public string PhotoPath { get; set; }
    
        public virtual ICollection<Employee> Employees1 { get; set; }
        public virtual Employee Employee1 { get; set; }
    }

2. Now in the controller TreeviewController.cs

Note also, that the hasChildren uses a navigation property generated in the EF model (Employees1). That would be present if you have created a relation one-to-many from the Products table to itself.

public JsonResult Remote_Data_Binding_Get_Employees(int? id)
{			
  using (TelerikEntities entities = new TelerikEntities())
  {
     var data = from e in entities.Employees
     where (id.HasValue ? e.ReportsTo == id : e. ReportsTo == null)
      select new
       {
         id = e.EmployeeID,
         Name = e.FirstName,
         hasChildren = e.Employees1.Any()

       };
      return Json(data.ToList(), JsonRequestBehavior.AllowGet);
   }
}

In the Remote_Data_Binding_Get_Employees action of the TreeviewController.cs you will notice the yellow colored variables:

Note that as the Name field is used in the controller, the same should also be used in the TreeView in index.cstml file and make sure return of data in list.

3. In View index.cshtml

<div class="demo-section k-content">
        @(Html.Kendo().TreeView()
        .Name("treeview")
        .DataTextField("Name")
        .DataSource(dataSource => dataSource
            .Read(read => read
                .Action("Remote_Data_Binding_Get_Employees", "TreeView")
                )
        )
    )
</div>

In a above code we are doing remote data binding in .Action function we are passing “Method name” and “Controller name”.

Here is the hierarchical output we have acheived:-

Make reusable web controls with Angular and Telerik Kendo UI

Angular requires the use of the entire framework for it to work making it a takeover for the entire application being built. Web Components provide a specification by which we make these Angular components available for use with plain simple HTML. It is a web standard for defining new HTML elements in a framework-agnostic way.

Specifically, Angular elements are Angular components packaged as custom elements (also called Web Components).

One of the questions that our customers ask us is when they use Kendo UI is Angular Elements supported? The answer is a resounding yes and we detail a simple step by step to showcase this capability by using Kendo UI charts control:

1. Install Angular CLI and create a new project

npm i -g @angular/cli
ng new angular-custom-elements

2. Activate your Trial or commercial License

Kendo UI for Angular is a professionally developed library distributed under a commercial license. Starting from December 2020, using any of the UI components from the Kendo UI for Angular library requires either a commercial license key or an active trial license key.

After login in your telerik account Download your Telerik license key and Save the kendo-ui-license.txt license key file in the project folder.

Install or Update a License Key

  • Copy the license key file (kendo-ui-license.txt) to the root folder of your project. Alternatively, copy the contents of the file to the KENDO_UI_LICENSE environment variable.
  • Install @progress/kendo-licensing as a project dependency by running npm install --save @progress/kendo-licensing or yarn add @progress/kendo-licensing.
  • Run npx kendo-ui-license activate or yarn run kendo-ui-license activate in the console.

Adding the Kendo UI Components

Kendo UI for Angular is distributed as multiple NPM packages scoped to @progress. For example, the name of the Grid package is @progress/kendo-angular-grid. As of the Angular 6 release, Angular CLI introduces the ng add command which provides for a faster and more user-friendly package installation. For more information, refer to the article on using Kendo UI for Angular with Angular CLI.

3. Let’s start and add the Charts package:

Angular CLI supports the addition of packages through the ng add command which executes in one step the set of otherwise individually needed commands.

ng add @progress/kendo-angular-charts

The command installs all necessary packages, sets up the default theme, and imports the component module. The full set of applied changes can be seen by running git diff at any time.

Manual Setup

All components that you reference during the installation will be present in the final bundle of your application. To avoid ending up with components you do not actually need, either:

  • Import all Charts components at once by using the ChartsModule, or
  • Import a specific Charts component by adding it as an individual NgModule.

Download and install the package.

npm install --save @progress/kendo-angular-charts @progress/kendo-angular-common @progress/kendo-angular-intl @progress/kendo-angular-l10n @progress/kendo-angular-popup @progress/kendo-drawing hammerjs @progress/kendo-licensing

Once installed, import Hammer.js and the NgModule of the components you need.

To get all package components, import the ChartsModule in your [application root]({{ site.data.url.angular[‘ngmodules’] }}#angular-modularity) or feature module in app.module.ts.

3. Add elements package

Custom elements are a Web Platform feature currently supported by Chrome, Edge (Chromium-based), Firefox, Opera, and Safari, and available in other browsers through polyfills

ng add @angular/elements

4. Create a component

ng g component chart --inline-style --inline-template -v None

5. Add properties to the component

5. Update NgModule

6. Building the Angular Project for Production

ng build –prod –output-hashing=none

Now we need to create a build script(angular-elements-build.js) to produce only one JS file from the multiple files generated by the Angular CLI.

You need to install fs-extra and concat from npm using:

npm install fs-extra concat

7. In your root application, create a build script file and add below code, angular-element-build.js

const fs = require('fs-extra');
const concat  = require('concat');
(async function build() {
const files = [
'./dist/chart-custom-element/runtime.js',
'./dist/chart-custom-element/polyfills.js',
'./dist/chart-custom-element/main.js',
]
try{
await fs.ensureDir('angular-elements')
await fs.copy('./dist/chart-custom-element/styles.css','angular-elements/styles.css')
await concat(files,'angular-elements/chart-angular-element.js')
}catch(err){
console.log(err);
}
})()

7. run the script using below command.

node angular-element-build.js

The above command will create an angular-elements folder and chat-angular-element.js and copy styles.css file inside in angular-elements folder

8. Use the Angular Element in simple HTML

Add index.html file in angular-elements folder with below code:

<!DOCTYPE html>
<html lang="en">
<head>
    <base href="/">
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <link rel="icon" type="image/x-icon" href="favicon.ico">
    <link rel="stylesheet" href="styles.css" />
    <title>Testing our custom chart element</title>
</head>
<body>
    <div class="container">
        New component
        <app-chart></app-chart>
        <script type="text/javascript" src="chart-angular-element.js"></script>
        <script>
            let arr = [2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011];
            let arrseries = [{
                name: 'India',
                data: [3.907, 7.943, 7.848, 9.284, 9.263, 9.801, 3.890, 8.238, 9.552, 6.855]
            }, {
                name: 'Russian Federation',
                data: [4.743, 7.295, 7.175, 6.376, 8.153, 8.535, 5.247, -7.832, 4.3, 4.3]
            }, {
                name: 'Germany',
                data: [0.010, -0.375, 1.161, 0.684, 3.7, 3.269, 1.083, -5.127, 3.690, 2.995]
            }, {
                name: 'World',
                data: [1.988, 2.733, 3.994, 3.464, 4.001, 3.939, 1.333, -2.245, 4.339, 2.727]
            }]
            let title = "New Title"
            let querySelect = document.querySelector('app-chart');
            querySelect.categories = arr;
            querySelect.series = arrseries;
            querySelect.title = title;
        </script>
    </div>
</body>
</html>

Install live-server using below command:

npm install -g live-server

Navigate to your angular-element folder and run below command:

cd angular-element
npx live-server

Browser window will open with the URL http://localhost:8080/

Now you can see the angular component is working outside of the angular application.