ML.NET: Machine Learning for .NET
ML.NET is developing into a high-level API and comprehensive framework that not only makes use of its own ML features but also makes other lower-level ML infrastructure libraries and runtimes, like TensorFlow and ONNX, simpler. ML.NET is more than just a machine learning library that offers a specific set of features.
ML.NET, a free, cross-platform, and open-source machine learning framework created to bring the power of ML to .NET applications for a variety of scenarios, including sentiment analysis, price prediction, recommendation, image classification, and more, was publicly unveiled by Microsoft at the Build conference in May 2018. Microsoft released ML.NET 1.0 at Build 2019 a year later, including additional tools and capabilities to make training unique ML models for .NET developers even simpler.
Basics of Machine Learning
Without explicit programming, computers can now make predictions thanks to machine learning. Problems that are challenging (or impossible) to solve using rules-based programming are handled via machine learning (e.g., if statements and for loops). You might not know where to begin, for example, if you were required to develop an application that can determine whether or not a picture contains a dog. Similar to this, you might start by looking at keywords like “long sleeves” and “business casual” if you were asked to design a function that predicts the price of a shirt based on the description of the garment, but you might not know how to build a function to scale it to a few hundred goods.
In order to analyze a large amount of data more quickly and accurately than ever before and to make data-driven decisions more easily, machine learning can automate a wide range of “human” tasks, such as classifying images of dogs and other objects or predicting values, such as the price of a car or house.
Let’s make an overview of ML.NET
In order to build your own unique ML models, you can use the ML.NET framework. In contrast to “pre-built AI,” which involves using pre-made general AI services from the cloud, this customized approach (like many of the offerings from Azure Cognitive Services). This can be quite effective in many situations, but due to the nature of the machine learning challenge or the deployment context, it might not always meet your unique business demands (cloud vs. on-premises). You can build and train your own ML models using ML.NET, which makes it incredibly flexible and responsive to your particular data and business domain circumstances. You can run ML.NET anywhere because the framework includes libraries and NuGet packages that you can use in your .NET apps.
Who does ML.NET address?
With the aid of ML.NET, developers can quickly incorporate machine learning into practically any.NET application using their existing .NET expertise. This implies that you are no longer required to learn a different programming language, such as Python or R, if C# (or F#, or VB), is your preferred programming language, in order to create your own ML models and incorporate unique machine learning into your.NET projects. It is not necessary to have any prior machine learning experience in order to use the framework’s tooling and features to quickly design, train, and deploy superior bespoke machine learning models directly on your computer.
Is ML.NET free?
You can use ML.NET wherever you wish and in any.NET application because it is a free and open-source framework (similar in autonomy and context to other .NET frameworks like Entity Framework, ASP.NET, or even .NET Core), as shown in Figure 1. This includes desktop applications (WPF, WinForms), web apps and services (ASP.NET MVC, Razor Pages, Blazor, Web API), and more. This indicates that you can create and use ML.NET models both locally and on any cloud, including Microsoft Azure. Additionally, as ML.NET is cross-platform, you may use it with any OS, including Windows, Linux, and macOS. To integrate AI/ML into your current .NET apps, you may even run ML.NET on Windows’s default.NET Framework. You can train and use ML.NET models in offline contexts like desktop applications (WPF and WinForms) or any other offline.NET program according to this rule as well (excluding ARM processors, which are currently not supported). NimbusML Python bindings are another option provided by ML.NET. If your company employs teams of data scientists with a stronger command of Python, they can use NimbusML to construct ML.NET models in Python, which you can then use to create commercial end-user .NET applications relatively quickly while running the models as native .NET.
When utilizing NimbusML to build/train ML.NET models that can run directly in .NET apps, data scientists and Python developers familiar with scikit-learn estimators and transforms as well as other well-known libraries in Python, such as NumPy and Pandas, will feel at ease. Visit https://aka.ms/code-nimbusml to learn more about NimbusML.
You can train and use ML.NET models in offline contexts like desktop applications (WPF and WinForms) or any other offline.NET program according to this rule as well (excluding ARM processors, which are currently not supported).
ML.NET Components
You can utilize the .NET API that ML.NET offers for two different types of actions:
- Training an ML model involves building the model, typically in your “back storage.”
- Consumption of ML models: utilizing the model to generate predictions in real-world production end-user apps
Data components:
- IDataView: An IDataView is a versatile, effective means of representing tabular data in NET (e.g., rows and columns). To load datasets for later processing, the IDataView serves as a placeholder. It is the component that holds the data during data transformations and model training since it is made to handle high-dimensional data and huge data sets quickly. It can handle enormous data sets up to terabytes in size because, in addition to allowing you to load data from a file or enumerable into an IDataView, you can also stream data from the original data source during training. IDataView objects may include text, Booleans, vectors, integers, and other data.
- Data Loaders: Almost any data source can be used to load datasets into an IDataView. To load and train data directly from any relational database supported by the system, you can use the Database Loader or File Loaders for common ML sources including text, binary, and image files. data from databases like MySQL, PostgreSQL, Oracle, SQL Server, etc.
- Data Transforms: Since mathematics is the foundation of machine learning, all data must be transformed into numbers or numeric vectors. To transform your data into a format that the ML algorithms can use, ML.NET offers a range of data transforms, including text featurizers and one-hot encoders.
ModelTraining components:
- Classical ML:NET supports a variety of traditional ML situations and tasks, including time series, regression, classification, and more. You can choose and fine-tune the particular algorithm that achieves higher accuracy and more effectively solves your ML challenge from among the more than 40 trainers (algorithms targeting a certain task) offered by ML.NET.
- Computer vision: Beginning with ML.NET 1.4-Preview, ML.NET also provides picture-based training tasks (image classification/recognition) with your own custom images. These tasks use TensorFlow as the training engine. Microsoft is attempting to make object detection training supportable as well.
ModelConsumption and Evaluation components:
- Model consumption:NET offers a variety of ways to make predictions after you’ve trained your custom ML model, including using the model itself to make predictions in bulk, the Prediction Engine to make individual predictions, or the Prediction Engine Pool to make predictions in scalable and multi-threaded applications.
- Model evaluation: should be evaluated to make sure it produces predictions of the desired caliber before being used in production. Depending on the specific ML task, ML.NET offers several evaluators relevant to each ML task so that you may determine the model’s accuracy as well as many other common machine learning metrics.
Extensions and Tools:
Integration of an ONNX model: ONNX is a standardized and versatile ML model format. Any pre-trained ONNX model can be run and scored using ML.NET.
Integration of the TensorFlow model: TensorFlow is one of the most well-liked deep learning packages. Additionally to the previously specified image classification training scenario, this API allows you to execute and score any pre-trained TensorFlow model.
Tools: To make model training even simpler, you can use ML.NET’s tools (Model Builder in Visual Studio or the cross-platform CLI). To find the best model for your data and scenario, these tools internally experiment with numerous combinations of algorithms and configurations using the ML.NET AutoML API.
Hello ML.NET
Let’s look at some ML.NET code now that you’ve seen an overview of the framework’s many parts and ML.NET itself.
It only takes a few minutes to create your own unique machine learning model with ML.NET. The code in Example 1 exemplifies a straightforward ML.NET application that trains, assesses, and uses a regression model to forecast the cost of a specific taxi ride.
// 1. Initalize ML.NET environment MLContext mlContext = new MLContext(); // 2. Load training data IDataView trainData = mlContext.Data.LoadFromTextFile<ModelInput>("taxi-fare-train.csv", separatorChar:','); // 3. Add data transformations var dataProcessPipeline = mlContext.Transforms.Categorical.OneHotEncoding( outputColumnName:"PaymentTypeEncoded", "PaymentType") .Append(mlContext.Transforms.Concatenate(outputColumnName:"Features", "PaymentTypeEncoded","PassengerCount","TripTime","TripDistance")); // 4. Add algorithm var trainer = mlContext.Regression.Trainers.Sdca(labelColumnName: "FareAmount", featureColumnName: "Features"); var trainingPipeline = dataProcessPipeline.Append(trainer); // 5. Train model var model = trainingPipeline.Fit(trainData); // 6. Evaluate model on test data IDataView testData = mlContext.Data.LoadFromTextFile<ModelInput>("taxi-fare-test.csv"); IDataView predictions = model.Transform(testData); var metrics = mlContext.Regression.Evaluate(predictions,"FareAmount"); // 7. Predict on sample data and print results var input = new ModelInput { PassengerCount = 1, TripTime = 1150, TripDistance = 4, PaymentType = "CRD" }; var result = mlContext.Model.CreatePredictionEngine<ModelInput,ModelOutput>(model).Predict(input); Console.WriteLine($"Predicted fare: {result.FareAmount}\nModel Quality (RSquared): {metrics.RSquared}");
Tools for Automated Machine Learning in ML.NET
Choosing the right data transformations and algorithms for your data and ML situation can be difficult, especially if you don’t have a background in data science. Although writing the code to train ML.NET models is simple, this might be a difficulty if you don’t have that expertise. However, Microsoft has automated the model selection process for you with the preview release of Automated Machine Learning and tooling for ML.NET so that you can immediately get started with machine learning in.NET without needing prior machine learning skills.
The Automated Machine Learning tool in ML.NET, also known as AutoML, operates locally on your development computer and uses a combination of the best algorithm and settings to automatically construct and train models. Simply state the machine learning goal and provide the dataset, and AutoML will select and produce the highest quality model by experimenting with a variety of algorithm combinations and associated algorithm settings.
AutoML now supports regression (for example, price prediction), binary classification (for example, sentiment analysis), and multi-class classification (for example, issue classification). Support for more scenarios is also being developed.
Although the ML.NET AutoML API allows you to utilize AutoML directly, ML.NET also provides tooling on top of AutoML to further simplify machine learning in.NET. You’ll utilize the tools in the following sections to demonstrate how simple it is to build your own ML.NET model.
ML.NET Model Builder
To create ML.NET models and model training and consumption code, you can use the ML.NET Model Builder, a straightforward UI tool in Visual Studio. To select the data transformations, algorithms, and algorithm settings for your data that would result in the best accurate model, the tool internally uses AutoML.
To get a trained machine learning model, you give Model Builder three things:
- The scenario for machine learning
- The data set
- How long do you want to exercise
Let’s try out an example of sentiment analysis. You’ll develop a console application that can determine whether a comment is harmful or not.
Let’s try out an example of sentiment analysis. You’ll develop a console application that can determine whether a comment is harmful or not.
The ML.NET Model Builder Visual Studio extension must first be downloaded and installed. You may do this online here or in Visual Studio’s Extensions Manager. Model Builder works with both Visual Studio 2017 and 2019.
The Wikipedia detox dataset (wikipedia-detox-250-line-data.tsv), which is available as a TSV file must also be downloaded. Each row represents a unique review submitted by a user to Wikipedia (SentimentText), each of which is labeled as hazardous (1/True) or non-toxic (0/False).
Data preview of Wikipedia detox dataset
Create a new.NET Core Console Application in Visual Studio after installing the extension and downloading the dataset.
Right-click on your project in the Solution Explorer and choose Add > Machine Learning.
A new tool window with ML.NET Model Builder appears. From here, as shown in the image attached below, you can select from a range of situations. Microsoft is presently trying to bring new ML situations, such as picture classification and recommendation, to Model Builder.
Select the sentiment analysis scenario to learn how to design a model that can predict which of two categories an item belongs in. This ML task is known as binary classification (in this case toxic or non-toxic).
You can import data into Model Builder from a database or a file (.txt,.csv, or.tsv) (e.g., SQL Server). Choose File as your input data source in the Data screen, then upload the wikipedia-detox-250-line-data.tsv dataset. Select Sentiment as your Label from the Column to Predict (Label) drop-down, and keep SentimentText checked as the Feature.
Move on to the Train step, where you specify the Time to train (i.e., how long the tool will spend evaluating different models using AutoML). In general, longer training periods allow AutoML to explore more models with multiple algorithms and settings, so larger datasets will need more time to train.
Because the wikipedia-detox-250-line-data.tsv dataset is less than 1MB, change the Time to train to just 20 seconds and select Start training. This starts the AutoML process of iterating through different algorithms and settings to find the best model.
You can watch the training progress, including the time remaining, the best model and model accuracy found so far, and the last model explored in the Model Builder UI,
You’ll see output and evaluation metrics once you get to the Evaluate step. The AveragedPerceptronBinary trainer, a linear classification algorithm that excels in text classification scenarios, was chosen by AutoML as the best model after 20 seconds of exploration of five different models. Its accuracy was 70.83%.
ML.NET CLI
Additionally, ML.NET offers cross-platform tooling so that you can still use AutoML to quickly develop machine learning models even if you don’t work on Windows or with Visual Studio. To create high-quality ML.NET models based on training datasets you supply, you can install and use the ML.NET CLI (command-line interface), a dotnet Global Tool, on any command-prompt (Windows, macOS, or Linux). The ML.NET CLI generates sample C# code to run that model along with the C# code that was used to build and train it, similar to Model Builder, so you can examine the settings and algorithm that AutoML selected.
Roadmap for ML.NET
Even while ML.NET offers a very wide range of machine learning jobs and algorithms, there are still a lot of other intriguing areas Microsoft wishes to expand upon and enhance, such as:
- Deep Learning-based computer vision training scenarios (TensorFlow)
- Completing the Image Classification training API, adding support for the GPU, and releasing this functionality to GA
- Developing and making available the Object Detection training API
- CLI and Model Builder updates:
- Create scenarios for item detection, recommendation, image classification/recognition, and other ML activities.
- A thorough model lifecycle, model versioning and registry, surface model analysis, and explainability are all made possible by the integration and support of ML.NET in Azure ML and Azure AutoML.
- Text analysis using TensorFlow and DNN
- Other data loaders, like No-SQL databases
- For inference/scoring in Xamarin mobile apps for iOS, Android, and IoT workloads, ARM support or full ONNX export support is available.
- Targeting x64/x86 UWP app support, Unity