The following HPCC Systems machine learning bundles are currently supported.
- Core Bundles - ML_Core
Provides the core data definitions and attributes for ML. It is a prerequisite for all of the other production bundles. See the HPCC Systems ML_Core repository on GitHub.
PBblas (Parallel Block Linear Algebra Subsystem)
Provides distributed, scalable matrix operations used by several of the other bundles. Can also be used directly whenever matrix operations are in order. This is a dependency for several of the other bundles. See the HPCC Systems PBblas repository on GitHub. See also the blog post Introduction to using PBblas on HPCC Systems.
Ordinary Least Squares Linear Regression for use as a ML algorithm or for other uses such as data analysis. See the HPCC Systems Linear Regression repository on GitHub.
Classification using Logistic Regression methods -- both Binomial (two-classes) and Multinomial (multiple classes). In spite of the name, Logistic Regression is a Classification method, not a Regression method. See the HPCC Systems Logistic Regression repository on GitHub.
General Linear Model. Provides Regression and Classification algorithms for situations in which your data does not match the assumptions of LinearRegression or LogisticRegression. Handles a variety of data distribution assumptions. See the HPCC Systems GLM repository on GitHub.
SVM implementation for Classification and Regression using the popular LibSVM under the hood. See the HPCC Systems Support Vector Machines repository on GitHub.
Random Forest based Classification and Regression. See the HPCC Systems Learning Trees repository on GitHub. See also this blog, Learning Trees - A guide to Decision Tree based machine learning on HPCC Systems.
This is the original HPCC Machine Learning bundle. It contains a wide range of Machine Learning algorithms at various levels of productization (i.e. documentation, reliability, and performance). Although it is no longer formally supported, it is still occasionally updated and typically contains the latest and greatest experimental algorithms. It is useful when you need access to algorithms that are not yet supported by the production bundles. See the HPCC Systems ecl-ml repository on GitHub.
Please Note - Due to the restrictions on bundle naming, ecl-ml does not install properly as a bundle. You must download it into your development area to use it. Get the supporting documentation for ecl-ml.
- Supervised Learning Bundles - Linear Regression, Logistic Regression, GLM, SupportVectorMachines, Learning Trees, Generalized Neural Networks (GNN)
- Unsupervised Learning Bundles - Means, DBScan
- Natural Language Processing Bundles - TextVectors
More details about these bundles are available in our Machine Learning download area of our website.