AutoML with SymetryML

AutoML with SymetryML is performed in two steps:

  • When creating a new SymetryML project you need to specify that you want to enable AutoML for that project.

    • SymetryML will automatically enhance the data when it is ingested

    • More info in the Create Project section.

  • Use the Auto Select functionality when building a new model

The following sections will describe both steps in more detail as well as the configuration available for each step.

Enabling AutoML for Your SymetryML Project

AutoML automates various stages of the typical machine learning pipeline. SymetryML’s AutoML functionality allows to perform:

  • Data pre-processing

    • Column Type Detection

    • Change Data Type of a column

    • One Hot Encoding

    • Target Encoding

    • Ignore Outliers

  • Feature engineering

    • Create new features interaction based on existing features. The FAST algorithm is used to decide which feature interactions to be added

    • Attribute Filtering

      • Compute Feature importance

      • Remove attribute with multi-collinearity

      • Ignore low variance features

      • Singular Value Decomposition

  • Model Building with Auto-Select

    • Automatically select type of models to build based on AutoML task - regression vs. binary classification supported.

    • Automatically select the best model from various permutations of the features. This is covered in the Auto Select section.

    • Automatically optimize model hyperparameters. This is covered in the Auto Select section.

AutoML Control Parameters

Family
Name
Description

general

automl_target_name

string, Column target name for the AutoML task

general

automl_warmup_size

integer, default 100,000. This number is used when enabling the creation of new features based on existing feature interaction. In order to score the various possible combination of features some data is needed, this parameter controls how many tuples/rows to use to perform that calculation. Until that number of rows is processed by the project the project will appear empty. When invoking the Get Project Info endpoint the ProjectInfo params map will contain a key named automl_setup_done to inform whether or not the warmup period is done.

pre-processing

automl_attributes_to_use

List of attribute names to use

Attribute Filtering

automl_ignore_outliers

true |false, default:false

Attribute Filtering

automl_remove_collinearity

true|false, default:false

Attribute Filtering

automl_ignore_low_variance

true|false, default:false

Attribute Filtering

automl_threshold_variance

double, default:0.0

Attribute Filtering

automl_use_svd

true|false, default:false

Attribute Filtering

automl_vif_filter

true|false, default:false

Attribute Filtering

automl_vif_filter_threshold

this is the threshold to use when automl_vif_filter is true, double, default:5.0

Feature Engineering

automl_use_feature_importance

true|false, default:false

Feature Engineering

automl_add_feature_interaction

true|false, default:false

Feature Engineering

automl_feature_interaction_threshold

double, default:1, percentage of interaction pairs to keep from the FAST algorithm.

Feature Engineering

automl_fast_num_bins

double, default:16, number of bins to use for the FAST algorithm

Feature Engineering

automl_fast_num_threads

integer, default:14, number of threads to use for the FAST algorithm

Feature Engineering

automl_power

string, comma separated string that describe which feature power to use. ex: automl_power=2:3:5:yj means add power 2, 3 and 5 as well as the Yeo-Johnson transform of the features.

Auto ML Rest API

In order to activate AutoML for a SymetryML project you need to specify additional parameters when creating the project. This is done using the Create Project rest API and adding the AutoML parameters to the MLContext of the request body.

AutoML Example

This example uses:

  • Specify to use AutoML for the project: "automl_project_is_automl":"true"

  • Specify to add feature interaction: "automl_add_feature_interaction":"true"

  • Specify a warmup period: "automl_warmup_size":"200"

  • Specify the target for the AutoML project: "automl_target_name":"MEDV"

Request:
POST url="http://charm:8080/symetry/rest/c1/projects?pid=automl-t6-reg-false&type=cpu&persist=false"

Body:
{"automl_warmup_size":"200","automl_add_feature_interaction":"true","automl_project_is_automl":"true","automl_target_name":"MEDV"}

Last updated