AutoML with SymetryML
Last updated
Last updated
AutoML with SymetryML is performed in two steps:
When creating a new SymetryML project you need to specify that you want to enable AutoML for that project.
SymetryML will automatically enhance the data when it is ingested
More info in the section.
Use the functionality when building a new model
The following sections will describe both steps in more detail as well as the configuration available for each step.
AutoML automates various stages of the typical machine learning pipeline. SymetryML’s AutoML functionality allows to perform:
Data pre-processing
Column Type Detection
Change Data Type of a column
One Hot Encoding
Target Encoding
Ignore Outliers
Feature engineering
Create new features interaction based on existing features. The FAST algorithm is used to decide which feature interactions to be added
Attribute Filtering
Compute Feature importance
Remove attribute with multi-collinearity
Ignore low variance features
Singular Value Decomposition
Model Building with Auto-Select
Automatically select type of models to build based on AutoML task - regression vs. binary classification supported.
Automatically select the best model from various permutations of the features. This is covered in the section.
Automatically optimize model hyperparameters. This is covered in the section.
general
automl_target_name
string, Column target name for the AutoML task
general
automl_warmup_size
pre-processing
automl_attributes_to_use
List of attribute names to use
Attribute Filtering
automl_ignore_outliers
true |false, default:false
Attribute Filtering
automl_remove_collinearity
true|false, default:false
Attribute Filtering
automl_ignore_low_variance
true|false, default:false
Attribute Filtering
automl_threshold_variance
double, default:0.0
Attribute Filtering
automl_use_svd
true|false, default:false
Attribute Filtering
automl_vif_filter
true|false, default:false
Attribute Filtering
automl_vif_filter_threshold
this is the threshold to use when automl_vif_filter
is true, double, default:5.0
Feature Engineering
automl_use_feature_importance
true|false, default:false
Feature Engineering
automl_add_feature_interaction
true|false, default:false
Feature Engineering
automl_feature_interaction_threshold
double, default:1, percentage of interaction pairs to keep from the FAST algorithm.
Feature Engineering
automl_fast_num_bins
double, default:16, number of bins to use for the FAST algorithm
Feature Engineering
automl_fast_num_threads
integer, default:14, number of threads to use for the FAST algorithm
Feature Engineering
automl_power
string, comma separated string that describe which feature power to use. ex: automl_power=2:3:5:yj
means add power 2, 3 and 5 as well as the Yeo-Johnson transform of the features.
This example uses:
Specify to use AutoML for the project: "automl_project_is_automl":"true"
Specify to add feature interaction: "automl_add_feature_interaction":"true"
Specify a warmup period: "automl_warmup_size":"200"
Specify the target for the AutoML project: "automl_target_name":"MEDV"
integer, default 100,000. This number is used when enabling the creation of new features based on existing feature interaction. In order to score the various possible combination of features some data is needed, this parameter controls how many tuples/rows to use to perform that calculation. Until that number of rows is processed by the project the project will appear empty. When invoking the endpoint the params
map will contain a key named automl_setup_done
to inform whether or not the warmup period is done.
In order to activate AutoML for a SymetryML project you need to specify additional parameters when creating the project. This is done using the rest API and adding the AutoML parameters to the of the request body.