Auto Select with SymetryML
Auto Select Introduction
The auto-select build on top of the Select Model functionality and allows to automate finding the best model for a given task (binary classification or regression model). This functionality will try various combinations of input attributes on various types of models and select the best one automatically. Auto Select perform the following:
Using a heuristic - see Select Heuristic - it will try various combination of the input attributes to build a model
When using a regression Task, there is also another parameter that controls the grid search size. Please see the Select Grid table for details
It uses a provided out-of-sample datasets to compute a score for the model. Example of scoring are AUC or RMSE.
It repeats step (1) and (2) to build many models - possibly thousands - and then select the best model.
Depending on how many attributes your project has it can take anywhere a few seconds to several hours. Be sure to understand the Select Heuristic section as well as the Select Grid.
Auto Select Rest API
Allows to invoke the auto select functionality by specifying an external data source id as the out of sample data to use for model assessment.
URL
POST /symetry/rest/{cid}/projects/{pid}/autoSelect/{dsid} [body=MLContext]
Query Parameters
Parameter
Required / Optional
Description
modelid
Required
ID to assign to the new model.
task
Required
Task to perform binary_classification
or regression
or multiclass_classifier
.
MLContext Build Parameters
Parameter
Required / Optional
Type
Description
rnd_seed
Optional
Integer
Set the seed of the randomizer
selector_type
Optional
String
Default is selector_type_fw_bw. Please see Selector Heuristic and Selector Types sections for details.
autoselect_grid_type
Optional
String
Default is autoselect_grid_type_tiny. Please see Select Grid for details.
autoselect_thread_pool_size
Optional
Integer
Default is 8.
sml_model_assessment_type
Optional
String
Defaults are
auc
for binary classifierrmse
for regressionmf1_w
for multi-class classifier, that is multi class F1 weighted scoring.
Possible value are:
classifier tasks
auc
recall
precision
mcc
Matthews correlation coefficient
regression tasks
rmse
r2
rmslogp1e
Root mean square of the sum of the log of the error
multi-class classifier tasks:
mf1_w
mkappa
, multi-class Kappa
HTTP Responses
|HTTP Status Code |HTTP Status Message |Description |202 | OK | Job accepted. |400 | BAD REQUEST | Unknown SymetryML project. {"statusCode":"BAD_REQUEST","statusString":" + Cannot Find SYMETRYML id[r2] for Customer id [c1]","values":{}}
Sample Request Response Classifier
Request:
POST url="/symetry/rest/c1/projects/p1/autoSelect/Iris_rtlm.csv?task=binary_classifier&modelid=autoSelectModel1
Body:
{"targets":[],"inputAttributes":[],"inputAttributeNames":["sepal_length","sepal_width","petal_length","petal_width","sepal_lengt_b1","sepal_lengt_b2","sepal_width_b1","sepal_width_b2","petal_length_b1","petal_length_b2","petal_width_b1","petal_width_b2"],"targetAttributeNames":["Iris_virginica"],"extraParameters":{"rnd_seed":"1"}}
Sample Request Response Regression
This example uses:
regression task:
?task=regression
specify the heuristic:
"selector_type":"selector_type_fw_bw"
specify the grid search:
"autoselect_grid_type":"autoselect_grid_type_small"
Request:
POST url="/symetry/rest/c1/projects/p1/autoSelect/Iris_rtlm.csv?task=binary_classifier&modelid=autoSelectModel1
Body:
{"targets":["13"],"inputAttributes":["44","88","45","89","46","47","48","49","90","91","50","51","52","53","10","54","11","55","12","56","57","14","58","15","59","16","17","18","19","0","1","2","3","4","5","6","7","8","9","60","61","62","63","20","64","21","65","22","66","23","67","24","68","25","69","26","27","28","29","70","71","72","73","30","74","31","75","32","76","33","77","34","78","35","79","36","37","38","39","80","81","82","83","40","84","41","85","42","86","43","87"],"inputAttributeNames":[],"targetAttributeNames":[],"extraParameters":{"rnd_seed":"1","selector_type":"selector_type_fw_bw","autoselect_grid_type":"autoselect_grid_type_small"}}
Auto Select DataFrame Rest API
Allows to invoke the auto select model functionality by using a DataFrame passed in the request body.
URL
POST /symetry/rest/{cid}/projects/{pid}/autoSelect [body=Map{"dataframe"=DataFrame, "mlcontext"=MLContext}]
Query Parameters
Parameter
Required / Optional
Description
modelid
Required
ID to assign to the new model.
task
Required
Task to perform binary_classification
or regression
or multiclass_classifier
MLContext Build Parameters
Same as Auto Select Rest API. Please consult the MLContext Build Parameters section for details.
HTTP Responses
HTTP Status Code
HTTP Status Message
Description
202
OK
Job accepted.
400
BAD REQUEST
Unknown SymetryML project. {"statusCode":"BAD_REQUEST","statusString":" + Cannot Find SYMETRYML id[r2] for Customer id [c1]","values":{}}
Sample Request Response Classifier
Request:
POST url="/symetry/rest/c1/projects/p1/autoSelect?task=binary_classifier&modelid=autoSelectModel1
Body:
{
"mlcontext" : {
"targets":[],
"inputAttributes":[],
"inputAttributeNames":["sepal_length","sepal_width","petal_length","petal_width","sepal_lengt_b1","sepal_lengt_b2","sepal_width_b1","sepal_width_b2","petal_length_b1","petal_length_b2","petal_width_b1","petal_width_b2","Iris_setosa","Iris_versicolor","Iris_virginica"],
"targetAttributeNames":["Iris_virginica"],
"extraParameters":{"rnd_seed":"1"}}
"dataframe" : {
"attributeNames":["sepal_length","sepal_width","petal_length","petal_width","sepal_lengt_b1","sepal_lengt_b2","sepal_width_b1","sepal_width_b2","petal_length_b1","petal_length_b2","petal_width_b1","petal_width_b2","Iris_setosa","Iris_versicolor","Iris_virginica"],
"attributeTypes":["C","C","C","C","B","B","B","B","B","B","B","B","B","B","B"]
"data":[
["4.3","3","1.1","0.1","1","0","0","1","1","0","1","0","1","0","0"],
["4.8","3","1.4","0.1","1","0","0","1","1","0","1","0","1","0","0"],
["4.9","3.1","1.5","0.1","1","0","0","1","1","0","1","0","1","0","0"]
(...)
["7.2","3.6","6.1","2.5","0","1","0","1","0","1","0","1","0","0","1"]
],
}
}
Sample Request Response Regression
This example uses:
regression task:
?task=regression
specify the heuristic:
"selector_type":"selector_type_fw_bw"
specify the grid search:
"autoselect_grid_type":"autoselect_grid_type_small"
TBD
Last updated