Modeling API
Last updated
Last updated
This API function allows for the building of a model by specifying the attribute and the targets to use to build the model. The information about which attributes to use to build the model are passed as part of the body request in the JSON data structure. For more information see the section .
SymetryML support 5 anomaly detection algorithms:
MFAnomaly: an anomaly model based on a mixture of isolation forest and SymetryML online random forest algorithm.
OSSPCAModel: Online implementation of Out of sample PCA.
HBAModel: Online model based on histograms of the input features.
ECODModel: Empirical Cumulative Distribution Outlier detection
EVTModel: Extreme value Theory based
These algorithms work in real-time on streaming data and are a type of of unsupervised learning algorithm in the sense that one does not need to provide positive examples of anomalies to the different algorithms. The 3 models use intrinsic values of the inputs features to compute an anomaly scores. To create such models just use the appropriate id - from - when creating them using rest API.
The following sections describes the KMeans clustering algorithm configuration that needs to be specified when creating a new project.
cluster_rnd_seed
Optional
Set the seed of the randomizer
cluster_features
Mandatory
This is a string containing the features to use for the clustering algorithm. The feature are colon separated.
cluster_max_iterations
Optional
Default is 1000. This control the number of iterations of the kmeans algorithm.
cluster_num_centroids
Optional
Default is 100. This parameter control the number of centroid that are kept in real time on the data. Typically this number is much higher than the intended number of cluster - e.g. the target k parameters
cluster_warmup_period
Optional
Default is 101. This parameters control how many tuples of data must be seen before the initial real-time cluster will be constructed using the cluster_num_centroids parameter as the k number. Once the initial cluster is created it will just be updated in real-time with each new tuple/row of data. Note that cluster_warmup_period needs to be greater than cluster_num_centroids, e.g. cluster_num_centroids=100 and cluster_warmup_period=101. If this condition is not meet an error will be raised by the software.
aicc
Boolean, true to enable AIC. Default is false;
min_lambda_power
Minimum value of Lambda. Specified as 10 ^ min_lambda_power. Default is -10.
max_lambda_power
Maximum value of Lambda. Specified as 10 ^ max_lambda_power. Default is 3
num_lambda
Number of possible Lambdas between min_lambda_power and max_lambda_power. Default is 100
min_eta
Minimum eta. Default is 0.
max_eta
Maximum eta. Default is 1.
num_eta
Number of possible Etas between min_eta and max_eta. Default is 11.
Note, when aicc is enabled. It’s not necessary to specify the lambda and eta parameters separately as they will be chosen by the AIC optimizer.
bayes
Bayesian Model
covest
Covariance Estimator
ecod
ECOD Model
elasticnet
Elastic Net
evt
EVT model
hmm
Hidden Markov Chains
km
Kaplan Meier
kmeans
KMeans Clustering
lasso
Lasso Regression
lr
Single Pass Logistic Regression
lda
lsvm
Linear SVM
lsvr
Linear Support Vector Regression
mc
Markov Chains
mlda
Multi-class LDA, Linear Discriminant
mlr
Multi Linear Regression
mqda
Quadratic discriminant Analysis
pcr
Principal Component Regression
plsq
Partial Least Square Regression
powerreg
Power Regression
ridge
Ridge Regression
rf_classifier
Online Random Forest Classifier
rf_regressor
Online Random Forest Regression
rf_anomaly
Anomaly / outlier model based on Online Random forest and isolation forest
oospca
Online Anomaly / outlier model based on out of sample PCA.
hba
Online Anomaly / outlier model based on histograms of the input features.
rf_type
rf_classifier permitted for now.
rf_target_name
target name
rf_num_classes
mandatory for classifier, specify number of classes, default:2
rf_features
list of columns to use for the Random Forest. If nothing is specified or '*' is used, then all continuous and binary attributes will be used, beside the target. Values are to be separated by colon :.
rf_always_split
control if split should occurs when a node contains only examples of one class, default is false, should be true for anomaly detection, default:false
rf_num_trees
number of trees in the forest, default:10
rf_step
Gradient step. default:1
rf_complexity
Control how complex the trees can grow, typical default:16, typical value 8 - 50
rf_max_split_time
Another way of controlling how complex tree can grow. if both rf_complexity
and rf_max_split_time
are specified rf_max_split_time
has priority.
rf_missing_value
value to use for missing value, default:0.0
rf_dirichlet
prior for classification prediction, default .5 for binary classification, 1/n for multiple class problems
rf_rnd_seed
Random seed, default:126.
km_time_column
Required
Identifies the time column of your dataset
km_event_column
Required
Identifies the event column of your dataset
km_group_column
Optional
Identifies the group column of your dataset.
lr_target_column
Required
The target attribute of the LR model
lr_features
Optional
Input attributes of the LR model. Use * to use all attributes which are not the target. Otherwise, use colon separated string of attributes.
Linear regression assumes that the target attribute is normally distributed. While in real world scenarios this is often not the case, we can still improve the performance of the regression model by transforming the target into something resembling a normal distribution. The method of transforming that is used within SymetryML is the Yeo-Johnson transformation.
Once a project has learned these transformations, a Power Regression model can be build which would select the optimal transformed target.
To enable Power Regression, please specify the following parameters when creating your project:
sml_project_power_column
String
Name of column to transform
sml_project_power_min
Number
Minimum value of the power parameter
sml_project_power_max
Number
Maximum value of the power parameter
sml_project_power_steps
Number
Number of intervals between min and max of the power parameter
Example of of MLContext used when creating a project that use power transformation:
sml_project_power_column
String
Name of the target column
sml_project_power_min
Float
minimum power to use
sml_project_power_max
Float
maximum power to use
sml_project_power_steps
Integer
steps between minimum and maximum power
rtlm.option.sml.power.separator
Default is ^
rtlm.option.sml.power.prefix
Default is pt_
When SymetryML builds new model it might use Matrix inversion. These operations are performed using an implementation of LAPACK (DGETRF and DGETRI) using some third party library using CPU and / or GPU. Matrix inversion can lead to numerical instability problem. The following parameters affect how matrix inversion is computed.
sml_rcond_use
Boolean: ‘true’ or ‘false’
Controls whether SymetryML conducts numerical instability (i.e., reciprocal condition number validation) when performing matrix inversion when building a model.
sml_rcond_tolerance
Number
By default to the accepted tolerance for the reciprocal condition number validation is 1e-14. Using this parameter, you can set it to lower values. However, be sure you understand the implications and assess your model carefully, as you might get a numerically unstable model when modifying this value.
matrix_use_pseudoinv
boolean
Use matrix pseudo inverse algorithm - with SVD - when computing matrix inverse in model like MLR, LDA and QDA.
sml_disable_error_logging
boolean
Default is true, This disable logging of error when performing matrix solving operations. This functionality is used by many regression models.
There are other parameters that control various model building
clip_negative_predictions
Boolean
For regression model and if set to true
it will clip negative prediction to zero.
sml_explore_ztest_known_mu
Number
This parameter is used to pass the known variance when performing the ztest against a known variable.
sml_det_norm_use
Boolean
alpha
Number
Alpha parameter when building a Ridge Regression model
matrix_use_pseudoinv
Boolean
Use matrix pseudo inverse algorithm - with SVD - when computing matrix inverse in model like MLR, LDA and QDA.
sml_model_svm_nu
Number
nu
parameter for LSVM model.
evt_anomaly_enabled
Boolean
laplace_factor
Number
use_log_likelihood
Boolean
Default is true
. Used to avoid numerical underflow caused by product of probabilities.
When building Kmeans clustering model the following hyper parameters are required:
kmeans_k
Required
Number of clusters
kmeans_max_iterations
Optional
default:1000, Number of iterations.
kmeans_rnd_seed
Optional
default:current time, the seed for the randomizer.
Elastic Net is an extension of the classic MLR algorithm which adds regularization as a way to combat overfitting of your training data. It combines the regularization methods of both Ridge (L-2 norm regularization) and Lasso (L-1 norm regularization). While all three models feature the regularization parameter Lambda, Elastic Net contains one extra parameter, ETA, which controls how close the regularization should be to a pure Ridge regression model (ETA = 0) versus a pure Lasso approach (ETA = 1)
When building Elastic Net, Lasso or Ridge regression model the following hyper parameters are required:
lambda
Number
Lambda parameter of Elastic Net
eta
Number
Eta parameter of Elastic Net
Optional parameters for Elastic Net are the following:
maxiter
Number
Maximum number of iteration
tolerance
Number
Convergence tolerance. Defaults to 1e-5. Increase if the model convergence is too slow.
standardize
Boolean
Transform each feature to the same scale (Z-score normalization)
centerY
Boolean
Center target around zero
debias
Boolean
Debias the estimator
The combined effect of centering the targets (centerY = true) as well as standardize the features (standardize = true) is to get rid of the intercept.
ATTRIBUTE_NAME
Number
The value of the initial weight
ECOD stands for Empirical-Cumulative-distribution-based Outlier Detection (https://arxiv.org/pdf/2201.00382.pdf). It uses the CDF of each feature to estimate tail probabilities per dimension for each data point. An outlier score is calculated by aggregating probabilities across dimensions.
ecod_anomaly_threshold
The value of threshold to use to flag an anomaly. If not provided, will choose a value based on anomaly probability per dimension desired.
use_skewness
Using the skewness of each feature in the aggregating strategy.
desired_prob_feats
Anomaly probability per desired dimension.
A typical anomaly model outputs an anomaly score for a particular instance with those instances that are further from nominal receiving a higher score. The decision whether to flag something as an anomaly essentially becomes a process of choosing a threshold for this score. Scores above the threshold will be flagged as anomalies and anything below it will be deemed as nominal.
evt_warmup_size
Required
Size of the initial set of scores used to build the Pareto Distribution.
evt_alpha
Required
The probability of an observation to be greater than the threshold Zq is smaller than alpha.
evt_quantile
Required
This parameter is the original chosen quantile from which we assume we are in the tail of the score distribution. Any value above that can be used to fit the tail Generalized Pareto distribution.
evt_tdigest
Optional
Allows the specified evt_quantile
to dynamically adjust with the addition of new data. If false, the quantile will be based on the original evt_warmup_size
number of rows. Default is false.
evt_window_size
Optional
The size of the look back window used to update the Pareto Distribution.
evt_two_sides
Optional
If false, only values above evt_quantile
will be treated as exceedances. Otherwise, uppwer exceedances (above evt_quantile
) and lower exceedances (1 - evt_quantile
) will be tracked seperately.
evt_seed
Optional
Seed for initial distribution.
evt_max_exceedances
Optional
The maximum number of values used to fit the Pareto Distribution.
Principal component regression is a technique based on principal component analysis (PCA). To be more precise, projected features are used in conjunction with linear regression. Two flavors are available.
I - The traditional approach where the first q principal components are kept, and a linear regression is applied to this q-dimensional space.
II - An augmented features space is created using the p original features and the p principal components. A model is created by applying Lasso regression. The assumption that the first q principal components are the most important is not necessarily true and other principal components may correlate also with the target. This approach enables to pick up such contributions.
pcr_type
qfirst
- Using the approach with the first q components. Or full
- Using the approach with the original features as well as the principal components
lambda
If pcr_type
is full
, specify value of lambda. Default is 1
aicc
If pcr_type
is full
, use aicc true or false to automaticaly find the best value of lambda or not.
min_lambda_power
If pcr_type is full
and aicc=true
, smallest power of lambda to test. Default is -10.
max_lambda_power
If pcr_type is full
and aicc=true
, largest power of lambda to test. Default is NaN which mean the value will be decided automatically.
num_lambda
If pcr_type is full
and aicc=true
, how many values of lambda to test. Default is 100.
maxiter
If pcr_type is full
, maximum number of iterations for Lasso. Default is 1000.
tolerance
if pcr_type is full
tolerance for Lasso. Default is 1e-05.
standardize
iI pcr_type is full
standardize or not. Default is true.
centerY
If pcr_type is full
center the target or not. Default is false.
q
iIf using pcr_type=qfirst
, the number of principal components to keep.
Partial least square regression creates a new latent space of size q where q is smaller than p the number of features in which to do the regression. In the approach available, only the coefficients projected back to the original p dimensions are obtained.
ddof
The degree of freedom to use for calculation. Default is 0.
oneq
Make the calculation for only one size of latent space or all size up to max. Default is false.
standardize
If pcr_type is full
standardize or not. Default is true.
q
The maximum size of the underlying latent space.
Covariance estimate deals with the problem of estimating the actual covariance matrix when only the empirical covariance matrix is available (coming from a sample of the multivariate distribution). The covariance estimators available use different regularization methods.
estimate_type
y
graph_lasso
or l2_reg_thinkhonov
or l2_reg_sig
or identity_mixture
or eigen_trunc.
All Model except graph_lasso
can be used with the LDA model.
tuning_type
y
if estimate_type is graph_lasso
or l2_reg_sig
, the regularization can be provided by some automatic tuning.
ebic: if estimate_type is graph_lasso
, this is the possible automatic tuning.
rblw: if estimate_type is l2_reg_sig
, this is one possible automatic tuning. Can be used with LDA.
oas: if estimate_type is l2_reg_sig
, this is one possible automatic tuning. Can be used with LDA.
alpha
y
if estimate_type is l2_reg_sig
or identity_mixture
, the alpha parameter if tuning not set. Default is 0.1.
gamma
if tuning is ebic
, value of the gamma parameter. Default is 0.5.
tolerance_lasso
The tolerance for the intermediate Lasso problem when estimate_type is graph Lasso
. By default it is 0.1 X tolerance
ddof
The degree of freedom to use for calculation. Default is 0
lambda
y
For estimate_type equals to graph_lasso
or l2_reg_thinkhonov
, value of lambda. Default is 1.
q
y
For estimate_type= eigen_trunc
, how many eigenvectors to keep.
maxiter
if estimate_type=graph_lasso
maximum number of iterations for Lasso. Default is 1000.
tolerance
if estimate_type =graph_lasso
, tolerance for for global probelm. Default is 1e-05
min_lambda_power
if estimate_type =graph_lasso
, smallest power of lambda to test. Default is -10.
max_lambda_power
if estimate_type =graph_lasso
, largest power of lambda to test. Default is NaN means it will be decided automatically.
num_lambda
If estimate_type =graph_lasso
, how many values of lambda to test. Default is 100
initialize_gl
How to initialize the graphical lasso iteration. Choices are:
lasso_path
Start from a large enough value of lambda such that all the coefficients are zero and reduce until getting to the wanted lambda
inv
Initialize using the inverse of the covariance
last_sol
First iteration uses inv
and then using previous iteration solution as the starting point
lin_reg
Initialize the coefficients with the value of the linear regression solution
A matrix of initial coefficients
modelid
Required
ID to assign to the new model.
algo
Required
Algorithm to use to build the model. See Model Algorithm Id.
svdreduce
Optional
Use SVD feature selection
202
ACCEPTED
Success. Includes an HTTP Location header specifying the location of the job ID that was created to handle the request. {"statusCode":"ACCEPTED","statusString":"Job Created","values":{}}
500
INTERNAL SERVER ERROR
If the server refuses to accept the new job, it notifies the client with the error "Job execution was refused by server."
None
Heuristic
Required
Used when optimizing the model. Currently, only "default" is supported.
Modelid
Required
ID to assign to that model.
Algo
Required
Algorithm used to build the model (currently lda is supported).
Delta
Optional
Minimum delta to decide whether one model is better than another.
numIterations
Optional
Specify the number of iterations used to refine the best model.
202
ACCEPTED
Success. Includes an HTTP Location header specifying the location of the job ID that was created to handle the request. {"statusCode":"ACCEPTED","statusString":"Job Created","values":{}}
500
INTERNAL SERVER ERROR
If the server refuses to accept the new job, it notifies the client with the error "Optimize Job execution was refused by server."
None
Markov Chain Models and Hidden Markov Models are built differently. First, the project must be marked as a “sequence” project. After some data is learned, you can build Markov Chains, Hidden Markov Models, or both. These types of models have input attributes only; they do not have a target.
For Markov Chains, you input attributes to specify the sequence attributes to use. For Hidden Markov Model, you also specify the hidden state that must be categorical. The Observed state can be continuous or categorical. If it is a continuous attribute, normal distribution will be assumed, mean and standard deviation of the data is used.
This rest endpoint allows to find the optimal k for your KMeans clustering model by building many model and then returning the silhouette and WSSSE metrics so that both the silhouette and elbow method can be used to decide which is the optimal k number of cluster that should be used.
kmeans_k_min
Required
Integer
The minimum k to test
kmeans_k_max
Required
Integer
The maximum k to test
kmeans_max_iterations
Optional
Integer
The maximum number of iteration for the KMean clustering algorithm.
kmeans_rnd_seed
Optional
Optional
The random seed to use
202
OK
Job accepted.
400
BAD REQUEST
Unknown SymetryML project. {"statusCode":"BAD_REQUEST","statusString":" + Cannot Find SYMETRYML id[r2] for Customer id [c1]","values":{}}
200
OK
Success.
Contains information about model.
The following table enumerates the additional specific keys/value pairs that are returned when asking for model information based on the type of model.
LDA & MLDA
model distance z0 z Overall contribution for each input parameters
MLR, Ridge, Elastic Net & Lasso
model Betas
LR
multiclass - false
for a binary classifier, true
for a multiclass classifier
QDA
Model
LSVM
svmNu svmGamma svmOmegas
LSVR
svmNu svmGamma svmOmegas
Bayes
ECOD
EVTModel
Kmeans
Markov Chains
Transition Matrix formatted as follows: {tm1$STATE:$:to$STATE – where STATE corresponds for Markov process state.
Hidden Markov Model
Transition Matrix formatted as follows: {HIDDEN_tm1$STATE:$:HIDDEN_tm0$STATE – where HIDDEN/STATE corresponds for Markov process state group and state respectively. Emission Matrix formatted as follows [HIDDEN_t0$STATE:$:OBSERVED$STATE – where HIDDEN/STATE corresponds for Markov process state-group and state respectively, while OBSERVED/STATE corresponds to the observed state-group and state.
Random Forest
Kaplan Meier
Principal Component Regression
Covariance Estimator
Partial Least Square Regression
evt_anomaly_enabled
Set to true
to enable EVT for this model
evt_alpha
The probability of an observation to be greater than the threshold Zq is smaller than alpha.
evt_quantileevt_quantile
This parameter is the original chosen quantile from which we assume we are in the tail of the score distribution. Any value above that can be used to fit the tail Generalized Pareto distribution.
evt_tdigest
Allows the specified evt_quantile
to dynamically adjust with the addition of new data. If false, the quantile will be based on the original evt_warmup_size
number of rows.
evt_window_size
The size of the look back window used to update the Pareto Distribution.
evt_seed
Seed for initial distribution.
evt_two_sides
If false, only values above evt_quantile
will be treated as exceedances. Otherwise, uppwer exceedances (above evt_quantile
) and lower exceedances (1 - evt_quantile
) will be tracked seperately.
evt_max_exceedances
The maximum number of values used to fit the Pareto Distribution.
attribute
Attributes names
coefficient
Attribute coefficient
std.error
standard error of the coefficient
scores
t-value of the coefficent
p.values
p value of the parameter
ci.low
Confidence interval: low value
ci.high
Confidence interval: high value
mean_<TARGET>
The mean of the target
attribute
Attribute names
TARGET_NAME
Attribute Coefficient for each target
mean_<TARGET>
The mean of the target
kmeans_k
K value for this KMeans model
kmeans_silhouette
Silhouette algorithm score
kmeans_wssse
Within set sum of squared error, useful for elbow heuristic.
kmeans_silhouette_trace
All the values used to compute silhouette score.
num_classes
Number of classes in the model.
num_trees
Number of trees in the forest
total_nodes
Total number of Nodes
always_split
Whether this forest split node that contains all the same value
increment
Memory increment used when incrementing the underlying array storing the trees
step
Gradient step.
mean_<TARGET>
The mean of the target
quantiles_hazard
25%,50%,75% Quantile Ranges for Hazard Function
quantiles_survival
25%,50%,75% Quantile Ranges for Hazard Function
hazard_{GROUP_ID}
DataFrame representing the Hazard Function for group GROUP_ID
survival_{GROUP_ID}
DataFrame representing the Survival Function for group GROUP_ID
size
Number of features
ecod_anomaly_threshold
Value of the threshold
skewness
Values of the skewness for each feature
mean
Values of the mean for each feature
lambda
Value used to build model.
maxiter
Value used to build model.
tolerance
Value used to build model if applicable.
max_lambda_power
Value used to build model if applicable.
min_lambda_power
Value used to build model if applicable.
num_lambda
Value used to build model if applicable.
aicc_vec
if using pcr_type=full
and aicc=true
, modelInfo will contain the vector of aicc values for each value in lambda_vec
lambda_vec
if using pcr_type=full
and aicc=true
, modelInfo will contain the vector of lambda values used to test.
q
Value if q used if pcr_type=qfirst
model
This is the coefficients of the linear model aka mb.
model_bias
This is the bias term of the linear model aka mb0.
standardize
Value used to build model if applicable.
centerY
Value used to build model if applicable.
muy
Mean of the target of the model.
mux
Mean of the features of the model.
sig
Vector of standard deviation of the features used for this model.
Wtilde
If standardize=true
, the pca projection matrix is contained in a matrix called that name.
W
If standardize=false
, the pca projection matrix is contained in a matrix called that name.
estimate_type
Value used to build model.
tuning_type
Value used to build model if applicable.
cxx_est
The estimate of the covariance matrix
ddof
Value used to build model.
theta
Graphical Lasso only. This is the inverse of the covariance matrix that is obtained.
B
Graphical Lasso only. This is a matrix containing the coefficients of the underlying Lasso models solved for graphical lasso.
lambda
Value used to build model if applicable.
maxiter
Value used to build model if applicable.
tolerance
Value used to build model if applicable.
initialize_gl
How the first iteration of graphical lasso was initialized.
gamma
Value used to build model if applicable.
min_lambda_power
Value used to build model if applicable.
max_lambda_power
Value used to build model if applicable.
num_lambda
Value used to build model if applicable.
ebic_vec
if tuning is ebic,
vector with values of ebic at values of lambda
lambda_vec
The array of values of lambda at which to build models is applicable
alpha
Value used to build model if applicable.
q
Value used to build model if applicable.
model
This is the coefficients of the linear model aka mb
model_bias
This is the bias term of the linear model aka mb0
standardize
Value used to build model if applicable.
muy
Mean of the target of the model
mux
Mean the features of the model
sig
Standard deviation of the features of the model
ddof
Value used to build model.
oneq
Set to true
if you want a model for only one value of q. false
if models for Q = 1..q
LDA and MLR models allow retrieval of their internal representation as a Java method or SQL function. This API function call can be used with these models to perform predictions. The result is a string that represents the definition of a JAVA method or SQL function.
language
Required
Targeted programming language for code. Choices are: - java = Java method. - sql = SQL function.
200
OK
Success.
String
/* Java Function */ + public double ldaRed_score( + double petal_length, + double petal_width, + double sepal_width + { + (-8.78 * petal_length) + + (-15.84 * petal_width) + + (2.626 * sepal_width) + }
Delete a model from a project
200
OK
Success
Akaike information criterion (AIC) is a model selection heuristic which allows the user to select the optimal Eta and Lambda hyperparameters for their Elastic Net models using in-sample data. The AIC implementation in SymetryML is corrected for small sample sizes. This adjustment is commonly referred to as AICc. More detailed description of the AIC and small sample correction can be found . In the context of SymetryML, the use of AIC can be selected by specifying the following extraParameters of the MLContext.
The following models are supported. For power regression model to be available a SymetryML project must have some attributes set when it is created. Please consult the section.
LDA, Linear Discriminant. Please also consult the section about as its possible to use this type of covariance matrix to build LDA models.
In order to use the Online random forest additional parameters must be specified when creating a SymetryProject. Please consult the endpoint for details. The following 2 tables describes the hyper parameters available with the SML Random Forest algorithm:
Kaplan Meier (KM) requires three additional parameters to be specified when creating a SymetryML project. Please consult the endpoint for details.
After enabling your project with KM parameters, building a KM model is as simple calling the REST Endpoint. At this stage you can optionally set the alpha
parameter which controls the width of your confidence intervals. This is done by setting alpha
in the extraParameters
field of the .
Single Pass Logistic Regression (LR) requires parameters to be specified when creating a SymetryML project. Please consult the endpoint for details.
SymetryML projects with power transformation enabled will automatically create new attributes representing various power transformations of the target so that they can be modeled against the existing features. These additional target features will have a prefix and separator to easily pinpoint them. (e.g pt_sepal_width^-2.0
, pt_sepal_width^-1.8
, pt_sepal_width^-1.6
, …, pt_sepal_width^1.8
, pt_sepal_width^2.0
). It’s possible to configure the prefix and separator - pt_
and ^
in our previous example. Please refer to the table for details.
Power transformation prefix and suffix can be configured in the /opt/symetry/symetry-rest.txt
configuration file. Please consult the for details on configuring that file.
This allows to avoid numerical instability when computing determinant of large matrix. Determinant are needed when building QDA model with . This parameter control how determinant is computed by using a sum of natural logarithms - instead of multiplication on the diagonal of the decomposed inverted matrix.
Enable this model to use the extreme value theory algorithm to mark prediction as anomaly or not. This functionality is only available for regression and anomaly model. When enabled an additional prediction result key is returned : evt_is_anomaly
. Please consult for background on the extreme value theory algorithm used.
Laplace smoothing factor. Default is 1
. See .
Lastly, Elastic Net models initial weights can be specified in the in the following format:
If the distribution of your input data does not change, the threshold method might be sufficient for your needs. However, in the case of dynamically changing distributions more robust approaches should be used. Once such approach is to leverage Extreme Value Theorem (EVT) to compute a distribution of extreme values which we can later be queried to determine whether or not a particular instance is an anomaly. Additional information on the use of EVT for anomaly detection can be found
This API function allows for the building of an optimized model (aka reduced models), where the SymetryML service builds a model using the best attributes for the target specified in the MLContext of the request body. Like the model-building request, the response contains a Location header for the job that was created to service this request. Information about these jobs can be requested and are described in the chapter.
This API function retrieves specific information about a model. A JSON data structure is returned. The response contains the type of model, attributes, and targets used by the model, along with specific information based on the type of model (info field).
the extra
field will also contains the
the extra
field will also contains the
See
See
See
See
see
see
see