SymetryML Projects REST API
Last updated
Last updated
SymetryML project are different than classical machine learning or other data science projects in the way data is ingested and in how machine learning models are created. Normally one would create a machine learning model by specifying some file(s) or data structure containing the data description - number of rows, input attributes, target(s) for models. This step can be highly memory intensive if the data to be processed is large.
SymetryML is different. There is no limitation on the number of rows that can be processed and the number of rows do not impact the memory requirement to build a model. Updating a project with some data and building a machine learning model based on a project are 2 separates operations:
When updating a project - that is learning or forgetting some data - internally some representation of the data is being updated. This update is usually quite fast. This internal representation can be queried in real time using the .
When building a model, the internal representation is used to extract values needed by each of the different machine learning models supported by SymetryML. This makes model building very quick, usually under a few milli-seconds. If another model is needed perhaps with different inputs and/or target(s), one just needs to issue a request to build another model and no re-scanning of the data is needed. Models are built based on the current state of a project. If a project was updated with some data, a rebuild of the model might be needed so that the models use the latest data available in the project.
New with SymetryML version 5.0 are “Federated” project types. The following table describes the difference between these project types and when to use them. For more details on Federated Project please consult this .
cpu or dynamic
- Project update (learn/forget) done on CPU.
- Project stored on CPU memory.
- Use this type of project if you have categorical attributes or fewer than 50-100 attributes.
gpu
- Project update (learn/forget) done on GPU.
- Project stored on GPU memory.
- Use this type of project if you have data with more than 100 - 200 attributes.
multi gpu
- Project update (learn/forget) done on multiple GPUs.
- Project stored on multiple GPUs memory.
- Use this type of project if you have dense data with more than 1000 attributes and you have multiple GPUs available. Please note that you cannot directly use category with a GPU project type. Such data needs to be one hot encoded before pushing it into a SymetryML GPU project.
sml.mgpubet.persistence.dir
sml.mgpubet.persistence.suffix
sml.mgpubet.persistence.file-ext
Sequence
Partitioned
Use this type of project if you need to use LDA model with multi-class classifier or if you need to build a QDA model. When creating such projects one needs to specify the attribute that will be used to partition the project. This attribute needs to be of type String
and usually should be the target for building your LDA or QDA models.
Federated Project
Fusion Project
New With SymetryML 5.2 is a powerful Online Random Forest model. In order to use it, some configuration is needed when creating your project:
Specify the target / dependent column
Specify the type of random forest (currently only classifier is available, soon regressor random forest will also be available)
Kaplan Meier(KM) is a survival model which can estimate a survival function from lifetime data. Unlike most SymetryML models, KM cannot be built after a project has learned a dataset. Instead, the user must specify the following parameters when creating a project.
Specify the time column.
Specify the event column.
Specify the group column.
With the release of SymetryML Version 4.1 came the ability to use Markov Chains and Hidden Markov Model. To build such models, you create a sequence SymetryML Project and specify its order. Order defines your ngrams that will be learned and the level of you ngrams will be a given value in a dataframe. For Markov chains, specify a categorical attribute for the input of your model. For the Hidden Markov Model, the observed state can be categorical or continuous, but the hidden must be categorical.
If a DataFrame to be processed has more than 1 column, it will be assumed that each line represent a sequence to learned.
It's possible to filter data as it is consumed by a SymetryML project. In order to filter out some tuple one needs to pass a string containing a comma separated list of predicates that all need to be true for a row / tuple to be processed, otherwise it will be ignored.
Example of such string:
The following table list the available binary boolean operator that can be used in any individual predicates.
==
Equals
Works on both Strings and Numbers
>=
Greater or Equals
Only works with Numbers
<=
Smaller or Equals
Only works with Numbers
!=
Not Equal
Works on both Strings and Numbers
>
Greater
Only works with Numbers
<
Smaller
Only works with Numbers
This API function call create a SymetryML project.
sml_project_autosave
boolean
sml_project_learn_merge
boolean
1. A new temporary project is created 2. The new temporary project is updated with the the data. 3. If no problem is encountered the temporary project is merged with the main project. 4. This allows a commit all or nothing approach to processing data.
sml_project_predicate_row_filter
String
The following table list mandatory parameters for sequence, partitioned and multi-gpus projects:
Sequence
sml_project_order
specify the sequence order (ngram)
Partitioned
partitionSplit
specify which attribute / column to use to split the partition
Multi GPU
mgpu_num
Number of GPUs to use
pid
Required
Name of the new SymetryML project.
Type
Optional
persist
Optional
Whether to persist or not a project. Valid values are
* true
* false
enableHistogram
Optional
Enable histogram for continuous and binary attributes in this project.
cpu
CPU based project, this is the default value
gpu
GPU based project
sequence
partition
mgpuf
Multi-GPU project using 32 bits float
mgpud
Multi-GPU project using 64 bits float - aka double
201
CREATED
Success. {"statusCode":"CREATED","statusString":" + SYMETRYML Created with id:r1","values":{}}
409
CONFLICT
SymetryML project already exists. {"statusCode":"CONFLICT","statusString":" + Customer [c1] already have SYMETRYML with id[r1], ","values":{}}
None
200
OK
Success.
400
BAD REQUEST
SymetryML project does not exist. {"statusCode":"BAD_REQUEST","statusString":"Cannot Find SYMETRYML id[r5] for Customer id [c1]","values":{}
This REST endpoint freeze a Symetry Project, that is it block it from being able to be updated. Pushing data to a frozen project will raise an Exception and the data will not be processed.
200
OK
Success.
400
BAD REQUEST
SymetryML project does not exist. {"statusCode":"BAD_REQUEST","statusString":"Cannot Find SYMETRYML id[r5] for Customer id [c1]","values":{}
200
OK
Success.
400
BAD REQUEST
SymetryML project does not exist. {"statusCode":"BAD_REQUEST","statusString":"Cannot Find SYMETRYML id[r5] for Customer id [c1]","values":{}
This API function call renames a SymetryML project.
rename
Required
New name for the SymetryML project.
201
CREATED
Success. {"statusCode":"CREATED","statusString": + "SYMETRYML Created with id:r1","values":{}}
409
CONFLICT
A SymetryML project by the specified new name already exists. {"statusCode":"CONFLICT","statusString":"Customer [c1] already have SYMETRYML with id[r1], ","values":{}}
None
This API function call assigns an encoder to a SymetryML project.
encodername
Required
Encoder name to use for this SymetryML project.
200
OK
Success.
None
This API function call deletes a SymetryML project from your repository.
200
OK
Success. {"statusCode":"OK","statusString":"deleted Project with id[r1] from Customer[c1] store","values":{}}
400
BAD REQUEST
User does not have the SymetryML project. {"statusCode":"BAD_REQUEST","statusString":" + Customer[c1] does not have Project with id[r1], ","values":{}}
None
200
OK
Success.
{"statusCode":"OK","statusString":"OK","values":{"stringList":{"values":["r1","r11","r2","r3"]}}}
In order to save memory on the server, it’s possible to unload a project from memory. Note that if you do that with a project that is not persisted then the project cannot be restored.
200
OK
Success.
It’s possible to merge 2 SymetryML projects together using the following rest endpoint.
otherProject
Required
Name of the other SymetryML project to use for the merging. At the end of this operation otherProject
will be merge into pid
. That is otherProject
will stay the same while pid
will be updated with the content of otherProject
.
200
OK
Success.
modelsList
List of models
modelTypeList
List of model types
dsList
List of data source names
encoderName
Encoder name if the SymetryML project use an encoder
attributeNames
List of attributes names
attributeIndexes
List of attributes index
attributeTypes
List of attributes type. For example:
- ([C] = continuous
- [B] = binary
- [S] = string
- [L] = list
- [X] = ignore
pid
Name of this object
hash
Internal use only
isDirty
Internal use only. This field is deprecated.
creationDate
Date this project was created.
lastModificationDate
Last time data was added / learned on this project.
modelAssessment
Internal use. List of assessments from SymetryML Web application
modelPredictions
Internal use. List of predictions from SymetryML Web application
categorySeparator
Internal use.
loaded
Specifies if a project is loaded in memory or not
persisted
Specifies if a project is persisted or not
streams
List of streams that belong to that project
partitionColumn
For Partitioned Projects, this is the column to use to partition the project
autoSave
Whether or not this project is autosaved.
type
The type of this project
histogramEnabled
Whether or not this project has histogram enabled.
fusionCellInfoList
params
A map containing various key / values. See Table Below
fusion_fetch_error
Error string from the last time a fusion project fetched data from its cells.
sml_project_is_freeze
Describe if this project is in the frozen state, that it is not possible for it to process new incoming data.
automl_project_is_automl
Specify that this project use AutoML
automl_setup_done
When using AutoML, this boolean key indicates that the auto ml 'warm up' is complete.
sml_project_power_column
sml_project_power_min
sml_project_power_max
sml_project_power_steps
fed_pulsing
If a Federated Project is pulsing or not
fed_has_error
If a Federated Project has error or not
fed_is_admin
If a Federated Project is the admin of the federation it is a member.
200
OK
Success.
400
BAD REQUEST
SymetryML project does not exist. {"statusCode":"BAD_REQUEST","statusString":"Cannot Find SYMETRYML id[r5] for Customer id [c1]","values":{}
See the example below.
This API function is asynchronous. If it succeeds, it returns a 202 response, along with a Location header that specifies the job URL. For more information about asynchronous REST calls, see the section "Asynchronous Learning."
async
Optional
202
ACCEPTED
Success. Includes an HTTP Location header specifying the location of the job ID that was created to handle the request. For example: {"statusCode":"ACCEPTED","statusString":"Job Created","values":{}}
After a Restore job completes, it returns the following information:
200
OK
202
ACCEPTED
None
Export / Import functionality enables the user to export projects and then re-import them. Projects are encoded into a Base 64 string so that they can easily be transmitted using various channels of communication.
Export a model into a Base 64 string.
200
OK
Success.
String
A Base64 representation of the project.
Import a model into a project.
persist
Optional
Whether or not to persist that project
The body is a Base 64 string from a project that was exported first with the [Project Export] functionality.
200
OK
Success.
SymetryML projects support streaming data. The streaming capabilities are encompassed in 2 major functionalities:
The following sub-sections will describe the learn and forget streaming API.
Usually, there is no need to run learn and forget tasks asynchronously unless your data has a large number of attributes that, depending on your hardware, can take more or less time to be executed. As a rule of thumb, data with fewer than 100 attributes can be learned and forgotten safely using synchronous execution. However, your experiences might be different. Therefore, if your data has a large number of attributes, we recommend that you perform testing to determine whether the asynchronous or synchronous method will work better for you.
Note
Addition of new attributes is handled by the incremental update automatically, as described in the next section.
async
Optional
Performs the operation asynchronously. We recommend using true for this parameter if your data is wider than 100 attributes. Default: false
200
OK
Success. {"statusCode":"OK","statusString":"OK","values":{}}
202
Accepted
If async == true, the server accepted the request.
400
BAD REQUEST
SymetryML project cannot be found. {"statusCode":"BAD_REQUEST","statusString":" + Cannot Find SYMETRYML id[r2] for Customer id [c1]","values":{}}
None
async
Optional
Performs the operation asynchronously. We recommend using true for this parameter if your data is wider than 100 attributes. Default: false
200
OK
Success. {"statusCode":"OK","statusString":"OK"," + values":{}}
400
BAD REQUEST
If SymetryML project cannot be found: {"statusCode":"BAD_REQUEST","statusString":" + Cannot Find SYMETRYML id[r2] for Customer id [c1]","values":{}}
None
- Multi GPU project are stored in the local file system. Where they are saved depends on the following SymetryML configuration parameters (Consult the for details)
SymetryML needs to perform some additional logic for these type of models to be built. Use the sequence project type so that you can build such models on your data. See the section for more details.
Federated Project allows multiple SymetryML projects to form a federation that leverage each individual project so that they all work together as if they were one single project. That is from the perspective of each individual project they can all build the same models or compute the same exploration metrics and all of this is accomplished without sharing any data. For more information please consult the .
Fusion Project allows to handle data coming from a stream with very high throughput. Please consult the section on for details
Various Hyper parameters. Please see the for details about the additional configuration available.
Please see the for details about the additional configuration available.
New with SymetryML version 5.4 is an online version of KMeans clustering. The clustering algorithm is controlled by a few parameters that need to be specified when creating a project. Please consult the for details on the additional configuration to enable KMeans clustering on your project.
New with SymetryML 4.2 is the ability to create Partitioned Projects. Partitioned project allows you to build Multi-Class LDA models as well as QDA models. When creating such projects one must specify which attributes will be the targets for these models. This parameter is called the partition column. This parameter is mandatory. Please refer to this for a sample REST request / response.
For the sequence project, SymetryML interprets the data structure differently:
If a to be processed has only 1 attribute, it will be assumed that each line represent a token or level of a sequence.
For more information on ngrams, order, and level, see .
New with version 5.0 of SymetryML is the capabilities to allows multiple SymetryML projects - possibly on different sites / geographical locations / different business units - to synchronize with each other in order to leverage each others data without sharing the actual data. This functionality is called Federated Learning
. A federated project act exactly the same as any other project in terms of how you build model or explore the data. The only difference is how you create them. For more details about the lifecycle of federated project please consult the .
The body of the request consist of a key/value map. It is optional and depends on the type of project. Please refer to the following table that describes mandatory parameters for the different type of projects. Also refer to for detail on the request body json data structure.
Default is true. This parameter controls whether or not a project is automatically saved after new data is pushed into it. The default behavior is true. If false is used, it is important to invoke the endpoint to persist the project in order to not lose information between server restart.
Default is false. This parameter is used whenever a project learns new data, see for details. This configuration allows to commit
new data to a project only if the whole was processed without an error. Whenever new data is pushed to a SymetryML project the following happens:
Default is empty. This parameter allows to filter data as it is consumed by the project. Please consult the for more details.
The type of the project. Please see table for details.
Sequence project, see
Partitioned project, see
This REST endpoint save a Project into the underlying persistence database - e.g. Redis. This is needed for project that do not use the autosave
functionality - see for details.
This REST endpoint does the reverse of. If the project was not frozen it does nothing.
This API function call retrieves a list of SymetryML IDs that were created by a given user. The response contains a string list containing the names of all the projects that belong to the user who made the request. The format of the response is described in section .
This REST API function call retrieves information about a single SymetryML project. The following information about a particular SymetryML project is returned as part of the JSON response.
For projects, this contain the information about the fusion cells. Please see .
Needed for power transformation model. Consult for details.
Needed for power transformation model. Consult for details.
Needed for power transformation model. Consult for details.
Needed for power transformation model. Consult for details.
This REST API function call returns information about all SymetryML projects that were created previously by a given user. To conserve memory usage, minimal information about each SymetryML project loads into server memory. If more information is required, it is loaded "lazily and transparently" by the server. For more information about the response format, see the .
If set to true then the ProjectInfo request will be done asynchronously and the result will be fetched using the
Response entity is a JSON data structure. For each SymetryML project ID restored, the entity contains its ID as well as the ID of models that belong to it.
Job is not finished. Includes a entity. For more information, see .
Using the or by providing a as the request body.
Using the to create stream data source that automatically push streaming data into your SymetryML project. Use this API to manage connections to streaming server solution such as Apache Kafka.
Both the learn and forget methods can be invoked synchronously or asynchronously. The default method is synchronous. To run the task asynchronously, add async=true as a query parameter to your REST invocation. When invoked, the response is 202 ACCEPTED
and you can use the Job Status API to determine when a learning job is finished. For more information, see .
This API function call learns new data. The body of the request should contain a DataFrame JSON data structure. For information about this JSON data structure, see the . For information about the async parameter, see .
This API function call allows users to forget data. The body of the request should contain a DataFrame JSON data structure. For information about this JSON data structure, see . For information about the async parameter, see .
See the sample in the previous .