SymetryML6.1
  • Introduction
  • Guides
    • Onboarding Guide
    • Technical Requirements
    • Admin User Guide
    • Installation Guide
      • Installation Guide - GPU
      • Installation Guide - Spark
  • SymetryML GUI
    • ML Toolkit
      • The SymetryML Difference
      • Data Mining Lifecycle
      • SymetryML Concepts
      • Data Sources
      • Streams
      • Encoders
      • Projects
      • Models
    • Sequence Models
    • SymetryML Federated Learning
      • Creating the Federation
      • Load data to local project
      • Requesting Federation Information from Admin Node
      • Joining a Federation with a peer node
      • Federated Data & Modelling
      • Appendix
    • DEM Generator
  • SymetryML Rest Client
    • REST API Reference Guide
      • SymetryML REST API Security
      • SymetryML JSON API Objects
      • Encoder Object REST API
      • SymetryML Projects REST API
      • About Federated Learning
      • Hipaa Compliance and Federated Learning
      • Federated Learning API
        • Federated Learning Topologies
        • Federated Learning with Nats
        • Federated Learning with AWS
        • Fusion Projects
      • Exploration API
      • Modeling API
      • Exporting and Importing Model
      • Third Party Model Rest API
      • SymetryML Job Information
      • Prediction API
      • Data Source API
      • Project Data Source Logs
      • Stream Data Source API
      • AutoML with SymetryML
      • Transform Dataframe
      • Select Model with SymetryML
      • Auto Select with SymetryML
      • Tasks API
      • Miscellaneous API
      • WebSocket API
      • Appendix A JSON Data Structure Schema
      • Appendix B Sample Code
  • SymetryML SaaS
    • SaaS Homepage
    • SaaS Dashboard
    • SaaS Account
    • SaaS Users
    • SaaS Licence
Powered by GitBook
On this page
  • Using a Data Source
  • Required DSInfo Fields
  • Additional Information Stored in Data Source
  • Additional CSV Options
  • Additional Information on Spark S3 Data Source
  • About Data Source Plugins (DSPlugins)
  • Data Source Encryption
  • Data Source Create
  • URL
  • HTTP Responses
  • HTTP Response Entity
  • Sample Request/Response
  • Data Source Update
  • URL
  • HTTP Responses
  • HTTP Response Entity
  • Sample Request/Response
  • List Customer Data Sources
  • URL
  • HTTP Responses
  • HTTP Response Entity
  • Sample Request/Response
  • Delete Data Source
  • URL
  • HTTP Responses
  • HTTP Response Entity
  • Data Source Information
  • URL
  • HTTP Responses
  • HTTP Response Entity
  • Sample Request/Response
  • Data Source Browsing
  • URL
  • HTTP Responses
  • HTTP Response Entity
  • Sample Request/Response
  • Fetching Sample Data 1
  • URL
  • HTTP Responses
  • HTTP Response Entity
  • Sample Request/Response
  • Fetching Sample Data 2
  • URL
  • HTTP Responses
  • HTTP Response Entity
  • Sample Request/Response
  • SymetryML Project Data Source API
  • Add Data Source to a SymetryML project
  • Remove Data Source from a SymetryML project
  • Learning Data from a Data Source
  • Forgetting Data from a Data Source
  • Prediction Based on a Data Source
  • Sample Request/Response
  • Encoder Data Source API
  • Updating an Encoder with a Data Source
  • Sample Request/Response
  • Listing a Data Source Used by an Encoder
  • Data Source Job Status
  • Learning a Data Source
  • Forgetting a Data Source
  • Prediction Based on Data Source
  • HTTP Responses
  1. SymetryML Rest Client
  2. REST API Reference Guide

Data Source API

Using a Data Source

In SymetryML, a data source is an abstraction of a CSV file that resides somewhere and that can be used by:

  • SymetryML projects to learn new data

  • Models to make predictions and assessments

  • Encoder to update their internal encoding table

SymetryML supports various types of data sources:

  • Secure File Transfer Protocol (SFTP)

  • HTTP/HTTPS URL

  • Amazon Simple Storage Service (S3)

  • Microsoft Azure Blob Storage

  • Google Cloud Storage

  • Oracle OCI Object Storage

  • Amazon RedShift

  • Spark Processing: Amazon S3, Google cloud storage, Oracle OCI object storage and Microsoft Azure blob storage data sources can be processed in parallel leveraging a Spark Cluster.

  • SymetryML data source plugins

  • JDBC

  • Local Data Source, that is allows to browse the local file system of the jetty web server with same privileges as the user running the Jetty web server.

Required DSInfo Fields

Field
Description

type

Type of Data Source

- Secure FTP (SFTP) data source = sftp

- HTTP/HTTPS data source = http

- Amazon S3 = s3

- Oracle OCI Object Storage with S3 Compatibility = s3oci

- Google Clound Storage = gcs

- Amazon Redshift = redshift

- Data Source Plug ins. - jdbc

- Local file = localfile

- Amazon Elastic Map Reduce = emr

- Microsoft Azure Blob Storage = abs

name

Name of the data source.

info

Additional Information Stored in Data Source

The info field of a data source contains specific information based on the type of data source. The following tables describes what they are for each different type of data source

HTTP/HTTPS Data Source

field
Description

path

http:// or https:// URL.

Secure FTP (SFTP)

field
Description

path

path to the file on the server.

sftpuser

user name used to connect to the SFTP server.

sftppasswd

user password used to connect to the SFTP server.

sftphost

host to which you want to connect.

Amazon S3

field
Description

path

path to file on the server, excluding the Amazon S3 bucket

s3accessKey

Amazon S3 access key to use to connect to S3.

s3secretKey

Amazon S3 secret key to use to connect to S3.

s3bucket

Amazon S3 bucket to use.

Oracle OCI Object Storage with S3 Compatibility

field
Description

path

path to file on the server, excluding the Oracle OCI Object Storage bucket

s3accessKey

Oracle Oracle OCI Object Storage access key to use to connect to Oracle OCI Object Storage.

s3secretKey

Oracle Oracle OCI Object Storage secret key to use to connect to Oracle OCI Object Storage.

s3bucket

Oracle Oracle OCI Object Storage bucket to use.

ocinamespace

Oracle Oracle OCI Object Storage namespace

ociregion

Oracle Oracle OCI Object Storage region

Google Cloud Storage

field
Description

path

Path to file

gcsaccessKey

gcssecretKey

gcsbucket

GCS Bucket

gcsproject

GCS Project

gcsmarker

Optional marker parameter indicating where in the GCS bucket to begin listing. The list will only include keys that occur lexicographically after the marker.

gcsdelimiter

GCS File/Folder delimiter. / is used by default

Microsoft Azure Blob Storage

field
Description

path

Path to file

azure.credentials.connection.string

Connection string that specifies credentials to authorize access to Azure Blob Storage. Use one of this, account key or SAS token.

azure.account.name

Name of the Azure account to use

azure.credentials.sharedkey.account.key

Account key that specifies credentials to authorize access to Azure Blob Storage. Use one of this, connection string or SAS token.

azure.credentials.sharedkey.sas.token

SAS token (account or service) that specifies credentials to authorize access to Azure Blob Storage. Use one of this, connection string or account key.

azure.blob.container.name

Name of the Azure Blob Storage container that contains the blob

azure.blob.inputstream.chunk.size.max.bytes

Maximum size in bytes of each chunk of data when reading the blob contents chunk by chunk. Default: 4194304

azure.blob.path.delimiter

String that separates elements of the path to the blob file. Default: /

azure.blob.list.marker

Marker that specifies the beginning of the next page of a list of Azure Blob Storage items to fetch from Azure. This property is used internally.

Amazon Redshift

field
Description

path

name of the table to use.

rsuser

Redshift database user

rspasswd

Redshift user password

rsurl

Redshift connection url

Spark Map Reduce

field
Description

sparkmaster

address of the spark’s cluster master

spark.job.process.jvm.heap.size.min

Mininum JVM size used for the spark Driver process launched by the Jetty Rest Server. Default : 1024m

spark.job.process.jvm.heap.size.max

Maximum JVM size used for the spark Driver process launched by the Jetty Rest Server. Default: 2048m

Any Spark parameters be used also like: spark.executor.memory or spark.executor.cores

To pass such parameters, prefix them with ‘sml.sparkenv.’ as in the following examples: - sml.sparkenv.spark.executor.cores - sml.sparkenv.spark.cores.max

spark.automl.sample.random.seed

If AutoML is used, one can set the randomizer seed that will be used to select a random sample of tuple from the data source to be used to bootstrap the AutoML environement.

Spark Map Reduce Data Source Type

Please note that the following matrix of supported version of Spark versus data source as well as how to name the data source for a given combination:

Data Source
Spark 2.4.5 hadoop 2.7
Spark 2.4.6 hadoop 2.7
Spark 3.0.1 hadoop 2.7
Spark 3.0.2 hadoop 3.2

Oracle OCIs3

N

N

N

sparkocis3_mr_3_0_2

Amazon S3

sparks3_mr_2_4_5

sparks3_mr_2_4_6

sparks3_mr_3_0_1

sparks3_mr_3_0_2

Google Cloud Storage

sparkgcs_mr_2_4_5

sparkgcs_mr_2_4_6

sparkgcs_mr_3_0_1

sparkgcs_mr_3_0_2

Microsoft Azure Blob

sparkabs_mr_2_4_5

sparkabs_mr_2_4_6

sparkabs_mr_3_0_1

sparkabs_mr_3_0_2

JDBC

field
Description

driver

host

port

database

user

password

Amazon EMR

field
Required
Description

chunksize

Optional

SymetryML process the data chunk by chunk. This parameters specifies the chunk size. Default: 5000

emr.client.aws.region

Optional

AWS region of the EMR cluster. Default: us-east-1

emr.cluster.ec2.key.name

Required

EC2 key pair name for the cluster.

emr.cluster.ec2.subnet.id

Optional

EC2 subnet id for the cluster. Default: null

emr.cluster.instance.count

Required

# of EC2 instances in the EMR cluster.

emr.cluster.instance.master.type

Required

Instance type of the master EC2 instance.

emr.cluster.instance.slave.type

Required

Instance type of the slave EC2 instances.

emr.cluster.log.storage.enable

Optional

Boolean enabling for storing the EMR logs. Default: false

emr.cluster.log.storage.uri

Optional

URI of the EMR logs. Default: null

emr.job.flow.role

Optional

EMR role for EC2 that is used by EC2 instances within the cluster. Default: AWS EMR_EC2_DefaultRole

emr.s3.job.bucket.name

Required

S3 bucket that stores the files needed for the Spark cluster job; it can include the directory that stores the EMR logs.

emr.service.role

Optional

Amazon EMR role, which defines the allowable actions for Amazon EMR. Default: AWS EMR_DefaultRole

path

Required

Path of data source to process, can be a folder. This is the 'data path' without the 'bucket part'

s3accessKey

Required

AWS access key

s3bucket

Required

AWS S3 bucket where data resides

s3marker

Required

Optional marker parameter indicating where in the S3 bucket to begin listing. The list will only include keys that occur lexicographically after the marker.

s3secretKey

Required

AWS secret key

sml.sparkenv.*

Required

Allows to specify any Apache Spark environment configuration like: spark.cores.max e.g. use:sml.sparkenv.spark.cores.max or spark.executor.memory e.g. use:sml.sparkenv.spark.executor.memory

sparksymproject

Required

Name of the project.

Additional CSV Options

You can specify additional parameters that describe the ‘type of csv files’. You can also add the following parameters to a data source to change how SymetryML parses your data:

Key
Description

csv_entry_separator

Specifies which character to use as the delimiter for each record for a given tuple.

csv_quote_character

Specifies the quote character.

csv_strict_quotes

Setting this option to true discards characters outside the quotes. If there are no quotes between delimiters, an empty string is generated.

csv_header_missing

Specifies that this data source does not have any header. SymetryML can then generate a header automatically.

Additional Information on Spark S3 Data Source

SymetryML can leverage a spark cluster to speed up processing of large amounts of data significantly. Currently, your data must reside on Amazon S3. Depending on the size of your data, it may take more or less time for the job to start, as the Spark Cluster must compute the partitions of your data before starting the job. Consequently, if your data is very large, this may take a few minutes.

Best practices for Spark S3 Data Source:

  • Performance may vary depending on Amazon resource utilization when you run your job.

  • Be sure all executor nodes in your cluster reside in the same Amazon EC2 placement group.

About Data Source Plugins (DSPlugins)

The SymetryML data source API allows you to create a new data source by the form of Java library (jar) that can be added the server. Instead of transforming data into CSV files, for example, you can write a DS plugin that reads the data natively.

Data Source Encryption

Data sources might contain sensitive information that should never be passed in the clear. To avoid having to use HTTPS for these services, the SymetryML REST API forces you to pass such information in encrypted form. This can be done easily, as each SymetryML secret key is also a 128-bit Advanced Encryption Standard (AES) secret key.

  1. Extract the JSON string from that data structure.

  2. Encrypt the JSON string representation using your SymetryML secret key:

    • Initialization vector in Base 64: LzM5QUtXZXWHm7HJ4wAePg==

    • Block cipher algorithm: AES/CBC/PKCS5Padding

Data Source Create

This API function creates a new data source.

URL

POST /symetry/rest/{cid}/dss/create [Body=DSInfo (encrypted)]

HTTP Responses

HTTP Status Code
HTTP Status Message
Description

202

CREATED

Success.

409

CONFLICT

A data source with the specified name already exists.

HTTP Response Entity

None.

Sample Request/Response

Request:
POST url="http://charm:8080/symetry/rest/c1/dss/create"

Body:
emUmJ7LXKaICPww/dKRoMR/Go6+B3ATTn+GwgE1/vcK9pN/mLoqGyKiMtGiTYMct4Gth1ElniKLXtmXfHfs7Rfn+QhJHZ+s00w2PBdbvYZIoF3My04H5XCboY21Fh4SkBhsxo+DhYuardN7R+uGphN/DSbiHRLIXe51HijSpfuq0fJuQYW0ccN4FM/B74LJccuDnbb+IouS9u/9rybKu/wsVbDVRTK/zZpWXyn4qtds=

Response:
{"statusCode":"CREATED","statusString":"DS created with name [ds1] for customer [c1]","values":{}}

Data Source Update

This API function update an existing data source.

URL

PUT /symetry/rest/{cid}/dss/{dsname} [Body=DSInfo (encrypted)]

HTTP Responses

HTTP Status Code
HTTP Status Message
Description

200

OK

Success.

404

NOT FOUND

If data source with the specified name does not exist.

HTTP Response Entity

None.

Sample Request/Response

Request:
PUT url="http://charm:8080/symetry/rest/c1/dss/aDataSourceName"

Body:
emUmJ7LXKaICPww/dKRoMR/Go6+B3ATTn+GwgE1/vcK9pN/mLoqGyKiMtGiTYMct4Gth1ElniKLXtmXfHfs7Rfn+QhJHZ+s00w2PBdbvYZIoF3My04H5XCboY21Fh4SkBhsxo+DhYuardN7R+uGphN/DSbiHRLIXe51HijSpfuq0fJuQYW0ccN4FM/B74LJccuDnbb+IouS9u/9rybKu/wsVbDVRTK/zZpWXyn4qtds=

Response:
{"statusCode":"OK","statusString":"DS updated with name [ds1] for customer [c1]","values":{}}

List Customer Data Sources

This API function returns all the data sources that belong to a user.

URL

GET /symetry/rest/{cid}/dss/

HTTP Responses

HTTP Status Code
HTTP Status Message
Description

200

OK

Success.

HTTP Response Entity

HTTP Response Entity
Example

{"statusCode":"OK","statusString":"OK","values":{"stringList":{"values":["Iris_SymetryML.csv-predict.csv:s3","h1:http","BigData11g_Test.csv:http","Iris_SymetryML.csv:s3","Smaato_Bids_20130812_CTR.csv:s3"]}}}

Sample Request/Response

Request:
GET url="http://charm:8080/symetry/rest/c1/dss"

Response:=
{"statusCode":"OK","statusString":"OK","values":{"stringList":{"values":["Iris_rtlm-out-neil.csv:s3","Iris_rtlm.csv:s3","BigData11g_num.csv:s3","Smaato_Bids_20130812_CTR.csv:s3"]}}}

Delete Data Source

This API function deletes a data source.

URL

DELETE /symetry/rest/{cid}/dss/{dsname}

HTTP Responses

HTTP Status Code
HTTP Status Message
Description

200

OK

Success.

409

CONFLICT

Data source cannot be deleted. A SymetryML project might be using the data source. The response contains an error string with further details.

HTTP Response Entity

None

Data Source Information

URL

GET /symetry/rest/{cid}/dss/{dsname}

HTTP Responses

HTTP Status Code
HTTP Status Message
Description

200

OK

Success.

HTTP Response Entity

HTTP Response Entity
Example

DSInfo (encrypted)

Sample Request/Response

Request:
GET url="http://charm:8080/symetry/rest/c1/dss/BigData11g_num.csv"

Response:
{"statusCode":"OK","statusString":"OK","values":{"dsinfo":"emUmJ7LXKaICPww/dKRoMRNJEMepL37Eq9CgZfhZPWj93mo3A+C8ucfIOGaPwwn2dip/JEuLFjUT/fjHjy18XKFnzFz5Ujp0WmS0uA4ssvAJwNPL6BvnsY6+a/lKa+c/q9/5tz5lr13N13I7OGAhuYhXYV+xb8oFZqsn+bH5spBXRb5u+oyEMXNKLCaNt3pzc/xCyW47KCwIi9V5iSA+fcJAWfetm9ZsIHNbI6utkxKrqrU5OfLmgriGAP++yQtutlGR7r/bKV1bRc8UDsgsXQg1HgoxHcKXCgsAFDFzqsJmZ/5/uQDc0ytc5Fk85GUx"}}

Data Source Browsing

This API function lists content about a remote data source directory.

URL

POST /symetry/rest/{cid}/projects/dsbrowse/ [Body=DSListingRequest (encrypted)]

HTTP Responses

HTTP Status Code
HTTP Status Message
Description

200

OK

Success.

HTTP Response Entity

HTTP Response Entity
Description

Contains listing information about the requested directory or folder.

Sample Request/Response

Request:
POST url="http://charm:8080/symetry/rest/c1/projects/dsbrowse"

Body:
drhSjndw6G15pgevCsDqaSfjX9x3hMo+dNqd/MV943Dsd2rl2guhvq2qUhjEORcfKAEjHaoRZMKmSbQB6bcca2YT6HmUyRxuOG0wiKgGy0MOEq7+iIncbX4orpGr4rhro1Frw909Uy8qcWskaInQHJT4EGRPcwxvwInFlea39hsMkycFK4pKlTpanOUYgcv7

Response:
{"statusCode":"OK","statusString":"OK","values":{"dsdirectoryListing":{"ok":true,"dirs":["data2/","folder-a/","folder-datasets/","folder-demo/","folder-dev/","folder-dev-pub-http/","folder-dev-pub-https/","folder-docs/","folder-perf-all-reports/","folder-source/"]}}}

Fetching Sample Data 1

URL

POST /symetry/rest/{cid}/dss/sample/preview [body=DSInfo encrypted]

HTTP Responses

HTTP Status Code
HTTP Status Message
Description

200

OK

Success.

HTTP Response Entity

HTTP Response Entity
Description

Contains a sample of the data source up to 128 lines.

Sample Request/Response

Request:
POST url="http://charm:8080/symetry/rest/c1/dss/sample/preview

Body:
WTqUHBoXHbl+cMMjdc0zgjBP8e44G1os15V+I4GZgDOr1dX9uOfvY5uK9ZgC9yral9XC1ohD1W+UvkPlKR4dQT00EgCdS2UPgZz2NwwooHOM+KY1Ysf5qZlkFKiOkxwoWH/mr3mvvgdTUpZS8zrDJk3gwsavFT5fe0J2lTR33F1OH7FwxP4qs5nzRbVz546l

Response:
{"statusCode":"OK","statusString":"OK","values":{"dataframe":{"attributeNames":["adexchange","imp_width","imp_height","imp_btype","preference","pub","domain","site_base_url","category","device_ip","device_country","device_dma","device_state","device_city","zip","device_carrier","language","device_os","device_make","device_model","device_osv","lat","lon","restriction_bcat","restriction_badv","position","gender","user_keyword","user_yob","user_age","view_count","campaign","creative","creative_type","winner","bid_price","win_price","clicked","rtlm_ctr_score","rtdm","req_hour","req_day","category_count","bcat_count","badv_count"],"data":[["Smaato","320","50","","APP","New IT Solutions Ltd","","","Technology \u0026 Computing","157.55.32.83","US","","WASHINGTON","REDMOND","98052","","","Unknown","","","","47.67398834","-122.1215134","IAB7-28|IAB19-30|IAB22-1|IAB19-3|IAB17-18|IAB26|IAB25|IAB24|IAB9-9|IAB7-

(...)

,"attributeTypes":["S","C","C","L","S","L","S","S","S","S","S","S","S","S","S","S","S","S","S","S","S","S","S","L","S","S","S","S","S","S","B","S","S","S","B","C","C","B","S","B","C","C","B","C","B"]}}}

Fetching Sample Data 2

URL

GET /symetry/rest/{cid}/dss/{dsname}/sample

HTTP Responses

HTTP Status Code
HTTP Status Message
Description

200

OK

Success.

HTTP Response Entity

HTTP Response Entity
Description

DataFrame

Sample Request/Response

Request:
GET url="http://charm:8080/symetry/rest/c1/dss/Smaato_Bids_20130812_CTR.csv/sample"

Response:
{"statusCode":"OK","statusString":"OK","values":{"dataframe":{"attributeNames":["adexchange","imp_width","imp_height","imp_btype","preference","pub","domain","site_base_url","category","device_ip","device_country","device_dma","device_state","device_city","zip","device_carrier","language","device_os","device_make","device_model","device_osv","lat","lon","restriction_bcat","restriction_badv","position","gender","user_keyword","user_yob","user_age","view_count","campaign","creative","creative_type","winner","bid_price","win_price","clicked","rtlm_ctr_score","rtdm","req_hour","req_day","category_count","bcat_count","badv_count"],"data":[["Smaato","320","50","","APP","New IT Solutions Ltd","","","Technology \u0026 Computing","157.55.32.83","US","","WASHINGTON","REDMOND","98052","","","Unknown","","","","47.67398834","-122.1215134","IAB7-28|IAB19-30|IAB22-1|IAB19-3|IAB17-18|IAB26|IAB25|IAB24|IAB9-9|IAB7-

(...)

,"attributeTypes":["S","C","C","L","S","L","S","S","S","S","S","S","S","S","S","S","S","S","S","S","S","S","S","L","S","S","S","S","S","S","B","S","S","S","B","C","C","B","S","B","C","C","B","C","B"]}}}

SymetryML Project Data Source API

Add Data Source to a SymetryML project

This API function lets you add a data source to a project.

URL

GET /symetry/rest/{cid}/projects/{pid}/addds/{dsname}

HTTP Responses

HTTP Status Code
HTTP Status Message
Description

200

OK

Success.

HTTP Response Entity

None

Remove Data Source from a SymetryML project

This API function lets you remove a data source from a project.

URL

GET /symetry/rest/{cid}/projects/{pid}/detachds/{dsname}

HTTP Responses

HTTP Status Code
HTTP Status Message
Description

200

OK

Success.

HTTP Response Entity

None

Learning Data from a Data Source

This API function lets you learn from a previously created data source.

URL

GET /symetry/rest/{cid}/projects/{pid}/dss/{dsname}/learn

HTTP Responses

HTTP Status Code
HTTP Status Message
Description

200

OK

Success. Includes an HTTP Location header specifying the location of the job ID that was created to handle the request. Example: {"statusCode":"ACCEPTED","statusString":"Job Created","values":{}}

HTTP Response Entity

None

Sample Request/Response

Request 1 create ds:
POST url=http://charm:8080/symetry/rest/c1/dss/create

Body 1:
{ENCRYPTED}

Request 2 learn ds:
POST url=http://charm:8080/symetry/rest/c1/projects/r1/dss/irisds/learn

BODY 2
{"errorHandling":1,"attributeNames":["sepal_length","sepal_width","petal_length","petal_width","sepal_lengt_b1","sepal_lengt_b2","sepal_width_b1","sepal_width_b2","petal_length_b1","petal_length_b2","petal_width_b1","petal_width_b2","Iris_setosa","Iris_versicolor","Iris_virginica"],"data":[],"attributeTypes":["C","C","C","C","B","B","B","B","B","B","B","B","B","B","B"]}

Response 2:
{"statusCode":"ACCEPTED","statusString":"Job Created","values":{}}

Response 2 Header:
Location: http://charm:8080/symetry/rest/c1/jobs/4

Job 3 Request:
GET url="http://charm:8080/symetry/rest/c1/jobs/4

Job 3 Response:
{"statusCode":"OK","statusString":"Job is finished","values":{"smlInfo":{"pid":"r1","isDirty":true,"modelsList":[],"modelTypeList":[],"attributeNames":["sepal_length","sepal_width","petal_length","petal_width","sepal_lengt_b1","sepal_lengt_b2","sepal_width_b1","sepal_width_b2","petal_length_b1","petal_length_b2","petal_width_b1","petal_width_b2","Iris_setosa","Iris_versicolor","Iris_virginica"],"attributeIndexes":[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14],"attributeTypes":["C","C","C","C","B","B","B","B","B","B","B","B","B","B","B"],"modelAssessments":{},"modelPredictions":{},"hash":-1,"categorySeparator":"$","type":"cpu","creationDate":1488220762857,"lastModificationDate":1488226030143,"loaded":true,"persisted":true}}}

Forgetting Data from a Data Source

This API function lets you forget data from a previously created data source.

URL

GET /symetry/rest/{cid}/projects/{pid}/dss/{dsname}/forget

HTTP Responses

HTTP Status Code
HTTP Status Message
Description

200

OK

Success. Includes an HTTP Location header specifying the location of the job ID that was created to handle the request. For example: {"statusCode":"ACCEPTED","statusString":"Job Created","values":{}}

HTTP Response Entity

None

Sample Request/Response

Prediction Based on a Data Source

After a model is built, you can use this API function to make predictions using a data source. This action can be performed on very large files if they reside on Amazon S3. A prediction file is created that contains the original rows, along with additional prediction information based on the type of model used.

URL

Request:
POST /symetry/rest/{cid}/projects/{pid}/dss/predict/{modelid}?indsname={indsname}&outdsname={outdsname}&impute=false
Request Body:
{
  "attributeNames":[{input attributes names}],
  "attributeTypes":[{input attribute types}]
}

Query Parameters

Parameter
Required / Optional
Description

indsname

Required

Data source to use as input file for prediction.

outdsname

Required

Data source to use as output file for prediction.

impute

Optional

Boolean parameter specifying whether to impute missing values.

HTTP Responses

HTTP Status Code
HTTP Status Message
Description

202

ACCEPTED

Success. Includes an HTTP Location header specifying the location of the job ID that was created to handle the request.

500

INTERNAL SERVER ERROR

If the server refuses to accept the new job, it notifies the client with the error "Job execution was refused by server."

HTTP Response Entity

None

Sample Request/Response

Request:
POST url="http://charm:8080/symetry/rest/c1/projects/irisTest/dss/predict/testLDA?indsname=dsin&outdsname=dsout&impute=false
Request Body:
{
  "attributeNames":["sepal_length","sepal_width","petal_length","petal_width"],
  "attributeTypes":["C","C","C","C"]
}

Response Header:
Location: http://charm:8080/symetry/rest/c1/jobs/2

Response:
{"statusCode":"ACCEPTED","statusString":"Job Created","values":{}}

Job Request:
GET url="http://charm:8080/symetry/rest/c1/jobs/2

Job Response:
{"statusCode":"OK","statusString":"Job is finished","values":{"dataframe":{"attributeNames":["resZ_Result","res_Result","normZ_Result","sepal_length","sepal_width","petal_length","petal_width","sepal_lengt_b1","sepal_lengt_b2","sepal_width_b1","sepal_width_b2","petal_length_b1","petal_length_b2","petal_width_b1","petal_width_b2","Iris_setosa","Iris_versicolor","Iris_virginica"],"data":[["-0.4559922893809558","1","0.2929126883636288","4.3","3","1.1","0.1","1","0","0","1","1","0","1","0","1","0","0"],["-0.5225734165408404","1","0.2965158290613048","

(…)

["-2.6526076600477295","1","0.4232667751179995","6.9","3.1","5.4","2.1","0","1","0","1","0","1","0","1","0","0","1"]]}}}

Encoder Data Source API

Updating an Encoder with a Data Source

This API function updates an Encoder with data from a data source.

URL

GET /symetry/rest/{cid}/encoders/{encodername}/learnds [body = DSInfo (encrypted)]

HTTP Responses

HTTP Status Code
HTTP Status Message
Description

202

ACCEPTED

Success. Includes an HTTP Location header specifying the location of the job ID that was created to handle the request.

500

INTERNAL SERVER ERROR

If the server refuses to accept the new job, it notifies the client with the error "Job execution was refused by server."

HTTP Response Entity

None

Sample Request/Response

Request:
POST url="http://charm:8080/symetry/rest/c1/encoders/enctest/learnds

Request Body:
emUmJ7LXKaICPww/dKRoMZo4y++JZksKFU3xx80mZ0E83ED6eCXhZ7PR1jzizIQg/uH19jVkSUWnDxul9+NIqZe7a9Yz5a8gW1kpRHJ7SyOYoo7oV90atrkMcZ73Jj0FRn53P81t8Q+7fCTwMYRfD52hjCbwtvdwFCbPbhPo1c9CAk9QLhtAFLDWVqblBeBXTPR8/0zfKJWtwj30Yr0gwqeTI3+BMOMEvH28WFWXo+wBdgkMBpXJsQH/zbPvVdCp9P+BJWp/E1Ju2hUzPO5c2k3/Dmqv3xWNmsQzEJWNMFTJnBG33hTUyTr/+j87NsM2e1luWvf1KaNzaatjS1ZZ9AQCp7gv48vMVrzEHs4ePiOFq5t0UElAu1kzFerhtNxEFmC9A90Gjt3FSLrEGgx5emZ0uogJa6m4nufbQnKUrUZL1sILmLfReZOqnPKp59HqjiiczpkPQ2vPfhmFpaM2RYaEHbqSpMGCnBO9Axij9ExPjR4X9aqHOzKw2yBba4Em

Response Header:
Location: http://charm:8080/symetry/rest/c1/jobs/4

Response:
{"statusCode":"ACCEPTED","statusString":"Job Created","values":{}}

Job Request:
GET url="http://charm:8080/symetry/rest/c1/jobs/4

Job Response:
{"statusCode":"OK","statusString":"Job is finished","values":{}}

Listing a Data Source Used by an Encoder

This API function lists the data source(s) that were used to update an Encoder.

URL

GET /symetry/rest/{cid}/encoders/{encodername}/dss

HTTP Responses

HTTP Status Code
HTTP Status Message
Description

202

ACCEPTED

Success. Includes an HTTP Location header specifying the location of the job ID that was created to handle the request.

HTTP Response Entity

HTTP Response Entity
Example

StringList

Sample Request/Response

Request:
GET url="http://charm:8080/symetry/rest/c1/encoders/enctest/dss

Response:
{"statusCode":"OK","statusString":"OK","values":{"stringList":{"values":["s12"]}}}

Data Source Job Status

When invoking a JobStatus for a job that was initiated for learning, forgetting, or making predictions based on a data source, the response might contain an entity. The following sections describe these cases.

Learning a Data Source

HTTP Responses

HTTP Status Code
HTTP Status Message
Description

200

OK

No entity.

202

ACCEPTED

Forgetting a Data Source

HTTP Responses

HTTP Status Code
HTTP Status Message
Description

200

OK

No entity.

202

ACCEPTED

Prediction Based on Data Source

HTTP Responses

HTTP Status Code
HTTP Status Message
Description

200

OK

DataFrame that contains a sample of the predictions (up to 128 lines). Because Amazon files can be very large, it is not possible to return the prediction result file in its entirety within a REST call. Use your favorite tool to fetch the prediction results from the data source (S3 or SFTP). The prediction result file contains all the original file columns, plus the additional prediction column for each row. Any additional columns depend on the type of model used to make the predictions.

202

ACCEPTED

PreviousPrediction APINextProject Data Source Logs

Last updated 2 years ago

To use a data source, create a JSON data structure described in that contains the fields in table .

- Spark Data Source = please see the section for the matrix of all possible data source names involving Spark Processing.

Hash Map Containing Additional Information Based on Data Source. Please consult the for details about this field.

Please consult for details on how to configure your Oracle OCI account so that the Amazon S3 Compatibility API can be used.

GCS Access Key

GCS Secret Key

Please be sure to consult the for additional Spark information - and more particularly the section.

Create your data structure and enter the appropriate information so that the SymetryML server can access it.

Send the encrypted string as part of the body to any REST service that requests a as the body.

The server decrypts the string using the client secret key and reconstructs the .

See - for code example on how to perform this encryption in Java, JavaScript, or Python.

Please note that in the following example the BODY is encrypted. Refer to section for encryption details.

Please note that in the following example the BODY is encrypted. Refer to section for encryption details.

This API function returns information about a data source. Because this information can be sensitive, the server encrypts it to ensure that the information will be returned safely, even when using HTTP. Use your SymetryML secret key to decrypt the information. For information about how to decrypt the response, see the section .

encrypted using the customer secret key.

This API function fetches a data source sample by specifying the data source information as part of the request body. The response returns up to 128 lines. This REST call needs a request body that contains an encrypted DSInfo data structure (see the section ).

See .

See the section previous endpoint and replace learn with forget in the url.

See

Job is not finished. Includes a entity. Using the current and maximum fields, you can compute the percentage of the job that was accomplished so far. "Maximum" refers to the size of the file to process and contains the approximate number of bytes processed by the job.

Job is not finished. Includes a JobInfo entity. For more information, see the section and also the section on . Using the current and maximum fields, you can compute the percentage of the job that was accomplished so far. "Maximum" refers to the size of the file to process and contains the approximate number of bytes processed by the job.

Job is not finished. Includes a entity. Using the current and maximum fields, it is possible to compute the percentage of the job that was accomplished so far. "Maximum" refers to the size of the file to process and contains the approximate number of bytes processed by the job.

https://docs.oracle.com/en-us/iaas/Content/Object/Tasks/s3compatibleapi.htm
Appendix B
Data Source Encryption
Data Source Encryption
Data Source Encryption
Data Source Encryption
Learn
HMAC
HMAC
Spark Data Source type
next section
Spark Installation Guide
below
Symetry Jobs
DSInfo
DSInfo
DSInfo
DSInfo
StringList
DSInfo
DSListingResponse
DataFrame
DataFrame
StringList
JobInfo
JobInfo
JobInfo
Additional SymetryML Configuration for Spark Support