About Federated Learning
Last updated
Last updated
This section will give background needed in order to understand the business object behind the Federated Learning functionality. The next section will go into the details about the REST API itself.
SymetryML projects can easily be merged together. That is, imagine you have 2 projects: (a) a project p1 that processed dataset d1 and (b) a project p2 that processed dataset d2. You can merge p2 into p1 and the resulting p1 project will be the same as if p1 would have processed the datasets d1 and d2. This capability is leveraged in a SymetryML Federated project. A federation consists of n Symetry Projects that each process their own private data and share their results at a given interval. This can be seen in the following picture:
In order to fully understand the federated learning REST API one needs to understand a few concepts / terminology.
peers or node
A node is a member of a federation. It’s basically a Federated Symetry Project.
federated project
A Federated Symetry Project contains 2 symetry projects. One local project
and one federated project
. The federated project is rebuilt from time to time according the the Federation Schedule
defined by the federation admin
.
local project
A Federated Symetry Project contains 2 projects. One local project
and one federated project
. The local project
is responsible to process data that is local to this project.
Federation
A federation is a set of nodes
that communicate and share Symetry project information
Federation Info
Information that describes a federation.
Federation Admin
The user who creates a federation automatically becomes the federation admin.
Federation Contract
Federation secret key
An AES secret key that is used to encrypt communication between peers/nodes of a federation.
Federation Schedule
Peers
in a federation
will send updates to other peers
according to a schedule. This schedule is defined by the federation admin
when a federation
is created. Example of schedule:
- m30 : synchronize every 30 mins
- h3 : synchronize every 3 hours
- d7 : synchronize every 7 days
scheduled synchronization message
A periodic message sent by a peer
to other peers
in a federation
. The period is defined by the federation schedule
.
In the AWS implementation, under the hood, the federation service uses many AWS services:
Each federation node has an AWS SQS queue to receive messages
A Federation has an AWS SNS topic that allows fanout messages to be sent to multiple SQS queues.
Nodes in the federation use messages to the SNS topic to communicate with other nodes
SNS messages are lightweight and contain pointers to Amazon S3 files that are used to temporarily store message content.
AWS STS credentials are used to allow other users to access a user’s file on S3.
The following figure illustrates this:
NATS based federation use the NATS 'connective technology' to create a federation. For more details on NATS please consult www.nats.io. Under the hood SymetryML uses NATS to send message as well as synchronization message between all the peers in the federation.
The user who creates a federation will become the administrator of it.
In order to join a federation one must:
As explained previously, SymetryML’s speed in machine learning and real-time capabilities rely on its proprietary statistical representation - the PSR. Once constructed from a new dataset, the PSR allows for two main functionalities:
supervised and unsupervised machine learning and
various exploration APIs.
It turns out that some of these exploration API can be used to enforce quality on the data that peers participating in a federation contribute. Of course, data is never shared directly, only knowledge of this data is shared via the PSR. But this knowledge is sufficient to enforce rules like: "enforce that at least 40% of the rows with positive cancer are female" or "enforce that at least 500 example of fraud is part of this dataset", etc...
This enforcement is done via what we can 'PSR Contract'. A PSR Contract is a list of rules to be enforced on a PSR for it to be validated. These rules are effectively Boolean predicates that evaluate to true or false and for a PSR contract to be validated, all its rules need to evaluate to true.
Federation PSR Contracts are defined with the following Backus-Naur notation as well as the following table that describes the individual function that can be used in a PSR Contract.
F1 / F2 means 'Feature 1 type' and 'Feature 2 type'
C means Continuous Type
B means binary Type
COUNT
C|B
How many time a feature was seen
MEAN
C|B
The mean value of a features
STDDEV
C|B
The standard deviation of a features
VARIANCE
C|B
the variance of a feature
STDDEV_UNBIASED
C|B
The unbiased standard deviation of a features
VARIANCE_UNBIASED
C|B
the unbiased variance of a feature
COVAR
C|B
C|B
The covariance of 2 features
LINCORR
C|B
C|B
The linear correlation of 2 features
COND_STDDEV
C|B
B
Stddev of feature 1 when Feature 2 is '1' or true
COND_VARIANCE
C|B
B
Variance of feature 1 when Feature 2 is '1' or true
COND_STDDEV_UNBIASED
C|B
B
Unbiased stddev of feature 1 when Feature 2 is '1' or true
COND_VARIANCE_UNBIASED
C|B
B
Unbiased variance of feature 1 when Feature 2 is '1' or true
COMPL_COND_STDDEV
C|B
B
Stddev of feature 1 when Feature 2 is '0' or false
COMPL_COND_VARIANCE
C|B
B
Variance of feature 1 when Feature 2 is '0' or false
COMPL_COND_STDDEV_UNBIASED
C|B
B
Unbiased stddev of feature 1 when Feature 2 is '0' or false
COMPL_COND_VARIANCE_UNBIASED
C|B
B
Unbiased variance of feature 1 when Feature 2 is '0' or false
PCT_OF_TRUE
B
B
Percentage of occurrence with Feature1 is 1 or true and feature2 is '1' or true
PCT_OF_FALSE
B
B
Percentage of occurrence with Feature1 is 1 or true and feature2 is '0' or false
NUM_OCCURENCE_WHEN_TRUE
B
B
Number of occurence when Feature1 is 1 or true and feature2 is '1' or true
NUM_OCCURENCE_WHEN_FALSE
B
B
Number of occurence when Feature1 is 1 or true and feature2 is '0' or false
MEAN_WHEN_TRUE
C
B
Mean of feature 1 when Feature 2 is '1' or true
MEAN_WHEN_FALSE
C
B
Mean of feature 1 when Feature 2 is '0' or false
Here is a small example with the Iris data set. For a PSR Contract to be valid all the rows must evaluate to TRUE.
fed_psr_contract_snd_fail_action
fed_psr_contract_snd_fail_action_block
- default
fed_psr_contract_snd_fail_action_allow
fed_psr_contract_rcv_fail_action
fed_psr_contract_rcv_fail_action_block
- default
fed_psr_contract_rcv_fail_action_allow
Another functionality enabled by SymetryML PSR technology is 'peer exploration'. It allows to use SymetryML whole suite of exploration APIs against the PSR of a peer. This can be used to perform various univariate and bivariate comparisons between different peer PSRs without ever seeing the raw data of that peer.
The PSR allows to share certain summary features of the data without ever sharing the data. However, in order for the PSR not to be invertible - that is not allow for the reconstruction of the original data from the PSR - it needs to have processed a minimum number of rows. This minimum threshold depends on the number of attributes and equals the following:
Minimum number of rows = Number of Attributes + 5
If this minimum is not meet on a given peer at the time of synching then the peer will not share its current PSR with the other nodes in a federation. The same logic appers for incremental synchronization. That is the delta of each sync - or the amount of new data in a PSR since the last synchronization - must follow this rule for the synchronization to be allowed.
This can be a limitation for some federations where each peer do not have a lots of data. To circumvent this limitation, it's possible to use secure multi party computation when peers share their PSR. The protocol will only complete if the resulting PSR is not invertible.
Besides creating and joining a federation via rest endpoints, other operations are available. The following table lists all available rest endpoints for federated learning. The following functionality of normal SymetryML projects is available in Federated Project
Features hashing is not available
This rest endpoint is used to create a new federation. The user performing this operation will become the owner of the federation.
This is a map of properties for a federation.
Return the federation information encrypted with a password. This is needed in order to share federation information with other peers that the federation admin wants to invite to join the federation. The response will contain a token that can only be used once.
This rest endpoint allows a peer to join an existing federation.
This endpoint instructs your federated project to start pulsing
, that is the project will periodically poll for messages from other nodes in the federation as well as sending its scheduled synchronization message.
Stop synchronizing with the federation
Returns the error log for this project. Since many messages between nodes happen asynchronously, this allows the user to see if there was an error while communicating with the other peers in the federation.
This returns a log of when this federated project was updated.
For AWS based federations, this will return information about AWS SNS topic, SNS subscriptions as well as SQS queues. This is for troubleshooting purposes.
A set of boolean rules used to enforce quality of individual peer's PSR. Please see the for details.
A SymetryML Federation can use either Amazon services or in the backend to transmit the various messages to support its functionality.
Peers can authenticate to the NATS network by either using user/password combination or token. Please consult for more details.
Make sure that your clock is correctly synched using a ntp service or something similar. If a computer’s clock, in a federation, is not correctly synched it will have problems receiving messages from other nodes as the service will ignore many messages because of the discrepancy between the time a message was sent and the internal clock of the computer receiving the message. Those errors could be seen using the rest endpoint.
Receive one-time encrypted federation info
along with the password to decrypt the message. This can be done over email, Skype or any other means that allows transferring some base64 encrypted text. The federation administrator can get this encrypted federation info using the rest endpoint
Invoke the rest point to join the federation () with the encrypted message and the password received from the federation admin. This message is also to be encrypted using the user secret key
.
Upon successful result from step 3, one can now start syncing with other nodes in the federation. This is done by invoking the rest endpoint.
Another example using multiple predicates on each line which is permitted per the :
PSR Contract can be evaluated by each peer at two times: First, when sharing their own PSR with other peers in a federation and second when receive other peer's PSR. It's possible to control what SymetryML does when a validation failure occurs at both these times. This is specified when creating / joining a federation by each individual peers by specifying the following parameters inside the Federation Key Map Value, please consult the sections mentioned in for details:
If a particular peer wishes to block such functionality please consult to learn how to disable / enable this functionality.
Federated Learning with SMPC can be enabled by simply adding a key value pair the fed_use_smpc
= true
inside the Federation Key Map Value when an administrator creates a federation please consult the sections mentioned in for details.
By default Random Forest model are disabled. It can be enabled by change the SymetryML server configuration. For details please see the rtlm.option.sml.fed.strict.mode
key in section.
If your project has more than 2000 attributes you should be careful on how frequently you sync your projects. Please consult the section for more information.