Hipaa Compliance and Federated Learning
Last updated
Last updated
By Default, SymetryML projects are Hipaa compliant. This means that additional restrictions will be enforced for them.
Please note that it is assumed that the data is Safe Harbor Compliant. In the absence of expert determination that no health information about individual patient can be identified in your data, the Safe Harbor method must be used to make sure that no patient personal health information can be leaked. Please consult if your organization needs more detail of the Safe Harbor method.
Please consult the for further details on enabling Secure Multi-Party Computation (SMPC) for your federation.
When adding a data source to a Hipaa compliant project, one must specify which column(s) uniquely identify a patient. This means that at least one of column of the data source must have an attribute type equals to 'P'
which stands for TYPE_PATIENT_ID
.
When a federation node is syncing with other nodes in the federation it will need to have at least 25 unique patients in the PSR for the sync to be sent.
Additional Statistical tests will be performed on each feature between each sync to ensure that no information is leaked. If one of these tests do not pass a node will not send their sync. These tests are protecting against information that a malicious user could possibly compute by comparing the current synchronization PSR with the previous synchronization PSR for a given peer.
This section describes the univariate statistical test that are performed on each attribute of your dataset when Hipaa compliance is enabled in a federated learning project.
Suppose is one of the attributes. We have a starting sample 1 and a desired update, sample 2.
The following -test is used
where the sample variance is defined as
and the sample mean is defined by
The test will insist that where is a -distributed variable whose degrees of freedom will be estimated by
Where one if the standard deviation vanishes, for example when , then a one-sample -test is performed - and then evaluated against a -distribution with degrees of freedom:
The -test will ensure that we accept the NULL-hypothesis that the existing population and the PSR update were drawn from populations of the same mean at 95% confidence. Where both variances vanish, the update should only be accepted when .
The PSR of both sample 1 and 2 are all is needed to extract the necessary values for , and regarding a sync.
We also carry out a Kolmogorov-Smirnov test to compare the distribution of the update to the previous distribution. This is a non-parametric test to compare two distributions. The idea here is that we would like any update “to represent the overall distribution and not special features of the individuals in the update.”
Suppose again is one of the attributes, but now assume sorted ascending values . An update adds new records, also sorted in an ascending way, . Typically . We accept the update if “it looks like a random sample from the existing distribution”, but reject it otherwise as it might then convey information specific to the individuals in the sample. The statistics to achieve that are covered in the Subsections below.
We define:
We then define:
and we require that
with for a 95% confidence level, as per the formula
The KS-test of the PSR update against the existing population must then be smaller than .
Given that (the update to a PSR will typically contain far fewer records than the existing data it), we cannot expect the KS-statistic to be zero or arbitrarily close to zero, because a step function with steps cannot generally perfectly approximate a step function with steps.
The integral of the difference can in principle become arbitrarily small:
This is only defined when both variables are non-constant (, )
This test will pass if the integral satisfies one of the following conditions depending on the value of :
The distribution information for the attributes provided along the PSR for and enable these calculations for the Kolmogorov-Smirnov and Integral tests to be performed by SymetryML.
Before the SMPC protocol is run to exchange all the peer's PSR, a verification will be made to ensure that the resulting PSR would contain at least 25 patients. No additional statistical test will be run.
If your project does not require to be Hipaa compliance, please specify fed_enforce_hipaa_compliance=false
as part of the Federation Key Map Value passed in the message body when using the REST API to create a federation. Please consult either '' or '' sections for details.