SymetryML6.1
  • Introduction
  • Guides
    • Onboarding Guide
    • Technical Requirements
    • Admin User Guide
    • Installation Guide
      • Installation Guide - GPU
      • Installation Guide - Spark
  • SymetryML GUI
    • ML Toolkit
      • The SymetryML Difference
      • Data Mining Lifecycle
      • SymetryML Concepts
      • Data Sources
      • Streams
      • Encoders
      • Projects
      • Models
    • Sequence Models
    • SymetryML Federated Learning
      • Creating the Federation
      • Load data to local project
      • Requesting Federation Information from Admin Node
      • Joining a Federation with a peer node
      • Federated Data & Modelling
      • Appendix
    • DEM Generator
  • SymetryML Rest Client
    • REST API Reference Guide
      • SymetryML REST API Security
      • SymetryML JSON API Objects
      • Encoder Object REST API
      • SymetryML Projects REST API
      • About Federated Learning
      • Hipaa Compliance and Federated Learning
      • Federated Learning API
        • Federated Learning Topologies
        • Federated Learning with Nats
        • Federated Learning with AWS
        • Fusion Projects
      • Exploration API
      • Modeling API
      • Exporting and Importing Model
      • Third Party Model Rest API
      • SymetryML Job Information
      • Prediction API
      • Data Source API
      • Project Data Source Logs
      • Stream Data Source API
      • AutoML with SymetryML
      • Transform Dataframe
      • Select Model with SymetryML
      • Auto Select with SymetryML
      • Tasks API
      • Miscellaneous API
      • WebSocket API
      • Appendix A JSON Data Structure Schema
      • Appendix B Sample Code
  • SymetryML SaaS
    • SaaS Homepage
    • SaaS Dashboard
    • SaaS Account
    • SaaS Users
    • SaaS Licence
Powered by GitBook
On this page
  • Safe Harbor
  • The following is true if SMPC is not Enabled
  • Additional Statistical Tests
  • When SMPC is Not Enabled
  • When Hipaa Compliance is not Needed
  1. SymetryML Rest Client
  2. REST API Reference Guide

Hipaa Compliance and Federated Learning

PreviousAbout Federated LearningNextFederated Learning API

Last updated 2 years ago

By Default, SymetryML projects are Hipaa compliant. This means that additional restrictions will be enforced for them.

Safe Harbor

Please note that it is assumed that the data is Safe Harbor Compliant. In the absence of expert determination that no health information about individual patient can be identified in your data, the Safe Harbor method must be used to make sure that no patient personal health information can be leaked. Please consult if your organization needs more detail of the Safe Harbor method.

The following is true if SMPC is not Enabled

Please consult the for further details on enabling Secure Multi-Party Computation (SMPC) for your federation.

  • When adding a data source to a Hipaa compliant project, one must specify which column(s) uniquely identify a patient. This means that at least one of column of the data source must have an attribute type equals to 'P' which stands for TYPE_PATIENT_ID.

  • When a federation node is syncing with other nodes in the federation it will need to have at least 25 unique patients in the PSR for the sync to be sent.

  • Additional Statistical tests will be performed on each feature between each sync to ensure that no information is leaked. If one of these tests do not pass a node will not send their sync. These tests are protecting against information that a malicious user could possibly compute by comparing the current synchronization PSR with the previous synchronization PSR for a given peer.

Additional Statistical Tests

This section describes the univariate statistical test that are performed on each attribute of your dataset when Hipaa compliance is enabled in a federated learning project.

T-Test

Suppose xxx is one of the attributes. We have a starting sample 1 and a desired update, sample 2.

The following ttt-test is used

t=μ1−μ2S12n1+S22n2,t = \frac{\mu_1 - \mu_2}{\sqrt{\frac{S_1^2}{n_1} + \frac{S_2^2}{n_2}}},t=n1​S12​​+n2​S22​​​μ1​−μ2​​,

where the sample variance SjS_{j}Sj​ (j=1 or 2)(j = 1 \text{ or } 2) (j=1 or 2) is defined as

Sj=1nj∑i=1nj(xi−μj)2S_{j}= \frac{1}{n_j}\sum_{i=1}^{n_{j}}(x_{i} - \mu_j)^{2}Sj​=nj​1​i=1∑nj​​(xi​−μj​)2

and the sample mean μj\mu_jμj​ is defined by

μj=1nj∑i=1njxi.\mu_j = \frac{1}{n_j} \sum_{i=1}^{n_{j}} x_{i}.μj​=nj​1​i=1∑nj​​xi​.

The test will insist that P[T>∣t∣]<ϵP[T > |t|] < \epsilonP[T>∣t∣]<ϵ where TTT is a ttt-distributed variable whose degrees of freedom will be estimated by

df=(S12n1+S22n2)2(S12n1)2n1−1+(S12n2)2n2−1.df = \frac{\left( \frac{S_1^2}{n_1} + \frac{S_2^2}{n_2} \right)^2}{ \frac{\left(\frac{S_1^2}{n_1}\right)^2}{n_1 - 1} + \frac{\left(\frac{S_1^2}{n_2}\right)^2}{n_2 - 1} }.df=n1​−1(n1​S12​​)2​+n2​−1(n2​S12​​)2​(n1​S12​​+n2​S22​​)2​.

Where one if the standard deviation vanishes, for example when S2=0S_2 = 0S2​=0, then a one-sample ttt-test is performed - and then evaluated against a ttt-distribution with n1n_{1}n1​ degrees of freedom:

t=μ1−μ2S1n1.t = \frac{\mu_1 - \mu_2}{\frac{S_1}{\sqrt{n_1}}}.t=n1​​S1​​μ1​−μ2​​.

The ttt-test will ensure that we accept the NULL-hypothesis that the existing population and the PSR update were drawn from populations of the same mean at 95% confidence. Where both variances vanish, the update should only be accepted when μ1=μ2\mu_1 = \mu_2μ1​=μ2​.

The PSR of both sample 1 and 2 are all is needed to extract the necessary values for SjS_jSj​, μj\mu_jμj​ and njn_jnj​ regarding a sync.

Kolmogorov-Smirnov Test

We also carry out a Kolmogorov-Smirnov test to compare the distribution of the update to the previous distribution. This is a non-parametric test to compare two distributions. The idea here is that we would like any update “to represent the overall distribution and not special features of the individuals in the update.”

Suppose again xxx is one of the attributes, but now assume sorted ascending values x1,...,xnx_1, . . . , x_nx1​,...,xn​. An update adds mmm new records, also sorted in an ascending way, x1′,...,xm′x^{′}_1, . . . , x^{′}_mx1′​,...,xm′​. Typically m≪nm ≪ nm≪n. We accept the update if “it looks like a random sample from the existing distribution”, but reject it otherwise as it might then convey information specific to the individuals in the sample. The statistics to achieve that are covered in the Subsections below.

We define:

Φ(t)=max⁡{in∣1≤i≤n,xi≤t}=1n∑i=1n1(−∞,t](xi)\Phi(t) = \max \biggl\{ \frac{i}{n} \bigg| 1 \leq i \leq n, x_i \leq t\biggl\} = \frac{1}{n} \sum_{i=1}^{n} 1_{(- \infty, t]} (x_i)Φ(t)=max{ni​​1≤i≤n,xi​≤t}=n1​i=1∑n​1(−∞,t]​(xi​)
Φ′(t)=max⁡{im∣1≤i≤m,xi′≤t}=1n∑i=1n1(−∞,t](xi′).\Phi'(t) = \max \biggl\{ \frac{i}{m} \bigg| 1 \leq i \leq m, x'_i \leq t\biggl\} = \frac{1}{n} \sum_{i=1}^{n} 1_{(- \infty, t]} (x'_i).Φ′(t)=max{mi​​1≤i≤m,xi′​≤t}=n1​i=1∑n​1(−∞,t]​(xi′​).

We then define:

D=supt∣Φ(t)−Φ′(t)∣D = \mathsf{sup}_t \Big|\Phi(t) - \Phi'(t) \Big|D=supt​​Φ(t)−Φ′(t)​

and we require that

Dn,m<K(α,n,m)=c(α)n+mnmD_{n,m} < K(\alpha, n, m) = c(\alpha) \sqrt{\frac{n +m}{nm}}Dn,m​<K(α,n,m)=c(α)nmn+m​​

with c=1.358c = 1.358c=1.358 for a 95% confidence level, as per the formula

c(α)=−ln⁡(α2)⋅12c(\alpha) = \sqrt{-\ln(\frac{\alpha}{2} ) \cdot \frac{1}{2}}c(α)=−ln(2α​)⋅21​​

The KS-test of the PSR update against the existing population must then be smaller than K(0.05,n.m)K(0.05, n. m)K(0.05,n.m).

Integral Test

Given that m≪nm \ll nm≪n (the update to a PSR will typically contain far fewer records than the existing data it), we cannot expect the KS-statistic to be zero or arbitrarily close to zero, because a step function with m<nm < nm<n steps cannot generally perfectly approximate a step function with nnn steps.

The integral of the difference can in principle become arbitrarily small:

I=1(xn−x1)(xm′−x1′)∫−∞∞(Φ(t)−Φ′(t))dtI = \frac{1}{\sqrt{\left(x_n - x_1\right)\left(x^{'}_m - x^{'}_1\right)}}\int_{-\infty}^{\infty}\left(\Phi(t) - \Phi^{'}(t)\right)dtI=(xn​−x1​)(xm′​−x1′​)​1​∫−∞∞​(Φ(t)−Φ′(t))dt

This is only defined when both variables are non-constant (xn>x1x_n > x_1xn​>x1​, xm′>x1′x^{′}_m > x^{′}_1xm′​>x1′​)

This test will pass if the integral III satisfies one of the following conditions depending on the value of mmm:

\begin{array}{rcl} \mbox{if } m \le 10 & \Longrightarrow & I < 0.2 \\ \mbox{if } 11 \le m \le 25 & \Longrightarrow & I < 0.1 \\ \mbox{if } m > 25 & \Longrightarrow & I < 0.05 \end{array}

The distribution information for the attributes provided along the PSR for xxx and x′x^{'}x′ enable these calculations for the Kolmogorov-Smirnov and Integral tests to be performed by SymetryML.

When SMPC is Not Enabled

Before the SMPC protocol is run to exchange all the peer's PSR, a verification will be made to ensure that the resulting PSR would contain at least 25 patients. No additional statistical test will be run.

When Hipaa Compliance is not Needed

If your project does not require to be Hipaa compliance, please specify fed_enforce_hipaa_compliance=false as part of the Federation Key Map Value passed in the message body when using the REST API to create a federation. Please consult either '' or '' sections for details.

https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html
Create AWS Backed Federation
section on SMPC
Create NATS Backed Federation