Privacy Preserving Technologies

Trusted Execution Environment - TEE’s
Fully Homomorphic Encryption
MPC - Multi Party Computation
Differential privacy
Federated learning
Zero-Knowledge Proofs

How does the above technologies solve privacy, benefits of it, cons and pros, and current development state.

Trusted Execution Environment	Fully Homomorphic Encryption	Multi party computation	Differential privacy	Federated learning	Zero-knowledge proofs
It is like a magic box for computers - Keeps important information, like passwords or private data, safe and secret while it works with them, so no one can peek inside - even hackers!	Its like a magic envelope for computers. It lets them do math or processes on secret data without ever opening it or knowing what it says. When you get the result back, only you can open it and see the answer!

In short term, it allows computations to be performed directly on encrypted data without decrypting it. | It’s like a computation of secrets or private data (whatever you want to call) by multiple parties without sharing their data but looking for an answer whole together. It lets different people work together on a problem using their secret data without even sharing their secrets! | Lets people share information safely by adding a little noise or randomness, so no one can figure out personal details, but can still learn useful patterns from the data! | Helps in finding the most important data from a group without sharing any information. In machine learning context, lets different devices learn together without sharing their private data, so everyone’s information stay safe!

Two major categories of FL depend on the type of clients being used: cross-device and cross-silo. Cross-device FL involves training a standard global model by keeping all the training data locally on many devices with limited and unstable network connections, such as mobile phones or IoT devices.

On the other hand, cross-silo FL trains a global model on datasets distributed at different organizations and geo-distributed data centers. These datasets are prohibited from moving out of organizations and data center regions due to data protection regulations, operational challenges (such as data duplication and synchronization), or high costs. | Its like proving something you know some stuff and you prove it without revealing what it is, keeping your secret safe! | | It offers a level of protection against software attacks and assist in the control of access rights.

To enhance security, two trusted applications running in the TEE also do not have access to each other’s data as they are separated through software and cryptographic functions.

TEEs encrypt data both at rest and during computation, ensuring it is inaccessible to external processes, including the operating system or malicious actors.

We can use it to verify identity/proof of humanity without exposing raw biometric data, photos, or personal identifiers. For eg, facial scans or voice prints can be processed inside the TEE without ever leaving the secure environment.

More eg: Can also be used for deepfakes, sybil resistance, AI model integrity, etc. | How it solves privacy?

Protects data during processing, ensuring that no plaintext data is exposed.
Allows external servers to perform operations on encrypted data without learning its content.
Sensitive data can be processed while remaining encrypted, preserving privacy even when computations are outsourced to untrusted parties.
Suitable for scenarios where data can be processed remotely like cloud computing. | How it solves privacy?
Individual inputs remain confidential, and only the final computed result is revealed.
Traditional methods requires a trusted intermediary to perform computations on private data. MPC removes the need for such intermediaries.
After computations are completed in the process of MPC, participants combine the computed shares to reconstruct the final result. This result is obtained without exposing any individual's private input. | How it solves privacy?
The amount of noise added to a dataset is controlled by a privacy loss parameter, represented by the Greek letter "ɛ". A smaller value of "ɛ" indicates better protection, while a higher value indicates worse protection.
In this doc - https://georgian.io/what-is-differential-privacy/, they analysed driving information for every driver in California without being able to identify any individual driver.
To achieve differential privacy is through the Laplace mechanism, a mathematical formula that alters a controlled amount of data.

For example, imagine a simple yes or no survey. However, before respondents submit their answers, they flip a coin. If it’s heads, they submit their answers without alteration, but if it’s tails, they flip again. On the second coin toss, heads tells them to answer yes and tails means they answer no—regardless of their original answer.

Source:

https://digitalprivacy.ieee.org/publications/topics/differential-privacy-and-applications

| How it solves privacy?

FL keeps data on local devices instead of aggregating it on a central server. Only model updates, not raw data, are shared with the server.
FL exchanges model parameters (e.g., gradients) instead of raw data, reducing exposure of sensitive information.
FL often incorporates DP by adding noise to model updates, masking individual contributions and reducing re-identification risks.
FL allows sensitive industries like healthcare and finance to leverage AI without violating regulations like GDPR or HIPAA.
FL models can often be fine-tuned locally, ensuring interpretability and trust at the device level.

Eg: Smart home assistants | How it solves privacy?

In scenarios where multiple parties need to collaborate on data analysis without exposing their private data, ZKPs facilitate this process. For example, in healthcare, different organisations can verify a healthcare professional's eligibility to access certain patient records without revealing any specific patient information.

https://www.rtinsights.com/appreciating-zero-knowledge-proofs-navigating-the-world-of-digital-privacy/

ZKPs allow selective disclosure, where only specific pieces of information are shared while keeping the rest private. This is useful in identity verification.
ZKPs allow a verifier to confirm that computations were performed correctly without needing to know the inputs.
Advanced forms of ZKPs (e.g., lattice-based proofs) are being explored for quantum-resistant cryptography. | | Pros:
Keeps sensitive data and computations secure from external access including malware, and operating system.
Enables compliance with data protection regulations.
It offers faster processing compared to other privacy-preserving techniques like MPC or homomorphic encryption.

Cons:

Hardware dependency
Must trust the hardware manufacturer to implement it securely, if vulnerabilities discovered in their chips then problems can arise.
Can be a challenge to deploy at scale.
Many implementations are closed systems - not open source. | Pros:
Provides end-to-end encryption during computation.
Eliminates the need to trust external processors with plaintext data.
Enables secure data analytics and machine learning without compromising privacy.

Cons:

Implementation complexity like developing and deploying FHE systems.
Requires large key sizes and more memory.
Practical Adoption is still limited because of alternative privacy-preserving technologies that offer better performance.

Source: https://baffle.io/blog/advantages-and-disadvantages-of-homomorphic-encryption-2023/ | Pros:

MPC eliminates single points of failure by distributing private key management across multiple parties, reducing the risk of theft or loss.
Allows parties to perform computations on their data without revealing it to each other or a third party.
Facilitates compliance with data protection regulations by ensuring that personal data remains encrypted during processing.

Cons:

Generating random numbers for security can slow down run time.
Secret sharing requires communication and connectivity between all participants, which can lead to higher costs.
MPC wallets are more difficult to set up and maintain than single-signature wallets.
Transactions through MPC wallet is slower than single-signature wallet.

Source:

https://solulab.com/what-is-a-multi-party-computation-mpc-wallet

https://inpher.io/technology/what-is-secure-multiparty-computation/ | Pros:

Differential privacy algorithms are built to be chained. The theoretical foundations of differential privacy include a good explanation of how multiple differential privacy algorithms can be layered on top of each other. If one offers some protection measured by alpha and the other protection measured by beta, then together they offer alpha plus beta. In the best cases, the algorithms can be joined like Lego bricks.
Differential privacy offers deniability. People can relax when sharing their data because the approach gives them deniability. The algorithms, like the randomized response, give them a cover story. Perhaps that information was just a random lie concocted by the algorithm.
The differential privacy algorithms don’t just add noise. They illustrate and codify the tradeoffs between accuracy and privacy. They give us a knob to adjust the fuzzing so it meets our needs. The algorithms let us set a privacy budget and then spend it as necessary through the various stages of data processing. If you remember calculus, the process is trying to emulate differentiation and calculate the slope of the privacy loss.

Cons:

Noise can have unknown effects. Machine learning algorithms can seem like magic and just like real magicians, they often they refuse to reveal the secret of their tricks and just why their model filled with magic numbers is making the decision. The mystery is compounded when the algorithms are fed fuzzed data because it’s often impossible to know just how the changes in the data affected the outcome. Some simple algorithms like finding the mean are easy to control and understand, but not the ones in magical black boxes.
Deniability may not be enough. Just because some of the data might be random or wrong doesn’t make it easier to answer some questions truthfully—and differential privacy algorithms require some answers to be accurate. It’s not clear how people feel about truthful information leaking out, even if it’s not immediately clear who is the owner. Emotional responses may not be logical, but humans are not always logical. Their feelings about privacy are not easy to translate into algorithms.
Differential privacy vision doesn’t offer firm guarantees, just statistical ones that the difference between the real data and the fuzzy data is bounded by some threshold governed by epsilon. So, some real information will leak out and often the noisy version can be close, but at least we have some mathematical bounds on just how much information is leaking.

Source:

https://www.csoonline.com/article/570203/differential-privacy-pros-and-cons-of-enterprise-use-cases.html

https://digitalprivacy.ieee.org/publications/topics/differential-privacy-and-applications | Pros:

FL is not only a training process, but it also defines the whole infrastructure to prepare such process on client devices and aggregate AI model updates to perform the best accuracy.
In classical model training, we have to send client data to a server (very often really big datasets). In Federated Learning we are just sending a small amount of AI model numbers.
More secure apps – we are not transferring user data to a server.
Better model accuracy because of having access to various data.
Another big benefit of FL is always updated model for never-seen data. Let’s suppose, we bought an AI FL product, that has another company. Your dataset will update the global model (which You will receive in an update) as well as other companies. We can compare it to humans sharing their knowledge with others. Sharing model without sharing data is each other advantage.

Cons:

FL is worth to use only if end-user device has various data and data should not be transferred out of the device
We don’t have FL platform for developers, so to provide fully FL for project we have to build our own or wait for one of big companies like Google or Amazon to create one.
AI Model verification can become hard because of training “without” data. | Pros:
Solves privacy by proving knowledge of information without revealing it.
It can reduce computation load by increasing the throughput and scalability of blockchain systems by allowing for the verification of transactions without the need to process all underlying data.
Can be used in diverse applications like voting systems, identity verification, etc.

Cons:

Computational complexity is a problem because it requires significant computation resources which leads to increased processing time and energy consumption.
Offers security and privacy but might be susceptible to threat like quantum computing.

Source: https://blockchain.smartosc.com/pros-and-cons-of-zero-knowledge-proof/ | | Current development state:

Software guard extensions introduced before to create isolated enclave for secure computation but Intel has deprecated SGX in its 11th and 12th generation Core processors, focusing on server-grade solutions.

Source: https://en.wikipedia.org/wiki/Software_Guard_Extensions

Secure Encrypted Virtualisation (SEV):

Enables memory encryption for virtual machines, ensuring data confidentiality. Recent vulnerabilities - BadRAM attack, have been identified too.

Source: https://arstechnica.com/information-technology/2024/12/new-badram-attack-neuters-security-assurances-in-amd-epyc-processors/?utm_source=chatgpt.com

Enarx:

A project aiming to provide a platform-agnostic deployment of applications into TEEs without requiring code modification.

Source: https://next.redhat.com/2019/12/02/current-trusted-execution-environment-landscape/ | Current development state:

For developers, a variety of open-source libraries are available for public use and contributions, including OpenFHE, TFHE and HEAAN, as well as compilers. These help developers, convert source code written in C++ programming language to implement FHE in their own applications.
National Institute of Standards and Technology, are supporting developments for the successful adoption and recognition of FHE. The NIST's current call for proposals for cryptographic threshold schemes to achieve a secure distribution of trust includes FHE. The NIST is looking forward to publishing selected schemes as recommendations, potentially leading to future standardization efforts in a second step. Additionally, a high-level NIST report will outline specific aspects of FHE standardization.
While homomorphic encryption and other PETs show great promise, challenges persist in their widespread adoption. As mentioned, the absence of widely recognized standards and regulatory certainties can hinder the interoperability and compatibility of PETs, making it difficult for organizations to integrate them into existing systems or making companies hesitant to adopt PETs due to potential compliance issues. New hardware solutions for homomorphic encryption currently under development, will eventually make the technology widely accessible. In particular, the U.S. Defense Advanced Research Projects Agency is sponsoring the development of hardware chips specifically designed to accelerate the implementation of FHE algorithms under its "Data Protection in Virtual Environments" program.

Sources: https://dl.acm.org/doi/abs/10.1145/3560810.3565290

https://iapp.org/news/a/the-latest-in-homomorphic-encryption-a-game-changer-shaping-up

https://www.iso.org/committee/45306.html

| Current development state:

Companies like Sepior, Zengo, Keyless, Unbound Tech, and Finema are developing MPC products for different use cases.
Duality Technologies released a platform in 2022 that allows businesses to share and analyse sensitive data, and Google Cloud also released Confidential Space in 2022 to help with joint data analysis and ML model training using MPC.
The global MPC market was valued at $794.1 million in 2023 and is projected to grow to 11.8% from 2024 to 2030.
In 2020, companies specializing in MPC established the MPC Alliance to promote MPC technology.

Source:

https://www.grandviewresearch.com/industry-analysis/secure-multiparty-computation-market-report

https://chain.link/education-hub/secure-multiparty-computation-mcp

https://www.marketsandmarkets.com/Market-Reports/secure-multiparty-computation-market-67797344.html

| Current development state:

Google recently shared a collection of differential privacy algorithms in C++, Go and Java.

Link: ‣

Microsoft has open-sourced a Rust-based library with Python bindings called SmartNoise to support machine learning and other forms of statistical analysis.

Link: ‣

TensorFlow, one of the most popular machine learning tools, offers algorithms that guard privacy for some data sets. Their work is part of OpenDP, a larger drive to create an integrated collection of tools under an open-source umbrella with broad governance.

Link: https://blog.tensorflow.org/2019/03/introducing-tensorflow-privacy-learning.html

Some high-profile projects are also using differential privacy technology. The answers to the US Census for 2020, for instance, must remain private for 72 years according to the law and tradition. However, many people want to use the Census data for planning, budgeting, and making decisions like where to put a new chain restaurant. So, the Census Bureau distributes its statistical summaries. This year, to protect the privacy of people in small blocks, it will inject noise to add protection using its Disclosure Avoidance system.

More info on this: https://www.csoonline.com/article/570203/differential-privacy-pros-and-cons-of-enterprise-use-cases.html | Current development state:

The federated learning market was valued at USD 127.75 million in 2023 and is projected to reach USD 341.92 million by 2032, growing at a CAGR of 11.60% from 2024 to 2032.

Source: https://www.snsinsider.com/reports/federated-learning-market-3597

Approximately 67% of organizations are currently exploring or implementing federated learning strategies to enhance their data privacy and security measures. In healthcare, around 80% of organizations aim to utilize federated learning for secure patient data analysis.

Source: https://www.snsinsider.com/reports/federated-learning-market-3597

Companies like Google, Apple, and Meta are leading the way in deploying production systems that leverage federated learning techniques.

Source: https://arxiv.org/html/2410.08892v1

| Current development state:

ZKP’s projected to reach $10 billion by 2030. Growth is driven by the rising demand for privacy preserving technologies.

https://www.protocol.ai/protocol-labs-the-future-of-zk-proofs.pdf

ZKPs are being integrated into machine learning applications, allowing for verification of model outputs without revealing the underlying data or model parameters.
ZKPs are being integrated into machine learning applications, allowing for verification of model outputs without revealing the underlying data or model parameters.

https://www.protocol.ai/protocol-labs-the-future-of-zk-proofs.pdf

ZKPs are also finding applications in various fields beyond finance, including secure voting systems, identity verification, and confidential data sharing in healthcare.

https://arxiv.org/abs/2408.00243

There is a growing emphasis on hardware acceleration techniques to enhance the efficiency of ZKP computations. The development of specialised hardware for proof generation aims to make ZKPs more accessible for real-time applications, particularly in environments like IOT.

Some Good Reads

https://en.wikipedia.org/wiki/Trusted_execution_environment

https://dualitytech.com/glossary/trusted-execution-environment/

https://digitalprivacy.ieee.org/publications/topics/differential-privacy-and-applications

Questions:

Does internet access exist in TEE? If not, how does it work?
What is the best service provider for TEE and MPC, e.g., Nillion? Research about them and identify the best option.
What is zkTLS as a technology? What innovations are ongoing, and what are the challenges? Read about Opacity and identify the available service providers.
Has Reclaim launched any changes in their AI schema - if the code changes in frontend then how fast interoperability works for them? Can we use reclaim to solve the interoperability problem?
What is streaming zkTLS?
Interoperability in zkTLS—are there other alternatives? Check and study across available options. Study Reclaim more.
Should we use Reclaim or Opacity?