Generating synthetic data with differentially private LLM inference

March 18, 2025

Due to challenges in generating text while maintaining DP and computational efficiency, prior work focused on generating a small amount of data points (<10) to be used for in-context learning. We show that it’s possible to generate two to three orders of magnitude more data while preserving quality and privacy by solving issues related to the privacy budget and computational efficiency.

The privacy budget constrains the amount of output the model can release while maintaining a meaningful DP guarantee. DP operates by introducing randomness to mask the contribution of any single data point, enabling plausible deniability. We increase output while maintaining privacy by leveraging the inherent randomness in next-token sampling to ensure privacy.

This connects next-token sampling in language models with a DP technique called the exponential mechanism. This mechanism is used to approximately choose the best token option from a set of options, with each option accompanied by a score computed from sensitive data. It does so by sampling an option with probability proportional to the exponential of its score – this introduces randomness crucial to the DP guarantee. This operation is the same as softmax sampling in language models when viewing the set of all tokens as the options from which the model chooses. Based on this connection, we design a DP token sampling algorithm that is strongly aligned with the standard generation process of large language models.

For computational efficiency, we propose a new privacy analysis that lets us use the same contexts for each generation step and avoid recomputation. Our analysis uses a fixed batch of examples, whereas the DP guarantee of prior work required a fresh batch of sensitive examples to be generated for each token. But using a fresh batch necessitates changing the input prompt for each sampled token, which is incompatible with standard inference efficiency techniques such as KV caching.

Finally, we also introduce a public drafter, a model that bases its next token predictions solely on already generated synthetic text, rather than sensitive data. Via the sparse vector technique, we only pay a privacy cost when the drafter’s proposals disagree with predictions made from sensitive data. Otherwise, we accept the drafter’s suggestion and do not expend any privacy budget. We find this is particularly effective for structured data, where many formatting-related tokens can be predicted by the drafter without looking at sensitive data.

The Java Developer’s Dilemma: Part 2 – O’Reilly

October 22, 2025

aitoolsadmin

This is the second of a three-part series by Markus Eisele. Part 1 can be found here. Stay tuned for part 3. Many AI projects fail. The reason is often…

Artificial Intelligence

Hierarchical generation of coherent synthetic photo albums

October 21, 2025

aitoolsadmin

Differential privacy (DP) provides a powerful, mathematically rigorous assurance that sensitive individual information in a dataset remains protected, even when a dataset is used for analysis. Since DP’s inception nearly…

Generating synthetic data with differentially private LLM inference

Leave a Reply Cancel reply

The Java Developer’s Dilemma: Part 2 – O’Reilly

Hierarchical generation of coherent synthetic photo albums

The Role of AI and Robotics in Improving Efficiency in Manufacturing

The Future of Work: How AI and Robotics are Changing Industries

The Impact of Artificial Intelligence and Robotics on Society

Advancements in AI and Robotics: What to Expect in the Coming Years

How to implement AI training for employees

Zapier vs. Tray: Which is best? [2026]

American Express Expands Trip Cancel Guard, Adding More Flexibility For Flight Cancellations – Forbes Advisor

How to set up Discord Server Subscriptions + Shops

American Express Expands Trip Cancel Guard, Adding More Flexibility For Flight Cancellations – Forbes Advisor

Open source agentic startup LangChain hits $1.25B valuation