Due to challenges in generating text while maintaining DP and computational efficiency, prior work focused on generating a small amount of data points (<10) to be used for in-context learning. We show that it’s possible to generate two to three orders of magnitude more data while preserving quality and privacy by solving issues related to the privacy budget and computational efficiency.
The privacy budget constrains the amount of output the model can release while maintaining a meaningful DP guarantee. DP operates by introducing randomness to mask the contribution of any single data point, enabling plausible deniability. We increase output while maintaining privacy by leveraging the inherent randomness in next-token sampling to ensure privacy.
This connects next-token sampling in language models with a DP technique called the exponential mechanism. This mechanism is used to approximately choose the best token option from a set of options, with each option accompanied by a score computed from sensitive data. It does so by sampling an option with probability proportional to the exponential of its score – this introduces randomness crucial to the DP guarantee. This operation is the same as softmax sampling in language models when viewing the set of all tokens as the options from which the model chooses. Based on this connection, we design a DP token sampling algorithm that is strongly aligned with the standard generation process of large language models.
For computational efficiency, we propose a new privacy analysis that lets us use the same contexts for each generation step and avoid recomputation. Our analysis uses a fixed batch of examples, whereas the DP guarantee of prior work required a fresh batch of sensitive examples to be generated for each token. But using a fresh batch necessitates changing the input prompt for each sampled token, which is incompatible with standard inference efficiency techniques such as KV caching.
Finally, we also introduce a public drafter, a model that bases its next token predictions solely on already generated synthetic text, rather than sensitive data. Via the sparse vector technique, we only pay a privacy cost when the drafter’s proposals disagree with predictions made from sensitive data. Otherwise, we accept the drafter’s suggestion and do not expend any privacy budget. We find this is particularly effective for structured data, where many formatting-related tokens can be predicted by the drafter without looking at sensitive data.