Get started with data privacy vaults using Databunker

If you’re using a traditional database to store personally identifiable information (PII), you may be putting yourself and your customers at unnecessary risk. Even if you’ve enabled disk-level encryption for your database, that only protects your data at rest. Your application still needs to decrypt the data to use it. By targeting your application layer, for example using SQL injections, attackers can bypass the disk-level encryption entirely.
And that’s just one database. In the distributed architectures of today, data is often duplicated across other services and vendors. A notification service might request the customer’s contact information, then send it to a message bus, which streams it into a data warehouse. Before you know it, PII is scattered across your entire system. This type of data sprawl across services, logs, and analytics systems is one of the reasons why companies like Netflix are investing in data privacy vaults1.
In this post, you’ll learn what data privacy vaults are and how they address these challenges. You’ll use Databunker, an open-source data privacy vault, to see how these concepts work in practice.
Before you start
To follow along on your own computer, you’ll need:
While this tutorial uses the open source version, Databunker is also available as a paid version, called Databunker Pro.
Databunker Pro includes additional features for enterprise users, such as multi-tenancy and more advanced tokenization techniques.
The Pro version also uses a different version of the REST API (v2) from the open source version (v1). You may need to adapt the instructions if you’re using the Pro version.
What is a data privacy vault?
A data privacy vault is a secure, isolated database designed specifically to store, manage, and control access to sensitive data. Instead of storing personal data directly in your application database, a vault is purpose-built for handling PII, with strict access controls and compliance features.
Your application database still stores the operational data, with the vault serving as the single authoritative source for all sensitive personal information.
This separation has several benefits:
Centralized security controls: Rather than implementing encryption, access logging, and compliance features across every service that touches PII, you can consolidate these controls in one hardened system.
Reduced blast radius: If one of your application databases is compromised, the attacker only finds references rather than PII in cleartext. They would need to compromise the vault’s access controls to decrypt the actual data.
Built-in compliance features: Privacy regulations like GDPR require specific capabilities, such as audit logs showing who accessed what data when, and mechanisms that allow users to access or delete their data. Vaults provide these features as core functionality rather than something you need to build into each service.
Understanding tokenization
Data privacy vaults rely on tokenization to keep information secure. When you store customer data in the vault, you get back a token—a random identifier like a3d98b68-38b8-c600-128c-40ae713d73d7.
This token is what you store in your application databases and pass between services. It contains no sensitive information. If someone were to gain access to your application database, they’d only see these effectively meaningless references.
When you actually need the customer’s email, your application exchanges the token for the real data by making an authenticated request to the vault. The vault logs this access, verifies permissions, and returns the decrypted information.
Here’s how this works in practice:
A customer signs up on your website. Your application immediately sends their name, email, and phone number to the vault. The vault encrypts this data and returns a token. Your application stores the token alongside non-sensitive information, like the account type.
Later, when sending an email notification, your notification service retrieves the token from your database and uses it to request the email address from the vault. The vault logs that your notification service accessed this user’s email at this specific time. After sending the email, your service discards the email address, keeping only the token.
This pattern ensures sensitive data is only decrypted when absolutely necessary and for the minimum time required. The rest of the time, your systems work with meaningless tokens.
Install and run Databunker locally
To see how tokenization works in practice, we’ll use Databunker—an open source data privacy vault.
The fastest way to get started with Databunker is by using the official Docker image, which includes sample data and a demo authentication token.
Run the Databunker container with the
demoargument:docker run -p 3000:3000 -d --rm --name databunker securitybunker/databunker demoVerify the container is running:
docker psYou should see the container running on port 3000.
Test the API status endpoint:
curl http://localhost:3000/v1/statusYou should receive a response saying
OKif all went well.
DEMO token for authentication. This setup is only intended for testing; never use demo mode in production.Create your first user record
Let’s store a user record in Databunker and see tokenization in action.
Create a user record using the Databunker REST API:
curl http://localhost:3000/v1/user \ --header "X-Bunker-Token: DEMO" \ --header "Content-Type: application/json" \ --data '{ "name": "John Doe", "email": "john@example.com", "phone": "+1-555-0123" }'Databunker returns a response containing the user token:
{ "status": "ok", "token": "a3d98b68-38b8-c600-128c-40ae713d73d7" }Save this token. You’ll need it later to retrieve the user data.
The sensitive data is now encrypted inside the vault. The token is what you’d store in your application database. Since the token reveals nothing on its own, you can safely pass it between services or include it in analytics without exposing customer data.
In your application database, you might store it like this:
CREATE TABLE users (
id SERIAL PRIMARY KEY,
token UUID,
created_at TIMESTAMP,
account_type VARCHAR(50)
);Notice how the database table contains no personal information—just the token and your business data.
Retrieve user data
When you need the actual user data, you exchange the token for the decrypted information.
Databunker supports several lookup methods, such as by token, email, or phone.
Retrieve by token:
curl http://localhost:3000/v1/user/token/a3d98b68-38b8-c600-128c-40ae713d73d7 \ --header "X-Bunker-Token: DEMO"Retrieve by email (useful for login flows):
curl http://localhost:3000/v1/user/email/john@example.com \ --header "X-Bunker-Token: DEMO"
Both requests return the decrypted user data:
{
"status": "ok",
"token": "a3d98b68-38b8-c600-128c-40ae713d73d7",
"data": {
"email": "john@example.com",
"name": "John Doe",
"phone": "+1-555-0123"
}
}email and phone, letting you perform fast lookups without storing values in cleartext. The vault hashes the search term and matches it against stored hashes, never exposing the actual values.How vaults help with compliance
Data privacy vaults provide features that directly support regulatory requirements:
Audit logs work automatically. Every operation that accesses or modifies customer data gets logged. When an auditor asks who accessed a customer’s data in March, you have a complete record. Databunker tracks what data was accessed, by which service, and when.
Erasure requests become straightforward. When a user requests data deletion under GDPR’s “right to be forgotten”, the vault acts as a centralized deletion point. Delete the data from the vault, and the corresponding tokens effectively become useless.
Time-limited access prevents unnecessary exposure. Vaults can issue temporary tokens that expire after a set duration or number of uses. A support agent helping a customer might receive a token that works for only 30 minutes. After that window, the token stops working automatically.
Data subject requests get easier. When customers request a copy of their data, the vault generates a complete report from its centralized store. You’re not assembling exports from multiple databases, hoping you didn’t miss anything.
Clean up
When you’re done, stop and remove the demo container:
docker stop databunkerLearn more
This tutorial introduced you to data privacy vaults and tokenization through hands-on experience with Databunker. For more information on how to use Databunker, check out the Databunker website.
To learn more about data privacy vaults, check out the following resources by Skyflow:
- What Is a Data Privacy Vault by Skyflow—comprehensive overview of vault architecture and use cases
- Things I Learned Building Data Privacy Vaults at Netflix—real-world implementation lessons from Netflix’s engineering team



Comments
Leave a comment by replying to the post on Bluesky or Mastodon.
@marcusolsson nice! I wonder if centralized control makes it single point of failure/attack vector? compared to e2ee solutions
Personally, I’d much rather see an e2ee solution wherever possible 💯
It does become a single point of attack, but would you rather defend 10 systems with duplicated PII than one (honest question)?
You could also deploy multiple vaults for different departments, depending on how the data is used.