Get started with data privacy vaults using Databunker

If you’re using a traditional database to store personally identifiable information (PII), you may be putting yourself and your customers at unnecessary risk. Even if you’ve enabled disk-level encryption for your database, that only protects your data at rest. Your application still needs to decrypt the data to use it. By targeting your application layer, for example using SQL injections, attackers can bypass the disk-level encryption entirely.

And that’s just one database. In the distributed architectures of today, data is often duplicated across other services and vendors. A notification service might request the customer’s contact information, then send it to a message bus, which streams it into a data warehouse. Before you know it, PII is scattered across your entire system. This type of data sprawl across services, logs, and analytics systems is one of the reasons why companies like Netflix are investing in data privacy vaults1.

In this post, you’ll learn what data privacy vaults are and how they address these challenges. You’ll use Databunker, an open-source data privacy vault, to see how these concepts work in practice.

Before you start

To follow along on your own computer, you’ll need:

Open source vs. Databunker Pro

While this tutorial uses the open source version, Databunker is also available as a paid version, called Databunker Pro.

Databunker Pro includes additional features for enterprise users, such as multi-tenancy and more advanced tokenization techniques.

The Pro version also uses a different version of the REST API (v2) from the open source version (v1). You may need to adapt the instructions if you’re using the Pro version.

What is a data privacy vault?

A data privacy vault is a secure, isolated database designed specifically to store, manage, and control access to sensitive data. Instead of storing personal data directly in your application database, a vault is purpose-built for handling PII, with strict access controls and compliance features.

Your application database still stores the operational data, with the vault serving as the single authoritative source for all sensitive personal information.

This separation has several benefits:

Centralized security controls: Rather than implementing encryption, access logging, and compliance features across every service that touches PII, you can consolidate these controls in one hardened system.

Reduced blast radius: If one of your application databases is compromised, the attacker only finds references rather than PII in cleartext. They would need to compromise the vault’s access controls to decrypt the actual data.

Built-in compliance features: Privacy regulations like GDPR require specific capabilities, such as audit logs showing who accessed what data when, and mechanisms that allow users to access or delete their data. Vaults provide these features as core functionality rather than something you need to build into each service.

Understanding tokenization

Data privacy vaults rely on tokenization to keep information secure. When you store customer data in the vault, you get back a token—a random identifier like a3d98b68-38b8-c600-128c-40ae713d73d7.

This token is what you store in your application databases and pass between services. It contains no sensitive information. If someone were to gain access to your application database, they’d only see these effectively meaningless references.

When you actually need the customer’s email, your application exchanges the token for the real data by making an authenticated request to the vault. The vault logs this access, verifies permissions, and returns the decrypted information.

Here’s how this works in practice:

A customer signs up on your website. Your application immediately sends their name, email, and phone number to the vault. The vault encrypts this data and returns a token. Your application stores the token alongside non-sensitive information, like the account type.

Later, when sending an email notification, your notification service retrieves the token from your database and uses it to request the email address from the vault. The vault logs that your notification service accessed this user’s email at this specific time. After sending the email, your service discards the email address, keeping only the token.

This pattern ensures sensitive data is only decrypted when absolutely necessary and for the minimum time required. The rest of the time, your systems work with meaningless tokens.

Install and run Databunker locally

To see how tokenization works in practice, we’ll use Databunker—an open source data privacy vault.

The fastest way to get started with Databunker is by using the official Docker image, which includes sample data and a demo authentication token.

  1. Run the Databunker container with the demo argument:

    docker run -p 3000:3000 -d --rm --name databunker securitybunker/databunker demo
  2. Verify the container is running:

    docker ps

    You should see the container running on port 3000.

  3. Test the API status endpoint:

    curl http://localhost:3000/v1/status

    You should receive a response saying OK if all went well.

Demo mode limitations
The demo container uses SQLite for storage and includes a hardcoded DEMO token for authentication. This setup is only intended for testing; never use demo mode in production.

Create your first user record

Let’s store a user record in Databunker and see tokenization in action.

  1. Create a user record using the Databunker REST API:

    curl http://localhost:3000/v1/user \
      --header "X-Bunker-Token: DEMO" \
      --header "Content-Type: application/json" \
      --data '{
        "name": "John Doe",
        "email": "john@example.com",
        "phone": "+1-555-0123"
      }'

    Databunker returns a response containing the user token:

    {
      "status": "ok",
      "token": "a3d98b68-38b8-c600-128c-40ae713d73d7"
    }
  2. Save this token. You’ll need it later to retrieve the user data.

The sensitive data is now encrypted inside the vault. The token is what you’d store in your application database. Since the token reveals nothing on its own, you can safely pass it between services or include it in analytics without exposing customer data.

In your application database, you might store it like this:

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    token UUID,
    created_at TIMESTAMP,
    account_type VARCHAR(50)
);

Notice how the database table contains no personal information—just the token and your business data.

Retrieve user data

When you need the actual user data, you exchange the token for the decrypted information.

Databunker supports several lookup methods, such as by token, email, or phone.

  • Retrieve by token:

    curl http://localhost:3000/v1/user/token/a3d98b68-38b8-c600-128c-40ae713d73d7 \
      --header "X-Bunker-Token: DEMO"
  • Retrieve by email (useful for login flows):

    curl http://localhost:3000/v1/user/email/john@example.com \
      --header "X-Bunker-Token: DEMO"

Both requests return the decrypted user data:

{
  "status": "ok",
  "token": "a3d98b68-38b8-c600-128c-40ae713d73d7",
  "data": {
    "email": "john@example.com",
    "name": "John Doe",
    "phone": "+1-555-0123"
  }
}
Hash-based search
Notice that you can find users by their email even though the data is encrypted. Databunker creates hash-based search indices for common fields like email and phone, letting you perform fast lookups without storing values in cleartext. The vault hashes the search term and matches it against stored hashes, never exposing the actual values.

How vaults help with compliance

Data privacy vaults provide features that directly support regulatory requirements:

Audit logs work automatically. Every operation that accesses or modifies customer data gets logged. When an auditor asks who accessed a customer’s data in March, you have a complete record. Databunker tracks what data was accessed, by which service, and when.

Erasure requests become straightforward. When a user requests data deletion under GDPR’s “right to be forgotten”, the vault acts as a centralized deletion point. Delete the data from the vault, and the corresponding tokens effectively become useless.

Time-limited access prevents unnecessary exposure. Vaults can issue temporary tokens that expire after a set duration or number of uses. A support agent helping a customer might receive a token that works for only 30 minutes. After that window, the token stops working automatically.

Data subject requests get easier. When customers request a copy of their data, the vault generates a complete report from its centralized store. You’re not assembling exports from multiple databases, hoping you didn’t miss anything.

Databunker Admin UI
Databunker even includes a built-in web interface for managing compliance workflows. Browse to localhost:3000, select Root Token, and log in with the demo token. You’ll see options for approving data subject requests, viewing audit logs, and generating compliance reports.

Clean up

When you’re done, stop and remove the demo container:

docker stop databunker

Learn more

This tutorial introduced you to data privacy vaults and tokenization through hands-on experience with Databunker. For more information on how to use Databunker, check out the Databunker website.

To learn more about data privacy vaults, check out the following resources by Skyflow:

Learn more from these articles

Tutorial

Get started with Fides

Learn how set up Fides, an open-source privacy-as-code platform, to create and respond to DSRs for a e-commerce sample application.

Comments

Leave a comment by replying to the post on Bluesky or Mastodon.

Artem R 🇺🇦 avatar
Artem R 🇺🇦@asci@indieweb.social

@marcusolsson nice! I wonder if centralized control makes it single point of failure/attack vector? compared to e2ee solutions

Marcus Olsson avatar
Marcus Olsson@marcusolsson

Personally, I’d much rather see an e2ee solution wherever possible 💯

It does become a single point of attack, but would you rather defend 10 systems with duplicated PII than one (honest question)?

You could also deploy multiple vaults for different departments, depending on how the data is used.