12 and 1 Ideas How to Enhance Backend Data Security

Information Security

Previously in series

Previously, we’ve talked about classic design patterns in backend data security, then about key management goals and techniques.

It is important to understand that database security evolved with system administration techniques and programming demands, with cryptography and access controls being complementary features, rather than cornerstones.

In classic designs, there are two important drawbacks:

1. Trust tokens:

  • they rely on storing trust tokens somewhere inside the infrastructure.
  • trust tokens barely rely on real-world relationships.
  • these trust tokens are large attack surface, as they open access to many records at once.

2. Trusting infrastructure:

  • all these designs suggest that infrastructure exists, works properly and was not completely compromised.
  • some of classic designs rely on idea that there is rough «perimeter» between inside and outside world.

This, as it turns out, is not the case in 2016.

Goals of modern backend security system

Apart from security practices themselves, today’s application architecture and typical engineering patterns have changed significantly. Also, the level of detail modern developer is willing to get into is unlike 10 years ago — developer expects most of the things to be neatly solved by existing software and frameworks.

When thinking of database/backend security, we generally want:

  • access control with strong compartmentation: authentication, granular CRUD authorization per user/table, similar to grant rights now existing in databases without encryption.
  • leakage prevention at rest / in use / in motion.
  • authenticity and integrity of all data.

When thinking about modern practices, we might add:

Risk model should consider baseline to be ‘everything will be broken’ threat model:

  • everything is in the cloud and the cloud itself should have very limited trust,
  • the database, middleware, API providers and front-end talk over the open internet,
  • they don’t have a centralised source of trust,
  • they don’t share verifiable physical factors,
  • they can be compromised without awareness of other talking parties.

Security instrumentation should easily blend into data representation:

  • ORM-friendly
  • Prepared statements
  • Easy management and entity mapping.

Security functions should include as little cryptographic detail as possible to isolate error and minimise adoption friction.

Also, we want to sacrifice as little database-specific benefits as possible:

  • Backups, compaction,
  • Indexing protected data and searching over it,
  • Using protected data in SQL statements,
  • Control protection with flexible granularity — from cell to table.

Solutions to some challenges

Unfortunately, no all-encompassing solutions for aforementioned problems exist. However, for each of the problems and goals of backend security design, there are numerous components and techniques we might use.

We can divide new protection solutions into few classes:

  • Encryption: searching, indexing, encrypted query databases,
  • Infrastructure security,
  • Access control.

Encryption: Searching / Indexing

Searching is a subset of controlling read access cryptographically: how to allow processes with certain features / keys to read the data without compromising it to potential attackers, yet preserving ability to execute various queries on top of it.

SSE, Searchable Symmetric Encryption

A promising approach to symmetrically encrypt text, then challenge the database with specially crafted queries. Works with sequential scanning and indexing but is rather limited and is a bit theoretical. There is an implementation to try out and build on, anyway:

https://people.eecs.berkeley.edu/~dawnsong/papers/se.pdf(Paper)

https://github.com/atulmahind/song-wagner-perrig (Implementation)

https://eprint.iacr.org/2006/210.pdf (Overview paper)

PEKS, Public Key Encryption Scheme

Public Key Encryption Keyword Search scheme relies on data owner to generate a number of trust tokens, which are used within ‘vefication’ process, which allows the server to verify whether chosen keyword is available or not within encrypted data. Although slow and theoretical currently, possible security of this scheme is very interesting.

Public Key Encryption with keyword Search

Public Key Encryption with Keyword Search Revisited

https://github.com/atulmahind/PEKS (implementation)

Homomorphic encryption

Homomorphic encryption is a method of performing calculations on encrypted information without decrypting it first. There are fully and partially homomorphic encryption schemes, which provide different sets of operations on protected data. Apart from searching, there are many use-cases (like using the data to perform certain calculations), in which homomorphic encryption is extremely useful.

Although looking as a part of the future, there are no practically usable systems till today:

Lattice-based encryption has also attracted attention from theoreticians who talk about its «flexibility for realising powerful tools like fully homomorphic encryption». The latest speed reports for fully homomorphic encryption are—let me use precise technical terminology here since I’m a big fan of careful benchmarking—ludicrously slow, but without ideal lattices, they would be utterly ludicrously slow.

(Daniel Bernstein, https://blog.cr.yp.to/20140213-ideal.html)

Minimal exposure search index

There are more practical approaches, though. You can manually define a list of tokens you’d like to search over, encrypt or hash them, and search accordingly. You can decouple search IDs and tokens from actual data before encrypting/hashing them, thus making sure that known ciphertext attack won’t be useful.

Encrypted query databases

CryptDB

CryptDB is a system that provides practical and provable confidentiality in the face of these attacks for applications backed by SQL databases. A scientific research led by MIT, CryptDB carefully balances various encryption techniques with risks and requires requesting the party to craft a special encrypted query to execute it over protected data. Although looking quite well and being adopted by many parties, there already are known vulnerabilities (https://cs.brown.edu/~seny/pubs/edb.pdf) and weaknesses, which already led to ‘how to use CryptDB securely’ guidelines. Although the dispute is yet to be solved, in most cases we can consider CryptDB practically applicable to backend data security problems.

Site: http://css.csail.mit.edu/cryptdb

Github: https://github.com/CryptDB/cryptdb

Encrypted BigQuery

Inspired by this research, Google has proposed Encrypted BigQuery, experimental BigQuery client, which provides a subset of BigQuery operations in encrypted fashion:

Client: https://github.com/google/encrypted-bigquery-client

Tutorial: https://github.com/google/encrypted-bigquery-client/blob/master/tutorial.md

Cipherbase

Microsoft has suggested its own security system for encrypted queries, Cipherbase, which is the base for Always Encrypted database engine.

Transaction Processing on Confidential Data using Cipherbase

Engineering Security and Performance with Cipherbase

Infrastructure security

Trust compartmentation

What would you do if you can’t control trust of a large database and/or application cluster? You offload critical procedures to a small service, running in a well-controlled environment (and, perhaps, powered by hardware separate from constantly-loaded database cluster).

HSM

There’s an easy classic way to offload trust — to a dedicated piece of hardware, performing all cryptographic operations and managing keys. There are cases where such solution might feel efficient, yet, typical HSM performance is not helpful when processing a lot of data.

HSMs are available for all mainstream commercial databases, and, with some level of effort, are easy to integrate into modern open-source ones.

Integrated security instrumentation

A lot of older database protection techniques rely on a database running in the safe and secure environment: e.g. trusting the system you run your code on. This is a place for traditional security instrumentation: Host IDS (like Samhain), Mandatory Access Control (like SELinux) and others.

Access control

Most existing database encryption techniques enforce only read control, preventing the risk of data exposure, by requiring a key to access encrypted data. Some of them verify the authenticity of protected records, thus providing tampering protection, but we are not aware of any schemes with write control (apart from the ones we’ve developed ourselves, but later on this). Apart from read control, the rest is enforced by typical ACL/grant techniques, which rely on trusting that database behaviour is not compromised by an attacker.

Inside previously discussed threat model, we want as little trust put into backend as possible. This means enforcing access control via non-database techniques, like encryption, and making sure that except for legitimate consumers, data ‘in process’ is decrypted as little as possible (if decrypted at all).

Cossack Labs research

At Cossack Labs, we strive to see the problems described in this article in a very different light.

First of all, we believe that old UNIX proverb of “do 1 thing really well, get inputs and outputs in standardised fashion” doesn’t work well for security in 2017, because in this case:

  • developers are still kept responsible for making security decisions, including key layout and encryption granularity.
  • typical solution suggests stripping several security tools together in one backend infrastructure, which means more chances to break things on integration.
  • such work requires high-level vision, which is rarely present.

We strive to address these problems differently: by providing specialized tools for specific use-cases, which abstract all cryptographic decisions into more user-friendly concepts.

Acra: crypto compartmentation via transparent database encryption

Acra is our take on compartmenting trust via transparent architecture: making sure that attack surface is very little and contained within well-controlled environment. A daemon running on a separate virtual machine, receiving all database queries, executing them, then decrypting the data and supplying it back to the application via protected channel. Acra’s encryption scheme is built in a way where the application is able to write data with a small number of cryptographic tokens, which are not sufficient to decrypt anything.

Hermes: granular access

Hermes is our research on the much more ambitious problem: enforcing all CRUD grant rights via cryptography and provide infrastructure to build complete end-to-end apps, which rely on cryptography to implement all of their security mechanisms. This is ongoing research, with new implementations and ecosystems being built right now. We’ll present Hermes proof-of-concept with practical sample code and scientific paper earlier next year.

Ending notes

There are many techniques for protecting data stored within database / application backend. Intuitively it feels that by combining a few tools here and there we might achieve some decent level of security. Practically, we need to understand threat model, how to limit attack surface and protect it really well. It is a part of application/infrastructure design, not a ‘feature’, nor a ‘service’.

This post was previously published on our blog, where you can find more interesting articles.

Comments

    3,751

    Ropes — Fast Strings

    Most of us work with strings one way or another. There’s no way to avoid them — when writing code, you’re doomed to concatinate strings every day, split them into parts and access certain characters by index. We are used to the fact that strings are fixed-length arrays of characters, which leads to certain limitations when working with them. For instance, we cannot quickly concatenate two strings. To do this, we will at first need to allocate the required amount of memory, and then copy there the data from the concatenated strings.