Skip navigation to main content

OWASP : Sensitive Data Exposure

This is the sixth in a series of posts on the OWASP Top Ten

This risk is one that should really be managed and fixed by your project architect. It boils down to ensuring that data when it's recorded on disk, transmitted over a piece of wire or sitting in memory, is only there if it needs to be and has suitable security controls in place to protect it from being stolen.

This usually means the correct use of cryptography, but can be as simple as making sure you're using some cryptography!

An architect must think about the flow of sensitive data across their design, so they must first decide what "sensitive data" means. Maybe the following questions help to clarify this definition in terms of the data flow across the project.

  • Sensitive to who?
    • The business?
    • A third party business?
    • The customer?
  • Sensitive how?
    • Commercially sensitive such as a company secret?
    • Perhaps it's credit card details, or personally identifiable information (PII)?
    • Mandated industry standard or requirement to classify?

By understanding how the leakage of this data would affect your business, so the architect can next design controls and policies into the solution that ensure that data is managed correctly. For web applications this usually boils down to the following:

  • Data at Rest
    • Is data stored in clear text?
    • Is data backed up in clear text?
    • Do you need to store that data at all? (Data you don't have can't be stolen!)
  • Data in Motion
    • Is data transmitted in clear text? Across the internet?
    • Are deprecated or weak cryptographic algorithms used?
    • Are bespoke cryptographic algorithms used?
    • Are weak cryptographic keys generated?
    • Is there a proper key management process in place?
    • Are there any security directives/headers missing in communications with the browser?

Finally, controls must be positioned and applied such that they are relevant to the risk, because so often they are not!

Let's talk about data at rest

After a recent cyber-incident where database contents were extracted this year, several of our customers wanted assurances that their data was protected against similar attacks against solutions we've delivered and continue to support.

One such customer wanted assurances that PII data stored on a MySQL database was protected against a "similar attack".

With the successful attack in question, the threat vector was hackers at home leveraging weaknesses in the solution design and implementation to extract PII over the internet. Our customer stressed that this would require the data to be encrypted at rest on the database i.e. that the solution was to use database encryption to mitigate the risk.

Database encryption, such as Oracle Transparent Data Encryption (TDE), Microsoft TDE and MySQL Enterprise Encryption, provides a transparent interface to the database from the application perspective, however they use a key hierarchy to encrypt data at the disk level.

Transparent Data Encryption (TDE) HLD
Transparent Data Encryption (TDE) HLD

You may have guessed, but TDE technologies are designed to mitigate the insider stealing a hard-drive or slurping data out of backup tapes; TDE does not protect against a hacker extracting data from the database while it's online.

In this example, the system was responsible for writing data provided by a user to a MySQL database, but no part of that same system required the ability to read the same data, that was done by a backend system later in the process. This suggested the use of public key cryptography so that data was encrypted with a public key and stored in the database such that only the backend system (which owned the corresponding private key) could decrypt and read that data. Risk mitigated! TDE is fine for insider attack vectors, but it doesn't meet the specific risks the customer wanted addressed.

Have you been doing it wrong all this time?

So as you can see, using a cryptographic algorithm to protect your data is not a silver bullet. It's crucial that your controls meet your risks appropriately. Here are some of the more basic, and easily fixed, errors I've seen in some poorly designed and implemented solutions:

  • If you've designed your own clever cryptographic algorithm to protect your data, there is a 99.99% chance you are wrong, and in fact your data is probably not suitably protected. Enlist the help of a security architect to select from the plethora of available standardised algorithms and mechanisms to meet the requirements of your control.
  • Do you need to store that sensitive data at all? Certainly, understanding the overall risk of data being leaked and the use cases will help you make this decision.
  • Where you're transmitting sensitive data, is it done so in clear text form? Even when data is transmitted internally across your internal network, there are risks that would require you to do so using a secured protocol such as TLS. Transmitting or receiving sensitive data externally, perhaps over the public internet, in an unencrypted or unprotected form should be especially suspect in your data model.
  • What algorithms are you using to protect your data? Are they standardised and approved for use by the industry? Even certain switches and settings with certain standard algorithms can make an otherwise strong algorithm sufficiently weak to make them a target for persistent or motivated hackers.
  • Where and how are your sensitive keys stored and managed? You might need to consider the use of Hardware Security Module (HSM) technologies to assure that cryptographic keys remain secure against even the most persistent attack. Cryptographic keys should we replaced (or "rolled") according to a measurement on their overall strength, access frequency and data they protect.
  • Are you recording user passwords in plaintext? Stop! Use a mechanism such as Salt+Hash or any number of equivalent standard cryptographic algorithms to store a one-way salted hash of your passwords instead.
  • Are all authenticated and login pages accessed by users using TLS? If not, it is trivial for an attacker to monitor traffic to and from the website, steal the session cookie (which is sensitive data itself, right!)

Next time: Missing Function Level Access

Join the conversation on Twitter@OWASP   #OWASPtop10   #sensitivedata