Friday, October 23, 2009

Use Data Masking to Secure Sensitive Data in Non-Production Environments

| Brett D. Arion |

Data masking is the process of de-identifying (masking) specific elements within data stores by applying one-way algorithms to the data. The process ensures that sensitive data is replaced with realistic but not real data; for example, scrambling the digits in a Social Security number while preserving the data format. The one-way nature of the algorithm means there is no need to maintain keys to restore the data as you would with encryption or tokenization.Last week's article covered the topic of protecting data in databases from the inside out. That is, watching every action involving data as it happens, and promptly halting improper actions.

Data masking is typically done while provisioning non-production environments so that copies of data created to support test and development processes are not exposing sensitive information. If you don't think this is important, consider what happened to Wal-Mart a few years ago. Wired.com reports that Wal-Mart was the victim of a serious security breach in 2005 and 2006 in which hackers targeted the development team in charge of the chain's point-of-sale system and siphoned source code and other sensitive data to a computer in Eastern Europe. Many computers the hackers targeted belonged to company programmers. Wal-Mart at the time produced some of its own software, and one team of programmers was tasked with coding the company's point-of-sale system for processing credit and debit card transactions. This was the team the intruders targeted and successfully hacked.

Wal-Mart's situation may not be unique. According to Gartner, more than 80%t of companies are using production sensitive data for non-production activities such as in-house development, outsourced or off-shored development, testing, quality assurance and pilot programs.

The need for data masking is largely being driven by regulatory compliance requirements that mandate the protection of sensitive information and personally identifiable information (PII). For instance, the Data Protection Directive implemented in 1995 by the European Commission strictly regulates the processing of personal data within the European Union. Multinational corporations operating in Europe must observe this directive or face large fines if they are found in violation. U.S. regulations such as the Gramm-Leach-Bliley Act (GLBA) and the Health Insurance Portability and Accountability Act (HIPAA) also call for protection of sensitive financial and personal data.

Worldwide, the Payment Card Industry Data Security Standard (PCI DSS) requires strict security for cardholder data. In order to achieve full PCI compliance, organizations must protect data in every system that uses credit card data. That means companies must address their use of cardholder data for quality assurance, testing, application development and outsourced systems -- and not just for production systems. In the Wal-Mart case discussed above, the retailer failed to meet the PCI standard for data security by not securing data in the development environment.

Many large organizations are concerned about their risk posture in the development environment, especially as development is outsourced or sent offshore. A lack of processes and technology to protect data in non-production environments can leave the company open to data theft or exposure and regulatory non-compliance. Data masking is an effective way to reduce enterprise risk. Development and test environments are rarely as secure as production, and there's no reason developers should have access to sensitive data. And while encryption is a viable security measure for production data, encryption is too costly and has too much overhead to be used in non-production environments.

Many database vendors offer a data masking tool as part of their solution suites. These tools, however, tend to work only on databases from a specific vendor. An alternative solution is to use a vendor-neutral masking tool. Dataguise is one of the leading vendors in the nascent market of data masking.

The dataguise solution has two complementary modules. dgdiscover is a discovery tool that searches your environment (including endpoints) to find sensitive data in structured and unstructured repositories. So, even if someone has copied data to a spreadsheet on his PC, dgdiscover can find it. This can be a valuable time-saving tool as data tends to be copied to more places, especially as virtual environments grow and new application instances can be deployed on demand. dgdiscover also can be used to support audits and create awareness of data repositories.

The second dataguise module is dgmasker, a tool that automatically masks sensitive data using a one-way process that can't be reverse engineered. Dgmasker works in heterogeneous environments and eliminates the common practice of having DBAs create masking techniques and algorithms. The tool preserves relational integrity between tables/remote databases and generates data that complies with your business rules for application comparability. In short, you have all the benefits of using your actual production data without using the real data. Instead, dgmasker obfuscates the real data so that it cannot be recovered by anyone -- insider or outsider -- who gains access to the masked data.


Data masking is an effective tool in an overall data security program. You can employ data masking in parallel with other data security controls such as access controls, encryption, monitoring and review/auditing. Each of these technologies plays an important role in securing data in production environments; however, for non-production environments, data masking is becoming a best practice for securing sensitive data.

Free Security Magazines