Sooner or later you’ll have to face the fact that your data store might as well have a huge target painted on it. Scammers are inventing new ways to harness big data to commit credit and identity fraud, while data miners are collecting digital footprints to influence everything from the success or failure of a product to influence political outcomes. It seems that everyone wants a piece of the pie.
There are many direct risks to the bottom line as well. Can you afford to have a few terabytes of data taken hostage by ransomware? What happens to your image if there’s a breach? Information collected for purposes of predictive analysis can easily become a huge liability, with the potential for misuse by those who aren’t afraid to take big risks to accomplish their agenda.
Big data = big consequences
Big data security involves the use of reasonable precaution during the stages of data collection, storage, and output analytics. For large enterprises, conventional security scans may not be feasible due to scale. For this reason, big data presents unique challenges, which require the entire organization to anticipate vulnerabilities rather than depend on InfoSec to scan for them and apply fixes later.
Most cloud providers operate what one provider calls a Shared Responsibility Model. This framework means that AWS, for instance, is responsible for protecting data stores from intruders within their network. However, that leaves its customers responsible for controlling access to those data stores.
Watch out for intermediary adversaries
Intermediaries are one big threat. Man-in-the-middle attacks can intercept data before it’s even stored. Transport Layer Security (TLS) is critical from the earliest stages of data capture to prevent eavesdropping or even on-the-fly data modification.
Unfortunately, traditional role-based access controls are usually insufficient to safeguard the many layers of data that are available. Attribute-based access control (ABAC) should be implemented in order to provide access based on a wide range of criteria. Unlike role-based access, ABAC is context-aware. This means that we can refine access much further than simply giving broad access to department managers, for instance. ABAC tools such as Apache Ranger can process a number of If…Then… scenarios to ensure that a particular manager is currently working on a project requiring access to a certain subset of data.
Locking down your APIs
The level of access provided to developers can also represent a huge security risk.
The DB Admin can control access by ensuring that data transfers occur through stored procedures, preventing direct access to the elevated privileges that a disgruntled developer could use to gain access to off-limits information. Stored procedures can also apply dynamic data masking to ensure that only the necessary data is displayed for a particular analysis report.
APIs themselves can be designed to operate on a more granular level of access, even though the processing footprint may increase. The XACML access control language can limit access from APIs to particular subsets based on a wide range of criteria, such as regional restrictions for instance.
As our data is transferred into the archive, compliance issues come into focus as leaked credit card numbers or theft of identity information can devastate a company’s image and incur big fines. Although user account information may be necessary in the live database, all credit card numbers and other personally identifiable information should be stripped before predictive analysis data enters the archive.
When in doubt, leave it out
Many organizations have decided that it’s unwise to expose big data connections to desktop workstations at all. As information flows from the data store to client machine for processing, any number of vulnerabilities could exist to intercept that data, as the Heartbleed SSL bug famously uncovered. By implementing Virtual Desktop Infrastructure (VDI), a client machine can exist as a virtual desktop within the data center itself, greatly reducing the transfer of data across the Internet for report generation and other analytics.
Whether your big data is used for marketing, scientific research, or for predicting criminal activity or medical conditions, its protection will always be of paramount importance. With a focus on designing-in data protection from the outset, we can avoid the costly media embarrassment that comes from a big data breach and continues to benefit from the many breakthroughs that data scientists are just beginning to uncover.