An Equifax Lesson Learned: DLP

What can we do when traditional DLP doesn't provide the protection for private or sensitive information that we actually expect to leave our enterprise.

What can we do when traditional DLP doesn't provide the protection for private or sensitive information that we actually expect to leave our enterprise.

For years now, I have been wrestling with the challenge of data loss prevention (DLP). My entrepreneurial efforts, their customers, and even acquirers were enterprises that share the need to prevent unintentional loss of private or sensitive data (the mission of DLP), but their core business requires them sell or exchange private or sensitive data - a process completely counter to DLP. While not the core issue, the Equifax breach only highlights the DLP challenge. The penetration occurred on a consumer-facing portal designed for individuals to access to their own private information. The vulnerability exposed Equifax to a massive data exfiltration through an "exit" from which they would normally expect sensitive information to leave. This is DLP's biggest shortfall - current tools do not help enterprises that need to "break the rules" and push protected data externally - a process for which they were designed to counter. A system that prevents both accidental and intentional release of sensitive data can't effectively guard the gateway where the largest amount of sensitive data needs to enter and leave the enterprise. Public and private data providers like Equifax, Experian, TransUnion, Innovis, and LexisNexis (among many others in healthcare, banking, and public sector) are critical to help business succeed, but they also need more effectively mitigate improper exfiltration of data through the very gateways they need to serve their customers and business partners.

A decade ago DLP was emerging and getting a lot of attention. Market analysts set out to build new "quadrants" and "waves" showing a high rate of change in capabilities. Despite acquisition by large cybersecurity software vendors, DLP was, and remains, a hodgepodge of tools, each uniquely developed for a specific kind of vulnerability and little interoperability. The tools are difficult to train (the process of learning things that should not leave the system or enterprise) and prevent false positive detections. They are also expensive to deploy in more complex, hybrid environments with cloud-native services and legacy server instances hosting data repositories and exchange systems.  The tools can be configured to alert the individual or the security team, block the undesired action, or even encrypt data that was found unencrypted...all automatically. McAfee acquired Onigma and Reconnex to integrate an endpoint and network DLP solutions, and eventually tried to tie it all together with policy orchestration. Similarly, Symantec purchased Vontu for its core DLP platform. They then built or acquired DLP solutions at enterprise gateways (e.g. Symantec bought BlueCoat). These proxy solutions open secure sessions in and out of the enterprise (usually for business systems and employee Internet access) to identify improper content buried in secure connections that were previously invisible to the enterprise. If you put web proxies on a traditional data provider customer interface, they will find sensitive content constantly (SSNs, addresses, DoBs, health information, financial data, etc.). Even Microsoft has enabled kludgey, yet prone to human error, an email DLP in the Exchange platform and cloud file storage tools. Large security vendors had servers (virtual appliances) that could be deployed in the Cloud, but they were little more than Cloud-based instantiations of their terrestrial cousins. They still only looked at mail servers, file servers, and Web servers you deployed to the Cloud. They had no interoperability with Cloud-native services like S3 buckets, AWS Aurora and RDS databases, and Cloud-proprietary API gateways. One might as well go back to cold iron at this point.

While I am beginning to see Cloud-native DLP tools on the rise, we are still missing a critical service. We need to have more awareness of improper data leakage at normal points of data exchange (customer, partner, or consumer interfaces). Properly designed and implemented, it would have helped prevent or reduce the magnitude of the Equifax breach as well as other notable examples in Anthem, Sony, and the SEC.

One example of a market leader in taking a unique approach to DLP is Ionic Security...their approach is: all data is encrypted, those who have the systems to decrypt make a call a centralized server, check their specific permission to see the data (based upon identity, time, and/or location), and then deliver the key to decrypt without end-user intervention or access to the key. So, the credit report would be generated and encrypted only for decryption by the intended recipient or application (like a loan processing app).  In addition to the improved data encryption tools, I propose a different approach that would have the following capabilities specifically for enterprises that must send private information to customers, partners, and end-users:

Data loss prevention at gateways where we intend to share sensitive data requires analysis that sees changes in data flow behavior.  GFDL

Data loss prevention at gateways where we intend to share sensitive data requires analysis that sees changes in data flow behavior. GFDL


  • Implement proxy servers (Symantec BlueCoat, McAfee Web Inspector) inbound and outbound for the customer (i.e. external) data gateway, allowing exposure of inbound requests for data before they hit the internal request end-point to ascertain the source of the request. Well secured organizations do this, but very few that put the proxy on the customer or partners interfaces where we know we are sharing sensitive data; instead, they focus on the gateway for business systems (email, web access, VPNs, Cloud-data storage, etc.). 
  • Model the outbound traffic flow and variations, both at the internal data generation point and the customer gateway using the proxy server approach similar to those above. Use this to ascertain:
    • the common behavior patterns with reference to request source and destination endpoint,
    • specific data requested compared specific data requested from that requestor in the past,
    • request sources and endpoints known to have been fraudulent or compromised (looks like a DDOS, but it is a Distributed, Apparently Normal, Gathering of Non-Approved Biographical Information Threat - DANG-NABIT)
    • changes in the request content and frequency detected at the internal data generation point and the data flowing out of the gateways to the customers.
  • Partner with ISPs to identify not only the Internet gateway, but also the IP address behind their gateway to find compromised hosts behind natural obfuscation points (NAT transversal), this capability exists in ISPs today.
  • Implementing a data encryption similar to that of Ionic wherein the data is protected and only accessible to the requestor with a real-time check of authorization established by the originator.

These strategies come from my dinosaur-era telecom experience with Simple Network Management Protocol (SNMP) detection of performance issues, re-routing, and known abusers of the network - an art form with which fewer infrastructure engineers have experience. Data providers, like hospitals and government agencies, probably have tens of thousands to millions of attempted penetrations daily, so the security alarms are ringing off the hook, and frankly our ops teams become numb to the build up of items in the queue. Analytical tools looking at these attacks catch the attempted or successful penetration from unknown or threat hosts, thereby identifying vulnerabilities (like a well-configured Splunk implementation). However, they are rarely configured to alarm not only on penetration, but further identification of an unexpected data exfiltration. The new approach described above will catch improper data exfiltration by comparing historical request-response rates with current, identifying response destinations that do not make sense for the given request, or observing duplicate flows showing a vulnerability where an authentic transaction is simply copied and forwarded to an alternate endpoint.

Finally, we need not over-engineer the regulatory and policy domain. Congress and state Attorneys General propose punitive actions and draft regulations that demand time consuming audits that overwhelm the ability to remediate threats ("they" already know the vulnerabilities are there, but the resources get dislodged to complete redundant audits so the business continues to thrive). As a person who has been in the middle of this, adding another audit doesn't make things better, it is filling out more paperwork and sharing more logs...slowing strained ops teams from actually doing the remediations.

I would love to know your thoughts and any other approaches you have seen work.