“File shared with you…”
If you receive a message like that, there’s about a 1 in 3 chance that you’ll have been shared on a Google Drive file. Google Drive has the highest market share of any global file sharing software, among individuals and among organizations. Google Workspace, of which Google Drive is an important component, was used by over 8 million organizations, comprising 3 billion users, as far back as the end of 2022.
The amount of data created and shared on Google Drive every day is mindboggling. But even more mindboggling is the Google Drive security question in the minds of IT and information security teams: how do we protect it all?
Is Google Drive secure?
Google Drive security divides into two main categories: data infiltration and data exfiltration.
Data infiltration is harm caused by data you don’t want entering your Google Drive. This includes files shared to your Drive that turn out to be malware, or bad actors that hack into your Drive and change the data in your assets.
Data exfiltration is harm caused by data leaving the domain of your Google Drive. This includes data that is lost or removed by threat actors. Data exfiltration also includes the exposure of sensitive data to unauthorized sources, even if the original data still remains in your systems unchanged. Data exposure is an operational concern, as it can give away trade secrets or plans to competitors. It is also a regulatory concern that can result in severe penalties if breached.
Google takes security precautions to protect users against both data infiltration and data exfiltration. The full responsibility for making your information in Google Drive secure, however, does not rest solely with Google.
Who is responsible for Google Drive security?
Like any cloud service, Google Drive security is a matter of shared responsibility. This responsibility is divided between:
- the provider (in this case, Google)
- the customer admins (usually the organization’s IT or information security team)
- the end user
Google’s responsibilities
Primary among Google’s provider responsibilities is data protection through Google Drive encryption. Google encrypts all data uploaded or added to Google Drive files both in-transit and at-rest.
Through default configurations, Google also does its part to provide a basic foundation of protection against data exfiltration through accidental exposure. The default Google Drive configuration for access permissions is that a file is private unless shared.
For Google Drive used through Workspace, Google puts in effort to prevent data infiltration by automatically evaluating any files shared with a user from outside their organization. If Google detects malware or a phishing attempt, they will block user access to the file.
End user responsibilities
At the most basic level, the end user is responsible for keeping their Google Workspace account safe. Choosing strong passwords, not storing passwords in accessible places, and ideally using extra security measures like multi-factor authentication (MFA) are all the user’s responsibility.
When it comes to Google Drive and Google Drive folder security specifically, one of the most important responsibilities of the end user is being aware of how and with whom they are sharing their files. If a user sets a file with sensitive information to “anyone with the link can view”, that information is now publicly exposed and can bring about negative operational and regulatory consequences.
Customer admin responsibilities
Since it’s not unusual for end users to make mistakes in their sharing settings, the responsibilities of Google Workspace administrators (usually part of the organization’s IT or information security department) include information rights management: configuring global Google Drive settings limiting re-sharing, downloading, printing, copying, or changing permissions to prevent accidental or intentional data exposure.
Another domain in which customer admins can affect the level of quality in how to secure Google Drive is in endpoint management: the ability to control aspects of end user devices (whether their own or the company’s) that impact corporate data and accounts. These aspects include device encryption, screen lock, password enforcement, remote signout, and remote wiping of corporate accounts should devices be lost or stolen. In the age of remote work and Bring Your Own Device, however, it is challenging to implement effective endpoint management without hampering productivity.
Customer admins have the responsibility to keep on top of what’s happening in their Google Workspace environment by using tools like audit logs, security reports about user behavior that may indicate a security risk, a security center with information about how files have been shared, and alerts when suspicious activity occurs. Google provides these tools, but it is the customer admin’s responsibility to implement them and take advantage of them.
One advanced data security tool Google provides is Google Drive DLP (Data Loss Prevention). In Google Drive DLP, Google scans the Drive files for sensitive information, as defined by the admin. The admin can define rules for sensitive information based on Google’s predefined detectors (e.g. a detector for Social Security Numbers), regular expressions or word lists. They can then set instructions for the action Google Drive should take (e.g. blocking shares; disabling print, download or copy functions) for files that trigger that rule. The admin can also receive alerts when a rule is triggered.
On the surface, Google Drive DLP seems like a wonderful solution to sensitive data exfiltration concerns. But, while it can be helpful, it has some significant limitations that require addressing before you can consider your Drive files truly protected. Let’s examine those limitations and how they affect the level of your Google Drive security.
Limitations on Google Drive DLP and Implications for Your Data Security
The effectiveness of Google Drive DLP is limited by:
- the size and type of file content that can be scanned by Google
- how long it takes to scan
- how accurate (or not) its identifications are
- the non-human interactions and sharing happening in Google Drive
Let’s take a more detailed look at each of these issues
Limitations on size and type of file content
Google Drive DLP will scan files in Docs, Sheets and Slides. They will also scan files uploaded as Forms responses. They will not, however, check comments in the above kinds of files, so if you have sensitive information in a Google Doc comment, it will not trigger the DLP rule and no precautions will be taken for that document.
Size can also pose a problem. Only the first 1 MB of each file is scanned, and the classification made on that content. If you have a large file, like a long or heavy slide presentation, and sensitive information only appears after the first 1 MB of the content, the file will not be classified as sensitive. Files larger than 50 MB (and sometimes even larger than 10 MB) are not converted for scanning at all, so even the first 1 MB does not get checked.
In addition, Google Drive DLP does not scan audio or video files.
Time limitations and historical data
Every time you add or modify content in Google Drive, a DLP scan is triggered to check if that content matches any of the rules for sensitive content.
However, every time you add or modify a rule, Google Drive DLP will scan all the files in your Drive to see if any contain sensitive information according to the new definition. That takes significant time; it can be a few hours, a day or even longer. Should a user share a file with sensitive information as defined by that rule, its protection is dependent on the result of a race: did DLP get to that file yet?
A classic case is that of an admin who adds a rule for sensitive information, which triggers a scan that takes 18 hours. About six hours through the scan, an end user shares a file that matches the sensitive information criteria with an external party. But the DLP hadn’t scanned that file yet, and so there was no trigger for a protective action (like blocking the share, warning the end user, or disabling copy). By the time DLP reached the file and came to the conclusion that “Oops! Sensitive info there - need to disable sharing and copying!”, the file could easily have been copied and moved on.
An additional problem when it comes to the effectiveness of new or modified rules on data that already exists in Drive is that DLP will only scan the latest revision of previously uploaded files. So if you have sensitive data hidden in an earlier version, DLP will not alert you to it; if it mistakenly gets shared, DLP will not initiate any remedial action.
False positives and alert fatigue
DLP tools that use regular expressions to detect sensitive information, as Google Drive DLP does, tend to have a relatively high error rate. Regular expression matching often results in a high percentage of false positives, as well as a statistically significant percentage of false negatives.
Additionally, even if the DLP identified bonafide sensitive information, often the policy actions or alerts are still out of place. Most organizations do not have a one-size-fits-all data protection policy where “sensitive data cannot be shared.” Of course it can and should be shared at times; it just depends on the context!
Access to sensitive financial data, for example, will usually be restricted to a small subset of internal users. But what if the Board of Directors hires an external contractor to review financials? What if a due diligence process for an acquisition is beginning? Simply shutting off access to sensitive data can create problems instead of solving them.
If DLP is the only context on which an organization is relying to identify data sharing risk, then they are forever subject to false positives and false negatives. False positives create overhead and extra work for the information security team who must investigate each of these cases and possible exceptions, plus lost productivity within the organization. False negatives - or the alert fatigue that comes from too many false positives - can cause the information security team to miss an actual breach.
SaaS to SaaS issues/third-party OAuth
Humans aren’t the only ones sharing information from Google Drive. SaaS applications, integrations and add-ons do so all the time, and by doing so affect your data security posture.
Google Drive apps and add-ons create connections between Google Drive and the other SaaS applications your organization uses, such as Box, Salesforce, DocuSign, Zoho - and many, many more. That’s great for productivity, but it also gives you less control over where your information is going. And when your information leaves the confines of Google Drive, it is no longer protected by DLP.
OAuth tokens, used by external applications connecting to Google Drive (or other Google Account-based services) to verify themselves, aren’t foolproof. They can be used for phishing attempts or otherwise exploited to gain access to your organization’s Google Workspace.
Effective Google Drive security needs context for the content
Instead of black and white content-based policies, imagine if DLP was backed up by a fuller understanding of the data sharing context: who (or what) is sharing the data, and with whom? What picture do HRIS and other company systems draw regarding whether this particular share is a risk?
For example, is the user sharing the data about to leave the company? In that case, even if this user normally shares sensitive data with external parties as part of their job description, at this time you might want to automatically block the share until the information security team can investigate.
Alternatively, is the recipient of the sensitive information part of the financial department of the corporation considering a merger with you? In that case, it would be prudent not to delay their access to the information, which might cause them to doubt your openness or integrity.
This context evaluation can take into account dozens of factors, including the organization to which the new collaborator belongs, the group or organizational units to which the internal user belongs, the current risk posture of that user's endpoint and that user's status in an HRIS platform. Unlike waiting for a DLP scan to finish, evaluation of the context surrounding the file share can be done in seconds. If significant risk is indicated, the potential exposure can be immediately undone.
Google Drive and security: a work in progress
From basic actions like keeping account passwords strong and safe, to complex evaluations of sharing content and context, Google Drive security is always evolving. As Google Drive and other SaaS applications become the dominant way organizations create, manage and collaborate on information, securing that information will become more challenging - but more important - than ever.