This guide is about the storage of research data and documentation during research. Information on how to archive data is available here.

Faculty Storage Recommendations for Research Data

There are a variety of data storage options available at the VU which can get a bit confusing. FGB recommends three primary options for the storage of research data: YODA, Research Drive, and SciStor. Additionally, Teams/SharePoint has been approved for the storage of research data, but FGB recommends only using it if the three primary options are not feasible because Teams suffers from various functionality issues. You can always discuss your options with the FGB Research Data Stewards if you are unsure which option to use.


YODA Research Drive SciStor (Teams/SharePoint)
Recommended Uses/Benefits ➢ Storage of large volumes of data that don’t need to be frequently accessed for processing/analysis
➢ Creation of structured metadata to describe your research and the associated datasets (helps with making data FAIR)
➢ Offers read-only storage of (a copy of) your raw data
➢ Allows data to be shared with external (non-VU) collaborators
➢ Provides archiving for data after a research article is published
➢ Other benefits and uses of YODA are described in further detail here and in the VU Storage Finder
➢ Storage of large volumes of data that need to be regularly accessed for processing/analysis
➢ Similar uses to SURFdrive, but ensures that data storage is linked to a project rather than an individual
➢ Has a desktop sync client for easy management of locally copied data
➢ Allows for access management at the folder and subfolder level, e.g. ensures students/collaborators can only access certain folders
➢ Allows for easy collaboration and sharing with external collaborators
➢ Additional information on the uses of Research Drive are found in the VU Storage Finder
➢ Storage of very large volumes of data that need to be regularly accessed for processing/analysis
➢ Data can be accessed directly from SciStor without copying locally prior to processing/analysis
➢ Best option for high-performance computing
➢ Allows for access management at the folder and subfolder level, e.g. ensures students can only access certain folders
➢ Additional information on the uses of SciStor are found in the VU Storage Finder
➢ Only use if YODA, Research Drive or SciStor are not feasible options
➢ Replacement for the previously used G-drive
➢ Similar to OneDrive, but ensures that data storage is linked to a project rather than an individual
➢ Allows for access management at the folder and subfolder level
➢ Allows for easy collaboration and sharing with external collaborators
➢ Can be used to record and generate transcripts for interviews conducted via Teams (depending on the “sensitivity” of the data)
➢ Additional information on the uses of Teams/Sharepoint are found in the VU Storage Finder
Limitations ➢ Not efficient for the storage of large volumes of data that need to be regularly accessed for processing/analysis
➢ Difficult and slow to access data directly on the YODA disk; data will likely need to be copied locally prior to data processing/analysis
➢ Lacks a desktop sync client for easy management of local copies of data
➢ Does not allow for access management of subfolders; everyone in your YODA group folder has access to all subfolders therein
➢ The limitations of YODA are described in further detail here
➢ Requires encryption for higher risk data
➢ Not possible to interact directly with the Research Drive disk; requires syncing of data locally before processing/analysis
➢ Does not offer structured metadata documentation to help make data FAIR
➢ Does not provide locking or vault options to prevent raw data from being modified or to serve as an archive
➢ Cannot be used for collaboration with external (non-VU) users
➢ If using SciStor from home, connectivity ends up similar to YODA; advantages are only present when connecting on campus
➢ Does not offer free storage up to 500 GB like YODA and Research Drive do
➢ May not be appropriate for storage of higher risk data; must be discussed with the IT for Research Engineers
➢ Does not offer structured metadata documentation to help make data FAIR
➢ Access rights are managed entirely by the IT for Research Engineers and changes can only be made upon request
➢ Requires encryption for higher risk data
➢ Can be too easy to share data, meaning data may be leaked to the wrong parties
➢ Difficult to maintain an overview of who has access to which folders or Teams channels
➢ Not possible to manipulate non-Microsoft files directly within Teams/SharePoint environment; to edit, these need to be opened locally in an appropriate app on your device
➢ Does not offer structured metadata documentation to help make data FAIR
➢ Does not provide locking or vault options to prevent raw data from being modified or to serve as an archive
➢ Back-up notifications are not always accurate, i.e. Teams may tell you your data is fully backed-up before the process is complete; always triple check that data are fully backed up before deletion
Storage of “Sensitive” Data:
  ➢ Red Data ~ * ~ * ~ * ~ *
  ➢ Orange Data ** ~ ** **
  ➢ Yellow Data
  ➢ Green Data
  ➢ Blue Data *** *** *** ***


VU Storage Finder

You can also use the VU Storage Finder for guidance, but note that the storage finder also includes some storage options that are not recommended for the storage of research data. The reasons for this are explained below. The VU storage finder is still a good resource for information on the technical uses of the various storage options, any costs associated with each and how to request access to a storage option once you’ve made a choice. To use the VU storage guide effectively, first determine your privacy risk categorization (and, if applicable, your confidentiality risk categorization). These privacy/confidentiality risks can be mapped onto the “data classification” levels listed in the VU Storage Finder as follows:

  • Red data = Very high risk
  • Orange data = High risk
  • Yellow data & Green data* = Medium risk
  • Blue data = Low risk


Secure Use of Preferred Storage Solutions

Storage of Very High-Risk Data

If your data are “Red” (a.k.a. “very high risk” according to the privacy risks page), you may require a custom storage solution to be built by IT. However, IT may also determine that the data are not so high risk as to require a custom solution. In such a case, IT Security would consider your “Red” data to only be “high risk”.

You can have your “Red” data assessed by IT Security by contacting the RDM Support Desk.

NB: Even if IT Security deems your “Red” data “high risk” according to their classification system, you should still use additional data protection measures to protect the “Red” data. This is because “Red” data are that much more sensitive than “Orange” data (e.g. the difference between video interviews of children speaking about abuse (“red”) vs. video recordings observing children playing at a playground (“orange”)). This is discussed further below.


Security Guidance per Storage Solution

Research Drive

Research Drive is similar to SURF Drive in much of its functionality, however Research Drive is akin to the former G-drive or to Teams, while SURFdrive is akin to the former H-drive or to OneDrive. Research Drive ensures that storage is not linked to a single individual, but the entire research team. This is better for research data management. Research Drive also allows for multi-factor authentication of all users with access to your Research Drive files. The arguments against SURFdrive and where its use may be appropriate are discussed below.


Security Considerations

Research Drive does not, unfortunately, have standard approval for the storage of “Red” or “Orange” data. If you want to store “Red” data on Research Drive, contact the RDM Support Desk and they will connect you with IT Security, as discussed above. If you are storing “Orange” data on Research Drive, or “Red” data that IT Security deems to be only “high risk” according to their classifications, then you will need to apply extra protection of the data by encrypting the data at the file level. If file-level data encryption is not feasible for you, contact the RDM Support Desk. They will connect you to IT Security for further assistance.

Regardless of the risk-level of your data, you (and everyone who has access to the Research Drive workspace) are required to activate multi-factor authentication (MFA).


Secure Syncing of Data

If you use Research Drive, you will need to determine your method of accessing the data and whether or not to sync the data to your local hard drive. Most users use the desktop client to access the data. This syncs a copy of the data stored in Research Drive onto your local hard drive. There are additional measures you must apply to keep the locally synced data as secure as possible:

  • If the data are “Red” or “Orange”, you must use your work computer. Do not use your personal device. If your personal device is absolutely your only option for accessing the data, you must:
    • Ensure you are following all of the basic security measures required by the faculty
    • Obtain a hardware-encrypted external hard drive and protect it with a strong password. Any data you need to work with locally must sync to this external hard drive. Once the data on this hard drive are no longer needed, the data must be permanently removed from the external hard drive or the external hard drive must be destroyed.
      • NB: This hard drive must be dedicated to this single research project and must be kept safe and secure at all times. This external hard drive should not be used for any other purpose and it should only be physically transported from one location to another when it is absolutely necessary. The reason for using an encrypted external hard drive when using your personal computer is to partition the “Red” or “Orange” data from all the other data on your personal hard drive and to make it easier to delete later on. The hard-drive is not intended for day-to-day physical transport of the data.
      • If this option is not feasible, contact the RDM Support Desk for further guidance.
  • In all cases, always turn off automatic syncing, and instead manage which files and folders sync to your local device.
    • This is especially important if you are syncing files to a personal (private) computer. In that case, you should only sync the data files that absolutely need to be temporarily stored on your personal computer. Once again, if the data are “Red” or “Orange” and you absolutely need to use a personal device, ensure that the files always sync to the dedicated, encrypted external hard drive, as discussed in the previous point.
  • In all cases, always activate Full Disk Encryption on your computer.
  • In all cases, take extra security measures if your computer is a laptop.
  • In all cases, delete the synced files from your local hard drive when you no longer need to access them regularly.


Access Management

Information on how to provide access to data in Research Drive is found in the Digital Data Transfer Guide. It is important to remember that when someone is given access to a folder in Research Drive (or SURFdrive) they have access to all of that folder’s subfolders. Keep this in mind when structuring your folders so that you don’t accidentally give someone access to data or documentation that they should not see.

YODA

Secure use of YODA is described in detail in the FGB YODA Manual.

SciStor

Unfortunately, SciStor is not currently approved for the storage of “Red” or “Orange” data. Start by contacting the IT for Research engineers who manage SciStor via . They will assess what extra security measures can be applied and whether your data can be stored on SciStor. If you have “Red” data, they may also ask you to check with IT Security (who you can contact via the RDM Support Desk) to determine if the “Red” data are low enough in risk that they can be stored on SciStor, as discussed above. If the IT for Research Engineers cannot determine security measures to protect your data, but you need the computing and storage power that SciStor offers, then IT will need to develop a custom storage solution for you. It may be that IT for Research develops this custom solution for you. If the IT for Research Engineers are unable to assist with this, contact the RDM Support Desk. They can connect you with other IT staff who can develop a new custom storage option.

Teams/SharePoint

If you have determined (ideally in discussion with an FGB Research Data Steward) that Teams/SharePoint is your only option for research data storage, then ensure that you are using Teams and not OneDrive. They are not the same: data stored in Teams is accessible to an entire research team and will remain accessible even if one team member leaves. OneDrive is linked entirely to you as an individual and when your contract ends, all data in your OneDrive will be deleted. OneDrive and the arguments against using it for research are explained further below.

All of the security guidance described for Research Drive also applies to Teams/SharePoint. The primary difference is that Teams does not use a sync client like Research Drive. If you are working with any non-Microsoft files, start by selecting the option to open these with an applicable app (e.g. RStudio if you are opening an R script). If that is not feasible, the data may need to be downloaded and saved locally. In both cases, all of the same protections on your computer that apply when using Research Drive, also apply to Teams. In short:

  • If you get approval from IT Security to store “Red” data on Teams, it must be encrypted. If you are storing “Orange” data on Teams it must also be encrypted.
  • If you need to work with “Red” or “Orange” data locally, you must used your work computer and in all cases when copying data to your computer:
    • Always have Full Disk Encryption active on your computer.
    • Take extra security measures if your computer is a laptop.
    • Delete the locally copied files from your local hard drive when you no longer need to access them regularly.

Lastly, if someone has access to a folder, they have access to all of the subfolders in that folder. Make sure this is appropriate for anyone given access to a folder in Teams, and also ensure that they have appropriate access rights (e.g. read-only vs. full editing rights). And remember: Teams makes sharing access very easy which means, it’s very easy to accidentally share data with the wrong person. Always check that the access rights are set up correctly before sharing. You should also set an expiration date on any files or folders shared with anyone outside of your Teams “Team”. You can find more information on how to set expiration dates here.


Storage of Anonymous/Anonymized Data

If your data are “Blue” or anonymous, you can use whichever storage option you wish from the list mentioned in the VU Storage Finder, however YODA, Research Drive and SciStor remain the preferred and recommended options, particularly for research data.

Be aware: very little research data collected at FGB are anonymous and, more often than not, the data cannot be anonymized fully. This is discussed in the Privacy Risks Guide. Even if your data are anonymous there may be confidentiality concerns that require you to choose a more secure storage option. If you believe that your data are “Blue”, you should check with the FGB Privacy Champion to ensure that your assessment is correct.

In general, “Blue” data will more often consist of your research documentation, metadata, code scripts etc. The non-preferred storage options mentioned in the VU Storage Finder may be used for the storage of these materials; these storage options are only “non-preferred” with regards to research data. If your research data itself is confirmed to be “Blue”, OSF is your best option for storage if you cannot use any of the preferred FGB options.


Non-preferred Data Storage Options

There are several other options for storage listed in VU Storage Finder. FGB discourages the use of these other options for the storage of research data.

SURFdrive

SURFdrive is primarily discouraged because the data stored there is linked to you as an individual instead of to your research project like with Research Drive. If you store research data on SURFdrive and then you become unavailable, your colleagues will eventually lose access to the data. In such a case, your colleagues will also be limited in how they can manage access to the data. The use of SURFdrive for the storage of research data is therefore generally discouraged. It can be used to share “Yellow”, “Green” or “Blue” data with students, for example if they are doing an internship under your supervision, but in most cases Research Drive is still preferable to SURFdrive for this purpose. This is discussed further in the Security For Students Guide. You may also use SURFdrive to share data digitally with a research collaborator, but again, Research Drive is preferred over SURFdrive for this purpose. In all cases, SURFdrive should not be used for “Red” or “Orange” data.

Secure Syncing of Data

If you use SURFdrive, you will need to determine your method of accessing the data and whether or not to sync the data to your local device. Most users use the desktop client to access the data. This syncs a copy of the data stored in SURFdrive onto your local hard drive. To keep the locally synced data as secure as possible:

  • Turn off automatic syncing, and instead manage which files and folders sync to your local device.
    • This is especially important if you are syncing files to a personal (private) computer. In that case, you should only sync the data files that absolutely need to be temporarily stored on your personal computer.
  • Activate Full Disk Encryption on your computer.
  • Take extra security measures if your computer is a laptop.
  • Delete the synced files from your local hard drive when you no longer need to access them regularly.


Access Management

Information on how to provide access to data in SURFdrive is found in the Digital Data Transfer Guide. It is important to remember that when someone is given access to a folder in SURFdrive they have access to all of that folder’s subfolders. Keep this in mind when structuring your folders so that you don’t accidentally give someone access to data or documentation that they should not see.

OneDrive

OneDrive is generally discouraged for the storage of research data and related materials because like SURFdrive, it is tied entirely to you as an individual. If something happens to you or you stop working at the VU, all access to the materials stored on your OneDrive will be lost, even if you shared some folders with colleagues. In all cases, OneDrive should not be used for “Red” or “Orange” data.

OSF

The primary use for OSF is the storage of research documentation and the pre-registration of your research protocols. You can also connect to data stored in Research Drive via your OSF storage space. This way you maintain the organization and findability of your data through OSF without actually storing your research data on OSF. You may store “Blue” data on OSF, but it is not recommended for all other risk categories. This is primarily because for all other risk categories, you will need to set your OSF to a private project to prevent the data from being publicly disclosed. However with a private project you will only have 5 GB of storage space. If you use OSF to store “Blue” data, it is recommended to choose Frankfurt, Germany as the storage location when setting up your OSF workspace.

Google Drive VU

The VU Google Drive account is only recommended for collaboration purposes, such as shared work documents or presentations. It should not be used for the storage of research data unless you are certain that the data are “Blue”, and even then, there are better alternatives that allow you more control over your data.

Encrypted Portable Storage

Encrypted portable storage is always a temporary solution. Its primary use is for the physical transport of data from a data collection site.



Additional Tips for Data Storage

  • You will often have several data assets each with its own privacy risk categorization. If you are storing all of these files in one storage location, choose the storage option that protects the data with the highest privacy risk. If a higher-risk data asset can be used to re-identify the research subjects in a lower-risk de-identified dataset (e.g. a key file (higher-risk) that can identify subjects in your pseudonymized questionnaires(lower-risk)) you should:
    • Store the higher risk data in a separate storage location from the lower risk data, or
    • If the higher risk data must be stored in the same location as the lower risk data, encrypt the higher risk dataset. Also store the higher risk data in a folder that is separate from the other lower-risk data and ensure that this folder can only be accessed by those who absolutely need to view the higher-risk data.
  • Make use of data de-identification methods to lower the privacy risks of your processed data. Your raw data will probably still need to be stored at a higher level of security, but if you lower the privacy risks of your processed data by de-identifying it, you will have more storage options for the processed data.