This guide is about the storage of research data and documentation during research. Information on how to archive data is available here.
There are a variety of data storage options available at the VU which can get a bit confusing. FGB recommends three primary options for the storage of research data: YODA, Research Drive, and SciStor. Additionally, Teams/SharePoint has been approved for the storage of research data, but FGB recommends only using it if the three primary options are not feasible because Teams suffers from various functionality issues. You can always discuss your options with the FGB Research Data Stewards if you are unsure which option to use.
YODA | Research Drive | SciStor | (Teams/SharePoint) | |
---|---|---|---|---|
Recommended Uses/Benefits | ➢ Storage of large volumes of data that don’t need
to be frequently accessed for processing/analysis ➢ Creation of structured metadata to describe your research and the associated datasets (helps with making data FAIR) ➢ Offers read-only storage of (a copy of) your raw data ➢ Allows data to be shared with external (non-VU) collaborators ➢ Provides archiving for data after a research article is published ➢ Other benefits and uses of YODA are described in further detail here and in the VU Storage Finder |
➢ Storage of large volumes of data that need to be regularly
accessed for processing/analysis ➢ Similar uses to SURFdrive, but ensures that data storage is linked to a project rather than an individual ➢ Has a desktop sync client for easy management of locally copied data ➢ Allows for access management at the folder and subfolder level, e.g. ensures students/collaborators can only access certain folders ➢ Allows for easy collaboration and sharing with external collaborators ➢ Additional information on the uses of Research Drive are found in the VU Storage Finder |
➢ Storage of very large volumes of data that need
to be regularly accessed for processing/analysis ➢ Data can be accessed directly from SciStor without copying locally prior to processing/analysis ➢ Best option for high-performance computing ➢ Allows for access management at the folder and subfolder level, e.g. ensures students can only access certain folders ➢ Additional information on the uses of SciStor are found in the VU Storage Finder |
➢ Only use if YODA, Research Drive or SciStor are not feasible
options ➢ Replacement for the previously used G-drive ➢ Similar to OneDrive, but ensures that data storage is linked to a project rather than an individual ➢ Allows for access management at the folder and subfolder level ➢ Allows for easy collaboration and sharing with external collaborators ➢ Can be used to record and generate transcripts for interviews conducted via Teams (depending on the “sensitivity” of the data) ➢ Additional information on the uses of Teams/Sharepoint are found in the VU Storage Finder |
Limitations | ➢ Not efficient for the storage of large volumes of data that need
to be regularly accessed for processing/analysis ➢ Difficult and slow to access data directly on the YODA disk; data will likely need to be copied locally prior to data processing/analysis ➢ Lacks a desktop sync client for easy management of local copies of data ➢ Does not allow for access management of subfolders; everyone in your YODA group folder has access to all subfolders therein ➢ The limitations of YODA are described in further detail here |
➢ Requires encryption for higher risk data
➢ Not possible to interact directly with the Research Drive disk; requires syncing of data locally before processing/analysis ➢ Does not offer structured metadata documentation to help make data FAIR ➢ Does not provide locking or vault options to prevent raw data from being modified or to serve as an archive |
➢ Cannot be used for collaboration with external (non-VU) users ➢ If using SciStor from home, connectivity ends up similar to YODA; advantages are only present when connecting on campus ➢ Does not offer free storage up to 500 GB like YODA and Research Drive do ➢ May not be appropriate for storage of higher risk data; must be discussed with the IT for Research Engineers ➢ Does not offer structured metadata documentation to help make data FAIR ➢ Access rights are managed entirely by the IT for Research Engineers and changes can only be made upon request |
➢ Requires encryption for higher risk data
➢ Can be too easy to share data, meaning data may be leaked to the wrong parties ➢ Difficult to maintain an overview of who has access to which folders or Teams channels ➢ Not possible to manipulate non-Microsoft files directly within Teams/SharePoint environment; to edit, these need to be opened locally in an appropriate app on your device ➢ Does not offer structured metadata documentation to help make data FAIR ➢ Does not provide locking or vault options to prevent raw data from being modified or to serve as an archive ➢ Back-up notifications are not always accurate, i.e. Teams may tell you your data is fully backed-up before the process is complete; always triple check that data are fully backed up before deletion |
Storage of “Sensitive” Data: | ||||
➢ Red Data | ~ * | ~ * | ~ * | ~ * |
➢ Orange Data | ✓ | ✓ ** | ~ ** | ✓ ** |
➢ Yellow Data | ✓ | ✓ | ✓ | ✓ |
➢ Green Data | ✓ | ✓ | ✓ | ✓ |
➢ Blue Data | ✓ *** | ✓ *** | ✓ *** | ✓ *** |
You can also use the VU Storage Finder for guidance, but note that the storage finder also includes some storage options that are not recommended for the storage of research data. The reasons for this are explained below. The VU storage finder is still a good resource for information on the technical uses of the various storage options, any costs associated with each and how to request access to a storage option once you’ve made a choice. To use the VU storage guide effectively, first determine your privacy risk categorization (and, if applicable, your confidentiality risk categorization). These privacy/confidentiality risks can be mapped onto the “data classification” levels listed in the VU Storage Finder as follows:
If your data are “Red” (a.k.a. “very high risk” according to the privacy risks page), you may require a custom storage solution to be built by IT. However, IT may also determine that the data are not so high risk as to require a custom solution. In such a case, IT Security would consider your “Red” data to only be “high risk”.
You can have your “Red” data assessed by IT Security by contacting the RDM Support Desk.
NB: Even if IT Security deems your “Red” data “high risk” according to their classification system, you should still use additional data protection measures to protect the “Red” data. This is because “Red” data are that much more sensitive than “Orange” data (e.g. the difference between video interviews of children speaking about abuse (“red”) vs. video recordings observing children playing at a playground (“orange”)). This is discussed further below.
Research Drive is similar to SURF Drive in much of its functionality, however Research Drive is akin to the former G-drive or to Teams, while SURFdrive is akin to the former H-drive or to OneDrive. Research Drive ensures that storage is not linked to a single individual, but the entire research team. This is better for research data management. Research Drive also allows for multi-factor authentication of all users with access to your Research Drive files. The arguments against SURFdrive and where its use may be appropriate are discussed below.
Research Drive does not, unfortunately, have standard approval for the storage of “Red” or “Orange” data. If you want to store “Red” data on Research Drive, contact the RDM Support Desk and they will connect you with IT Security, as discussed above. If you are storing “Orange” data on Research Drive, or “Red” data that IT Security deems to be only “high risk” according to their classifications, then you will need to apply extra protection of the data by encrypting the data at the file level. If file-level data encryption is not feasible for you, contact the RDM Support Desk. They will connect you to IT Security for further assistance.
Regardless of the risk-level of your data, you (and everyone who has access to the Research Drive workspace) are required to activate multi-factor authentication (MFA).
If you use Research Drive, you will need to determine your method of accessing the data and whether or not to sync the data to your local hard drive. Most users use the desktop client to access the data. This syncs a copy of the data stored in Research Drive onto your local hard drive. There are additional measures you must apply to keep the locally synced data as secure as possible:
Information on how to provide access to data in Research Drive is found in the Digital Data Transfer Guide. It is important to remember that when someone is given access to a folder in Research Drive (or SURFdrive) they have access to all of that folder’s subfolders. Keep this in mind when structuring your folders so that you don’t accidentally give someone access to data or documentation that they should not see.
Secure use of YODA is described in detail in the FGB YODA Manual.
Unfortunately, SciStor is not currently approved for the storage of “Red” or “Orange” data. Start by contacting the IT for Research engineers who manage SciStor via itvo.ucit@vu.nl. They will assess what extra security measures can be applied and whether your data can be stored on SciStor. If you have “Red” data, they may also ask you to check with IT Security (who you can contact via the RDM Support Desk) to determine if the “Red” data are low enough in risk that they can be stored on SciStor, as discussed above. If the IT for Research Engineers cannot determine security measures to protect your data, but you need the computing and storage power that SciStor offers, then IT will need to develop a custom storage solution for you. It may be that IT for Research develops this custom solution for you. If the IT for Research Engineers are unable to assist with this, contact the RDM Support Desk. They can connect you with other IT staff who can develop a new custom storage option.
If your data are “Blue” or anonymous, you can use whichever storage option you wish from the list mentioned in the VU Storage Finder, however YODA, Research Drive and SciStor remain the preferred and recommended options, particularly for research data.
Be aware: very little research data collected at FGB are anonymous and, more often than not, the data cannot be anonymized fully. This is discussed in the Privacy Risks Guide. Even if your data are anonymous there may be confidentiality concerns that require you to choose a more secure storage option. If you believe that your data are “Blue”, you should check with the FGB Privacy Champion to ensure that your assessment is correct.
In general, “Blue” data will more often consist of your research documentation, metadata, code scripts etc. The non-preferred storage options mentioned in the VU Storage Finder may be used for the storage of these materials; these storage options are only “non-preferred” with regards to research data. If your research data itself is confirmed to be “Blue”, OSF is your best option for storage if you cannot use any of the preferred FGB options.
There are several other options for storage listed in VU Storage Finder. FGB discourages the use of these other options for the storage of research data.
SURFdrive is primarily discouraged because the data stored there is linked to you as an individual instead of to your research project like with Research Drive. If you store research data on SURFdrive and then you become unavailable, your colleagues will eventually lose access to the data. In such a case, your colleagues will also be limited in how they can manage access to the data. The use of SURFdrive for the storage of research data is therefore generally discouraged. It can be used to share “Yellow”, “Green” or “Blue” data with students, for example if they are doing an internship under your supervision, but in most cases Research Drive is still preferable to SURFdrive for this purpose. This is discussed further in the Security For Students Guide. You may also use SURFdrive to share data digitally with a research collaborator, but again, Research Drive is preferred over SURFdrive for this purpose. In all cases, SURFdrive should not be used for “Red” or “Orange” data.
If you use SURFdrive, you will need to determine your method of accessing the data and whether or not to sync the data to your local device. Most users use the desktop client to access the data. This syncs a copy of the data stored in SURFdrive onto your local hard drive. To keep the locally synced data as secure as possible:
Information on how to provide access to data in SURFdrive is found in the Digital Data Transfer Guide. It is important to remember that when someone is given access to a folder in SURFdrive they have access to all of that folder’s subfolders. Keep this in mind when structuring your folders so that you don’t accidentally give someone access to data or documentation that they should not see.
OneDrive is generally discouraged for the storage of research data and related materials because like SURFdrive, it is tied entirely to you as an individual. If something happens to you or you stop working at the VU, all access to the materials stored on your OneDrive will be lost, even if you shared some folders with colleagues. In all cases, OneDrive should not be used for “Red” or “Orange” data.
The primary use for OSF is the storage of research documentation and the pre-registration of your research protocols. You can also connect to data stored in Research Drive via your OSF storage space. This way you maintain the organization and findability of your data through OSF without actually storing your research data on OSF. You may store “Blue” data on OSF, but it is not recommended for all other risk categories. This is primarily because for all other risk categories, you will need to set your OSF to a private project to prevent the data from being publicly disclosed. However with a private project you will only have 5 GB of storage space. If you use OSF to store “Blue” data, it is recommended to choose Frankfurt, Germany as the storage location when setting up your OSF workspace.
The VU Google Drive account is only recommended for collaboration purposes, such as shared work documents or presentations. It should not be used for the storage of research data unless you are certain that the data are “Blue”, and even then, there are better alternatives that allow you more control over your data.
Encrypted portable storage is always a temporary solution. Its primary use is for the physical transport of data from a data collection site.