Dcf-Pwg repository on GitHub is a solution for preserving and sharing Dcf-Pwg research data outputs
Almost any piece of information can be defined as “data”. However, to be relevant for Dcf-Pwg framework creation, data should be a set of information that is acquired/collected with a minimum scientific method and have value to the XR community.
- Datasets presented for inclusion in the Dcf-Pwg repository must be the output of research by a Dcf-Pwg contributor
- The name of the primary contributor(s), and technical collaborator(s) who created the dataset – if preferred – must be provided
Dataset as original work and source(s) of data
- The contributor(s) submitting the dataset for reposit must be familiar with the dataset
- The dataset must be the output of original data generation; or must be the output of significant, value-added, elaboration of pre-existing sources, meaning that the elaboration has been done using variables that make the study unique; or the contributor(s) provide the scientific protocol for collecting/creating the data; or the collection of secondary data has value to the XR community
- The source(s) of the data must be indicated. If the dataset is the output of original data collection and elaboration, details must be provided. If the dataset is derived from pre-existing sources, those sources must be clearly indicated (data creator, source organization, publisher).
Permission to share dataset as open data
- Dcf-Pwg members submitting a dataset for inclusion in the Dcf-Pwg repository must state whether the data can be shared as open data or not - taking into consideration data protection and data copyright. Contributors should indicate if datasets presented for reposit will be restricted for release until a specific date or time due to technical or logistical reasons.
- Data Protection: persons, families and households cannot be identifiable in any dataset. Data contributors are responsible for obtaining and documenting the informed consent of subjects. Dataset creators are responsible for the anonymisation of data observations.
- Database copyright: data outputs which are elaborated from pre-existing copyrighted sources may require permission to reposit and share. It is not possible to publish a dataset containing substantial portions of data sourced from pre-existing databases governed by contractual license.
- When preparing submissions, it is advisable to create two files: a Data file for the dataset; and a Docs file for documentation and (where applicable) codebooks. Data should be submitted in original file format version. Subsets should be accommodated in the Data file – not submitted as separate data entries. (New iterations of the dataset can subsequently be submitted as new data entries). Documentation should include a concise overview of the project and methodology. Contributors presenting datasets for inclusion in the Dcf-Pwg repository should pay particular attention to data quality control, dataset structure and data protection. Clear and consistent metadata for folders, files, variables and versioning helps facilitate future data retrieval, reuse and replicability.
- Names: Submitted data must adhere to name standards in order to clearly connect data and docs. Please add the following information to the files you submit separating them with _ (underscore)
- Date prefix: yyyy-mm-dd
- Name: a meaningful name for the set. Please, replace spaces with - or _ for compatibility reasons
- Subset: if there’s a need to submit subsets, name the files accordingly
- Status: DRAFT, VER1, VER2, etc.
- Formats: Due to the wide definition of “data” and “dataset”, folders and format could change during the collection. Please, adhere to these specifications:
- code: source code. The set of instructions forming an algorithm, a web application, a computer application, a mobile application or a native-vr application;
- dataset: collection of data. A data set corresponds to the contents of a database table, or a statistical data matrix, where every column of the table represents a particular variable, and each row corresponds to a given member of the data set in question;
- docs: every document describing the submitted data;
- examples: stories, cases, bad and good practices;
- images: every vector or bitmap file that can be useful for data mining, or that’s data itself;
- xtra: every filetype different from the ones listed here
- Executable file formats (see full list here: https://fileinfo.com/filetypes/executable) are usually banned for security reasons. In order to reposit an executable file formats, please propose it with proper documentation before uploading it.
- For assistance, please write to Marco Magnano: email@example.com