B
BioPortal Ontology Repository (BioPortal)
A web portal providing easy discovery and access to a large collection of standardized biomedical vocabularies and ontologies through searching, browsing, and visualization of the content.
C
Canopy Data Dictionary
A document (typically a csv file) that defines characteristics (such as the name, description, coded values,and actual values) of the core data elements used in the study. There must be one data dictionary submitted per each high level data file (i.e. transformed or original data file).
Canopy Global Codebook
The Canopy Global Codebook is a data dictionary for all required Common Data Elements (CDEs). It contains precise mappings that organize (C)DCC-specific Data Elements into 12 unique, required CDEs categories.
Canopy Variable Category
In the Canopy, data are harmonized, or re-labeled and re-coded, so data can be uniformly interpreted across multiple studies. There are three types of variables, categorized by their level of harmonization across the various programs: Core variables (non-harmonized) and Non-harmonized variables. variables.
Core Variables: Variables intended to be shared across all studies involving human subjects, regardless of the specific program. They are defined in the Global Codebook and have identifiers beginning with “nih_”. Core variables are also referred to as Common Data Elements (CDEs).
Non-harmonized Variables: Variables that are not a Core Variable or Program Common Variable. These are typically unique to an individual study.
Center for Expanded Data Annotation and Retrieval (CEDAR)
A web-based computational metadata management platform focused on helping users create high-quality biomedical metadata through templates that facilitate development, evaluation, use, and refinement of experiment-based biomedical metadata.
Common Data Elements
A type of health data specification commonly used in clinical and research settings to capture and bind together complex phenomena, like depression, through standardized, consistent, well-defined questions (variables), paired with a description of allowable responses (values or value type) that are used in a standardized, machine readable manner across studies or trials to prevent avoidable variability.
Curator
Individual/s from the Support Team who assists data submissions to the Canopy, including answering user questions, providing guidance on data de-identification, reviewing and approving study submissions, etc. Please contact the Canopy Administrator at example@example.com to contact curators.
D
Data Repository
A system that serves as a centralized place for storing and managing data.
Datasets
Individual-level data files generated from research studies. studies typically have multiple datasets, which typically consist of phenotypic data collected on study participants, such as demographics, survey, laboratory results, etc.
Data Use Agreement
Formal agreement executed between and the user, defining the terms and conditions under which study data obtained from the Canopy can be used for secondary research.
Documents
Documentation (e.g. READMe files, study protocols) associated with research studies stored in the Canopy and made publicly available for secondary use.
Digital Object Identifier (DOI)
A DOI is a unique and permanent reference tied to the metadata about the digital object and is assigned to studies, to increase findability, accessibility, and reusability. When citing use of resources in publications and other research outputs, users should include the resources' unique identifier, DOI,
F
FAIR
A best practice in data management referring to the findability, accessibility, interoperability, and reusability of data assets. These principles emphasize discovery and reuse of data objects with minimal or no human intervention (i.e. automated and machine-actionable), but are targeted at human entities as well.
H
Harmonization
Harmonization is the process of integrating and standardizing data from various sources or formats to make them compatible and consistent. This involves resolving differences in data structures, terminology, and units of measurement, ensuring that the data can be effectively analyzed, compared, and combined for meaningful insights and decision-making.
Health Insurance Portability and Accountability Act (HIPAA)
Health Insurance Portability and Accountability Act of 1996, Public Law 104-191, 45 CFR Parts 160, 162, and 164. Part 160.103 of the HIPAA law regulates individually identifiable health information that is related to a person’s past, present, or future health or treatment and is transmitted or maintained in any form or medium by a covered entity.
I
Investigator
Individual conducting a study or project; clinicians, researchers, and partners/agencies participating in the Initiative funded through a (C)DCC.
M
Metadata File
A file describing the metadata contained in the study.
N
Common Data Elements (CDEs)
A collection resulting from the ongoing effort, in response to COVID-19 data, developed by the NIH’s Scientific Data Council, CDE Task Force, and CDE Governance Committee, for indicating endorsement of CDEs that meet meaningful criteria, are available through a common discovery platform (such as the CDE Repository), and avoid duplicating functions of resources that already exist.
Required Common Data Elements (CDEs)
Collections of elements or “classifications,” (as opposed to single field definitions) required by the in the Initiative, including Race, Ethnicity, Sex, Age, Education, Domicile, Employment, Insurance Status, Disability Status, Medical History, Symptoms, and Health Status, presented to study participants as survey questions with a discreet set of allowable responses. While typically a CDE is defined as a single field definition, -Required CDEs are collections of field definitions addressing a single topic.
O
Original Data File
A data file based on an organization’s individual codebook that has not been harmonized. Original data files are also referred to as ‘raw’ data files.
P
Protected Health Information (PHI)
Protected Health Information, also known as "Individually identifiable health information," is information, including demographic data, that relates to:
- the individual’s past, present or future physical or mental health or condition,
- the provision of health care to the individual,
- the past, present, or future payment for the provision of health care to the individual
Published Manuscript
Report of findings of a Study, Protocol, or Research Project that is made public through inclusion in a recognized print or online outlet.
S
Secondary Research
A new study using existing data for exploring new hypotheses, analyses, or investigation of a research topic.
Site Map
A graphical representation of the Canopy organization and workflow. The Site Map is accessible from the ‘Home’ page and is also available on the ‘Resource Center’ page.
Study
A research protocol (including interventional and observational research) or set of experiments designed to investigate a research question and/or evaluate biomedical or health-related outcomes. Studies available through the Canopy span a wide range of biomedical and clinical research areas, contributed by participating research centers and institutions.
Study Registration
Process by which investigators can submit their studies to the Canopy to be made available to users for secondary research.
Submitter
Individual(s) responsible for submitting studies, data files, and other documentation to the Canopy.
T
Transformed Data File
A data file that has been harmonized according to the 12 unique, required core CDEs.