Basic↦Data Management↦Study Database↦Raw Data
What is it? Why is it important?
Raw Data (RD) is considered the original data derived from some primary source (e.g. laboratory reports, participant questionnaires, medical examinations).
RD has not undergone any processing, either manually or through automated processing.
It is critical to clearly define any RD used in a study. Aspect to include are:
- Description of the type of RD (e.g. weight = continuous data, gender = coded data, questionnaire = score data)
- The source (Source Data) from where RD was retrieved (e.g. medical records, laboratory reports, paper CRF)
Sometimes RD must be processed in order to convert RD into a format that can be analysed and visualised. This usually entails some type of cleaning (harmonisation) or transformation. Any such data processing procedures are described and documented in the metadata of the study.
More
Examples
- The calculation of creatinine clearance requires the input of specific RD into a respective formula (e.g. age, weight, serum creatinine). Consequently, the creatinine clearance number is not considered RD but processed data
- The calculation of BMI is based on participant height and weight. Consequently, height and weight is the RD needed to calcualte BMI, while BMI is the derived data
What do I need to do?
As a SP-INV, familiarise yourself with how to specify and document study data:
- List variables needed for study evaluation
- Define RD and its source (e.g. participant weight is located on paper-CRF from medical examinations at screening)
- In the event of derived or processed data, indicate formula used (e.g. computation of pain score based on participant questionnaire)
- Define data format and prioritise standardised or international formats
- Define quality checks needed to locate computation errors
- Document implemented processing procedures in the metadata of the study
Examples of data formatting
- Date formats: 31st January 1999, 31/01/1999 or 31.1.99 99.01.31, 31011999, or today
- Computation:
- metrics: 1,70 m or 170 cm
- time: 1,15 min or 75 sec
- Coding: female=F male=M, or female=1 male=2
By comparing entries in the electronic database (eCRF) with the original RD, a monitor can confirm during monitoring that the respective data entered in the database is correct.
More
Errors to consider when processing RD include
Formatting errors:
Different study sites or departments might report blood MCHC differently (e.g. 32 g/dl, 320 g/l, 4.81 mmol/l or 13%). Depending on how the MCHC is documented in the study database, some conversion might be required.
Computation errors:
Occur based on human, machine, or instrument errors. Consequently, data should be checked and potentially identified as “suspect” if entries are unreasonable. Such entries may require reconfirmation. Computation errors are identified based on data quality checks rules, through data monitoring, or during data cleaning at the end of the study.
Where can I get help?
Your local CTU↧ can support you with experienced staff regarding this topic
Basel, Departement Klinische Forschung, CTU, dkf.unibas.ch
Lugano, Clinical Trials Unit, CTU-EOC, www.ctueoc.ch
Bern, Clinical Trials Unit, CTU, www.ctu.unibe.ch
Geneva, Clinical Research Center, CRC, crc.hug.ch
Lausanne, Clinical Research Center, CRC, www.chuv.ch
St. Gallen, Clinical Trials Unit, CTU, www.kssg.ch
Zürich, Clinical Trials Center, CTC, www.usz.ch