4 Sequencing Data Best Practices
Where to put data, how does it link to metadata? How to manage old data?
SampleDB and its Data Manager Utility is currently still under construction π§. For now, please store raw sequencing data in the Mines following the following guidelines:
4.1 SeekDeep filename guidelines
experimentIdentifier-micronics-plate-well-rep-commentInfo
- experimentID (of some kind)
- micronics barcode (use
NC#
for negative controls) - plate name
- well position on plate (use
A1
notA01
) - replicate identifier (use
rep1
,rep2
, etc.) - (optional) comments, e.g.Β experimental conditions or other info
Rules:
- Filename must consist of 5 βpartsβ separated by dashes β
-
β - An optional 6th field is reserved for comments and other information
- Do not introduce new dashes β
-
β
From micronics, we can recover date, study_code, identifier, etc. from SampleDB
Example sample filenames:
PBC003-8041439280-Plate2-B7-rep1
S3A003-8041439281-SMT4-A5-rep2
JDH002-8041439282-plate6-C12-rep1
Example sample filenames with optional comment field:
PBC003-8041439280-Plate2-B7-rep1-commentInfo
S3A003-8041439281-SMT4-A5-rep2-commentInfo
JDH002-8041439282-plate6-C12-rep1-commentInfo
Example control filenames:
PBC003-NC1-Plate7-E9-rep1
S3A003-NC3-plate9-F11-rep2
JDH002-NC10-PBCSMT4-G9-rep2
4.2 Raw sequencing data π§
Please store raw sequencing data in the mines in the following format:
sequencing_data/YYYYMMDD/some_filename.gz
4.2.1 Folder names π§
Date: The YYMMDD
date should be the sequencing run start date
Biohub
YYMMDD_Sequencer_RunNumber_FlowCellSerialNumber
Example folder name:
221006_VH00444_149_AAATC37HV
In-house: Make the folder name manually and be diligent about removing or renaming erroneously demuxed data. Here we emulate the folder naming structure used by Biohub.
YYMMDD_Sequencer_RunNumber_FlowCellSerialNumber
Example folder name:
221006_MiniSeqRR_1