Published Workflows | r-mauleon | SNP lift-over

Galaxy Workflow ' SNP lift-over'

Annotation: Liftover SNP coordinate from 1 genome to another, via flanking sequences

StepAnnotation
Step 1: Input dataset
select at runtime
The coordinate file (3 column format : contig[tab]start[tab]end of the genome location in source genome. To lift SNPs over, specify a 60 to 100 bp length flanking the left and right side of the SNP position
Step 2: Compute
c2-59
Output dataset 'output' from step 1
YES
Step 3: Compute
c3+59
Output dataset 'output' from step 1
YES
Step 4: Cut
c1,c4,c3
Tab
Output dataset 'out_file1' from step 2
Step 5: Cut
c1,c2,c4
Tab
Output dataset 'out_file1' from step 3
Step 6: Concatenate datasets
Output dataset 'out_file1' from step 4
Datasets
Dataset 1
Output dataset 'out_file1' from step 5
Step 7: Batch-get-subseq
Output dataset 'out_file1' from step 6
3 column tab delimited: chrName start stop
This gets the (multi)FASTA file sequence(s) from the source genome. For SNP lift-over, it gets the left and right flanking sequences.
Step 8: Find-seq
Output dataset 'output' from step 7
11
5000
0
Default blat tabular format,no sequence
The (multi)FASTA file from source genome is aligned to the target genome. For SNP lift-over, these are the left and right flanking sequences.
Step 9: blat-alignment-filter
Output dataset 'output' from step 8
0.8
1
0
removed alignment with gaps and mismatch >1
Step 10: Cut
c14,c16,c17
Tab
Output dataset 'output' from step 9
PSL postprocessing step to parse out cols 14,16,17 , the alignment coordinates.
Step 11: Remove beginning
5
Output dataset 'out_file1' from step 10
removes the 1st 5 lines of the header of the alignment output. The final file gives the coordinates of the new position of the SNP flanking sequences for the target genome