Kely instances of transcriptional readthrough, even if this may miss gene
Kely instances of transcriptional readthrough, even if this may miss gene fusions occurring between adjacent genes – for example, as a result of tandem duplications or inversions [14]. Candidate fusion events between paralogous genes were excluded as likely mapping errors. Selecting gene-gene pairs supported by two or more short read pairs (Figure 1a) provided an initial list of 303 to 349 fusion candidates per cell line and 152 in normal breast. Of the initial 83 candidates tested, only seven (8.5 ) were PubMed ID:https://www.ncbi.nlm.nih.gov/pubmed/26509685 validated by RT-PCR, indicating that most of them represented false positives. We reasoned that if the process that gave rise to false positives involved PCR amplification or misalignment of short reads, we would ZM241385MedChemExpress ZM241385 expect that the artifactual reads spanning an exon-exon junction all align to the same position, whereas for a genuine fusion gene, we would expect a tiling pattern of short read alignment start positions across the fusion junction (Figure 1b). Examining the pattern among the initial list of fusion candidates indicated that all seven validated fusion genes displayed a tiling pattern. In contrast, the fusions we had been unable to validate had a frequently high number of identically mapping short reads (plus or minus aSUMF1-LRRFIPIGFBP5-INPPLFigure 1 Fusion gene identification by paired-end RNAsequencing. (a) Identification of fusion gene candidates through selection of paired-end reads, the ends of which align to two different and non-adjacent genes. (b) Identification of the exact fusion junction by aligning non-mapped short reads against a computer generated database of all possible exon-exon junctions between the two partner genes. Separation of true fusions (left) from false positives (right) by examining the pattern of short read alignments across exon-exon junctions. Genuine fusion junctions are characterized by a stacked/ladder-like pattern of short reads across the fusion point. False positives lack this pattern; instead, all junction matching short reads align to the exact same position or are shifted by one to two base pairs. Furthermore, this alignment is mostly to one of the exons.single base pair) aligning to the junction. These short reads also almost exclusively aligned to one of the exons. The paired-ends of identical short reads did not map within one to two bases of each other, suggesting misalignment, not PCR artifacts, is the likely reason for this phenomenon (data not shown). Utilizing the abovedescribed criteria, we identified a total of 28 fusion gene candidates in the four breast cancer cell lines, whereas none were predicted in the normal breast sample.Fusion gene validationUsing the improved bioinformatic pipeline described above, we were able to significantly reduce the number of false positive observations. We validated 27 ofEdgren et al. Genome Biology 2011, 12:R6 http://genomebiology.com/2011/12/1/RPage 3 ofTable 1 Identified and validated fusion gene candidatesSample 5′ gene 5′ 3′ gene chromosome 17 17 20 20 20 17 20 17 13 3 20 8 20 17 5 14 3 8 17 20 20 19 12 9 20 20 17 STAC2 SNF8 IKZF3 CEP250 MYO9B MYO19 KIAA0406 DOK5 MCF2L CMTM7 PI3 GSDMB ENSG00000236127 PKIA PCDH1 SETD3 LRRFIP2 ZNF704 EIF3H ITCH PREX1 NFIX SEPT10 NUP214 BCAS3 SULF2 TMEM49 3′ chromosome 17 17 17 20 19 17 20 20 13 3 20 17 20 8 5 14 3 8 8 20 20 19 2 9 17 20 17 Number of paired-end reads 57 43 41 35 9 8 8 4 5 6 4 28 10 13 12 6 14 3 38 3 5 22 2 4 133 17 2 Number of In Amplified junction reads frame 72 68 26 14 12 7 1 6 3 2 2 447 20.