filmpolt.blogg.se

Sng tool to sort active tables
Sng tool to sort active tables








In Table 1, we list properties of existing tools and of xengsort, our implementation of the method we describe in this article. Alignment-free methods use a large lookup table to associate species information with each k-mer. Differences result from different parameter settings used for the alignment tool (often bwa or bowtie2) and from the way “better alignment” is defined by each of these tools. Nevertheless, our evaluation shows that we obtain good results on all of exomes, genomes and transcriptomes with the same parameter set.Ĭoncerning related work, we distinguish alignment-based methods that work on already aligned reads (BAM files), versus alignment-free methods that directly work on short subsequences ( k-mers) of the raw reads (FASTQ files).Īlignment-based methods scan existing alignments in BAM files and test whether each read maps better to the graft genome or to the host genome. Of course, different sources may exhibit different error distributions and require distinct optimized parameter sets for classification.

sng tool to sort active tables

Since we use a comprehensive reference of the genome and transcriptome, we are in principle able to process genome, exome, and transcriptome samples of xenografts. By designing a new decision function, we also obtain fewer unclassified reads and in some cases even higher classification accuracy. Here we improve upon the existing approaches in several ways: by using carefully engineered k-mer hash tables, our approach is both faster and needs less memory than existing tools. Several tools have been developed for xenograft sorting, motivated by different goals and using different approaches a summary appears below. A recent study showed that if such a step is omitted, several mouse reads would be aligned to certain regions of the human genome (HAMA: human-aligned mouse allele) and induce false positive variant calls for the tumor this especially concerns certain oncogenes. A key step in such analyses is xenograft sorting, i.e., separating the human tumor reads from the mouse reads. This information can be used to predict the response to different chemotherapy alternatives and to monitor treatment success or failure. Over time, several samples of the (graft/human) tumor and surrounding (host/mouse) tissue are taken and subjected to exome or whole genome sequencing in order to monitor the changing genomic features of the tumor. This is called a (patient-derived) xenograft (PDX). To learn about tumor heterogeneity and tumor progression under realistic in vivo conditions, but without putting human life at risk, one can implant human tumor tissue into a mouse and study its evolution. It is written in numba-compiled Python and comes with sample Snakemake workflows for hash table construction and dataset processing.

Sng tool to sort active tables software#

Our software xengsort is available under the MIT license at.

sng tool to sort active tables

Several engineering steps (e.g., shortcuts for unsuccessful lookups, software prefetching) improve the performance even further. It allows extremely fast lookups and uses less CPU time than other alignment-free methods and alignment-based methods at similar accuracy. Our hash table requires memory comparable to an FM index typically used for read alignment and less than other alignment-free approaches. We improve upon the state of the art sorting by presenting a fast lightweight approach based on three-way bucketed quotiented Cuckoo hashing. We show that alignment-free methods for xenograft sorting are superior concerning CPU time usage and equivalent in accuracy. Recent studies compare different approaches and tools, with varying results. On the other hand, alignment-free tools work directly on the raw read data (typically FASTQ files).

sng tool to sort active tables

Two kinds of methods are in use: On the one hand, alignment-based tools require that reads are mapped and aligned (by an external mapper/aligner) to the host and graft genomes separately first the tool itself then processes the resulting alignments and quality metrics (typically BAM files) to assign each read or read pair. With an increasing number of patient-derived xenograft (PDX) models being created and subsequently sequenced to study tumor heterogeneity and to guide therapy decisions, there is a similarly increasing need for methods to separate reads originating from the graft (human) tumor and reads originating from the host species’ (mouse) surrounding tissue.








Sng tool to sort active tables