Castellanos-Rodríguez et al., 2023 - Google Patents
SeQual-Stream: approaching stream processing to quality control of NGS datasetsCastellanos-Rodríguez et al., 2023
View HTML- Document ID
- 8863185273629983934
- Author
- Castellanos-Rodríguez
- Expósito R
- Touriño J
- Publication year
- Publication venue
- BMC bioinformatics
External Links
Snippet
Background Quality control of DNA sequences is an important data preprocessing step in many genomic analyses. However, all existing parallel tools for this purpose are based on a batch processing model, needing to have the complete genetic dataset before processing …
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30312—Storage and indexing structures; Management thereof
- G06F17/30321—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30386—Retrieval requests
- G06F17/30424—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30289—Database design, administration or maintenance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30575—Replication, distribution or synchronisation of data between databases or within a distributed database; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30067—File systems; File servers
- G06F17/30129—Details of further file system functionalities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30861—Retrieval from the Internet, e.g. browsers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F19/00—Digital computing or data processing equipment or methods, specially adapted for specific applications
- G06F19/10—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
- G06F19/22—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology for sequence comparison involving nucleotides or amino acids, e.g. homology search, motif or SNP [Single-Nucleotide Polymorphism] discovery or sequence alignment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F19/00—Digital computing or data processing equipment or methods, specially adapted for specific applications
- G06F19/10—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
- G06F19/28—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology for programming tools or database systems, e.g. ontologies, heterogeneous data integration, data warehousing or computing architectures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/60—Software deployment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Error detection; Error correction; Monitoring responding to the occurence of a fault, e.g. fault tolerance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Camacho et al. | ElasticBLAST: accelerating sequence search via cloud computing | |
| Chikhi et al. | Compacting de Bruijn graphs from sequencing data quickly and in low memory | |
| Dobin et al. | Mapping RNA‐seq reads with STAR | |
| Shen et al. | SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation | |
| Chen et al. | SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data | |
| Decap et al. | Halvade: scalable sequence analysis with MapReduce | |
| Abuín et al. | SparkBWA: speeding up the alignment of high-throughput DNA sequencing data | |
| Schatz | CloudBurst: highly sensitive read mapping with MapReduce | |
| US10366053B1 (en) | Consistent randomized record-level splitting of machine learning data | |
| US20140214334A1 (en) | Efficient genomic read alignment in an in-memory database | |
| Ferraro Petrillo et al. | Analyzing big datasets of genomic sequences: fast and scalable collection of k-mer statistics | |
| Gurtowski et al. | Genotyping in the cloud with crossbow | |
| Medina et al. | Highly sensitive and ultrafast read mapping for RNA-seq analysis | |
| Lee et al. | Scalable metagenomics alignment research tool (SMART): a scalable, rapid, and complete search heuristic for the classification of metagenomic sequences from complex sequence populations | |
| Kathiresan et al. | Accelerating next generation sequencing data analysis with system level optimizations | |
| Lawlor et al. | Field of genes: using Apache Kafka as a bioinformatic data repository | |
| Piñeiro et al. | BigSeqKit: a parallel Big Data toolkit to process FASTA and FASTQ files at scale | |
| Boger et al. | Functional protein mining with conformal guarantees | |
| Zhang et al. | Applying twister to scientific applications | |
| Expósito et al. | MarDRe: efficient MapReduce-based removal of duplicate DNA reads in the cloud | |
| Expósito et al. | HSRA: Hadoop-based spliced read aligner for RNA sequencing data | |
| Expósito et al. | SeQual: big data tool to perform quality control and data preprocessing of large NGS datasets | |
| Mushtaq et al. | SparkGA2: production-quality memory-efficient Apache Spark based genome analysis framework | |
| Castellanos-Rodríguez et al. | SeQual-Stream: approaching stream processing to quality control of NGS datasets | |
| AlJame et al. | DNA short read alignment on apache spark |