GDSArray - Representing GDS files as array-like objects
GDS files are widely used to represent genotyping or sequence data. The GDSArray package implements the `GDSArray` class to represent nodes in GDS files in a matrix-like representation that allows easy manipulation (e.g., subsetting, mathematical transformation) in _R_. The data remains on disk until needed, so that very large files can be processed.
Last updated 24 days ago
infrastructuredatarepresentationsequencinggenotypingarray
6.65 score 5 stars 2 packages 8 scripts 498 downloadsReUseData - Reusable and reproducible Data Management
ReUseData is an _R/Bioconductor_ software tool to provide a systematic and versatile approach for standardized and reproducible data management. ReUseData facilitates transformation of shell or other ad hoc scripts for data preprocessing into workflow-based data recipes. Evaluation of data recipes generate curated data files in their generic formats (e.g., VCF, bed). Both recipes and data are cached using database infrastructure for easy data management and reuse. Prebuilt data recipes are available through ReUseData portal ("https://rcwl.org/dataRecipes/") with full annotation and user instructions. Pregenerated data are available through ReUseData cloud bucket that is directly downloadable through "getCloudData()".
Last updated 24 days ago
softwareinfrastructuredataimportpreprocessingimmunooncology
5.68 score 4 stars 7 scripts 169 downloadsSeqSQC - A bioconductor package for sample quality check with next generation sequencing data
The SeqSQC is designed to identify problematic samples in NGS data, including samples with gender mismatch, contamination, cryptic relatedness, and population outlier.
Last updated 24 days ago
experiment datahomo_sapiens_datasequencing dataproject1000genomesgenome
5.38 score 2 scripts 398 downloadsVariantExperiment - A RangedSummarizedExperiment Container for VCF/GDS Data with GDS Backend
VariantExperiment is a Bioconductor package for saving data in VCF/GDS format into RangedSummarizedExperiment object. The high-throughput genetic/genomic data are saved in GDSArray objects. The annotation data for features/samples are saved in DelayedDataFrame format with mono-dimensional GDSArray in each column. The on-disk representation of both assay data and annotation data achieves on-disk reading and processing and saves memory space significantly. The interface of RangedSummarizedExperiment data format enables easy and common manipulations for high-throughput genetic/genomic data with common SummarizedExperiment metaphor in R and Bioconductor.
Last updated 24 days ago
infrastructuredatarepresentationsequencingannotationgenomeannotationgenotypingarray
5.00 score 1 stars 2 scripts 149 downloadsDelayedDataFrame - Delayed operation on DataFrame using standard DataFrame metaphor
Based on the standard DataFrame metaphor, we are trying to implement the feature of delayed operation on the DelayedDataFrame, with a slot of lazyIndex, which saves the mapping indexes for each column of DelayedDataFrame. Methods like show, validity check, [/[[ subsetting, rbind/cbind are implemented for DelayedDataFrame to be operated around lazyIndex. The listData slot stays untouched until a realization call e.g., DataFrame constructor OR as.list() is invoked.
Last updated 24 days ago
infrastructuredatarepresentation
4.95 score 1 stars 1 packages 3 scripts 174 downloadsSQLDataFrame - Representation of SQL tables in DataFrame metaphor
Implements bindings for SQL tables that are compatible with Bioconductor S4 data structures, namely the DataFrame and DelayedArray. This allows SQL-derived data to be easily used inside other Bioconductor objects (e.g., SummarizedExperiments) while keeping everything on disk.
Last updated 24 days ago
datarepresentationinfrastructuresoftware
4.51 score 2 stars 5 scripts 130 downloadsVCFArray - Representing on-disk / remote VCF files as array-like objects
VCFArray extends the DelayedArray to represent VCF data entries as array-like objects with on-disk / remote VCF file as backend. Data entries from VCF files, including info fields, FORMAT fields, and the fixed columns (REF, ALT, QUAL, FILTER) could be converted into VCFArray instances with different dimensions.
Last updated 24 days ago
infrastructuredatarepresentationsequencingvariantannotation
4.00 score 1 stars 3 scripts 195 downloads