--- title: "Quick start to presto" author: "Ilya Korsunsky" output: BiocStyle::html_document: code_folding: show number_sections: yes toc: yes fig_width: 7 fig_height: 2.5 vignette: > %\VignetteIndexEntry{Quick start to presto} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Introduction presto makes it fast and easy to run Wilcoxon rank sum test and auROC analysis on large datasets. The tutorial below shows how to install presto, walks through the 3 major ways you can use presto with your data, and finally explores more advanced use cases. # Installation To install the current stable release from CRAN: ```{r, eval=FALSE} install.packages('presto') ``` For the cutting edge version of presto: ```{r, eval=FALSE} library(devtools) install_github('immunogenomics/presto') ``` ```{r message=FALSE, warning=FALSE, include=FALSE} options(digits=2) library(presto) ``` # Input Types The main function in this vignette is `wilcoxauc`. presto currently supports 3 interfaces to `wilcoxauc`, with a matrix, Seurat object, or SingleCellExperiment object. The output of `wilcoxauc` is described in the next section. ## Matrix input The most general use of presto is with a matrix of features and observations (exprs) paired with a vector of group labels (y). ```{r} data(exprs) head(exprs)[, 1:10] data(y) head(y) head(wilcoxauc(exprs, y)) ``` ## Seurat We support interfacing with Seurat Version 3 objects. In the most basic use case, specify the Seurat object and the meta data variable that defines the group labels. ```{r} data(object_seurat) # ensure object structure matches with the installed Seurat version object_seurat <- Seurat::UpdateSeuratObject(object_seurat) head(wilcoxauc(object_seurat, 'cell_type')) ``` Seurat objects can store multiple assays. The assay used by `wilcoxauc` can be specified with the `seurat_assay` argument. ```{r} head(wilcoxauc(object_seurat, 'cell_type', seurat_assay = 'RNA')) ``` Seurat supports multiple feature expression matrices, such as raw counts, library normalized data, and scaled data. These can be accessed with `assay`. ```{r} head(wilcoxauc(object_seurat, 'cell_type', assay = 'counts')) head(wilcoxauc(object_seurat, 'cell_type', assay = 'data')) head(wilcoxauc(object_seurat, 'cell_type', assay = 'scale.data')) ``` ## SingleCellExperiment presto supports the Bioconductor data structure SingleCellExperiment. Again, the most simple use case takes a SingleCellExperiment object and the metadata field with group labels. ```{r} data(object_sce) head(wilcoxauc(object_sce, 'cell_type')) ``` SingleCellExperiment can have several data slots, such as counts and logcounts. These can be accessed with `assay`. ```{r} head(wilcoxauc(object_sce, 'cell_type', assay = 'counts')) head(wilcoxauc(object_sce, 'cell_type', assay = 'logcounts')) ``` # Description of outputs ## Results table All inputs for `wilcoxauc` give the same table of results. parameter | description --------- | ----------- feature | name of feature. group | name of group label. avgExpr | mean value of feature in group. logFC | log fold change between observations in group vs out. statistic | Wilcoxon rank sum U statistic. auc | area under the receiver operator curve. pval | nominal p value, from two-tailed Gaussian approximation of U statistic. padj | Benjamini-Hochberg adjusted p value. pct_in | Percent of observations in the group with non-zero feature value. pct_out | Percent of observations out of the group with non-zero feature value. ```{r} head(wilcoxauc(exprs, y)) ``` ## Top markers We often find it helpful to summarize what the most distinguishing features are in each group. ```{r} res <- wilcoxauc(exprs, y) top_markers(res, n = 10) ``` We can also filter for some criteria. For instance, the top features must be in at least 70% of all observations within the group. Note that not all groups have 10 markers that meet these criteria. ```{r} res <- wilcoxauc(exprs, y) top_markers(res, n = 10, auc_min = .5, pct_in_min = 70) ``` # Options ## Dense vs sparse presto is optimized for dense and sparse matrix inputs. When possible, use sparse inputs. In our toy dataset, almost 39% of elements are zeros. Thus, it makes sense to cast it as a sparse dgCMatrix and run wilcoxauc on that. ```{r} sum(exprs == 0) / prod(dim(exprs)) exprs_sparse <- as(exprs, 'dgCMatrix') head(wilcoxauc(exprs_sparse, y)) ``` ## groups_use Sometimes, you don't want to test all groups in the dataset against all other groups. For instance, I want to compare only observations in group 'A' to those in group 'B'. This is achieved with the groups_use argument. ```{r} res_AB <- wilcoxauc(exprs, y, groups_use = c('A', 'B')) head(res_AB) top_markers(res_AB) ```