---
title: "Quick start to presto"
author: "Ilya Korsunsky"
output:
  BiocStyle::html_document:
    code_folding: show
    number_sections: yes
    toc: yes  
    fig_width: 7
    fig_height: 2.5
vignette: >
    %\VignetteIndexEntry{Quick start to presto}
    %\VignetteEngine{knitr::rmarkdown}
    %\VignetteEncoding{UTF-8} 
---


# Introduction

presto makes it fast and easy to run Wilcoxon rank sum test and auROC analysis 
on large datasets. The tutorial below shows how to install presto, walks 
through the 3 major ways you can use presto with your data, and finally
explores more advanced use cases. 

# Installation 

To install the current stable release from CRAN:

```{r, eval=FALSE}
install.packages('presto')

```

For the cutting edge version of presto: 

```{r, eval=FALSE}
library(devtools)
install_github('immunogenomics/presto')

```

```{r message=FALSE, warning=FALSE, include=FALSE}
options(digits=2)
library(presto)
```


# Input Types

The main function in this vignette is `wilcoxauc`. presto currently supports 3
interfaces to `wilcoxauc`, with a matrix, Seurat object, or 
SingleCellExperiment object. The output of `wilcoxauc` is described in the
next section.


## Matrix input

The most general use of presto is with a matrix of features and observations 
(exprs) paired with a vector of group labels (y). 

```{r}
data(exprs)
head(exprs)[, 1:10]
data(y)
head(y)
head(wilcoxauc(exprs, y))

```


## Seurat 

We support interfacing with Seurat Version 3 objects. In the most basic use
case, specify the Seurat object and the meta data variable that defines the 
group labels. 

```{r}
data(object_seurat)
# ensure object structure matches with the installed Seurat version
object_seurat <- Seurat::UpdateSeuratObject(object_seurat)
head(wilcoxauc(object_seurat, 'cell_type'))

```

Seurat objects can store multiple assays. The assay used by `wilcoxauc` can be
specified with the `seurat_assay` argument.

```{r}
head(wilcoxauc(object_seurat, 'cell_type', seurat_assay = 'RNA'))

```


Seurat supports multiple feature expression matrices, such as raw counts, 
library normalized data, and scaled data. These can be accessed with `assay`.

```{r}
head(wilcoxauc(object_seurat, 'cell_type', assay = 'counts'))
head(wilcoxauc(object_seurat, 'cell_type', assay = 'data'))
head(wilcoxauc(object_seurat, 'cell_type', assay = 'scale.data'))

```


## SingleCellExperiment

presto supports the Bioconductor data structure SingleCellExperiment. Again, 
the most simple use case takes a SingleCellExperiment object and the metadata
field with group labels. 


```{r}
data(object_sce)
head(wilcoxauc(object_sce, 'cell_type'))

```

SingleCellExperiment can have several data slots, such as counts and logcounts.
These can be accessed with `assay`. 

```{r}
head(wilcoxauc(object_sce, 'cell_type', assay = 'counts'))
head(wilcoxauc(object_sce, 'cell_type', assay = 'logcounts'))

```


# Description of outputs

## Results table

All inputs for `wilcoxauc` give the same table of results. 

parameter | description 
--------- | ----------- 
feature | name of feature.
group | name of group label. 
avgExpr | mean value of feature in group. 
logFC | log fold change between observations in group vs out.
statistic | Wilcoxon rank sum U statistic. 
auc | area under the receiver operator curve. 
pval | nominal p value, from two-tailed Gaussian approximation of U statistic.
padj | Benjamini-Hochberg adjusted p value. 
pct_in | Percent of observations in the group with non-zero feature value. 
pct_out | Percent of observations out of the group with non-zero feature value. 

```{r}
head(wilcoxauc(exprs, y))

```


## Top markers

We often find it helpful to summarize what the most distinguishing features
are in each group. 

```{r}
res <- wilcoxauc(exprs, y)
top_markers(res, n = 10)

```

We can also filter for some criteria. For instance, the top features must be 
in at least 70% of all observations within the group. Note that not all groups
have 10 markers that meet these criteria. 


```{r}
res <- wilcoxauc(exprs, y)
top_markers(res, n = 10, auc_min = .5, pct_in_min = 70)

```


# Options

## Dense vs sparse

presto is optimized for dense and sparse matrix inputs. When possible, use 
sparse inputs. In our toy dataset, almost 39% of elements are zeros. Thus, it 
makes sense to cast it as a sparse dgCMatrix and run wilcoxauc on that. 

```{r}
sum(exprs == 0) / prod(dim(exprs))
exprs_sparse <- as(exprs, 'dgCMatrix')
head(wilcoxauc(exprs_sparse, y))

```


## groups_use

Sometimes, you don't want to test all groups in the dataset against all other 
groups. For instance, I want to compare only observations in group 'A' to
those in group 'B'. This is achieved with the groups_use argument. 

```{r}
res_AB <- wilcoxauc(exprs, y, groups_use = c('A', 'B'))
head(res_AB)
top_markers(res_AB)

```