mypackage.setup_anndata¶
-
mypackage.
setup_anndata
(adata, batch_key=None, labels_key=None, layer=None, protein_expression_obsm_key=None, protein_names_uns_key=None, categorical_covariate_keys=None, continuous_covariate_keys=None, copy=False)[source]¶ Sets up
AnnData
object for models.A mapping will be created between data fields used by models to their respective locations in adata. This method will also compute the log mean and log variance per batch for the library size prior.
None of the data in adata are modified. Only adds fields to adata.
- Parameters
- adata :
AnnData
AnnData
AnnData object containing raw counts. Rows represent cells, columns represent features.
- batch_key :
str
|None
Optional
[str
] (default:None
) key in adata.obs for batch information. Categories will automatically be converted into integer categories and saved to adata.obs[‘_scvi_batch’]. If None, assigns the same batch to all the data.
- labels_key :
str
|None
Optional
[str
] (default:None
) key in adata.obs for label information. Categories will automatically be converted into integer categories and saved to adata.obs[‘_scvi_labels’]. If None, assigns the same label to all the data.
- layer :
str
|None
Optional
[str
] (default:None
) if not None, uses this as the key in adata.layers for raw count data.
- protein_expression_obsm_key :
str
|None
Optional
[str
] (default:None
) key in adata.obsm for protein expression data, Required for
TOTALVI
.- protein_names_uns_key :
str
|None
Optional
[str
] (default:None
) key in adata.uns for protein names. If None, will use the column names of adata.obsm[protein_expression_obsm_key] if it is a DataFrame, else will assign sequential names to proteins. Only relevant but not required for
TOTALVI
.- categorical_covariate_keys :
List
[str
] |None
Optional
[List
[str
]] (default:None
) keys in adata.obs that correspond to categorical data. Used in some models.
- continuous_covariate_keys :
List
[str
] |None
Optional
[List
[str
]] (default:None
) keys in adata.obs that correspond to continuous data. Used in some models.
- copy :
bool
bool
(default:False
) if True, a copy of adata is returned.
- adata :
- Return type
- Returns
If
copy
, will returnAnnData
. Adds the following fields to adata:- .uns[‘_scvi’]
scvi setup dictionary
- .obs[‘_local_l_mean’]
per batch library size mean
- .obs[‘_local_l_var’]
per batch library size variance
- .obs[‘_scvi_labels’]
labels encoded as integers
- .obs[‘_scvi_batch’]
batch encoded as integers
Examples
Example setting up a scanpy dataset with random gene data and no batch nor label information
>>> import scanpy as sc >>> import scvi >>> import numpy as np >>> adata = scvi.data.synthetic_iid(run_setup_anndata=False) >>> adata AnnData object with n_obs × n_vars = 400 × 100 obs: 'batch', 'labels' uns: 'protein_names' obsm: 'protein_expression'
Filter cells and run preprocessing before setup_anndata
>>> sc.pp.filter_cells(adata, min_counts = 0)
Since no batch_key nor labels_key was passed, setup_anndata() will assume all cells have the same batch and label
>>> scvi.data.setup_anndata(adata) INFO No batch_key inputted, assuming all cells are same batch INFO No label_key inputted, assuming all cells have same label INFO Using data from adata.X INFO Computing library size prior per batch INFO Registered keys:['X', 'batch_indices', 'local_l_mean', 'local_l_var', 'labels'] INFO Successfully registered anndata object containing 400 cells, 100 vars, 1 batches, 1 labels, and 0 proteins. Also registered 0 extra categorical covariates and 0 extra continuous covariates.
Example setting up scanpy dataset with random gene data, batch, and protein expression
>>> adata = scvi.data.synthetic_iid(run_setup_anndata=False) >>> scvi.data.setup_anndata(adata, batch_key='batch', protein_expression_obsm_key='protein_expression') INFO Using batches from adata.obs["batch"] INFO No label_key inputted, assuming all cells have same label INFO Using data from adata.X INFO Computing library size prior per batch INFO Using protein expression from adata.obsm['protein_expression'] INFO Generating sequential protein names INFO Registered keys:['X', 'batch_indices', 'local_l_mean', 'local_l_var', 'labels', 'protein_expression'] INFO Successfully registered anndata object containing 400 cells, 100 vars, 2 batches, 1 labels, and 100 proteins. Also registered 0 extra categorical covariates and 0 extra continuous covariates.