gans package#
Submodules#
gans.causal_gan module#
- class gans.causal_gan.CausalGAN(genes_no: int, batch_size: int, latent_dim: int, noise_per_gene: int, depth_per_gene: int, width_per_gene: int, cc_latent_dim: int, cc_layers: List[int], cc_pretrained_checkpoint: str, crit_layers: List[int], causal_graph: Dict[int, Set[int]], labeler_layers: List[int], device: str | None = 'cpu', library_size: int | None = 20000)[source]#
Bases:
GAN
- __init__(genes_no: int, batch_size: int, latent_dim: int, noise_per_gene: int, depth_per_gene: int, width_per_gene: int, cc_latent_dim: int, cc_layers: List[int], cc_pretrained_checkpoint: str, crit_layers: List[int], causal_graph: Dict[int, Set[int]], labeler_layers: List[int], device: str | None = 'cpu', library_size: int | None = 20000) None [source]#
Causal single-cell RNA-seq GAN (TODO: find a unique name).
- Parameters:
genes_no (int) – Number of genes in the dataset.
batch_size (int) – Training batch size.
latent_dim (int) – Dimension of the latent space from which the noise vector used by the causal controller is sampled.
noise_per_gene (int) – Dimension of the latent space from which the noise vectors used by target generators is sampled.
depth_per_gene (int) – Depth of the target generator networks.
width_per_gene (int) – The width scale used for the target generator networks.
cc_latent_dim (int) – Dimension of the latent space from which the noise vector to the causal controller is sampled.
cc_layers (List[int]) – List of integers corresponding to the number of neurons of each causal controller layer.
cc_pretrained_checkpoint (str) – Path to the pretrained causal controller.
crit_layers (List[int]) – List of integers corresponding to the number of neurons of each critic layer.
causal_graph (Dict[int, Set[int]]) –
The causal graph is a dictionary representing the TRN to impose. It has the following format: {target gene index: {TF1 index, TF2 index, …}}. This causal graph has to be acyclic and bipartite. A TF cannot be regulated by another TF. Invalid: {1: {2, 3, {4, 6}}, …} - a regulator (TF) is regulated by another regulator (TF) Invalid: {1: {2, 3, 4}, 2: {4, 3, 5}, …} - a regulator (TF) is also regulated Invalid: {4: {2, 3}, 2: {4, 3}} - contains a cycle
Valid causal graph example: {1: {2, 3, 4}, 6: {5, 4, 2}, …}
labeler_layers (List[int]) – List of integers corresponding to the width of each labeler layer.
device (Optional[str], optional) – Specifies to train on ‘cpu’ or ‘cuda’. Only ‘cuda’ is supported for training the GAN but ‘cpu’ can be used for inference, by default “cuda” if torch.cuda.is_available() else”cpu”.
library_size (Optional[int], optional) – Total number of counts per generated cell, by default 20000.
- _save(path: str | bytes | PathLike) None [source]#
Saves the model.
- Parameters:
path (Union[str, bytes, os.PathLike]) – Directory to save the model.
- _load(path: str | bytes | PathLike, mode: str | None = 'inference') None [source]#
Loads a saved causal GAN model (.pth file).
- Parameters:
path (Union[str, bytes, os.PathLike]) – Path to the saved model.
mode (Optional[str], optional) – Specify if the loaded model is used for ‘inference’ or ‘training’, by default “inference”.
- Raises:
ValueError – If a mode other than ‘inference’ or ‘training’ is specified.
- _train_labelers(real_cells: Tensor) None [source]#
Trains the labeler (on real and fake) and anti-labeler (on fake only).
- Parameters:
real_cells (torch.Tensor) – Tensor containing a batch of real cells.
- _train_generator() Tensor [source]#
Trains the causal generator for one iteration. :returns: Tensor containing only 1 item, the generator loss. :rtype: torch.Tensor
- train(train_files: str, valid_files: str, critic_iter: int, max_steps: int, c_lambda: float, beta1: float, beta2: float, gen_alpha_0: float, gen_alpha_final: float, crit_alpha_0: float, crit_alpha_final: float, labeler_alpha: float, antilabeler_alpha: float, labeler_training_interval: int, checkpoint: str | bytes | PathLike | None = None, output_dir: str | None = 'output', summary_freq: int | None = 5000, plt_freq: int | None = 10000, save_feq: int | None = 10000) None [source]#
Method for training the causal GAN.
- Parameters:
train_files (str) – Path to training set files (TFrecords supported for now).
valid_files (str) – Path to validation set files (TFrecords supported for now).
critic_iter (int) – Number of training iterations of the critic for each iteration on the generator.
max_steps (int) – Maximum number of steps to train the GAN.
c_lambda (float) – Regularization hyper-parameter for gradient penalty.
beta1 (float) – Coefficients used for computing running averages of gradient in the optimizer.
beta2 (float) – Coefficient used for computing running averages of gradient squares in the optimizer.
gen_alpha_0 (float) – Generator’s initial learning rate value.
gen_alpha_final (float) – Generator’s final learning rate value.
crit_alpha_0 (float) – Critic’s initial learning rate value.
crit_alpha_final (float) – Critic’s final learning rate value.
labeler_alpha (float) – Labeler’s learning rate value.
antilabeler_alpha (float) – Anti-labeler’s learning rate value.
labeler_training_interval (int) – The number of steps after which the labeler and anti-labeler are trained. If 20, the labeler and anti-labeler will be trained every 20 steps.
checkpoint (Optional[Union[str, bytes, os.PathLike, None]], optional) – Path to a trained model; if specified, the checkpoint is be used to resume training, by default None.
output_dir (Optional[str], optional) – Directory to which plots, tfevents, and checkpoints will be saved, by default “output”.
summary_freq (Optional[int], optional) – Period between summary logs to TensorBoard, by default 5000.
plt_freq (Optional[int], optional) – Period between t-SNE plots, by default 10000.
save_feq (Optional[int], optional) – Period between saves of the model, by default 10000.
gans.conditional_gan module#
- class gans.conditional_gan.ConditionalGAN(genes_no: int, batch_size: int, latent_dim: int, gen_layers: List[int], crit_layers: List[int], device: str | None = 'cpu', library_size: int | None = 20000)[source]#
Bases:
GAN
,ABC
- static _sample_pseudo_labels(batch_size: int, cluster_ratios: Tensor) Tensor [source]#
Randomly samples cluster labels following a multinomial distribution.
- Parameters:
batch_size (int) – The number of samples to generate (normally equal to training batch size).
cluster_ratios (torch.Tensor) – Tensor containing the parameters of the multinomial distribution (ex: torch.Tensor([0.5, 0.3, 0.2]) for 3 clusters with occurence probabilities of 0.5, 0.3, and 0.2 for clusters 0, 1, and 2, respectively).
- Returns:
Tensor containing a batch of samples cluster labels.
- Return type:
torch.Tensor
- _generate_tsne_plot(valid_loader: DataLoader, output_dir: str | bytes | PathLike) None [source]#
Generate t-SNE plot during training.
- Parameters:
valid_loader (DataLoader) – Validation set DataLoader.
output_dir (Union[str, bytes, os.PathLike]) – Directory to save the t-SNE plots.
- _abc_impl = <_abc._abc_data object>#
gans.conditional_gan_cat module#
- class gans.conditional_gan_cat.ConditionalCatGAN(genes_no: int, batch_size: int, latent_dim: int, gen_layers: List[int], crit_layers: List[int], num_classes: int, label_ratios: Tensor, device: str | None = 'cpu', library_size: int | None = 20000)[source]#
Bases:
ConditionalGAN
- __init__(genes_no: int, batch_size: int, latent_dim: int, gen_layers: List[int], crit_layers: List[int], num_classes: int, label_ratios: Tensor, device: str | None = 'cpu', library_size: int | None = 20000) None [source]#
Conditional single-cell RNA-seq GAN using the conditioning method by concatenation.
- Parameters:
genes_no (int) – Number of genes in the dataset.
batch_size (int) – Training batch size.
latent_dim (int) – Dimension of the latent space from which the noise vector is sampled.
gen_layers (List[int]) – List of integers corresponding to the number of neurons of each generator layer.
crit_layers (List[int]) – List of integers corresponding to the number of neurons of each critic layer.
num_classes (int) – Number of classes in the dataset.
label_ratios (torch.Tensor) – Tensor containing the ratio of each class in the dataset.
device (Optional[str], optional) – Specifies to train on ‘cpu’ or ‘cuda’. Only ‘cuda’ is supported for training the GAN but ‘cpu’ can be used for inference, by default “cuda” if torch.cuda.is_available() else”cpu”.
library_size (Optional[int], optional) – Total number of counts per generated cell, by default 20000.
- _get_gradient(real: Tensor, fake: Tensor, epsilon: Tensor, labels: Tensor | None = None, *args, **kwargs) Tensor [source]#
Compute the gradient of the critic’s scores with respect to interpolations of real and fake cells.
- Parameters:
real (torch.Tensor) – A batch of real cells.
fake (torch.Tensor) – A batch of fake cells.
epsilon (torch.Tensor) – A vector of the uniformly random proportions of real/fake per interpolated cells.
labels (torch.Tensor) – A batch of real class labels.
*args – Variable length argument list.
**kwargs – Arbitrary keyword arguments.
- Returns:
Gradient of the critic’s score with respect to interpolated data.
- Return type:
torch.Tensor
- _cat_one_hot_labels(cells: Tensor, labels: Tensor) Tensor [source]#
Concatenates one-hot encoded labels to a tensor.
- Parameters:
cells (torch.Tensor) – Tensor to which to concatenate one-hot encoded class labels.
labels (torch.Tensor) – Class labels to concatenate.
- Returns:
Tensor with one-hot encoded labels concatenated at the tail.
- Return type:
torch.Tensor
- generate_cells(cells_no: int, checkpoint: str | bytes | PathLike | None = None, class_: int | None = None) Tuple[ndarray, ndarray] [source]#
Generate cells from the Conditional GAN model.
- Parameters:
cells_no (int) – Number of cells to generate.
checkpoint (Optional[Union[str, bytes, os.PathLike, None]], optional) – Path to the saved trained model, by default None.
class (Optional[Union[int, None]] = None) – Class of the cells to generate. If None, cells with the same ratio per class will be generated.
- Returns:
Gene expression matrix of generated cells and their corresponding class labels.
- Return type:
Tuple[np.ndarray, np.ndarray]
- _train_critic(real_cells, real_labels, c_lambda) Tuple[Tensor, Tensor] [source]#
Trains the critic for one iteration.
- Parameters:
real_cells (torch.Tensor) – Tensor containing a batch of real cells.
real_labels (torch.Tensor) – Tensor containing a batch of real labels (corresponding to real_cells).
c_lambda (float) – Regularization hyper-parameter for gradient penalty.
- Returns:
The computed critic loss and gradient penalty.
- Return type:
Tuple[torch.Tensor, torch.Tensor]
- _train_generator() Tensor [source]#
Trains the generator for one iteration.
- Returns:
Tensor containing only 1 item, the generator loss.
- Return type:
torch.Tensor
- _abc_impl = <_abc._abc_data object>#
gans.conditional_gan_proj module#
- class gans.conditional_gan_proj.ConditionalProjGAN(genes_no: int, batch_size: int, latent_dim: int, gen_layers: List[int], crit_layers: List[int], num_classes: int, label_ratios: Tensor, device: str | None = 'cpu', library_size: int | None = 20000)[source]#
Bases:
ConditionalGAN
- __init__(genes_no: int, batch_size: int, latent_dim: int, gen_layers: List[int], crit_layers: List[int], num_classes: int, label_ratios: Tensor, device: str | None = 'cpu', library_size: int | None = 20000) None [source]#
Conditional single-cell RNA-seq GAN using the projection conditioning method.
- Parameters:
genes_no (int) – Number of genes in the dataset.
batch_size (int) – Training batch size.
latent_dim (int) – Dimension of the latent space from which the noise vector is sampled.
gen_layers (List[int]) – List of integers corresponding to the number of neurons of each generator layer.
crit_layers (List[int]) – List of integers corresponding to the number of neurons of each critic layer.
num_classes (int) – Number of classes in the dataset.
label_ratios (torch.Tensor) – Tensor containing the ratio of each class in the dataset.
device (Optional[str], optional) – Specifies to train on ‘cpu’ or ‘cuda’. Only ‘cuda’ is supported for training the GAN but ‘cpu’ can be used for inference, by default “cuda” if torch.cuda.is_available() else”cpu”.
library_size (Optional[int], optional) – Total number of counts per generated cell, by default 20000.
- _get_gradient(real: Tensor, fake: Tensor, epsilon: Tensor, labels: Tensor | None = None, *args, **kwargs) Tensor [source]#
Compute the gradient of the critic’s scores with respect to interpolations of real and fake cells.
- Parameters:
real (torch.Tensor) – A batch of real cells.
fake (torch.Tensor) – A batch of fake cells.
epsilon (torch.Tensor) – A vector of the uniformly random proportions of real/fake per interpolated cells.
labels (torch.Tensor) – A batch of real class labels.
*args – Variable length argument list.
**kwargs – Arbitrary keyword arguments.
- Returns:
Gradient of the critic’s score with respect to interpolated data.
- Return type:
torch.Tensor
- generate_cells(cells_no: int, checkpoint: str | bytes | PathLike | None = None, class_: int | None = None) Tuple[ndarray, ndarray] [source]#
Generate cells from the Conditional GAN model.
- Parameters:
cells_no (int) – Number of cells to generate.
checkpoint (Optional[Union[str, bytes, os.PathLike, None]], optional) – Path to the saved trained model, by default None.
class (Optional[Union[int, None]] = None) – Class of the cells to generate. If None, cells with the same ratio per class will be generated.
- Returns:
Gene expression matrix of generated cells and their corresponding class labels.
- Return type:
Tuple[np.ndarray, np.ndarray]
- _train_critic(real_cells, real_labels, c_lambda) Tuple[Tensor, Tensor] [source]#
Trains the critic for one iteration.
- Parameters:
real_cells (torch.Tensor) – Tensor containing a batch of real cells.
real_labels (torch.Tensor) – Tensor containing a batch of real labels (corresponding to real_cells).
c_lambda (float) – Regularization hyper-parameter for gradient penalty.
- Returns:
The computed critic loss and gradient penalty.
- Return type:
Tuple[torch.Tensor, torch.Tensor]
- _train_generator() Tensor [source]#
Trains the generator for one iteration.
- Returns:
Tensor containing only 1 item, the generator loss.
- Return type:
torch.Tensor
- _abc_impl = <_abc._abc_data object>#
gans.gan module#
- class gans.gan.GAN(genes_no: int, batch_size: int, latent_dim: int, gen_layers: List[int], crit_layers: List[int], device: str | None = 'cpu', library_size: int | None = 20000)[source]#
Bases:
object
- __init__(genes_no: int, batch_size: int, latent_dim: int, gen_layers: List[int], crit_layers: List[int], device: str | None = 'cpu', library_size: int | None = 20000) None [source]#
Non-conditional single-cell RNA-seq GAN.
- Parameters:
genes_no (int) – Number of genes in the dataset.
batch_size (int) – Training batch size.
latent_dim (int) – Dimension of the latent space from which the noise vector is sampled.
gen_layers (List[int]) – List of integers corresponding to the number of neurons of each generator layer.
crit_layers (List[int]) – List of integers corresponding to the number of neurons of each critic layer.
device (Optional[str], optional) – Specifies to train on ‘cpu’ or ‘cuda’. Only ‘cuda’ is supported for training the GAN but ‘cpu’ can be used for inference, by default “cuda” if torch.cuda.is_available() else”cpu”.
library_size (Optional[int], optional) – Total number of counts per generated cell, by default 20000.
- static _generate_noise(batch_size: int, latent_dim: int, device: str) Tensor [source]#
Function for creating noise vectors: Given the dimensions (batch_size, latent_dim).
- Parameters:
batch_size (int) – The number of samples to generate (normally equal to training batch size).
latent_dim (int) – Dimension of the latent space to sample from.
device (str) – The device type.
- Returns:
A tensor filled with random numbers from the standard normal distribution.
- Return type:
torch.Tensor
- static _set_exponential_lr(optimizer: Optimizer, alpha_0: float, alpha_final: float, max_steps: int) ExponentialLR [source]#
Sets up exponentially decaying learning rate scheduler to be used with the optimizer.
- Parameters:
optimizer (torch.optim.Optimizer) – Optimizer for which to create an exponential learning rate scheduler.
alpha_0 (float) – Initial learning rate.
alpha_final (float) – Final learning rate.
max_steps (int) – Total number of training steps. When current_step=max_steps, alpha_final will be set as the learning rate.
- Returns:
Exponential learning rate scheduler. Call the step() function on this scheduler in the training loop.
- Return type:
ExponentialLR
- static _critic_loss(crit_fake_pred: Tensor, crit_real_pred: Tensor, gp: Tensor, c_lambda: float) Tensor [source]#
Compute critic’s loss given the its scores on real and fake cells, the gradient penalty, and gradient penalty regularization hyper-parameter.
- Parameters:
crit_fake_pred (torch.Tensor) – Critic’s score on fake cells.
crit_real_pred (torch.Tensor) – Critic’s score on real cells.
gp (torch.Tensor) – Unweighted gradient penalty
c_lambda (float) – Regularization hyper-parameter to be used with the gradient penalty in the WGAN loss.
- Returns:
Critic’s loss for the current batch.
- Return type:
torch.Tensor
- static _generator_loss(crit_fake_pred: Tensor) Tensor [source]#
Compute the generator loss from the critic’s score of the generated cells.
- Parameters:
crit_fake_pred (torch.Tensor) – The critic’s score on fake generated cells.
- Returns:
Generator’s loss value for the current batch.
- Return type:
torch.Tensor
- _get_gradient(real: Tensor, fake: Tensor, epsilon: Tensor, *args, **kwargs) Tensor [source]#
Compute the gradient of the critic’s scores with respect to interpolations of real and fake cells.
- Parameters:
real (torch.Tensor) – A batch of real cells.
fake (torch.Tensor) – A batch of fake cells.
epsilon (torch.Tensor) – A vector of the uniformly random proportions of real/fake per interpolated cells.
- Returns:
Gradient of the critic’s score with respect to interpolated data.
- Return type:
torch.Tensor
- static _gradient_penalty(gradient: Tensor) Tensor [source]#
Compute the gradient penalty given a gradient.
- Parameters:
gradient (torch.Tensor) – The gradient of the critic’s score with respect to the interpolated data.
- Returns:
Gradient penalty of the given gradient.
- Return type:
torch.Tensor
- generate_cells(cells_no: int, checkpoint: str | bytes | PathLike | None = None) ndarray [source]#
Generate cells from the GAN model.
- Parameters:
cells_no (int) – Number of cells to generate.
checkpoint (Optional[Union[str, bytes, os.PathLike, None]], optional) – Path to the saved trained model, by default None.
- Returns:
Gene expression matrix of generated cells.
- Return type:
np.ndarray
- _save(path: str | bytes | PathLike) None [source]#
Saves the model.
- Parameters:
path (Union[str, bytes, os.PathLike]) – Directory to save the model.
- _load(path: str | bytes | PathLike, mode: str | None = 'inference') None [source]#
Loads a saved model (.pth file).
- Parameters:
path (Union[str, bytes, os.PathLike]) – Path to the saved model.
mode (Optional[str], optional) – Specify if the loaded model is used for ‘inference’ or ‘training’, by default “inference”.
- Raises:
ValueError – If a mode other than ‘inference’ or ‘training’ is specified.
- _get_loaders(train_file: str | bytes | PathLike, validation_file: str | bytes | PathLike) Tuple[DataLoader, DataLoader] [source]#
Gets training and validation DataLoaders for training.
- Parameters:
train_file (Union[str, bytes, os.PathLike]) – Path to training files.
validation_file (Union[str, bytes, os.PathLike]) – Path to validation files.
- Returns:
Train and Validation Dataloaders.
- Return type:
Tuple[DataLoader, DataLoader]
- _add_tensorboard_graph(output_dir: str | bytes | PathLike, gen_data: Tensor | Tuple[Tensor], crit_data: Tensor | Tuple[Tensor]) None [source]#
Adds the model graph to TensorBoard.
- Parameters:
output_dir (Union[str, bytes, os.PathLike]) – Directory to save the tfevents.
gen_data (Union[torch.Tensor, Tuple[torch.Tensor]]) – Input to the generator.
crit_data (Union[torch.Tensor, Tuple[torch.Tensor]]) – Input to the critic.
- _update_tensorboard(gen_loss: float, crit_loss: float, gp: Tensor, gen_lr: float, crit_lr: float, output_dir: str | bytes | PathLike) None [source]#
Updates the TensorBoard summary logs.
- Parameters:
gen_loss (float) – Generator loss.
crit_loss (float) – Critic loss.
gp (torch.Tensor) – Gradient penalty.
gen_lr (float) – Generator’s optimizer learning rate.
crit_lr (float) – Critic’s optimizer learning rate.
output_dir (Union[str, bytes, os.PathLike]) – Directory to save the tfevents.
- _generate_tsne_plot(valid_loader: DataLoader, output_dir: str | bytes | PathLike) None [source]#
Generates t-SNE plots during training.
- Parameters:
valid_loader (DataLoader) – Validation set DataLoader.
output_dir (Union[str, bytes, os.PathLike]) – Directory to save the t-SNE plots.
- _train_critic(real_cells: Tensor, real_labels: Tensor, c_lambda: float) Tuple[Tensor, Tensor] [source]#
Trains the critic for one iteration.
- Parameters:
real_cells (torch.Tensor) – Tensor containing a batch of real cells.
real_labels (torch.Tensor) – Tensor containing a batch of real labels (corresponding to real_cells).
c_lambda (float) – Regularization hyper-parameter for gradient penalty.
- Returns:
The computed critic loss and gradient penalty.
- Return type:
Tuple[torch.Tensor, torch.Tensor]
- _train_generator() Tensor [source]#
Trains the generator for one iteration.
- Returns:
Tensor containing only 1 item, the generator loss.
- Return type:
torch.Tensor
- train(train_files: str, valid_files: str, critic_iter: int, max_steps: int, c_lambda: float, beta1: float, beta2: float, gen_alpha_0: float, gen_alpha_final: float, crit_alpha_0: float, crit_alpha_final: float, checkpoint: str | bytes | PathLike | None = None, output_dir: str | None = 'output', summary_freq: int | None = 5000, plt_freq: int | None = 10000, save_feq: int | None = 10000) None [source]#
Method for training the GAN.
- Parameters:
train_files (str) – Path to training set files (TFrecords supported for now).
valid_files (str) – Path to validation set files (TFrecords supported for now).
critic_iter (int) – Number of training iterations of the critic for each iteration on the generator.
max_steps (int) – Maximum number of steps to train the GAN.
c_lambda (float) – Regularization hyper-parameter for gradient penalty.
beta1 (float) – Coefficients used for computing running averages of gradient in the optimizer.
beta2 (float) – Coefficient used for computing running averages of gradient squares in the optimizer.
gen_alpha_0 (float) – Generator’s initial learning rate value.
gen_alpha_final (float) – Generator’s final learning rate value.
crit_alpha_0 (float) – Critic’s initial learning rate value.
crit_alpha_final (float) – Critic’s final learning rate value.
checkpoint (Optional[Union[str, bytes, os.PathLike, None]], optional) – Path to a trained model; if specified, the checkpoint is be used to resume training, by default None.
output_dir (Optional[str], optional) – Directory to which plots, tfevents, and checkpoints will be saved, by default “output”.
summary_freq (Optional[int], optional) – Period between summary logs to TensorBoard, by default 5000.
plt_freq (Optional[int], optional) – Period between t-SNE plots, by default 10000.
save_feq (Optional[int], optional) – Period between saves of the model, by default 10000.