module unsupervised
function compute_tsne_embedding
compute_tsne_embedding(
dataset: DataFrame,
cols: list,
N_rows: int = 20000,
n_components=2,
perplexity=30
) → tuple
Compute TSNE embedding. Only for a random subset of rows.
Args:
dataset
: Input datacols
: A list of column names to produce the embedding forN_rows
: A number of rows to randomly sample for the embedding. Only these rows are embedded.n_components
: The number of dimensions to embed the data into.perplexity
: The perplexity of the TSNE embedding.
Returns: The tuple: - A numpy array with the embedding data, only for a random subset of row - The rows that were used for the embedding
function compute_morlet
compute_morlet(
data: ndarray,
dt: float = 0.03333333333333333,
n_freq: int = 5,
w: float = 3
) → ndarray
Compute morlet wavelet transform of a time series.
Args:
data
: A 2D array containing the time series data, with dimensions (n_pts x n_channels)dt
: The time step of the time seriesn_freq
: The number of frequencies to computew
: The width of the morlet wavelet
Returns A 2D numpy array with the morlet wavelet transform. The first dimension is the frequency, the second is the time.
function compute_density
compute_density(
dataset: DataFrame,
embedding_extent: tuple,
bandwidth: float = 0.5,
n_pts: int = 300,
N_sample_rows: int = 50000,
rows: list = None
) → ndarray
Compute kernel density estimate of embedding.
Args:
dataset
: pd.DataFrame with embedding data loaded in it. (Must have already populated columns named 'embedding_0', 'embedding_1')embedding_extent
: the bounds in which to apply the density estimate. Has the form (xmin, xmax, ymin, ymax)bandwidth
: the Gaussian kernel bandwidth. Will depend on the scale of the embedding. Can be changed to affect the number of clusters pulled outn_pts
: number of points over which to evaluate the KDEN_sample_rows
: number of rows to randomly sample to generate estimaterows
: If provided, use these rows instead of a random sample
Returns: Numpy array with KDE over the specified square region in the embedding space, with dimensions (n_pts x n_pts)
function compute_watershed
compute_watershed(
dens_matrix: ndarray,
positive_only: bool = False,
cutoff: float = 0
) → tuple
Compute watershed clustering of a density matrix.
Args:
dens_matrix
: A square 2D numpy array, output from compute_density, containing the kernel density estimate of the embedding.positive_only
: Whether to apply a threshold, 'cutoff'. If applied, 'cutoff' is subtracted from dens_matrix, and any value below zero is set to zero. Useful for only focusing on high density clusters.cutoff
: The cutoff value to apply if positive_only = True
Returns: A numpy array with the same dimensions as dens_matrix. Each value in the array is the cluster ID for that coordinate.
function cluster_behaviors
cluster_behaviors(
dataset: DataFrame,
feature_cols: list,
N_rows: int = 200000,
use_morlet: bool = False,
use_umap: bool = True,
n_pts: int = 300,
bandwidth: float = 0.5,
**kwargs
) → tuple
Cluster behaviors based on dimensionality reduction, kernel density estimation, and watershed clustering.
Note that this will modify the dataset dataframe in place.
The following columns are added to dataset: 'embedding_index_[0/1]': the coordinates of each embedding coordinate in the returned density matrix 'unsup_behavior_label': the Watershed transform label for that row, based on its embedding coordinates. Rows whose embedding coordinate has no watershed cluster, or which fall outside the domain have value -1.
Args:
dataset
: the pd.DataFrame with the features of interestfeature_cols
: list of column names to perform the clustering onN_rows
: number of rows to perform the embedding on. If 'None', then all rows are used.use_morlet
: Apply Morlet wavelet transform to the feature cols before computing the embeddinguse_umap
: If True will use UMAP dimensionality reduction, if False will use TSNEn_pts
: dimension of grid the kernel density estimate is evaluated on.bandwidth
: Gaussian kernel bandwidth for kernel estimate**kwargs
: All other keyword parameters are sent to dimensionality reduction call (either TSNE or UMAP)
Returns:
A tuple with components:
- dens_matrix
: the (n_pts x n_pts) numpy array with the density estimate of the 2D embedding
- labels
: numpy array with same dimensions are dens_matrix, but with values the watershed cluster IDs
- embedding_extent
: the coordinates in embedding space that dens_matrix is approximating the density over
This file was automatically generated via lazydocs.