Provides a computational framework for Bayesian estimation of antigen-driven selection in immunoglobulin (Ig) sequences, providing an intuitive means of analyzing selection by quantifying the degree of selective pressure. Also provides tools to profile mutations in Ig sequences, build models of somatic hypermutation (SHM) in Ig sequences, and make model-dependent distance comparisons of Ig repertoires.
SHazaM is part of the Immcantation
analysis framework for Adaptive Immune Receptor Repertoire sequencing
(AIRR-seq) and provides tools for advanced analysis of somatic hypermutation
(SHM) in immunoglobulin (Ig) sequences. Shazam focuses on the following
findThreshold()that allows users to choose a mixture of two univariate density distribution functions among four available combinations:
findThreshold()from the best average sensitivity and specificity, the curve intersection or user defined sensitivity or specificity.
collapseClones(), adding various deterministic and stochastic methods to obtain effective clonal sequences, support for including ambiguous IUPAC characters in output, as well as extensive documentation. Removed
calcClonalConsensus()from exported functions.
calcObservedMutations()for sequences with non-triplet overhang at the tail.
OBSERVED) and expected mutations (previously
EXPECTED) returned by
calcBaseline()no longer calls
collapseClones()automatically if a
CLONEcolumn is present. As indicated by the documentation for
calcBaseline()users are advised to obtain effective clonal sequences (for example, calling
collapseClones()) before running
collapseClones()that prevented it from running when
nprocis greater than 1.
collapseClones()that resulted in erroneous
summarizeBaseline(). The returned p-value can now be either positive or negative. Its magnitude (without the sign) should be interpreted as per normal. Its sign indicates the direction of the seLicense chalection detected. A positive p-value indicates positive selection, whereas a negative p-value indicates negative selection.
editBaseline()to exported functions, and a corresponding section in the vignette.
createSubstitutionMatrix(), enabling parameter tuning for
minNumSeqMutationsTune()to tune for parameters
createMutabilityMatrix()respectively. Also added function
plotTune()which helps visualize parameter tuning using the abovementioned two new functions.
HH_S1F), human kappa and lambda light chain, silent, 1-mer, functional substitution model (
HKL_S1F), and mouse kappa light chain, replacement and silent, 1-mer, non-functional substitution model (
makeDegenerate5merMutwhich make degenerate 5-mer substitution and mutability models respectively based on the 1-mer models. Also added
makeAverage1merMutwhich make 1-mer substitution and mutability models respectively by averaging over the 5-mer models.
calcObservedMutations(), which if true returns the positions of point mutations and their corresponding mutation types, as opposed to counts of mutations (hence "raw").
slideWindowDb()which implement a sliding window approach towards filtering a single sequence or sequences in a data.frame which contain(s) equal to or more than a given number of mutations in a given number of consecutive nucleotides.
slideWindowTune()which allows for parameter tuning for using
slideWindowTunePlot()which visualizes parameter tuning by
normalize="length"for 5-mer models was resulting in distances normalized by junction length squared instead of raw junction length.
symmetry="min"was calculating the minimum of the total distance between two sequences instead of the minimum distance at each mutated position.
findThresholdfunction to infer clonal distance threshold from nearest neighbor distances returned by
lengthoption for the
lenso it matches Change-O.
M1NDistancedistance models, which have been renamed to
distToNearest. These deprecated models should be used for compatibility with DefineClones in Change-O v0.3.3. These models have been replaced by replaced by
mk_rs1nf, which are supported by Change-O v0.3.4.
calcTargetingDistance()to enable calculation of a symmetric distance matrix given a 1-mer substitution matrix normalized by row, such as
findThreshold. The previous smoothed density method is available via the
method="density"argument and the new GMM method is available via
plotDensityThresholdto plot the threshold detection results from
IMGT_V_BY_REGIONSso that neither includes CDR3 now.
createMutabilityMatrix(), enabling parameter tuning for
InfluenzaDbdata object, in favor of the updated
ExampleDbprovided in alakazam 0.2.4.
distToNearest()which allows restriction of distances to only distances across samples (ie, excludes within-sample distances).
distToNearest(), which will return all distances to neighboring nodes in a minimum spanning tree.
shmulateTree()to simulate mutations on sequences and lineage trees, respectively, using a 5-mer targeting model.
groupBaseline()multiple times resulted in incorrect normalization.
testBaseline()function to test the significance of differences between two selection distributions.
dplyr::tbl_dfobject instead of a
distToNearest()did not return the nearest neighbor with a non-zero distance.
MUTATIONS_POLARITYproviding alternate approaches to defining replacement and silent annotations to mutations when calling
regionDefinition=NULLconsistent for all mutation profiling functions. Now the entire sequence is used as the region and calculations are made accordingly.
calcDBObservedMutations()returns R and S mutations also when
regionDefinition=NULL. Older versions reported the sum of R and S mutations. The function will add the columns
symmetryparameter to distToNearest to change behavior of how asymmetric distances (A->B != B->A) are combined to get distance between A and B.
minNumMutationsparameter to createSubstitutionMatrix. This is the minimum number of observed 5-mers required for the substituion model. The substitution rate of 5-mers with fewer number of observed mutations will be inferred from other 5-mers.
minNumSeqMutationsparameter to createMutabilityMatrix. This is the minimum number of mutations required in sequences containing the 5-mers of interest. The mutability of 5-mers with fewer number of observed mutations in the sequences will be inferred.
returnModelparameter to createSubstitutionMatrix. This gives user the option to return 1-mer or 5-mer model.
returnSourceparameter to createMutabilityMatrix. If TRUE, the code will return a data frame indicating whether each 5-mer mutability is observed or inferred.
Initial public release.
Influenza.tabfile did not load on Mac OS X.
HS1FDistance, based on the Yaari et al, 2013 data.
hs1fas the default distance model for
calcDBClonalConsensus()so that the function now works correctly when called with the argument
calcDBObservedMutations(), which enables return of mutation frequencies rather the default of mutation counts.
M3NModeland all options for using said model.
createMutabilityMatrix()where IMGT gaps were not being handled.
U5NModel, which is a uniform 5-mer model.
Prerelease for review.