Computes optimal changepoint models using the
Poisson likelihood for non-negative count data,
subject to the PeakSeg constraint:
the first change must be up, second change down, third change up, etc.
For more info about the models and algorithms,
read "A log-linear time algorithm for constrained changepoint detection"
Regularized isotonic regression solver, square loss and penalized (FPOP) parameterization.
cast to double in PDPA as well.
cast to double before taking log, to avoid compile error on solaris (in PeakSegFPOP).
instead of printf, use Rprintf from R.h, to avoid CRAN check NOTE.
Register native routines.
removed interval regression code, added dependency on penaltyLearning.
removed problems.R, which is now in the PeakSegPipeline pkg.
PeakSegPDPAInf which always uses Inf as upper limit of intervals, instead of max(data). This results in more intervals and is thus slower than PeakSegPDPA by a constant factor.
Import data.table >= 1.9.8 (from CRAN not GitHub).
PREV_NOT_SET in min-less since set_prev_seg_end is always called after.
clarify docs, example shows how to recover solution in terms of (M,C) variables.
mass.spec test case and bugfix.
problem.target now considers infeasible models (before, we assigned infeasible models infinite cost).
problem.predict now allows predicting infeasible models (before, we took the closest model inside the target interval). Now the target interval is not used at all during prediction.
since model training now includes fitting a Gaussian model to the log(bases) values of all peaks which are perfectly predicted in the training data, we use that model during prediction to filter all peaks outside of the 95% confidence interval.
no longer printf when more than NEWTON_STEPS in root finding C++ code.
New functions oracleModelComplexity, PPN.cores, mclapply.or.stop, problem.predict.allSamples
simplify arguments of problem.* functions.
more robust checking of penalty_segments.bed in problem.PeakSegFPOP
problem.coverage uses coverage.bigwig if it is present
Many fixes for min-less/more/env bugs which were encountered while running the PeakSegFPOP command line program on large data sets (test cases in test-cosegData.R).
problems.R contains problem.* functions which run the PeakSegFPOP command line program and parse its output (compute the target interval, features, predicted peaks).
Bugfix for min-less: only add a constant piece at the end if we need to. New simulated data set in test-simulation.R which was crashing before the bugfix.
Bugfix in C++ Minimize function which sometimes returned infinite cost when the minimum is exactly on the interval border.
New test-simulation.R which previously was failing the run-time min-more check. Fixed by not adding an interval at the end of min-more, if we end on a degenerate linear function.
New H3K4me3_PGP_immune_chunk24 data set, one of the samples for which the PeakSegPDPA solution with 13 segments was not as likely as the PeakSegDP solution. I figured out that this was being caused by a bug in the "not equal on the sides, zero crossing points" code. When we compare the cost of the first interval (with min_log_mean at -Inf), we still use the cost at the middle, but now we compute the middle in the original space (so the smaller interval limit is at least 0, never -Inf).
Limited memory FPOP version at https://github.com/tdhock/PeakSegFPOP
To avoid root finding problems in large data sets, we now store the mean cost rather than total cost in the C++ code. The R functions still report the total cost.
PeakSegFPOPall C++ function which can start or end either up or down.
PeakSegFPOPLog C++ function forces starts and ends down. This is used by the R function PeakSegFPOP.
Bugfix in min_more computation for intersections between constant cost and degenerate linear cost.
Bugfix in decoding: we no longer call Minimize for segments other than the last one. Now we store the previous segment mean (which was already computed by the min-less/more operator). No more bool equality_constraint_active, instead test prev_seg_mean == INFINITY.
FPOP version of PeakSeg constrained, Poisson loss problem.
For numerical stability, the PeakSegPDPA intervals are in now the log space. That means if the data min is 0, then there will be an interval with -Inf for its lower limit. Smaller root finding is in the f(m)=Lineare^m+Logm+Constant space, and larger root finding is in the g(x)=Linearx+Loglog(x)+Constant space.
The Lambert W root finding method was numerically unstable, for example the roots of the cost function -628*x+log(x)-776.140660 could not be computed correctly since exp(-776) is exactly zero using doubles. Instead I implemented a bound-constrained Newton root finding method which seems to be more stable.
Some bugs fixed in the min env and min less computations.
First working C++ solver for Segment Neighborhood problem with Poisson Loss and PeakSeg constraint (PeakSegPDPA function).