Tools for Storing, Restoring and Searching for R Objects

Data exploration and modelling is a process in which a lot of data artifacts are produced. Artifacts like: subsets, data aggregates, plots, statistical models, different versions of data sets and different versions of results. The more projects we work with the more artifacts are produced and the harder it is to manage these artifacts. Archivist helps to store and manage artifacts created in R. Archivist allows you to store selected artifacts as a binary files together with their metadata and relations. Archivist allows to share artifacts with others, either through shared folder or github. Archivist allows to look for already created artifacts by using it's class, name, date of the creation or other properties. Makes it easy to restore such artifacts. Archivist allows to check if new artifact is the exact copy that was produced some time ago. That might be useful either for testing or caching.


News

  • Bugs fixed:
  1. Fixed bug: All examples fail when CreateLocalRepo is invoked with the default=FALSE parameter. [#291]
  2. Changed aread and asearch examples due to new version of ggplot2 - 2.2.0 [#296]
  3. Changed asave examples due to new version of ggplot2 - 2.2.0 [#300]
  • New features:
    1. aread(), asearch(), searchInLocalRepo() and loadFromLocalRepo() are now handling URL addresses as well. This may be useful to access artifacts generated by the shiny app.
    2. %a% archives proper names of first object so does ahistory prints proper name of archived artifact instead of env[[nm]]. [#269]
    3. addHooksToPrint() :
    4. Can now give links in latex format as it has new format argument. [#270]
    5. Archives artifacts with their original names instead md5hashes. [#287]
  • New functions:
    1. The atrace() function is added. It call trace() function to store a selected object in the repository after each call to specified FUN (for example 'lm').
  • New features:
    1. restoreLibs() can now restore libraries in custom directory. [#251
    2. createMDGallery has new maxTags parameter so that gallery's summaries in the README.md files now has limited chunk's length. [#249]
    3. We've added section 'Notes' to each man page with the link to https://github.com/pbiecek/archivist/issues
  • New functions:
    1. The restoreLibs() function is added. It recovers previous versions of R packages. Needed due to rapid changes in structure of ggplot2 objects. Now one can restore version of the ggplot2 package consistent with archived object.
  • New functions:
    1. All Github functions are now Remote functions. This is because we support more than just GitHub repos (currently github + git/hg bitbucket)
    2. RemoteRepoCheck is used to verify if parameters for remote repo are correct.
    3. asession returns session info for given artifact (similar to aread).
    4. aformat returns vector of formats in which the artifact is saved (similar to aread).
  • New features:
    1. saveToRepo by default saves session info.
    2. repoDirGit has changed name to subdir and the default value is now '/'.
    3. All remote functions support github and bitbucket (see the repoType parameter).
    4. alink is now working with github and bitbucket repositories.
    5. asearch returns named list of artifacts. MD5hashes are used as names.
    6. silent=TRUE by default in saveToRepo. Less warnings.
    7. saveToRepo has now two copies, consistent with other names saveToLocalRepo an short one asave.
  • Deprecations and new naming convention:
    1. archive, cloneGitHubRepo, createGitHubRepo, (deleteGithubRepo) deleteGitHubRepo, (pushRepo) pushGitHubRepo, (pullRepo) pullGitHubRepo have been moved to separate archivist.github package to maintain Local/Remote consistency. [#198].
    2. For the above reason deleteRepo was deprecated. Use deleteLocalRepo
    3. For the above reason createEmptyLocalRepo and createEmptyRepo were deprecated. Use createLocalRepo.
    4. For the above reason rmFromRepo was deprecated. Use rmFromLocalRepo
    5. multiSearchInLocalRepo and it's remote version were deprecated. Now multiple patterns are available in searchInLocalRepo/searchInRemoteRepo.
  • New functions:

    1. alink function: Returns a Link To Download an Artifact Stored on GitHub Repository. Ideal combination with archive
    2. pushRepo function which add files, commits them and pushes from Local Repository to synchronized GitHub one. [#146].
    3. pullRepo pulls (git pull) changes from remote GitHub Repository to the correspoding Local one. [#146].
    4. New functions deleteLocalRepo (previous deleteRepo) and deleteGithubRepo. [#156].
    5. createGithubMDGallery that give the markdown summary for each artifact in the repository. Ideal for README.md file. Example [#144]
  • Bugs fixed:

    1. asearch function enables a user to read artifacts from default GitHub repository. In the previous version it was possible only in default local repository.
    2. It is now possible to unset global Repository with apotions('repo/repoDir', NULL, unset = TRUE) [#176].
  • New features:

    1. Alterations in the text of: ?ahistory, ?cache, ?asearch, ?archive, ?cloneGithubRepo, githubFunctions, ?shinySearchInLocalRepo, ?alink documentation pages.
    2. Additional examples to better understand usage of archivist package functions:
      1. In asearch completely new example section divided into 3 subsections: default local repository, default GitHub resository and Github repository.
    3. Added new tags in the following methods: extractTags.lm, extractTags.htest. extractTags.lda, extractTags.qda, extractTags.survfit, extractTags.glmnet, extractTags.partition.
    4. htest object's data is now saved to repository as a list.
    5. It is possible to archive devtoolss::session_info() with an artifact during the execution of saveToRepo() and archive() [#184].
    6. New tag format: is now added to every artifact/miniature. Artifacts can be saved in different (and more than one) formats (rda/json/csv) what makes them easier to access from other languages.
  • New and renamed parameters:

    1. user.name and user.password parameters of archive and createEmptyGithubRepo were changed into user and password correspondingly.
    2. createEmptyGithubRepo now can use repoDir to specify in which directory the synchronized Local Repository should be created [#142].
    3. archive no longer cats hook to the artifact during the execution. Hook cat can be set with new alink parameter that uses alink() function, where parameters can be passed with ....
    4. deleteRepo has now new unset parameter that allows to unset global aoptions('repoDir', NULL, unset = TRUE) when deleted repoDir was a globally specified Repository [#157].
    5. Changed parameter name in cloneGithubRepo from local_path to repoDir to maintain consistency within package documentation and name convention.
    6. createEmptyGithubRepo, createEmptyRepo(type ='github') and cloneGithubRepo now reacts on new default parameter which sets newly created/cloned repositories (GitHub and synchronized with it Local one) as default [#171 , #142].
    7. Changed the name of chain parameter to value in saveToRepo function #101].
    8. Changed the name of aformat parameter to format in ahistory() to maintain consistency with alink() function.
    9. Fix in alink. Now the repoDirGit is supported.
  • Archivist Integration With GitHub API: new functions: 1. It is possible to create new GitHub repository with an empty archivist-like Repository with createEmptyGithubRepo function. We also added createEmptyLocalRepo to maintain consistency with other sister functions. createEmptyRepo is now a wrapper around createEmptyLocalRepo and createEmptyGithubRepo functions. 2. One can now clone GitHub-archivist repo with new cloneGithubRepo function.
    1. One can automatically archive artifacts to Local and synchronized GitHub archivist-like Repositiories with new archive function. Example: https://github.com/MarcinKosinski/archive-test4/commits/master
    2. Added manual page to enable easier usage of this integration: ``?archivist-github-integration``` (or shorter?agithub`).
  • New functions: 1. splitTagsLocal and splitTagsGithub enabling to split tag column in database into two separate columns: tagKey and tagValue.
  • Bugs fixed: 1. checkDirectory function is now immune to directories that don't exist. This made showLocalRepo function working properly when passed an argument to the directory that do not exist. 2. Changed dbDisconnect( conn ) call to the on.exit(dbDisconnect( conn )) in executeSingleQuery function to prevent a situation in which during an error inside a function (which might be produced), the connection stays open, when it shouldnt. 3.%a%operator does react ondefault = TRUEincreateEmptyRepo` function.
    1. deleteRoot = TRUE argument of the deleteRepo function works properly and enables removing root directory of the Repository.
    2. Some changes in rmFromRepo's body:
      1. Function will give a warning when a user uses wrong md5hash (that does not exist in the Repository). In case of wrong md5hash abbreviation a user will receive an error message.
      2. Artifacts' data is now removed from tag table in backpack.db file when many = TRUE. They were not removed before.
      3. Artifacts' data files are now removed from gallery folder. They were not removed before.
      4. Invisible(NULL) is the result of the function evaluation.
    3. Some changes in copy*Repo's body:
      1. Invisible(NULL) is the result of the function evaluation
      2. repoFrom parameter in copyLocalRepo is set to NULL as default.
    4. copyFromLocalRepo and copyFromGithubRepo copies only distinct records for table tag and artifact in backpack.db file, that can be seen with show*Repo and copies all mentioned artifacts for local version.
    5. downloadDB in createEmptyRepo function gives a user-friendly error.
    6. In zipGithubRepo unzipped file has the same name as zip file. Earlier it had a name of the temporary file that was difficult to notice.
    7. In setGithubRepo it is now possible to use repoDirGit parameter. Before there was wrong stopifnot condition.
    8. paste0() was replaced by file.path() in appropriate places of function's bodies in the following R scripts: archive.R, copyToRepo.R, createEmptyRepo.R, deleteRepo.R, extractMiniature.R, loadFromRepo.R, rmFromRepo.R, saveToRepo.R, zipRepo.R.
    9. Two crucial parts of checkDirectory's function body were removed due to changes in point 11. checkDirectory2 was completely removed as it is unnecessary now.
    10. Small change in test_base_functionalities.R due to changes in point 11 and 12.
    11. aoptions for user and repo will work properly with showGithubRepo and summaryGithubRepo when set. It might have not been noticed in version 1.7, it might have been a bug that occured in the development between 1.7 and 1.8 version.
  • New features: 1. print.ahistory function can now print outputs of the artifact's history as the knitr::kable would. 2. Examples for searchInGithubRepo now works for user='pbiecek' and repo='archivist parameters as we added new backpack.db file. The previous one was almost empty (for 7 months). 3. Additional examples to better understand usage of archivist package functions: 1. in loadFromRepo function - Loading artifacts from the repository which is built in the archivist package and saving them on the example repository. 2. in createEmptyRepo function - creating a default local Repository in non existing directory. 3. in rmFromRepo function - removing artifacts with many = TRUE argument. 4. in deleteRepo function - using deleteRoot = TRUE argument. 5. in copy*Repo function - using graphGallery local repository in copyLocalRepo function. 6. in get*Tags function - additional example using getTagsLocal function. 7. in aoptions function - added two new examples concerning usage of silent and repoDir parameters in this function. 4. Alterations in the text of: ?Tags, ?Repository, ?md5hash, archivist-package, ?saveToRepo, loadFromRepo, summaryRepo, showRepo, ?searchInRepo, ?createEmptyRepo, ?rmFromRepo, ?deleteRepo, copyToRepo, zipRepo, setRepo, getTags, addTagsRepo, magrittr, archivistOptions, ?aread documentation pages. 5. Adding missing functions which are used in the archivist package now to ?Repository documentation page. 6. tempdir() was replaced by tempfile() in examples sections of: ?addTagsRepo, ?cache, copyToRepo, createEmptyRepo, ?deleteRepo, loadFromRepo, ?rmFromRepo, ?saveToRepo, setRepo, showRepo, summaryRepo, ?Tags, zipRepo documentation pages. tempdir is existing directory in which R works so calling deleteRepo( exampleRepoDir, deleteRoot=TRUE) removed important R files. 7. New tests for the following functions: zip*Repo.
    1. In order to obtain cohesion with Tags in all functions there has been stated such an order:
      1. If we use Tags in the text of function's documentation, examples' comments, then Tags are considered as a proper name and they begin with capital letter.
      2. If we use tags in function's body, as parameters, as R object's atrributes, then they begin with small letter.
    2. Added checking if parameters have appropriate lengths in the following function's bodies: ?addTagsRepo, asearch, ?cloneGithubRepo, copy*Repo, createEmptyLocalRepo, getTags*, loadFrom*Repo, ?rmFromRepo, ?saveToRepo, searchIn*Repo, set*Repo, ?shinySearchInLocalRepo, showRepo, summary*Repo, zip*Repo
  • The order of parameters in asearch has changed!

  • Added graphGallery for self-contained examples

  • aread allows for single MD5 hash (which will be read from the default repo)

  • asearch allows for only patterns (will be searched in local repo)

  • ahistory has now 'artifact' argument instead of 'obj'

  • Added tests.

  • Removed unnecessary dependencies - now archivist is free of dependencies.

  • shiny package is in Suggests so you should load that package before running shinySearchInLocalRepo function.

  • Moved saveSetToRepo with a new function loadSetFromRepo to the github.com/pbiecek/archivist2 repository.

  • Fix in aread(), now subdirectories are handled properly
  • aoption() handles default values for archivist parameters
  • createEmptyRepo() takes 'default' argument. If set to TRUE, then the new empty repo becomes the default one.
  • Added CITATION
  • Added new demo, as for JSS article replication script
...should be updated...
...should be updated...
  • Added setLocalRepo and setGithubRepo functions. ...should be updated...

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("archivist")

2.1.2 by Przemyslaw Biecek, 9 months ago


https://pbiecek.github.io/archivist/


Report a bug at https://github.com/pbiecek/archivist/issues


Browse source code at https://github.com/cran/archivist


Authors: Przemyslaw Biecek [aut, cre], Marcin Kosinski [aut], Witold Chodor [ctb]


Documentation:   PDF Manual  


Task views: Reproducible Research


GPL-2 license


Imports RCurl, digest, httr, DBI, lubridate, RSQLite, magrittr

Suggests shiny, dplyr, testthat, ggplot2, devtools, knitr

Enhances archivist.github


Imported by reproducible.

Depended on by archivist.github.

Suggested by SpaDES.


See at CRAN