A simple data science challenge system using R Markdown and 'Dropbox' < https://www.dropbox.com/>. It requires no network configuration, does not depend on external platforms like e.g. 'Kaggle' < https://www.kaggle.com/> and can be easily installed on a personal computer.
The rchallenge R package provides a simple data science competition system using R Markdown and Dropbox with the following features:
Further documentation is available in the Reference manual.
Please report bugs, troubles or discussions on the Issues tracker. Any contribution to improve the package is welcome. Install the R package from CRAN repositories
install.packages("rchallenge")
or install the latest development version from GitHub
# install.packages("devtools")
devtools::install_github("adrtod/rchallenge")
A recent version of pandoc (>= 1.12.3) is also required. See the pandoc installation instructions for details on installing pandoc for your platform.
Install a new challenge in Dropbox/mychallenge
:
setwd("~/Dropbox/mychallenge")library(rchallenge)?new_challengenew_challenge()
or for a french version:
new_challenge(template = "fr")
You will obtain a ready-to-use challenge in the folder Dropbox/mychallenge
containing:
challenge.rmd
: template R Markdown script for the webpage.data
: directory of the data containing data_train
and data_test
datasets.submissions
: directory of the submissions. It will contain one subdirectory per team where they can submit their submissions. The subdirectories are shared with Dropbox.history
: directory where the submissions history is stored.The default challenge provided is a binary classification problem on the German Credit Card dataset.
You can easily customize the challenge in two ways:
new_challenge
function.data
subdirectory and the baseline predictions in submissions/baseline
and by customizing the template challenge.rmd
as needed.To complete the installation:
Create and share subdirectories in submissions
for each team:
?new_teamnew_team("team_foo", "team_bar")
Render the HTML page:
?publishpublish()
Use the output_dir
argument to change the output directory.
Make sure the output HTML file is rendered, e.g. using GitHub Pages.
Give the URL to your HTML file to the participants.
Refresh the webpage by repeating step 2 on a regular basis. See below for automating this step.
From now on, a fully autonomous challenge system is set up requiring no further administration. With each update, the program automatically performs the following tasks using the functions available in our package:
store_new_submissions
reads submitted files and save new files in the history.print_readerr
displays any read errors.compute_metrics
calculates the scores for each submission in the history.get_best
gets the highest score per team.print_leaderboard
displays the leaderboard.plot_history
plots a chart of score evolution per team.plot_activity
plots a chart of activity per team.You can setup the following line to your crontab using crontab -e
(mind the quotes):
0 * * * * Rscript -e 'rchallenge::publish("~/Dropbox/mychallenge/challenge.rmd")'
This will render a HTML webpage every hour.
Use the output_dir
argument to change the output directory.
If your challenge is hosted on a Github repository you can automate the push:
0 * * * * cd ~/Dropbox/mychallenge && Rscript -e 'rchallenge::publish()' && git commit -m "update html" index.html && git push
You might have to add the path to Rscript and pandoc at the beginning of your crontab:
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
Depending on your system or pandoc version you might also have to explicitly add the encoding option to the command:
0 * * * * Rscript -e 'rchallenge::publish("~/Dropbox/mychallenge/challenge.rmd", encoding = "utf8")'
You can use the Task Scheduler to create a new task with a Start a program action with the settings (mind the quotes):
Rscript.exe
-e rchallenge::publish('~/Dropbox/mychallenge/challenge.rmd')
My own challenge (in french) given to Master students at the University of Bordeaux.
A classification and variable selection problem (in french) given by Robin Genuer (Bordeaux).
Please contact me to add yours.
Copyright (C) 2014-2015 Adrien Todeschini.
Contributions from Robin Genuer.
Design inspired by Datascience.net, a french platform for data science challenges.
The rchallenge package is licensed under the GPLv2 (https://www.gnu.org/licenses/gpl-2.0.html).
ggvis
output_dir
argument of publish
function now defaults to "index.html"
. Useful for hosting the challenge on a GitHub repo with Github pages.glyphicon
is defunct. use icon
instead of glyphicon.print_readerr
displays a table.get_best
returns a single data.frame instead of a list with one data.frame per metric. the ranking can be based on several metrics in a specific order to break ties.update_rank_diff
and print_leaderboard
take a single data.frame as inputoutput_dir
argument of publish
function now defaults to the input
directory instead of "~/Dropbox/Public"
because Dropbox rendering of HTML content is discontinued.out_rmdfile
argument to new_challenge
template
argument to c("en", "fr")
new_team
can create several teamsnew_team
functionpublish