Public attention is an interesting field of study. The internet not only allows to access information in no time on virtually any subject but via page access statistics gathered by website authors the subject of attention as well can be studied. For the omnipresent Wikipedia those access statistics are made available via ' http://stats.grok.se' a server providing the information as file dumps as well as as web API. This package provides an easy to use, consistent and traffic minimizing approach to make those data accessible within R.
|Version on CRAN|
|Version on Github||1.1.10|
|Downloads from CRAN.RStudio|
The wikipediatrend package is designed to make Wikipedia page access statistics data availible in R in a most convenient way.
Consequently the package provides
A stable version of the package can be found on CRAN and installed via ...
... while the current developement version can be retrieved by using
install_github() from the devtools package ...
After loading the package several functions are available.
The workhorse of the package is the
wp <- wp_trend(page = c("Fever","Fieber"),from = "2013-08-01",to = "2015-12-31",lang = c("en","de"))
# (... messages shortened)
The function's return is a data frame with six variables date, count, project, title, rank, month paralleling the data provided by the stats.grok.se server:
## date count lang page rank month title## 1 2013-08-01 486 de Fieber 1391 201308 Fieber## 2 2013-08-01 2768 en Fever 5014 201308 Fever## 3 2013-08-02 476 de Fieber 1391 201308 Fieber## 4 2013-08-02 2529 en Fever 5014 201308 Fever## 5 2013-08-03 429 de Fieber 1391 201308 Fieber## 6 2013-08-03 2113 en Fever 5014 201308 Fever
Furthermore, wikipediatrend provides a helper function
wp_linked_pages() which allows to query wikipedia if a particualr article exists in other languages as well:
## page lang title## 1 Schame bar Schame## 2 Reposapeus ca Reposapeus## 3 Footstool en Footstool## 4 Reposapi%C3% ... es Reposapiés## 5 %D8%B2%DB%8C ... fa زیرپایی## 6 Pouf fr Pouf## 7 Skabelo io Skabelo## 8 Voetenb%C3%A ... nds-NL Voetenbänksi ...## 9 Fotskammel nn Fotskammel## 10 Podn%C3%B3%C ... pl Podnóżek
For a more detailed usage have a look at the vignette accompanying the package.
politan.ch (2015-10-04): Welche Ständeratskandidaturen interessieren?. politan.ch. http://www.politan.ch/welche-standeratskandidaturen-interessieren/
politan.ch (2015-05-25): Wenn Klicks Stimmen wären. politan.ch. http://www.politan.ch/wenn-klicks-stimmen-waren/
Munzert, Simon (2015): Using Wikipedia Page View Statistics to Measure Issue Salience. WEBDATANET CONFERENCE 2015. http://conference.webdatanet.eu/uploads/submission/full_paper/35/munzert-wikipedia-webdatanet.pdf
Wilkerson, Bill (2015): Post-Republican debate on Wikipedia follow-up: before and after public interest in the candidates. http://www.wrwilkerson.com/ . http://www.wrwilkerson.com/blog/2015/8/15/post-republican-debate-on-wikipedia-follow-up-before-and-after-public-interest-in-the-candidates
Taha Yasseri and Jonathan Bright (2015): Predicting elections from online information flows: towards theoretically informed models. http://arxiv.org/abs/1505.01818
Mellon, Jonathan (2014) Internet Search Data and Issue Salience: The Properties of Google Trends as a Measure of Issue Salience Journal of Elections, Public Opinion and Parties 24(1):45-72. http://www.tandfonline.com/doi/abs/10.1080/17457289.2013.846346
Yla Tausczik, Kate Faasse, James W. Pennebaker, Keith J. Petrie (2012): Public Anxiety and Information Seeking Following the H1N1 Outbreak: Blogs, Newspaper Articles, and Wikipedia Visits. Health Communication, Vol. 27, Iss. 2. http://www.tandfonline.com/doi/pdf/10.1080/10410236.2011.571759
Ripberger, Joseph T. (2011): Capturing curiosity: using Internet search trends to measure public attentiveness. Policy Studies Journal 39(2):239-259. http://onlinelibrary.wiley.com/doi/10.1111/j.1541-0072.2011.00406.x/full
(I missed your application? Make a pull request, open an issue, drop me a line and I put it here)
Fernando Reis, Eryk Walczak, Simon Munzert, Kristin Lindemann
wp_date()generic and its methods and is detailed in the help files.
wp_trend() would fail with un-informative error if page or lang input would contain NA - now it fails with more informative error: 'Error: all(!is.na(page)) is not TRUE'
vignette would fail due to NA as page/lang input of wp_trend() - code has been changed to prevent such
modifying vignette to comply with CRAN policies: replaced non ascii character in R code by its \u-escape sequence ( \u00e4 )
modifying vignette to comply with CRAN policies: making code evaluation for code that uses non-mainstream repository hosted packages optional on machines that do not have those installed
modifying caching to comply with CRAN policies
changing default folder of cache file from temp (basename(tempdir())) to Rtemp ( tempdir() )
adding ghrr as additional repo to comply with CRAN policies
changing default folder of cache file from home (~) to temp (basename(tempdir()))
feature: caching has been overhauled
feature: wp_trend() now tries to guess if page was supplied as title with possible special characters or as (url-encoded) URL part and take care of further processing
bug-fix: special character support of the packages was lousy and preventing the usage of articles of non-standard languages ( - especially on Windows)
bug-fix / backward compatibility: with version 1.0.0 old parameters for wp_trend() were causing errors
bug-fix: wp_cache_reset() would stop with an error if called twice in a row - fixed
api-change: option userAgent deleted: the default is to send information on versions of R, wikipediatrend, curl as well as RCurl
api-change: option requestFrom deleted: the default is to not send the header
feature: wp_trend() now by default caches data retrievals in a temporary file
feature: wp_trend(file="save.csv") now allows to specify a file where retrievals are stored (this will always add to the already existing data)
feature: wp_trend() now allows to specify more than one page and/or language at a time. data than will be retrieved for every combination of page-language and date
feature: caching system is persistant wp_cache_file() will report file used for caching; wp_cache_reset() will reset cache; wp_cache_load() will return its content as data.frame()
feature: while wp_trend() now (invisibly) returns only data from the current request at hand the new function wp_cache() will retrieve data from cache files (by default / if no file name is specified it retrieves data from .wp_trend_cache)
api-change: the data returned by wp_trend(), cached in cache-file, retrieved by wp_cache() does consist of more variables: date, count, project, title, rank, month
feature: testthat tests now check base functionality of the package
bug-fix: non-existing page views for a month have led to an error, fixed.
bug-fix: wp_trend() now checks date inputs better for logical inconsistencies