Tools for Harnessing 'MLBAM' 'Gameday' Data and Visualizing 'pitchfx'

With 'pitchRx', one can easily obtain Major League Baseball Advanced Media's 'Gameday' data (as well as store it in a remote database). The 'Gameday' website hosts a wealth of data in XML format, but perhaps most interesting is 'pitchfx'. Among other things, 'pitchfx' data can be used to recreate a baseball's flight path from a pitcher's hand to home plate. With pitchRx, one can easily create animations and interactive 3D 'scatterplots' of the baseball's flight path. 'pitchfx' data is also commonly used to generate a static plot of baseball locations at the moment they cross home plate. These plots, sometimes called strike-zone plots, can also refer to a plot of event probabilities over the same region. 'pitchRx' provides an easy and robust way to generate strike-zone plots using the 'ggplot2' package.


News

CHANGES IN pitchRx VERSION 1.8.2

BUG FIXES

- Fix #34

CHANGES IN pitchRx VERSION 1.8.1

BUG FIXES

  • scrape() function went to debugger when connect argument was used
CHANGES IN pitchRx VERSION 1.8

BUG FIXES

  • Fixed #26 and #31.
CHANGES IN pitchRx VERSION 1.7

BUG FIXES

- The break_y, break_angle, and break_length fields are now considered REAL as opposed to TEXT.
- Double header games were not being included by default. Now they will. See https://github.com/cpsievert/pitchRx/issues/23

CHANGES IN pitchRx VERSION 1.6

NEW FEATURES

- scrape is now smart enough to grab new gameday identifiers that don't exist in data(gids). This is particularly useful during the postseason since we don't know the identifiers in advance.

CHANGES IN pitchRx VERSION 1.5

NEW FEATURES

  • Data from all available minor league games is now available. See ?scrape for an example.

NEW ADDITION

  • strikeFX will now recognize geom = "raster". Raster is a special case of "tile" where all tiles are the same size. When using the model argument, a geom_raster() is now used. This means 'binwidths' can no longer be altered.

BUG FIXES

  • When using scrape without a database connection and "inning_all.xml" in the suffix, only the first 100 games were being returned. This is now "fixed". See here for more info -- https://github.com/cpsievert/pitchRx/issues/19

  • miniscoreboard.xml files changed slightly in 2014 which caused some errors. These are now fixed.

  • update_db will now actually append to the database if connect argument is not specified.

    CHANGES IN pitchRx VERSION 1.4

MAJOR IMPROVEMENTS

  • I've decided the 'atbat' table should include a 'date' column (for reasons discussed here -- http://baseballwithr.wordpress.com/2014/04/13/modifying-and-querying-a-pitchfx-database-with-dplyr/). If you already have a database and update it using scrape(), a 'date' column will be added to the 'atbat' table (if 'date' doesn't already exist). Unfortunately, when 'date' is added, any 'atbat' INDEXs you previously created will have to be recreated manually.

  • The update_db() function was added to help automate sequential database updates.

MINOR IMPROVEMENTS

  • Added some simple tests to verify scrape() works as intended.

MINOR CHANGES

  • The rgl package was moved from IMPORTS to SUGGESTS since it was causing so many installation difficulties.

    CHANGES IN pitchRx VERSION 1.3

  • Added support for scraping 2014 data.

    CHANGES IN pitchRx VERSION 1.2

MAJOR FIXES

  • In the switch to version 1.0, the count column started to be computed incorrectly. This is now fixed.

    CHANGES IN pitchRx VERSION 1.1

MAJOR FIXES

  • Not all records in the atbat table were being returned due to an improper use of merge. This is now fixed.
  • Records are sometimes duplicated when extracting from source. Duplicated records will now be removed.
  • Some variables weren't being copied to the database correctly (when using the connect argument). For example, the "gameday_link" field in the "game" table was accidently being copied as "venue".

MINOR IMPROVEMENTS

  • A 'gameday_link' was added to all tables for better linking between tables (you can now use this instead of url for a game identifier).

NEW FEATURE

  • The export function allows for robust appending of data.frames to remote tables.

    CHANGES IN pitchRx VERSION 1.0

MAJOR CHANGES

  • scrapeFX() is now deprecated. Please use the scrape() function instead. The function is built upon the R package XML2R. This change will lead to many improvements:

MAJOR IMPROVEMENTS

(1) Easily write tables to a remote database using scrape()'s connect argument. This will lead to both better memory management and faster run-time.
(2) Query particular game(s) of interest using the game.ids option.
(3) Much, much easier to maintain and add support for new file types. scrape() currently supports four types of files: inning/inning_all.xml, inning/inning_hit.xml, miniscoreboard.xml, and players.xml.
(4) pitchRx will serve as a nice example of using the XML2R framework to build web scrapers. Another example is the bbscrapeR package -- https://github.com/cpsievert/bbscrapeR
(5) The memory hog data(urls) is no longer included. Instead, Gameday IDs - data(gids) - will help drive the construction of urls. Also, data(fields) are now used as the "master" list of fields to when writing tables to database.
(6) "Average" users should still find scrape() very easy to use.
(7) Better management of variable types.

MINOR CHANGES

  • The variable "top_inning" in the "atbat" table was changed to "inning_side" and will now contain values of "top" or "bottom" (instead of "Y" or "N").

  • The default value for the limitz opiton for strikeFX was changed from c(-2.5, 2.5, 0, 5) to c(-2, 2, 0.5, 4.5). This eliminates some un-interesting white-space in many cases.

  • geom="ggsubplot" is working again now that CRAN is happy with the ggsubplot package.

MINOR FIXES

  • More robust handling of model option in strikeFX. Also, added informative messages to let user know about conditioning.

    CHANGES IN pitchRx VERSION 0.7

MAJOR CHANGES

  • Markdown vignette was removed since R2SWF was removed from CRAN and there doesn't seem to be a great alternative currently. The vignette may re-appear in future versions, but you can still go to the demo page - http://cpsievert.github.io/pitchRx/demo/

  • The model option of strikeFX is now restricted to "gam" objects (see ?mgcv::gamObject). One can now pass either a fitted gam object or a function call to mgcv::gam or mgcv::bam. When passing a function call, strikeFX will automatically save the model object to the working directory. It's important to note that any variables used in a ggplot2 facet layer or in the density1/density2 options MUST BE INCLUDED as factors in the model.

  • geom="ggsubplot" is no longer a valid geometry option since the ggsubplot package was archived on CRAN.

MINOR FIXES

  • In the pitches dataset, the variables "px", "pz", "x0", "y0", "z0", "vx0", "vy0", "vz0", "ax", "ay", "az" are now numeric variables.

  • scrapeFX changes the "des" field in the "atbat" tables to "atbat_des" to avoid conflicts with the "des" field in the "pitch" table.

    CHANGES IN pitchRx VERSION 0.6

MAJOR CHANGES

  • Better namespace handling. The package no longer depends, but instead imports the following packages: MASS, rgl, mgcv, hexbin, lubridate, stringr, plyr, reshape.

NEW FEATURES

- avg.by option now available for interactiveFX

MINOR FIXES

- When passing a model to strikeFX, a white strike-zone is now drawn (if it isn't a differenced density)

- In animateFX, if the color variable is a factor, the original ordering of levels will be preserved.

    CHANGES IN pitchRx VERSION 0.5

MAJOR CHANGES

  • point.size is no longer supported in animateFX. Point size is now governed by distance from home plate (they will get larger as they progress toward home plate).

  • Default values for the start and end options of scrapeFX were removed. The user must specify dates.

  • Default value for adjust option in strikeFX was changed to FALSE to avoid confusing the user by adjusting vertically locations without specifying so.

NEW FEATURES

  • the model option allows one to create probabilistic strike-zone densities as opposed a "raw" strike-zone density. See the strikeFX example page for an example using mgcv::gam.

  • the avg.by option allows one to average over PITCHf/x parameters for a given discrete variable (note that any faceting variables will be included in the index). That is, the average PITCHf/x parameters are plotted for every unique combination of facet and avg.by variable(s).

DOCUMENTATION

- Documentation of non-exported functions was removed.

    CHANGES IN pitchRx VERSION 0.4

NEW FEATURES

- HTML vignette now available.

- strikeFX has a new geom="subplot2d" option.


    CHANGES IN pitchRx VERSION 0.3

NEW FEATURES

- The data(urls) object is now updated to include 2013 urls.
  • The function updateUrls() is now exported so that users can collect urls that are not already provided in data(urls).

    CHANGES IN pitchRx VERSION 0.2

NEW FEATURES

  • see here for a comprehensive tutorial: http://cpsievert.github.com/pitchRx/demo

  • 3D plots of pitch trajectories are now available through interactiveFX(). Note that spheres==TRUE employs basic rgl::spheres3d functionality; otherwise rgl::plot3d is used

    CHANGES IN pitchRx VERSION 0.1

NEW FEATURES

  • first version of pitchRx.

  • intro to data collection with pitchRx: http://cpsievert.wordpress.com/2013/01/10/easily-obtain-mlb-pitchfx-data-using-r/

  • using pitchRx via shiny: http://cpsievert.wordpress.com/2013/01/13/pitchrx-shiny-fun-flexible-mlb-pitchfx-visualization

MISC

  • in this NEWS file, #n means the issue number on GitHub, e.g. #1 is https://github.com/cpsievert/pitchRx/issues/1

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("pitchRx")

1.8.2 by Carson Sievert, 2 years ago


http://cpsievert.github.com/pitchRx


Report a bug at http://github.com/cpsievert/pitchRx/issues


Browse source code at https://github.com/cran/pitchRx


Authors: Carson Sievert <cpsievert1@gmail.com>


Documentation:   PDF Manual  


Task views:


MIT + file LICENSE license


Imports XML2R, plyr, MASS, hexbin, mgcv

Depends on ggplot2

Suggests DBI, dplyr, RSQLite, parallel, knitr, animation, shiny, testthat, ggsubplot, rgl


See at CRAN