With 'pitchRx', one can easily obtain Major League Baseball Advanced Media's 'Gameday' data (as well as store it in a remote database). The 'Gameday' website hosts a wealth of data in XML format, but perhaps most interesting is 'pitchfx'. Among other things, 'pitchfx' data can be used to recreate a baseball's flight path from a pitcher's hand to home plate. With pitchRx, one can easily create animations and interactive 3D 'scatterplots' of the baseball's flight path. 'pitchfx' data is also commonly used to generate a static plot of baseball locations at the moment they cross home plate. These plots, sometimes called strike-zone plots, can also refer to a plot of event probabilities over the same region. 'pitchRx' provides an easy and robust way to generate strike-zone plots using the 'ggplot2' package.
CHANGES IN pitchRx VERSION 1.8.2
- Fix #34 CHANGES IN pitchRx VERSION 1.8.1
CHANGES IN pitchRx VERSION 1.8
CHANGES IN pitchRx VERSION 1.7
- The break_y, break_angle, and break_length fields are now considered REAL as opposed to TEXT. - Double header games were not being included by default. Now they will. See https://github.com/cpsievert/pitchRx/issues/23 CHANGES IN pitchRx VERSION 1.6
- scrape is now smart enough to grab new gameday identifiers that don't exist in data(gids). This is particularly useful during the postseason since we don't know the identifiers in advance. CHANGES IN pitchRx VERSION 1.5
When using scrape without a database connection and "inning_all.xml" in the suffix, only the first 100 games were being returned. This is now "fixed". See here for more info -- https://github.com/cpsievert/pitchRx/issues/19
miniscoreboard.xml files changed slightly in 2014 which caused some errors. These are now fixed.
update_db will now actually append to the database if connect argument is not specified.
CHANGES IN pitchRx VERSION 1.4
I've decided the 'atbat' table should include a 'date' column (for reasons discussed here -- http://baseballwithr.wordpress.com/2014/04/13/modifying-and-querying-a-pitchfx-database-with-dplyr/). If you already have a database and update it using scrape(), a 'date' column will be added to the 'atbat' table (if 'date' doesn't already exist). Unfortunately, when 'date' is added, any 'atbat' INDEXs you previously created will have to be recreated manually.
The update_db() function was added to help automate sequential database updates.
The rgl package was moved from IMPORTS to SUGGESTS since it was causing so many installation difficulties.
CHANGES IN pitchRx VERSION 1.3
Added support for scraping 2014 data.
CHANGES IN pitchRx VERSION 1.2
In the switch to version 1.0, the count column started to be computed incorrectly. This is now fixed.
CHANGES IN pitchRx VERSION 1.1
The export function allows for robust appending of data.frames to remote tables.
CHANGES IN pitchRx VERSION 1.0
(1) Easily write tables to a remote database using scrape()'s connect argument. This will lead to both better memory management and faster run-time. (2) Query particular game(s) of interest using the game.ids option. (3) Much, much easier to maintain and add support for new file types. scrape() currently supports four types of files: inning/inning_all.xml, inning/inning_hit.xml, miniscoreboard.xml, and players.xml. (4) pitchRx will serve as a nice example of using the XML2R framework to build web scrapers. Another example is the bbscrapeR package -- https://github.com/cpsievert/bbscrapeR (5) The memory hog data(urls) is no longer included. Instead, Gameday IDs - data(gids) - will help drive the construction of urls. Also, data(fields) are now used as the "master" list of fields to when writing tables to database. (6) "Average" users should still find scrape() very easy to use. (7) Better management of variable types.
The variable "top_inning" in the "atbat" table was changed to "inning_side" and will now contain values of "top" or "bottom" (instead of "Y" or "N").
The default value for the limitz opiton for strikeFX was changed from c(-2.5, 2.5, 0, 5) to c(-2, 2, 0.5, 4.5). This eliminates some un-interesting white-space in many cases.
geom="ggsubplot" is working again now that CRAN is happy with the ggsubplot package.
More robust handling of model option in strikeFX. Also, added informative messages to let user know about conditioning.
CHANGES IN pitchRx VERSION 0.7
Markdown vignette was removed since R2SWF was removed from CRAN and there doesn't seem to be a great alternative currently. The vignette may re-appear in future versions, but you can still go to the demo page - http://cpsievert.github.io/pitchRx/demo/
The model option of strikeFX is now restricted to "gam" objects (see ?mgcv::gamObject). One can now pass either a fitted gam object or a function call to mgcv::gam or mgcv::bam. When passing a function call, strikeFX will automatically save the model object to the working directory. It's important to note that any variables used in a ggplot2 facet layer or in the density1/density2 options MUST BE INCLUDED as factors in the model.
geom="ggsubplot" is no longer a valid geometry option since the ggsubplot package was archived on CRAN.
In the pitches dataset, the variables "px", "pz", "x0", "y0", "z0", "vx0", "vy0", "vz0", "ax", "ay", "az" are now numeric variables.
scrapeFX changes the "des" field in the "atbat" tables to "atbat_des" to avoid conflicts with the "des" field in the "pitch" table.
CHANGES IN pitchRx VERSION 0.6
- avg.by option now available for interactiveFX
- When passing a model to strikeFX, a white strike-zone is now drawn (if it isn't a differenced density) - In animateFX, if the color variable is a factor, the original ordering of levels will be preserved. CHANGES IN pitchRx VERSION 0.5
point.size is no longer supported in animateFX. Point size is now governed by distance from home plate (they will get larger as they progress toward home plate).
Default values for the start and end options of scrapeFX were removed. The user must specify dates.
Default value for adjust option in strikeFX was changed to FALSE to avoid confusing the user by adjusting vertically locations without specifying so.
the model option allows one to create probabilistic strike-zone densities as opposed a "raw" strike-zone density. See the strikeFX example page for an example using mgcv::gam.
the avg.by option allows one to average over PITCHf/x parameters for a given discrete variable (note that any faceting variables will be included in the index). That is, the average PITCHf/x parameters are plotted for every unique combination of facet and avg.by variable(s).
- Documentation of non-exported functions was removed. CHANGES IN pitchRx VERSION 0.4
- HTML vignette now available. - strikeFX has a new geom="subplot2d" option. CHANGES IN pitchRx VERSION 0.3
- The data(urls) object is now updated to include 2013 urls.
The function updateUrls() is now exported so that users can collect urls that are not already provided in data(urls).
CHANGES IN pitchRx VERSION 0.2
see here for a comprehensive tutorial: http://cpsievert.github.com/pitchRx/demo
3D plots of pitch trajectories are now available through interactiveFX(). Note that spheres==TRUE employs basic rgl::spheres3d functionality; otherwise rgl::plot3d is used
CHANGES IN pitchRx VERSION 0.1
first version of pitchRx.
intro to data collection with pitchRx: http://cpsievert.wordpress.com/2013/01/10/easily-obtain-mlb-pitchfx-data-using-r/
using pitchRx via shiny: http://cpsievert.wordpress.com/2013/01/13/pitchrx-shiny-fun-flexible-mlb-pitchfx-visualization