Full texts for Jane Austen's 6 completed novels, ready for text analysis. These novels are "Sense and Sensibility", "Pride and Prejudice", "Mansfield Park", "Emma", "Northanger Abbey", and "Persuasion".
must be intolerably stupid.”
(from Mr. Tilney in Northanger Abbey)
This package provides access to the full texts of Jane Austen's 6 completed, published novels. The UTF-8 plain text for each novel was sourced from Project Gutenberg, processed a bit, and is ready for text analysis. Each text is in a character vector with elements of about 70 characters. The package contains:
sensesensibility: Sense and Sensibility, published in 1811
prideprejudice: Pride and Prejudice, published in 1813
mansfieldpark: Mansfield Park, published in 1814
emma: Emma, published in 1815
northangerabbey: Northanger Abbey, published posthumously in 1818
persuasion: Persuasion, also published posthumously in 1818
There is also a function
austen_books() that returns a tidy data frame of all 6 novels.
Users should be aware that there are some differences in usage between the novels as made available by Project Gutenberg. For example, "anything" vs. "any thing", "Mr" vs. "Mr.", and using underscores vs. all caps to indicate italics/emphasis.
To install the package type the following:
Or you can install the development version from Github:
For some ideas on getting started with analyzing these texts, see my blog post on sentiment analysis of Austen's novels. For help within R, try
?persuasion or similar for getting started with the data sets.
This project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
austen_booksfunction to align with publication order
austen_booksfunction to align with publication order (made an error in this)
dplyrto Suggests; change implementation of
austen_booksto use base functions thanks to Jeroen Ooms
NEWS.mdfile to track changes to the package.