Utilizes the 'genderize.io' Application Programming Interface to predict gender from first names extracted from a text vector. The accuracy of prediction could be controlled by two parameters: counts of a first name in the database and probability of prediction.
by Kamil Wais homepage / contact
R package for gender predictions based on first names.
The package home page: http://www.wais.kamil.rzeszow.pl/genderizer/
Information about the genderize.io project and documentation of the API: http://genderize.io
The genderizeR package uses genderize.io API to predict gender from first names extracted from text corpus (not only from clean vectors of given names). The accuracy of prediction could be controlled by two parameters: counts of first names in database and probability of gender given the first name. The package has also built-in functions that can calculate specific errors (also with bootstrapping), train algorithm on training dataset (with gender labels) and prepare character vectors for gender checking.
Installing the package ----------------------
Remember to install
devtools package first!
#> #> Welcome to genderizeR package version: 2.0 #> #> Homepage: http://www.wais.kamil.rzeszow.pl/genderizeR #> #> Changelog: news(package = 'genderizeR') #> Help & Contact: help(genderizeR) #> #> If you find this package useful cite it please. Thank you! #> See: citation('genderizeR') #> #> To suppress this message use: #> suppressPackageStartupMessages(library(genderizeR))
# An example for a character vector of stringsx = c("Winston J. Durant, ASHP past president, dies at 84","JAN BASZKIEWICZ (3 JANUARY 1930 - 27 JANUARY 2011) IN MEMORIAM","Maria Sklodowska-Curie")# Search for terms that could be first names# If you have your API key you can authorize access to the API with apikey argument# e.g. findGivenNames(x, progress = FALSE, apikey = 'your_api_key')givenNames = findGivenNames(x, progress = FALSE)
# Use only terms that have more than x counts in the databasegivenNames = givenNames[count > 100]givenNames#> name gender probability count#> 1: jan male 0.60 1692#> 2: maria female 0.99 8467#> 3: winston male 0.98 128# Genderize the original character vectorgenderize(x, genderDB = givenNames, progress = FALSE)#> text#> 1: Winston J. Durant, ASHP past president, dies at 84#> 2: JAN BASZKIEWICZ (3 JANUARY 1930 - 27 JANUARY 2011) IN MEMORIAM#> 3: Maria Sklodowska-Curie#> givenName gender genderIndicators#> 1: winston male 1#> 2: jan male 1#> 3: maria female 1
For more comprehensive tutorial check the vignette in the package.
news(package = 'genderizeR')
help(package = 'genderizeR')?textPrepare?findGivenNames?genderize
Fork git repo
https://github.com/kalimu/genderizeR and submit a pull request.
If you enjoy using the package you could write a short testimonial and send it to me. I will be happy to post in on the package homepage.
For any kind of feedback you can use the contact form here: http://www.wais.kamil.rzeszow.pl/kontakt/
Please use the contact form: http://www.wais.kamil.rzeszow.pl/kontakt/
Thank You for the citation!