[NEW! Get your API Key directly from NamSor and use the online tool as a complement to RapidMiner]
This is a hands-on video tutorial on measuring the gender pay gap and other diversity analytics, with the example of open data published by the city of Palo Alto, California.
The original file has a lot of data, but doesn’t have any diversity information so we use NamSor extension for RapidMiner to extract from names the gender and likely origin.
NamSor API can infer gender information with high precision, recognizing for example that Andrea Rossini is likely Italian male, whereas Andrea Parker is more likely a female name. Onomastics or onomatology is the study of the origin, history, and use of proper names.
Using NamSor to determine gender and likely country of origin
For this tutorial, you will need,
- an API Key from NamSor or from Mashape
- A sample file, for example Palo Alto Employees Salaries 2013 [mirror PaloAlto_3024698421708621269.zip]
Data mining can be fun, open data and better corporate transparency can make a difference. Please RT and join us in thanking Palo Alto City for their transparency:
NamSor™ Applied Onomastics is a European designer of name recognition software. NamSor is committed to promote diversity and equal opportunity. NamSor launched GendRE API, a free API to extract gender from personal names. http://namsor.com
GenderGapGrader’s mission is to publish gender gap estimates at the finest grain level, using whatever reference database we can identify for a particular industry: The Internet Movie Database (IMDB) for the film industry, “The Airman Database” for pilots… and more to come. http://gendergapgrader.com