What’s in a name? Machine learning / artificial intelligence was used to classify the personal names of entrepreneurs and beneficial owners of companies in eight European countries (France, Germany, Netherlands, Belgium, Sweden, Italy, Spain, Ireland) and derive unique insights into the Contribution and Challenges of Ethnic Minority Businesses in Europe.
In this OPEN report for MSDUK, Philippe Legrain and Martyn Fitzgerald of Open Political Economy Network found Minority businesses in Europe to contribute at least €570 billion to the economy and employ at least 2.7 million people. The full report can be downloaded here,
Why use personal names classification ? A previous report from the European Commission states clearly the issue : the lack of data.
The problems of analysing migrant and ethnic minority entrepreneurship are aggravated by the fact that statistical information is scarce and not fully comparable between countries. There are some key figures for some countries (e.g. self-employment rates) and estimates but there are no comprehensive official statistics let alone statistics that could be compared internationally. In some cases data are not available because an ethnic differentiation is legally not allowed in the data collection or, more often, theSupporting Entrepreneurial Diversity in Europe : Conclusions and Recommendations of the European Commission’s Network “Ethnic Minority Businesses”
phenomenon of ethnic entrepreneurship it is confused with questions of nationality (e.g. naturalised persons cannot be identified).
Personal names classification can help fill the Information Gap about minority businesses.
How onomastics, the science of names, informs on ethnic diversity
NamSor software uses machine learning to classify personal names (by gender, origin and ethnicity). Although creating a single taxonomy to represent all types of diversity seems like an impossible task, as human societies are fractals in their diversity, NamSor provides a number of taxonomies adapted to how certain countries and regions of the World look at diversity.
Informing on Diversity initiatives in the United-States
For the United-States, NamSor US ‘race’/ethnicity model can classify personal names to 4 classes or 6 classes corresponding to the taxonomy recommended by the US Census
White (Non Latino) – A person having origins in any of the original peoples of Europe, the Middle East, or North Africa.
Black or African American (Non Latino) – A person having origins in any of the Black racial groups of Africa.
Asian – A person having origins in any of the original peoples of the Far East, Southeast Asia, or the Indian subcontinent including, for example, Cambodia, China, India, Japan, Korea, Malaysia, Pakistan, the Philippine Islands, Thailand, and Vietnam.
Hispanic Origin / Latino – Hispanic origin can be viewed as the heritage, nationality, lineage, or country of birth of the person or the person’s parents or ancestors before arriving in the United States. People who identify as Hispanic, Latino, or Spanish may be any race.
Native Hawaiian or Other Pacific Islander – A person having origins in any of the original peoples of Hawaii, Guam, Samoa, or other Pacific Islands.
American Indian or Alaska Native – A person having origins in any of the original peoples of North and South America (including Central America) and who maintains tribal affiliation or community attachment.
Some names are easier to classifier and distinguish, for example Chinese or Japanese names can hardly be confused with Hispanic names. But overall, Namsor provides very accurate insights into racial and ethnic diversity in the United-States, as measured by a recent scientific paper Imperfect Inferences: A Practical Assessment, by Aaron Rieke (Upturn), Vincent Southerland (New York University School of Law), Dan Svirsky (Uber), Mingwei Hsu (Upturn) :
Informing on Diversity initiatives in Canada or Europe
Some other countries look at diversity from a migration standpoint : from which country or region a particular name is likely to come from (ex. China/Asia vs. Italy/Europe). Namsor provides specific taxonomies that corresponds to this relativistic view of diversity and classify names to country level (Country / Origin / Diaspora). Name classification is imperfect by nature, but it provides enough statistically significant insights to be useful. Here is an example of scientific paper, published in the journal NAMES: A JOURNAL OF ONOMASTICS, which offers interesting validation results : Using Onomastics to Inform Diversity Initiatives: Race, Gender, and Names in Academic Radiology in Canada by Sohrab Towfighi, Adrian Marcuzzi, Salman Masood, Faisal Khosa (University of British Columbia, Vancouver, CANADA), Mohsin Yakub (California University of Science and Medicine, Colton, USA), Jessica B. Robbins (University of Wisconsin, Madison, USA)
For the Minority Businesses Matter: Europe report, the Diaspora taxonomy was used.
Informing on Gender Diversity initiatives
NamSor™ Applied Onomastics is a European vendor of sociolinguistics software (NamSor sorts names). NamSor mission is to help understand international flows of money, ideas and people. We proudly support Gender Gap Grader.