NamSor Algorithms help Mylist platform to leverage information from its current customer database; changing the direction, future and finances of the company!
NamSor software, which determines Ethnicity, Origin and Gender of an individual based on their name and surname, has delivered tremendous results to the company Mylist in Dubai. Mylist is a Gift & Reward e-commerce platform that allows you to create online gift lists for: weddings, birthdays, baby showers, and other social events. Operating in Dubai, a global city known for its diverse cultural communities, the Company is looking to benefit from the cultural wealth of one of its most valuable assets: its customer database!
As a Data Scientist, Baptiste Quidet, the data scientist based in Dubai, who conducted the analysis said, “Working both for Mylist and for NamSor, my main responsibility is to connect NamSor’s software to Mylist’s customer database. After some work and data preparation, I linked NamSor API to Mylist customer tables in Power BI. The results of this project have truly been useful to Mylist’s analyst team, as NamSor’s software really helped to confirm views on portfolio identity and/or sometimes disprove common misconceptions about specific business situations. For instance, the study showed, with a high confidence level that an individual from Saudi Arabia spends on average on Mylist platform, twice as much as an individual from the U.K. In a similar confidence level, the analysis proved that participants of gift lists are more likely to be female.” This information is quite valuable when targeting a particular market.
Furthermore, this information gleaned by NamSor serves as a guide on the statistical learning side and giving its users an edge for predicting revenues. Mr. Quidet went on to say, “The results showed that by simply adding features for gender and origin, Mylist could improve its precision rate by 25%! Then, each origin became a categorical feature in our predictive model and it improved precision in the prediction of online orders and total revenue.”
How Data Science adds value to the business model?
Mylist has been growing at a steady pace across various countries throughout the Middle East. We centered our analysis on one of its numerous databases; the Business to Customer segment, which now has a strong foothold in Saudi Arabia and Egypt, as well as its main business location, the UAE. In order to capitalize on the wealth that the database has to offer, it was fundamental to fully understand who the customer is, as well as understanding their needs. To better understand the customer, we decided to focus on their country of birth. In addition, we also determined how long the customer had been currently living in one specific location, which could help us to determine their specific needs in order to customize our services, tailoring it to them specifically.
We collected raw information from the backend platform, we then connected with Power BI, corrected miss-entries, changed the format of values into Category, Integer, date-time formats; we also aggregated information coming from different business locations. We then merged these tables together with Primary and Foreign keys to have a clearer and simpler map reflecting the operational connections between: Products, Customers, Orders, Revenue and Locations.
The result was directly interpreted with Power BI and we created dashboards for direct use for BI analysts that might be interested in the economics, customer segmentation and marketing insights offered by the algorithm program. The primary advantage of having such a processed pipeline is to continue to have direct access to the Business Analytical Tool by suppling data to the database to receive live access to the evolution of the trends. To simply understand the process, I’ve outlined the different steps in the following points:
- Create Pipeline for processing the data, correct wrong entries, change format for: numbers, categories and dates in order to come up with interpretable data for both PowerBI and statistical learning of usable data.
- Illustrate business metrics for revenue growth per category of Wishlist: wedding/baby showers/birthdays/Christmas etc. Put in perspective the evolution of trends and point-out high and low seasons in spending habits.
- Extract value by showing ‘best seller’ products and compare the gap between wish and actual amounts.
- Map revenue per city and country displaying where people live and where they are originally from (Using NamSor’s API).
- Create a Network map of customer Database to show interconnections between hosts and participants of the Wishlists, and understand the impact and potential for future growth.
As a second stage, we built regression models to predict consumer spending. With Jupyter Notebook and Python libraries, we used the tables created previously to input data in a predictive pipeline, hence merging the two pipelines for numerical and categorical values. After illustrating all plot chart, histograms and box-plots, we used Scikit-learn methods to pre-process data and output sets of data for training and validation. As a white-box model, Decision Trees are very useful in the sense that they provide a direct and visual explanation on the logic used to estimate values. They are also very powerful models with low error rate.
Moreover, we used a random search on hyper-parameters to be tuned, as well as carefully choosing the parameters to avoid overfitting training sets. We could also justify the variables by giving what features were maximizing the Gini index (used to measure how features segregate the objective value).
To supplement the analysis, we added variables by computing the polynomial terms for all features and ran a regression analysis. We compared statistical relevance of the models and variables using F-statistics and Student t-statistics along with their p-values. Additionally, we compared the scores (mean square errors and adjusted Coefficient of determination R^2) to decide what model and how many variables will be chosen for the final model and its usable data.
Additionally, we designed learning curves and added regularization terms in order to prevent overfitting. We then tuned regularization parameters and concluded that the Elastic-Net technique was the most efficient from all of our choices. This allowed the model to choose a reduced number of variables by imposing some feature parameters to be zero or close to zero.
Finally, we worked on Times Series model in order to make predictions for the monthly revenues, then we fitted an ARMA model to explain the momentum and mean reversion effects often observed in trading markets in order to explain and utilize the target value. It also helped us to visualize seasonality; relating to when customers are more likely to participate to a gift list during the year.
As done previously, below I’ve listed the steps to complete the process of building models:
- Come up with a white box model to describe customer behavior in a ‘10 nodes tree’; Put in perspective relevant variables of the models.
- Design a regression model with regularization techniques to predict orders processed in the platform and total amount of
- Detect Seasonality and trends in individual orders made to the platform along with total revenue.
- Fit Time Series for Revenue Forecast with ARMA model.
Overall, this project was very useful for data scientists as we acquired useful experiences in understanding the business model of online gift lists, which is very common in the e-commerce industry. We agree that there are many more subjects that require deeper analysis and will support the decision-making process for business and marketing strategies, we give four main action points, such as:
- Make Wishlist revenue ranges (typically from 3 to 7 buckets) with total amounts collected. Then, use a multinomial model to predict which category the host belongs to.
- Use of Non-supervised model such as: K-Means or EM algorithm to perform customer segmentation.
- Track website visitors in real-time; use A/B testing which allows for website/page comparison to determine which is most effective, results can be used for improving website offers and target customer segments. Use Recommender systems, used to predict item ratings and preferences; item based or user-like to propose content according to the interests of a user by collecting preferences or taste information from many users.
- Use ‘Deep Learning Techniques’ if database size increases (for more accurate segmentation).
Tools used:Microsoft Power BI / JupyterNotebook / Scikit-Learn, Statsmodel, Pandas, Numpy, Matplotlib libraries
How NamSor adds value?
Data scientists for companies prefer to use NamSor as it is easy to make request to its API, especially with the launch of V2. NamSor covers all languages and alphabets and has worked with linguists, anthropologists and historians to identify the origin of diasporas across the globe. We used an integration with Power BI, which was very hands-on, and added additional tables including: Gender and Origin, with a score associated to each individual to provide information on the confidence level. The geographical map in Power BI recognizes country code, sub-region or region.
We can then illustrate metrics such as: the average spent on the platform per region/country, and use it to investigate further data records.
At last, in regards to predictive model, adding NamSor add-on gave stronger statistical results for model precision.
In the linear regression model that we discussed earlier, the adjusted coefficient of determination R^2 has improved by 60 basis points. We did this by using gender and countries of origin of the host, from 87.8 to 88.4%. We also noticed that by informing the model of gender and countries of origin of the host, we reduced the error rate (Mean Squared Error of estimated versus target value) by 25%.
We can also see the improvement in the Learning curve of the final model, who tries to predict the revenue per gift list with Polynomial model. (Elastic-net was chosen for Regularization)
It is clear that NamSor features for Origin and Gender helps modelling revenue of the gift list business. Indeed, as we identify consuming patterns among nationalities and geographic areas, we empower management to make better-targeted and educated marketing strategies.
‘Machine learning techniques’ can be easily applied and companies with fast-growing business models such as Mylist can benefit very quickly from numerous open-source libraries to feed live information into data-process pipelines to get the Business Intelligence they need for their business…instantly, and with limited resources. NamSor best illustrates how to leverage the potential of a customer portfolio very simply and for any business sector. Thanks to its API, we also proved that it improves performance of machine learning models. With a more globalized and connected world, diasporas are increasingly moving and a company that depends on knowing their customers will be challenged to keep track of their base. Therefore, it is fundamental that companies apply these techniques to follow migrations, as well as the evolving needs of their customers.
(*) Y= X . β+ ε ,where X is a (n+1× p ) matrice, p representing number of features, n size of the predictor variables.
NamSor™ Applied Onomastics is a European vendor of sociolinguistics software (NamSor sorts names). NamSor’s mission is to help business owners and their management teams better understand international flows of money, ideas, and people in order to maximize profits and streamline efficiency.
#PowerBI #NamSor #Mylist #machinelearning #python #scikit-learn
Photo by Valeria andersson