[Ứng dụng của AI-DS] Kỳ 1 - Thuật toán AI để phân khúc khách hàng với Python

Nhằm giúp mọi người hiểu hơn về Trí tuệ nhân tạo (Artificial intelligence - AI) và Khoa học dữ liệu (Data Science - DS), cũng như ứng dụng đa dạng của AI - DS trong cuộc sống, Ngành TTNT & KHDL xin giới thiệu đến các bạn chuỗi bài "𝐀𝐩𝐩𝐥𝐢𝐜𝐚𝐭𝐢𝐨𝐧 𝐨𝐟 𝐀𝐈 & 𝐃𝐚𝐭𝐚 𝐒𝐜𝐢𝐞𝐧𝐜𝐞".

Mở đầu sẽ là Ứng dụng của AI - DS trong lĩnh vực Kinh Tế: Thuật toán AI để phân khúc khách hàng với Python.

Bài viết được thực hiện bởi Abdoul - thực tập sinh của Viện IAD.

Abdoul sẽ giải thích tường tận những dòng code sử dụng, mục đích & tác dụng của những code này.

Mọi người đều có thể tải data & code về để thực hành lại, hoặc thậm chí chạy trực tiếp các dòng code python thông qua trình duyệt (bằng Google Colab).

Link tải dataset (nguồn: Kaggle):
https://drive.google.com/file/d/19BOhwz52NUY3dg8XErVYglctpr5sjTy4/view

Code: https://colab.research.google.com/drive/1iQLDwKWSB-bV5Ggp0R9SDp33mKzL8dkN?usp=sharing

Cùng bắt đầu nào!

𝐂𝐮𝐬𝐭𝐨𝐦𝐞𝐫 𝐬𝐞𝐠𝐦𝐞𝐧𝐭𝐚𝐭𝐢𝐨𝐧

Customer Segmentation is one of the most important applications of unsupervised learning. Using clustering techniques, companies can identify several segments of customers allowing them to target the potential user base.

It is the process that aims to divide customers into different segments of customers based on several groups of individuals that share a similarity in different ways that are relevant to marketing such as gender, age, interests, and miscellaneous spending habits. It is therefore necessary to look for characteristics shared by a sufficient number of people. These characteristics are called segmentation criteria.

Companies that deploy customer segmentation are under the notion that every customer has different requirements and require a specific marketing effort to address them appropriately. These Companies obtain a real competitive advantage over their competitors because they will be able to adapt their offer and their communication according to the groups identified. Successful customer segmentation makes it possible to optimize the marketing efforts to be provided, to better respond to customers and therefore to increase profitability.

𝐓𝐡𝐞 𝐝𝐢𝐟𝐟𝐞𝐫𝐞𝐧𝐭 𝐜𝐫𝐢𝐭𝐞𝐫𝐢𝐚 𝐨𝐟 𝐬𝐞𝐠𝐦𝐞𝐧𝐭𝐚𝐭𝐢𝐨𝐧

The segmentation criteria can be of several types and will be different depending on each company. The different criteria making it possible to segment the consumers present in a market can be:
● Socio-economic criteria: socio-professional category, level of income, level of education
● Psychological criteria: centers of interest, lifestyle, personality, values, technological maturity
● Criteria related to purchasing behavior: frequency of purchase, purchasing habits, use of products
● Socio-demographic criteria: age, sex, size, place of residence, family situation
● Organizational criteria: company attributes, location, level of turnover, size, seniority

Each type of criterion must be precisely defined in order to be able to segment consumers into representative and realistic groups. This analysis and targeting process is an essential step in the segmentation process.

In this machine learning project, we will make use of K-means clustering which is the essential algorithm for clustering unlabeled datasets. The dataset was downloaded on Kaggle and contains 200 lines and 5 columns.
(https://drive.google.com/.../19BOhwz52NUY3dg8XErVYgl.../view)

In the first step of this data science project, we will perform data exploration. We will import the essential packages required for this role and then read our data. Finally, we will go through the input data to gain the necessary insights about it.
All the different steps will be described and explained in the code.

Code: https://colab.research.google.com/.../1iQLDwKWSB...

From the above barplot and pie chart, we observe that the number of females (56%) is higher than the males (44%).

In the above histogram, we can analyze that most of the customers have ages around 30-50 years old and most regular customers have ages around 30-35, this can be noticed in the boxplot.

In the above plot of age distribution, we can infer that there are few people who earn more than 100kUS Dollars. Most of the people have an earning of around 40k-60k US Dollars and the least Income is around 20k US Dollars.

Hình ảnh có thể có: văn bản

The elbow curve above seems to be at 5 so we 'll choose 5 as the number of clusters.

Không có mô tả ảnh.

𝐈𝐧𝐭𝐞𝐫𝐩𝐫𝐞𝐭𝐚𝐭𝐢𝐨𝐧

So we can clearly identify our 5 clusters; according to them, we can say that the clients can be classified as follows: people that don't earn too much are two types: some that keep their spendings low (orange) and those that spend a lot(red).

The ones with high incomes can be split in a similar way: the yellow cluster is the people that keep their spendings at lower levels, while the blue ones spend a lot. There is also a medium category: people that have medium level incomes and keep their spendings at medium levels( black cluster)

Không có mô tả ảnh.

𝐈𝐧𝐭𝐞𝐫𝐩𝐫𝐞𝐭𝐚𝐭𝐢𝐨𝐧

Here we can clearly identify a group of people (red clusters) that keep their spendings at a low level, most of them seem to be above 30 years old; the ones below 30 years old are two types: people that keep their spending at a medium level( blue and orange clusters) and people that spend a lot(black clusters).
As seen previously, this last group is made of people of the opposite levels of income, low and high. The ones that spend at a medium level, in blue and orange, are the group we've seen above in the black cluster, people that also have medium incomes.

Now we can start taking actions with this segmentation. The main strategies are :
● High Value: Improve Retention
● Mid Value: Improve Retention + Increase Frequency
● Low Value: Increase Frequency

𝐒𝐮𝐦𝐦𝐚𝐫𝐲

In this project, we implemented a customer segmentation model using a class of machine learning known as unsupervised learning. The clustering algorithm called K-means clustering has been used. We analyzed and visualized the data and then proceeded to implement our algorithm.

𝐑𝐞𝐟𝐞𝐫𝐞𝐧𝐜𝐞𝐬:
1. https://towardsdatascience.com/data-driven-growth-with...
2. https://data-flair.training/.../r-data-science-project.../
3. https://www.qualtrics.com/.../etu.../segmentation-marketing/

Nếu các bạn có bất kỳ câu hỏi nào, đừng ngần ngại để lại comment tại fanpage Ngành Trí tuệ nhận & Khoa học dữ liệu nhé:

https://www.facebook.com/media/set/?vanity=AI.DS.UDA&set=a.134319738359726

𝑺𝒆𝒆 𝒚𝒐𝒖 𝒏𝒆𝒙𝒕 𝒑𝒓𝒐𝒋𝒆𝒄𝒕𝒔