Online Customer Buying Behaviour Analysis using Sequence Embedding: A Case Study Approach
When Sequence embedding can generate a better result than conventional machine learning algorithms
I thought to start this article from a different angle; maybe just to give you a bigger picture of the utility of this approach of solving a very common but critical problem of any consumer-facing industry.
Digital Footprint and User Journeys
The number of digital buyers worldwide is currently around 1.7 billion and may go up to 2.1 billion by next three years according to a statistical report and forecast (https://www.statista.com/statistics/251666/number-of-digital-buyers-worldwide/). Millions of users are visiting every day to any of the online shopping sites and leaving their digital footprints at the same time. How can we understand those consumers’ buying behaviour or maybe more specifically whether those consumers are going to buy any product or not while visiting the website?
Conventional Machine Learning Approach
There are many kinds of research already taken place in this space and there are dozens of machine learning algorithms available to solve this typical classification problem. Maybe we can start with defining the problem that can be solved using the established approach. From there we can move to a more unorthodox methodology to solve the same problem.
Every user is having a different journey than others on any shopping website. Let’s assume we have consumer dataset with millions of past records and each record is showing individual user’s journey as shown in following.
In the data sample, we can imagine the web pages (1 to 4) like product specification page, product cart page, product deal page, product billing page etc. In the above table, we can see 1st user (id:1001) only visited web page 1 and 3 (1: visited and 0: not visited) in session 1, whereas 2nd user (id:1002) visited web pages 2 and 4. Also, we have another type of feature showing the total time spent on any web page. Unit of time spent can be assumed as minutes. Here we have taken only four web pages and two types of features for the sake of simplicity; in the real world there could be hundreds of such web pages and/or features. But the underlined concepts will remain the same.
Now if I tell you to predict whether a user can buy or not provided the above dataset for training your model, how will you go from here?
Among many approaches that you think of, one could be to take these eight features directly into the prediction model and get the pattern of user buying behaviour. Refer to Table 1.1 above.
Another method could be to get the web page visit count along with total time spent on those pages (if such data available) and then take these as features in the prediction model. Refer Table 1.2 in following.
As I told there could be multiple approaches in solving the same problem. But if we look at both the solution we can see one common thing missing Can you see that too?
Maybe you get it correct. We are not taking the users’ journey into consideration. Every user is having a unique journey as compared to others and if we can capture that, then the predictive model may create wonderful results for us.
But How we can bring users’ journey into our model? The answer is Sequence Embedding.
What is Sequence Embedding?
Let’s first understand word embedding then it will be easier for us to get into sequence embedding. Word Embedding is an NLP based feature learning technique to map words into vectors of real numbers. In simple term, word embedding gives us a cluster of words with similar meaning.
So, it seems any word can be represented as a vector in an n-dimensional vector space and the distance between two such vectors represents the degree of similarity between words. In picture 1.3 we can see the words are represented in a two-dimensional vector space and less distance between ‘Microsoft’ and ‘IBM’ shows that both are company names.
To dig deeper into word embeddings you may visit the following links.
Like other words, we can represent the web pages (Table 1.1) as n-dimensional vectors as represented below. Value of n may vary in different contexts. You can either use pre-trained models like Word2Vec or your own model to get the embeddings of words.
When multiple words are arranged together in a sequence and we want to get the meaningful context out of that, then we need to have sequence embedding. In our example, it is a sequence of web pages that define a journey of an individual user. To make it more meaningful we can take other attributes like time spent data along with these sequence embeddings into consideration as shown below.
In the above picture, we have shown sequence embedding for one user journey taken from the table 1.1. We could see a user may visited the web pages in a sequence (webpage2 -> webpage1 -> webpage3) and user spent some time in those pages as shown above. We can multiply embeddings with these attributes to get a weighted mean value named as sequence embedded vector. Then we can use these vectors to understand buying behaviour of the customers visited on our websites as shown below. Even we can predict the chances of conversion on near-real-time for the current users visiting our website.
We can also do a cluster analysis of the vectors of mostly converted (product booked: yes) against the vectors of not converted (product booked: no).
In this blog, I have just tried to provide you other options of understanding customer buying behaviour, but it should not be considered the only way of solving similar problems. From my personal experience, I always feel that we should be aware of different ways of solving the same business problem. Not only it gives us the flexibility of moving from one approach to another for getting a better result, but also sometimes it creates magic.
3. https://www.youtube.com/watch?v=Nbpz79v2y5Q – Conference talk from our team on Sequence Embedding for Prediction in Spark