See How And Where To Download Fast Music

Music is an important part of our lives. We connect and interact with it daily and use it as a way of projecting our self-identities to the people around us. The music we enjoy — whether it’s country…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Introduction

Recommendation Systems are a big part of today’s world. Customers may see a lot of available options and not know what to buy. They may be unaware of a product that serves their purpose fully, or maybe a movie or a song or joke they will eventually like but they haven’t heard about it yet. This is why recommendation systems are used. They make specific recommendations to customers to overcome the above mentioned problems. They may recommend items based on its content (content-based recommendation), based on user’s session activities (sequential or session-based recommendation), based on items that similar users like (user-user collaborative filtering), or based on the similarities with items that customer has liked previously (item-item collaborative filtering), or maybe a hybrid model of two or more of the above-mentioned systems.

The jokes dataset is taken from Berkley university’s jokes dataset [1]. It has the ratings for 100 jokes from 73421 customers. Few jokes have a lot of ratings (close to 73000) and a few jokes with not so many ratings (just over 20000). The total number of ratings doesn’t give a clear picture, so we dive into the type of ratings (positive or negative). Few jokes have a good number of positive ratings and few have a high number of negative ratings (both around 55000). Few jokes are very well-liked and have around 80% positive ratings, while few are very disliked and have only around 30% positive ratings.

It is a type of collaborative filtering which uses item-item similarity obtained from user-item interaction. The similarity is not based on the content of the item, but rather the item ratings or feedback or clicks, etc. The basic assumption is a user must like items that have been rated similarly to the items previously liked by the user. This works perfectly depending on the scenario. Amazon is one of the biggest companies in the world and they have openly talked about using item-item collaborative filtering for their recommendation system. Linden et al [4] from Amazon.com talks about how good this algorithm has been for their business.

The basics of the algorithm are taken from the implementations of Sarwar, et al [2].

Input: User-Item Ratings Matrix, and user id

Output: Recommendations for user

1. Read user-item ratings matrix, user id

2. Calculate similarity between items (I have used cosine similarity)

3. Calculate mean rating of each item

4. For the chosen user, check which items have not been rated (There is no point in recommending already purchased/liked/read items)

5. For each of these new items, calculate the weighted similarity with the items the user have previously rated (similarity * (normalised rating — mean rating) for previous items)

6. Divide it by the sum of similarity weights for the combination of items used in step 5

7. The recommendation score is obtained by adding the mean rating of the new item.

8. The top-k jokes with highest recommendation scores are then recommended by the system.

1. Similarity: I have used cosine similarity over others because this calculates the angle between two vectors (jokes in this case), embedded in a vector space. Suppose there is a joke with lot of ratings, and one with few ratings (not popular yet). But their ratings are very similar. This means that the jokes are similar. Customers who liked joke 1 will like joke 2 and should be recommended that. But if we use basic similarity measures, their similarity wont be high due to the huge difference of number of ratings of the two jokes. The better way will be to consider the angle between them in vector space rather than their length. That is why it is ideal to choose cosine similarity in this case.

2. Normalisation: Recommendation can be done without normalisation. But according to me, it is very important. We must consider the goodness of the joke, along with its similarity to others. A joke can be more or less similar to jokes that the user have liked, but it is very highly rated. So this should be considered while recommending. That’s why the mean rating of each item should be considered as well.

3. Neighbourhood Selection: I have selected all the neighbours for calculation. Firstly, the dataset contains only 100 items (jokes). On an average around 50 jokes have been rated by the user, so I have only 50 jokes to consider. This will not take much computational time. Secondly, I believe dissimilarity should also be used. If two items are dissimilar (this is based on ratings by 73421 users — so usually it is correct) and user likes item 1, there is a high possibility he/she will not like item 2. If I consider neighbourhoods, only similar items will be considered, and not dissimilar items.

Before deploying the model, we must look into how good the model is. We cannot wait for the model to go live and see how the customers react. That can be disastrous. So, how to do it — how can we check if the recommendations are correct? One way is to divide our data into training and test sets. But, I opted for a different route. I predicted the recommendation scores for already rated jokes. Then, I chose the top 10 recommendation for each user and checked whether they liked the joke or not. For 73421 users, the average accuracy in recommendation was 7.32 out of 10 i.e. on an average, a user liked 7.32 jokes from the top 10 recommendations for him or her from the previously rated jokes. The performance can be better, but this is also satisfactory.

1. Cold-start item: If an item is new, its similarity with other items cannot be calculated. In that way, it will never be recommended. One way to overcome this is to randomly put the item in few user’s recommendations and check how they rate them.

2. No use of Content of Item: The content of the item also tells us a lot about it, and can be used for prediction. The similarity can also be calculated based on item’s content. This will also solve the cold-start problem.

3. No knowledge about user’s recent activity: The timestamp is not used. A user’s liking may change over time. But this wont be considered by item-item collaborative filtering.

2. Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. “Item-based Collaborative Filtering Recommendation Algorithms”. WWW10, May 1–5, 2001, Hong Kong

3. Coursera. Course on “Nearest Neighbor Collaborative Filtering”, by University of Minnesota. Week 3 and Week 4.

4. Greg Linden, Brent Smith, and Jeremy York, “Amazon.com Recommendations Item-to-Item Collaborative Filtering”, Industry Report, 2003

Add a comment

Related posts:

Big O notation

Before you dive into algorithms, it is important to understand what this means. It sounds fancy, but in fact, it’s really easy to understand this concept. Big O notation does not directly indicate…

Napas

petang mengantar kedatanganmu ke rumahku. aku baru selesai membersihkan diri, wajahku masih basah ketika memelukmu. sebentar lagi sepertinya langit juga akan melekati udara dengan bulir airnya. “di…

Women as a socially defined term

Views expressed here are those of the author’s and do not necessarily represent or reflect the views of UNICEF. In the past, the term ‘woman’ was used as synonyms for ‘submissiveness’ and…