written 5.7 years ago by |
Collaborative Filtering
i. A significantly different approach to recommendation, using features of items to determine their similarity, focusing on the similarity of the user ratings for two items.
ii. Place of the item-profile vector for an item, we use its column in the utility matrix.
iii. Contriving a profile vector for users, we represent them by their rows in the utility matrix.
iv. Users are similar if their vectors are close according to some distance measure such as Jaccard or cosine distance.
v. Recommendation for a user U is then made by looking at the users that are most similar to U in this sense, and recommending items that these users like.
vi. The process of identifying similar users and recommending what similar users like is called collaborative filtering.
A: Measuring Similarity
- The first question we must deal with is how to measure similarity of users or items from their rows or columns in the utility matrix.
- The utility matrix is as shown below
- The above data is too small to draw any reliable conclusions, but its small size will make clear some of the pitfalls in picking a distance measure.
- Specifically the users A and C. They rated two movies in common, but they appear to have almost diametrically opposite opinions of these movies.
- Expect that a good distance measure would make them rather far apart.
B: The Duality of Similarity
- The utility matrix can be viewed as telling us about users or about items, or both.
- There are two ways in which the symmetry is broken in practice.
- We can use information about users to recommend items. We can base our recommendation on the decisions made by these similar users, e.g., recommend the items that the greatest number of them have purchased or rated highly. There is no symmetry. Even if we find pairs of similar items, we need to take an additional step in order to recommend items to users. This point is explored further at the end of this subsection.
- There is a difference in the typical behavior of users and items, as it pertains to similarity. Intuitively, items tend to be classifiable in simple terms. For example, music tends to belong to a single genre.
C: Clustering Users and Items
- It is hard to detect similarity among either items or users, because we have little information about user-item pairs in the sparse utility matrix.
Even if two users both like a genre or genres, they may not have bought any items in common.
One way of dealing with this pitfall is to cluster items and/or users.
There may be little reason to try to cluster into a small number of clusters immediately.
- A hierarchical approach, where we leave many clusters unmerged may suffice as a first step.
- For example, we might leave half as many clusters as there are items.