Skip to content

Ask About C3 W2 Content based Filtering Data #51

Answered by TheFalcon1990
Minhnhat0408 asked this question in Q&A
Discussion options

You must be logged in to vote

After pulling the repo and running the content-based filtering code, I noticed something odd with the data files—especially content_item_train.csv and content_user_train.csv. Both files have 58,187 rows, which is consistent with the model’s .fit input requirements, but there’s a lot of duplication, and it’s tricky to understand the reasoning behind it.

Let me break down the files as I’ve come to understand them.

Content Item and User Files Overview

  1. content_item_train.csv: This file is where the item (in this case, movie) feature vectors or item profiles are stored. These vectors encode different attributes like genres, directors, or any other one-hot encoded movie features. For example,…

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by greyhatguy007
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants