Ask About C3 W2 Content based Filtering Data #51
-
Hey guys, im new to the AI department, I try to implement the code from the content based filtering which use neural network on my app. I pull the repo and run the code successfully but i find it hard to understand the data, all those
this chunk is one hot vector for movie id 6874 but it got duplicated like 131 times in the Hope you guys can answer it soon. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
After pulling the repo and running the content-based filtering code, I noticed something odd with the data files—especially Let me break down the files as I’ve come to understand them. Content Item and User Files Overview
Why Duplications Exist in
|
Beta Was this translation helpful? Give feedback.
-
It actually truth cuz when i try to implement the unique chunk it, give out alot higher loss value. But do you have the code that gen those file, or understand the pattern off the generating process of those file. Since some chunk duplicate 131 times, some only 16 times, and the |
Beta Was this translation helpful? Give feedback.
After pulling the repo and running the content-based filtering code, I noticed something odd with the data files—especially
content_item_train.csv
andcontent_user_train.csv
. Both files have 58,187 rows, which is consistent with the model’s.fit
input requirements, but there’s a lot of duplication, and it’s tricky to understand the reasoning behind it.Let me break down the files as I’ve come to understand them.
Content Item and User Files Overview
content_item_train.csv
: This file is where the item (in this case, movie) feature vectors or item profiles are stored. These vectors encode different attributes like genres, directors, or any other one-hot encoded movie features. For example,…