Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Didn't get the purpose of the method #3

Open
NikAleksFed opened this issue Jul 28, 2022 · 3 comments
Open

Didn't get the purpose of the method #3

NikAleksFed opened this issue Jul 28, 2022 · 3 comments

Comments

@NikAleksFed
Copy link

Hey guys, I need a clarification about situations, which this method must be applied to.

Am I right, that this method is best to use when our teacher model is much more complex than the student model? In this case we could get comparable accuracy with lower params for the student model.

Or we must use the same student and the teacher model architecture? But in this case I do not understand, why just not to perform fine-tuning using the pre-trained teacher network directly?

@wondering516
Copy link

mark

@ancientmooner
Copy link
Member

ancientmooner commented Aug 7, 2022

We only experimented with student models that have the same number of parameters as the teacher models. The method should also work for scenarios with different teacher and student architectures.

The baseline right directly uses the pre-trained teacher model for fine-tuning. We showed that using the student model works much better than this baseline.

@jihwanp
Copy link

jihwanp commented Nov 8, 2022

@ancientmooner Hi small question regrad to ur answer.
If the output feature map size between teacher and student is not the same, how can the feature
map distilled in your method?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants