Didn't get the purpose of the method #3

NikAleksFed · 2022-07-28T13:11:54Z

Hey guys, I need a clarification about situations, which this method must be applied to.

Am I right, that this method is best to use when our teacher model is much more complex than the student model? In this case we could get comparable accuracy with lower params for the student model.

Or we must use the same student and the teacher model architecture? But in this case I do not understand, why just not to perform fine-tuning using the pre-trained teacher network directly?

wondering516 · 2022-08-04T00:36:18Z

mark

ancientmooner · 2022-08-07T15:42:56Z

We only experimented with student models that have the same number of parameters as the teacher models. The method should also work for scenarios with different teacher and student architectures.

The baseline right directly uses the pre-trained teacher model for fine-tuning. We showed that using the student model works much better than this baseline.

jihwanp · 2022-11-08T04:06:28Z

@ancientmooner Hi small question regrad to ur answer.
If the output feature map size between teacher and student is not the same, how can the feature
map distilled in your method?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Didn't get the purpose of the method #3

Didn't get the purpose of the method #3

NikAleksFed commented Jul 28, 2022

wondering516 commented Aug 4, 2022

ancientmooner commented Aug 7, 2022 •

edited

Loading

jihwanp commented Nov 8, 2022

Didn't get the purpose of the method #3

Didn't get the purpose of the method #3

Comments

NikAleksFed commented Jul 28, 2022

wondering516 commented Aug 4, 2022

ancientmooner commented Aug 7, 2022 • edited Loading

jihwanp commented Nov 8, 2022

ancientmooner commented Aug 7, 2022 •

edited

Loading