-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consistent gemm tests #16
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine, please answer those 2 questions
e7c50aa
to
d53768d
Compare
@roversch please re-check that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small refactoring for libxsmm would be nice. Otherwise looks good!
randomize(a); | ||
randomize(b); | ||
randomize(c); | ||
|
||
// libxsmm behaves upredictably with alpha != 1. || beta != 1. | ||
Real const alpha = 1., beta = 1.; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite some duplication. I would refactor into its own setup
function or so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there is a lot of duplicated code in benchmarks, not only here. I have created the issue #25 to address this problem later.
Made
gemm
tests consistent:D = alpha * op1(A) * op2(B) + beta * C
is used or implementations that support out-of-placegemm
C = alpha * op1(A) * op2(B) + beta * C
is used or implementations that support in-placegemm
onlyop1
andop2
are chosen as "transpose" or "don't transpose" depending on what we think is fastest for each implementationalpha = 1.
andbeta = 1.
forlibxsmm
, because otherwise it either crashes or returns in 0 time.