Skip to content

Commit

Permalink
handle 64 filter SE networks (#624)
Browse files Browse the repository at this point in the history
* minor performance fixes

~5-10% improvement in CPU limited cases (tested with 32x4 network on GTX 970)

* handle 64 filter SE networks

 - need numFc1Out of 16

* Update params.cc

* fix diff

* fix whitespace
  • Loading branch information
ankan-ban authored and mooskagh committed Feb 14, 2019
1 parent b3e8a0b commit 74a215f
Showing 1 changed file with 9 additions and 1 deletion.
10 changes: 9 additions & 1 deletion src/neural/cuda/fp16_kernels.cu
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,15 @@ void Se_Fp16_NHWC(int N, int C, int numFc1Out, half* output, const half* skip,
const half* input, const half* w1, const half* b1,
const half* w2, const half* b2, const half* bPrev) {
// TODO: Think of more elegant way to avoid this hardcoding :-/
if (numFc1Out == 32) {
if (numFc1Out == 16) {
if (C == 64) {
SE_Layer_NHWC<64, 16>
<<<N, C>>>(output, skip, input, w1, b1, w2, b2, bPrev);
} else {
// TODO: support other channel counts.
throw Exception("channel count unsupported by SE layer");
}
} else if (numFc1Out == 32) {
if (C == 64) {
SE_Layer_NHWC<64, 32>
<<<N, C>>>(output, skip, input, w1, b1, w2, b2, bPrev);
Expand Down

0 comments on commit 74a215f

Please sign in to comment.