-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tracking Issue] Support Axis Union/Intersection #72
Comments
Given the current design, compare to directly using the co-iteration, I think we can firstly replace the binary search module with traverse to generate mid-buffer.In this way, the complexity of generate mid-buffer in a sparse axis can be reduced to O(n) from O(nlogn). This method only change the binary search part and can probably prevent the potential performance loss caused by co-iteration like TACO.@yzh119
generated "binary search block":
(2)for sparse axis iterate on another sparse axis(Intersection):
generated "binary search block":
|
@qelk123 thanks for your proposal. Binary search is absolutely not the best solution, a good property of binary search is the computation at each position is independent of each other, so that we safely parallelize everything outside the block, and we can even compute the |
Yes, you are right. If you want to achieve maximum parallelism, using binary search is reasonable. I also read your related paper,it seems the backend you considered is GPU, which provides sufficient parallelism capability. However when it comes to other backend like CPUs and FPGA, is binary search also compatiable with these backends? Maybe a more proper way is to bind the generation of this search block with specific target? |
I don't think it needs to be bounded with the target, we can just let the user decide what to use by annotating sparse iterations like what did for checking binary search validity. We might call it a SparseTIR is not only designed for GPU (in the paper our evaluation focuses on GPU, but CPU backend is totally supported by selecting LLVM target), CPU also has parallelism (though the number of cores might not be as large as GPUs). FPGA backend is another story, currently TVM support FPGA backend by generating HLS code but I suppose for sparsity we have better alternatives, e.g. The Sparse Abstract Machine which generate RTL to handle co-iterations, directly.
I guess so, it depends on the schedule actually and we may inevitably emit search functions in some cases. I am trying to write some formal semantics for that, please stay tuned. |
Thank you for your reply, as you said CPU may not has so manys cores as cuda cores in GPU. In this situation the benefit from parallelism in |
There are more search function choices than we have discussed (binary search and the algorithm you mentioned). The selection is not only hardware-dependent but also data-dependent (the sparse structure), an ideal solution is to design cost model to predict running time. So I think it's reasonable to make the lower decision from sparse iteration annotations, and we will have some algorithm to annotate the sparse iterations by comparing the cost generated from our cost models(like what we did in MetaSchedule for TensorIR) during compilation. |
OK I got it,it would be a nice choice.We can discuss this in detail later. |
The Problem
Currently, SparseTIR does not support lowering code to co-iteration structure, whenever we want to add/multiply two sparse tensors/vectors, we need to create another axis to indicate the union/intersection of axes.
Here is an example of SpMSpV.
SparseTIR would generate several binary blocks for indexing
A
andB
because we do not have co-iterations yet, and we needmid
arrays generated by binary search blocks to accessA
andB
under the for-loop structure.Once we support axis union/intersection and co-iteration structure generation, we can declare
J_and
as:and sparse iterations on union/intersect axes can yield co-iteration structures in sparse iteration lowering pass.
Milestone
While
construct, or create a new statement in TIR).T.intersection
/T.union
, and possibly more general ones (consider SpGEMM).The text was updated successfully, but these errors were encountered: