The recent resurgence of Deep Neural Networks (DNNs) was largely enabled [24] by the widespread availability of programmable, parallel computing devices.
And while several high-level programming abstractions for tiling have recently been proposed [ 23 , 41], underlying compiler backends still lack support for tile-level operations and optimizations.
One problem that arises from the existence of tile-level operations in Triton-IR is the inexpressibility of divergent control flow within tiles.
We propose to solve this issue through the use of the Predicated SSA (PSSA) form [ 8] and ψ -functions [ 39].
Because Triton-IR programs are single-threaded and auto- matically parallelized, our compiler backend is able to order threads internally within each micro-tile so as to avoid un- coalesced memory accesses when possible
Glasp is a social web highlighter that people can highlight and organize quotes and thoughts from the web, and access other like-minded people’s learning.