Forcing Flash Attention onto a TPU and Learning the Hard Way

by azhng | View on Hacker News