I'm interested in speech recognition and integrating speech understanding capabilities into multimodal foundational models.
Some of the latest works are listed in the following.
Paper reading notes:
Feel free to steal this website's source code. Do not scrape the HTML from this page itself, as it includes analytics tags that you do not want on your own website — use the github code instead. Also, consider using Leonid Keselman's Jekyll fork of this page.