Papers I’ve had the privilege of contributing to:
Title/Link | Author(s) | Year | Description |
---|---|---|---|
Portfolio contstruction as linearly constrained separable optimization | Moehle et al | 2022 | ADMM-based fast portfolio optimization. |
Finding AI-Generated Faces in the Wild | Aniano et al | 2023 | AI-generated face detection at scale. |
As time permits, I also like to (try) to keep read papers about applied math and machine learning. Below you’ll find an archive of papers I’ve read that I think are worthwhile:
Title/Link | Author(s) | Year | Description |
---|---|---|---|
Scaling and evaluating sparse autoencoders | Gao et al, OpenAI | 2024 | This paper discusses a top-k sparse autoencoder approach to explainability in large language models. |
Sparse Autoencoders Find Highly Interpertable Feature in Language Models | Cunningham at al | 2023 | This paper discusses using sparse autoencoders to try to learn monosemantic interpretable features in language models. |
The Platonic Representation Hypothesis | Huh et al | 2024 | This paper argues that large deep neural network models are converging to similar underlying representations of reality. |
DoRA: Weight-Decomposed Low-Rank Adaptation | Liu et al, NVIDIA | 2024 | This paper introduces a method for parameter-efficient fine-tuning that decomponses a pre-trained weight into magnitude and direction component. |
The Era of 1-bit LLMS: All Large Language Models are in 1.58 Bits | Ma et al, Microsoft | 2024 | This paper introduces a 1-bit LLM variant where parameters are ternary. It matches performance of similarily sized full-precision models with increased efficiency. |
Revisiting k-means: New Algorithms via Bayesian Nonparametrics | Kulis and Jordan | 2012 | This paper introduces a Bayesian nonparametric approach to clustering. This leads to an elegant algorithm that doesn’t require us to choose k. |
Accelerating Large Language Model Decoding with Speculative Sampling | Chen et al, DeepMind | 2023 | This paper introduces a method to speed up the decoding process in large language models. It explores speculative sampling techniques to enhance performance efficiency. |
Modularity and community structure in networks | M.E.J. Newman | 2006 | Elegant spectral method for community detection. |
Scalable Hierarchical Agglomerative Clustering | Monath et al, Google | 2021 | A scalable, level-based approach to hierarchical agglomerative clustering. |
Pearl: A Production-Ready Reinforcement Learning Agent | Zhu et al, Meta | 2023 | This paper introduces Pearl, a modular reinforcement learning agent designed for production environments. |
Discovering faster matrix multiplication algorithms with reinforcement learning | Fawzi et al, DeepMind | 2022 | This paper introduces AlphaTensor, an RL-based algorithm to find faster ways to multiply matrices. |