00 Some General Insights

New article (added 7/24/23)
In theory, LLMs can encode algorithms of high complexity (added 6/25/23)
In practice, out-of-the-box LLMs have significant limitations in complex reasoning algorithms (added 6/24/23)
LLMs can learn specific algorithms, but they need to be taught (added 6/24/23)
Basic “concept math” appears to be a good representation of how LLMs and other networks understand the world (added 6/23/23)
For transformers, making sense of concepts, and reasoning on those concepts, appear to be two different things (added 6/23/23)
An LLM watching another LLM is a good design primitive (added 6/4/23)
Transformers truly learn meaning from form, they aren’t just stochastic parrots (added 5/22/23)
Training an LLM on code and language is surprisingly synergistic (added 4/30/23)
On average, we are all stupid (added 4/30/23)
LLMs can be tricked into giving wrong answers incredibly easily, and when forced to think harder, they become even more wrong (added 5/14/23)
An LLM flawlessly trained on adding 16-digit numbers can’t even generalize to 17-digit numbers (added 5/25/23)
LLMs are fundamentally not deterministic, so don’t count on that (added 6/26/23)
In fine-tuning, every letter counts (added 5/23/23)
LLMs are giant superpositions of personalities (added 4/30/23)
The order with which an LLM sees data during training doesn’t matter for memorization (added 5/6/23)
The frequency with which an LLM sees training data seems to matter for its performance related to that data (added 5/6/23)
On certain tasks, the typical LLM scaling (bigger is better) is reversed and bigger is worse (added 5/7/23)
The higher the model layer, the more complex the job of its neurons (added 5/22/23)

Contents