A horizontal banner depicting a brilliant, bright sky with clouds.

marius.vision

Software Ghostbox Tanto AI Museum B/X Toolkit
Media HeMakesMePlay A Dungeon About Blue Sky Github

Showing posts tagged with: releases

Opening the AI Museum

So benchmarking LLMs is kind of an unsolved problem. The metrics used to evaluate models are either too narrow or too gameable. Cross-entropy may be a useful statistic to tune a training run, but in practice it doesn't tell me if a model can understand my amazon e-mails or write cool haikus. Practical approaches exist, like the LM Arena. This is going in the ...

Read more