What to Measure When Evaluating Model Refresh, Benchmarks, and Production Switching
https://farelaevol.raindrop.page/bookmarks-67856037
Between Jan 10 and Feb 28, 2024 I ran a focused evaluation across 40 publicly available and vendor-hosted model endpoints