3.7 sonnet LiveBench results are in

It’s not much higher than sonnet 10-22 which is interesting. It was substantially better in my initial tests. Thinking will be interesting to see.

It’s not much higher than sonnet 10-22 which is interesting. It was substantially better in my initial tests. Thinking will be interesting to see.