Remember how the whole of Reddit and other social media were so convinced OpenAI was done for in January?
A new study finds that individuals randomly assiged to use AI did as well as teams of two people and were happier as well
OpenAI released GPT-4.5 and O1 Pro via their API and it looks like a weird decision.
Vibe Coding is a Dangerous Fantasy
Why no mid-teir? I feel like OpenAI is missing a huge potential here.
New study from METR suggests the length of tasks AI models can handle is doubling every 7 months, suggesting automating week or month long tasks is less than 5 years away
New study from METR suggests the length of tasks AI models can handle is doubling every 7 months, suggesting automating week- or month-long tasks is less than 5 years away
According to Bloomberg, Open AI Operator can't even book a simple flight, and agents as a whole are really struggling to deliver any value...
DeepSeek's owner asked R&D staff to hand in passports so they can't travel abroad. How does this make any sense considering Deepseek open sources everything?
Manus turns out to be just Claude Sonnet + 29 other tools, Reflection 70B vibes ngl
Manus turns out to be just Claude Sonnet + 29 other tools
So the much-hyped Manus AI agent from China turns out to be just Claude Sonnet + 29 other tools
Jokic, who was questionable tonight with an ankle injury, becomes the first player with 30 pts, 20 reb, 20 ast in a game in NBA history
QwQ on LiveBench - is better than Sonnet 3.7 (non thinking)!
Severance - 2x08 "Sweet Vitriol" - Post-Episode Discussion
What's the point of local LLM for coding?
o1 like image generator next? This could be game changing if it works!
OpenAI's next image generation will likely have some kind of chain of thought/inference time compute usage, probably based on GPT-4o. This could be very interesting.
GPT-4.5 creates a Louis CK style standup routine. This material is new afaik and genuinely funny. I haven't seen any model generate anything remotely close to this
How is Sesame not all everyone is talking about today? This blows ChatGPT Voice out of the water. I am in awe!
GPT-4.5 is a base model. Just compare other thinking models to their non-thinking versions to see what's coming.
The big week has started with an absolute banger!!!!! Claude 3.7 sonnet absolutely crushes every single competitor in real world coding tasks by a large margin
Claude 3.7 results in the Aider Polyglot benchmark
3.7 sonnet LiveBench results are in
Everyone is catching up.