Remember when Stack Overflow usage started dropping and everyone celebrated it as proof that AI coding assistants had “won”? Remember the triumphant blog posts about 10x productivity, the demos showing entire apps built in minutes, the breathless predictions that junior developers were obsolete? That was maybe eighteen months ago. Feels like a decade.
A thread hit the top of Hacker News this week — “Ask HN: What was your ‘oh shit’ moment with GenAI?” — posted by user andrehacker. It pulled 521 points, 911 comments, and spent meaningful time on the front page. I review AI toolkits for a living. I test what works and what doesn’t. And reading through those comments felt like attending a collective therapy session for an industry waking up with a brutal hangover.
What People Actually Said
The thread is raw. One commenter described coming back from six months of parental leave in March 2026. When they left, nobody serious was using GenAI tools for more than casual rubber ducking — bouncing ideas, getting unstuck on syntax, that sort of thing. When they returned, the tools had metastasized through their org. Widespread misuse. Code that nobody fully understood. Decisions made by autocomplete.
Another commenter put it bluntly: “The people who’ve gone all in on genAI and can’t do anything without it are going to be increasingly boring and impossible to work with.”
And then there was this gem, which I’ll paraphrase because the original was saltier: the suggestion that people should feel embarrassed about over-reliance on these tools. Not embarrassed for experimenting. Embarrassed for substituting AI output for actual thought.
My Take as a Toolkit Reviewer
I test these tools every week. I’ve recommended plenty of them. I’ve written positive reviews of code assistants, writing aids, and workflow automation agents. So let me be clear about what I’m saying and what I’m not.
I’m not saying AI tools are useless. I’m saying that somewhere between “this helps me think faster” and “this thinks for me,” a lot of professionals crossed a line they didn’t notice. And 2026 is the year the bill came due.
The productivity issues people describe in that thread aren’t hypothetical. They’re specific. Codebases where GenAI-written functions contradict each other because nobody read them carefully. Documentation that sounds authoritative but describes behavior the software doesn’t actually have. Teams where the person who can still reason from first principles became the single point of failure because everyone else forgot how.
From a toolkit review perspective, this is the gap I keep seeing: the tools got rated on speed of output, not quality of outcome. We — myself included — benchmarked how fast you could generate a component, not whether that component held up under real conditions six weeks later.
What This Means For How We Evaluate Tools
I’m changing how I review things on this site. Starting now, every toolkit review at agntbox includes what I’m calling a “dependency risk” score. How much does this tool encourage you to stop thinking? How transparent is it about confidence levels? Does it make it easy to verify output, or does it optimize for the dopamine hit of instant generation?
The tools that will matter going forward are the ones that treat AI as a collaborator you still have to supervise — not an oracle you defer to. The ones that surface uncertainty. The ones that make you engage your brain rather than bypass it.
Critical Thinking Isn’t a Feature Request
The real lesson from that 911-comment thread is simple and uncomfortable: critical thinking cannot be automated away without consequences. The professionals who used GenAI as a thinking accelerator are fine. The ones who used it as a thinking replacement are now producing work they can’t defend, can’t debug, and can’t extend.
If you’re evaluating AI toolkits right now — and that’s literally why this site exists — ask yourself one question before you integrate anything new: does this tool make me better at my job, or does it make my job unnecessary to understand?
Those are very different outcomes. And after reading 911 comments from people who learned the difference the hard way, I’d suggest you figure out which one you’re optimizing for before the next six months of autopilot catches up with you too.
🕒 Published: