TL;DR
Today GenAI has become agentic (it can plan steps and take actions using tools), our role as users have becomes even more important. Not because AI is “bad”, but because someone has to own the outcome. The best setup is: AI does the heavy lifting, humans define success, set boundaries, approve risky actions, and validate results.
1) First, lets understand what do I mean by “agentic workflow”?
Most of us started with GenAI as “chat” — ask a question, get an answer when ChatGPT launched in 2022.
Agentic workflows are what we are working with today where: the system can break a goal into steps, use tools/APIs, and complete multi-step tasks. In short, it doesn’t just suggest, it can actually do.
This is powerful. But it also changes the game. Because once the AI system can take actions, the question becomes:
“If something goes wrong, who is accountable?”
2) Why human-in-the-loop matters more in agentic systems?
In early phase of GenAI chatbot usage, a wrong answer was annoying, but you could simply ignore it.
However in agentic workflows, a wrong step can become a real action — like sending an email, changing a record, triggering a workflow, or touching sensitive data (even dropping databases :P). That’s why many HITL (Human-in-the-Loop) guides say: don’t trust agents to act without oversight — not because they are useless, but because they can hallucinate actions, misread intent, or overstep boundaries.
Oracle’s explanation also makes the same point in simpler terms: keep a human in the loop when agents face ambiguity, risk, or high-stakes decisions — like cruise control: it helps, but you still stay ready to take the wheel.
So yes — automation is great, but in real systems, accountability doesn’t get automated away.
3) The simplest operating model I follow
This is the mental model that clicked for me:
AI generates possibilities. Humans provide meaning + accountability.
Not a debate. Just a practical division of responsibilities.
So what does “meaning + accountability” look like in day-to-day use?
It looks like you (the human) showing up properly.
4) Be a better human-in-the-loop: what you must bring every time
Whenever I use GenAI (especially agentic workflows), I make sure I bring four things:
✅ 1) Goal — what does “good” look like?
Don’t just say “make a report”. Say what “good” means:
- who is the audience
- what decision will it support
- what format you want (bullet points, table, email, etc.)
✅ 2) Constraints — what must NOT happen?
This is very important. Examples:
- don’t use confidential data
- don’t assume facts
- don’t send anything externally
- don’t change production systems
✅ 3) Context — why does this matter?
A simple 2–3 lines of context saves a lot of rework.
✅ 4) Evaluation — how will you validate the output?
This is where most people skip and then blame the model.
If you don’t define how you’ll judge the output, you end up judging based on tone.
5) Where humans should intervene: approval gates (don’t approve everything!)
Human-in-the-loop does not mean you sit and approve each small thing. That becomes slow and irritating.
The better approach is what many teams recommend: risk-based approval gates — approve only when the action is high impact, irreversible, regulated, or has a wide blast radius.
Few examples where human approval should be required:
- External communication (emails/messages)
- Payments / money movement
- Permission/access changes
- Deleting data / destructive changes
- Production deployments / irreversible system changes
Few example where it is usually safe to automate:
- Drafting content
- Summarising documents
- Exploring internal docs (within allowed scope)
- Suggesting options or plans
- Preparing a “recommendation”
If you think it in terms of GitHub, think of it like this:
Automate drafts. Gate commits.
6) My favourite reliability habit
This is the habit I keep coming back to:
I typically write acceptance criteria + failure modes first, and only then I write the prompt.
This becomes even more important in agentic workflows because evaluation is genuinely harder:
- there are multiple steps
- tool calls
- state changes
- mistakes can compound
Here is my sample prompt when I start initially with a new project:
You are my agentic assistant.
Before starting, ask me 3 clarifying questions about goal, constraints, and evaluation.
Then propose a plan: steps, tools needed, risks, and which steps should require human approval.Acceptance criteria:
- [paste your list]
Failure modes to avoid:
- [paste your list]
Rules:
- Label assumptions
- Flag uncertainty
- Stop and ask approval before any external or irreversible action
- Share a short execution flow at the end
- …..
This pushes the model to behave like a responsible teammate, and minimize random directions.
7) Auditability: if you can’t explain it, you can’t scale it
If an agent takes actions, it should be possible to answer following question:
- what it did
- why it did
- what it used (data/tools)
- who approved it (if approval was needed)
Without the trail of above items, it will be difficult to debug what went wrong in an agentic workflow.
Closing thought
The goal is not “100% automation”.
The goal is: reliable outcomes with clear ownership.
In agentic workflows, the best pattern is “supervised autonomy”:
- agents do routine work fast
- humans own intent, risk, and accountability
That is how you get speed and safety.
What is one habit/checklist you follow to keep GenAI outputs reliable in real work?
(Thoughts are personal but polished by GenAI companions)