Founders

47 min read

Selling AI To Enterprise

I’ve learned that enterprise AI doesn’t scale just because the technology is impressive — it scales when companies stop treating agents like clever chatbots and start treating them like supervised digital workers. The real opportunity is not full autonomy or endless human review, but a hybrid system where agents handle repetitive work, humans keep judgment and relationships, and AI slowly becomes part of the real operating rhythm of the business.

Why GTM Agents Stall In The Wild

When I look at where AI is starting to land inside companies, go-to-market work feels like one of the most obvious places. Sales, marketing, customer support, customer success, onboarding, follow-up, research, routing, summaries, drafts, handoffs — all of this work is language-heavy, repetitive, and full of patterns.

So it makes sense that companies are excited.

A sales team wants faster account research.
A marketing team wants more content variations.
A support team wants instant responses.
A founder wants more leverage.
A customer success team wants fewer things slipping through the cracks.

AI looks like the perfect answer.

But here is the thing I’ve learned: adoption is not the same as success.

A company can try AI.
A team can run a pilot.
A few employees can say, “This is cool.”
Someone can record a great demo.

And still, the thing may never become part of the real business.

That gap between excitement and actual scale is where most enterprise AI gets stuck.

At first, people ask, “Can the AI do the task?” But that is usually the wrong question. A better question is:

Can the AI do the task safely, repeatedly, visibly, and in a way that everyone still trusts?

That is a very different bar.

In real GTM work, a bad answer is not just a bad answer. It can become a bad sales email. A confused customer. A damaged relationship. A messy CRM. A wrong promise. A strange brand voice. A bad handoff. A frustrated employee who says, “I’m never using this again.”

That is why I do not think enterprise AI agents should be thought of as magic software. They should be thought of as supervised digital labor.

They need a job description.
They need boundaries.
They need managers.
They need logs.
They need escalation paths.
They need a way to earn more responsibility over time.

That is the real enterprise problem.

Not “Can we build an agent?”

But “Can we build a system where the agent does useful work without breaking trust?”

What The Market Already Knows And Still Misses

The more time I spend around AI products, teams, and GTM systems, the more I realize something: most companies are not really struggling with whether AI is useful.

They know it is useful.

The problem is deciding where it belongs.

Should the agent only suggest?
Should it draft but not send?
Should it update CRM fields?
Should it speak directly to customers?
Should it qualify leads?
Should it make recommendations?
Should it trigger workflows?
Should it handle pricing questions?
Should it escalate?

Every one of those questions has a different risk level.

That is where a lot of people go wrong. They talk about “AI agents” as if one level of autonomy applies everywhere. It does not.

An agent researching a company is different from an agent emailing a CEO.
An agent summarizing a support ticket is different from an agent promising a refund.
An agent drafting campaign ideas is different from an agent changing ad spend.
An agent suggesting a next step is different from an agent taking that step.

The task matters.
The risk matters.
The customer matters.
The relationship matters.
The reversibility matters.

I’ve also learned that trust is fragile.

When an AI tool works well ten times, people get excited. But when it fails once in a visible way, especially in front of a customer or executive, confidence drops fast.

That is not irrational. That is human.

People are willing to forgive a person because they understand people. But when software makes a mistake, especially a confident mistake, it feels different. It feels like the system itself may not be controllable.

That is why the goal should not be blind trust. The goal should be calibrated trust.

Employees should know when the agent is strong.
They should know when to check it.
They should know when to ignore it.
They should know when to escalate.
They should know what the agent is allowed to do.

This is also why “human plus AI” does not automatically mean better.

A bad hybrid system can be worse than either side alone. If the AI gives weak recommendations and the human blindly accepts them, the system gets worse. If the AI creates too much review work, the human becomes the bottleneck. If the AI is too cautious, it saves no time. If it is too confident, it creates risk.

The best hybrid systems are designed around strengths.

AI is good at speed, memory, volume, consistency, summaries, pattern recognition, and first drafts.

Humans are better at judgment, nuance, emotional context, negotiation, trust repair, and knowing when the rule should bend.

The magic is not in combining them.
The magic is in assigning the right work to each side.

That is what most AI discussions still miss.

They talk too much about intelligence and not enough about responsibility.

From Copilots To Supervised Digital Labor

The shift I care about is this: AI is moving from “help me think” to “help me operate.”

A copilot helps you write.
A worker helps you move work forward.

That difference changes everything.

Once an AI agent can touch business systems, update records, send messages, retrieve customer history, trigger workflows, book meetings, create tasks, or hand off conversations, you are no longer just evaluating writing quality.

You are evaluating business behavior.

Can it access the right information?
Can it take the right action?
Can it explain what happened?
Can it be audited?
Can it recover from mistakes?
Can a human override it?
Can the company prove what it did?

This is why governance becomes harder as agents become more useful.

A chatbot that only answers questions is annoying when it is wrong.

An agent that takes action is dangerous when it is wrong.

That does not mean companies should avoid agents. It means they need to design them like real operational systems.

The agent needs business context.
It needs rules.
It needs permissions.
It needs a clear scope.
It needs monitoring.
It needs humans around it.
It needs a way to start small and earn more autonomy.

I’ve come to believe that enterprise AI should expand in layers.

First, the agent observes.
Then it summarizes.
Then it drafts.
Then it recommends.
Then it acts with approval.
Then it acts alone in low-risk situations.
Then, only after enough proof, it handles higher-value workflows.

That is how trust compounds.

Not through one giant launch.

Through repeated evidence that the system behaves well.

Why Pilots Die Before Scale

I think most AI pilots die for a few simple reasons.

The first is messy data.

Agents sound smart when they have the right context. They sound dangerous when they do not. If customer data is scattered, stale, duplicated, incomplete, or trapped in different systems, the agent will still produce confident output. It will just be confidently wrong.

That is a brutal enterprise problem.

The second is inconsistent quality.

In a demo, the agent only needs to work once. In production, it needs to work thousands of times, across weird edge cases, bad inputs, emotional customers, missing data, unclear instructions, and changing business rules.

That is a much higher standard.

The third is weak governance.

Many teams want the productivity upside without doing the boring work: permissions, logs, review queues, escalation policies, role design, testing, and accountability. But those boring pieces are the difference between a toy and an operating system.

The fourth is trust whiplash.

People either overtrust the agent because it looks impressive, or they undertrust it after one mistake. Both are bad. The company needs to help employees understand where the agent is reliable and where it is not.

The fifth is weak economics.

It is easy to show that an agent saved time. It is harder to prove that it created durable business value. Did it improve conversion? Did it reduce churn? Did it increase retention? Did it improve response time without hurting quality? Did it make customers happier? Did it reduce manager load or just move the work somewhere else?

That is where a lot of AI projects get exposed.

The sixth is failure to redesign the work.

You cannot just drop an agent into an old process and expect a new company to emerge. Roles have to change. Review paths have to change. Incentives have to change. Managers have to know what they are managing. Employees need to know when the AI is helping them, not replacing their judgment.

Companies do not scale AI by turning it on.

They scale AI by redesigning the work around it.

The Hybrid Supervisor Model I Would Actually Deploy

If I had to reduce my view to one rule, it would be this:

Agents should own standardized, reversible, high-volume work. Humans should own ambiguous, relationship-sensitive, or irreversible work. Supervisors should own the boundary between the two.

That boundary is where the real system lives.

For sales, I would let agents handle research, enrichment, buying-signal monitoring, meeting prep, CRM cleanup, first-draft outreach, and lead prioritization.

But I would not let them freely handle strategic accounts, pricing promises, legal claims, sensitive objections, negotiation, or high-stakes relationship moments.

For marketing, I would let agents create campaign variants, repurpose content, summarize customer feedback, draft nurture sequences, generate segment ideas, and find performance anomalies.

But I would keep brand positioning, crisis communication, major spend decisions, regulated claims, and sensitive audience messaging under human control.

For customer success and support, I would let agents handle FAQs, triage, knowledge retrieval, basic handoffs, renewal-risk summaries, and suggested next actions.

But I would escalate emotional complaints, high-value accounts, churn-risk conversations, policy exceptions, billing disputes, and anything involving trust repair.

The missing role in many companies is the AI operations supervisor.

This person or team is not just a normal manager. They are responsible for the quality of digital labor.

They review exception traffic.
They inspect failures.
They tune prompts and policies.
They watch override rates.
They decide when autonomy can expand.
They protect the company from invisible drift.
They make sure the agent does not quietly become worse over time.

That role matters because AI systems do not stay perfect after launch. The business changes. Customers change. Offers change. Policies change. Data changes. The agent’s behavior has to be watched, tuned, and governed.

That is why I think “supervised digital labor” is the right frame.

The agent is not just a tool.
It is not fully an employee either.
It is a new kind of operational worker that needs a new kind of management layer.

What The Best Implementations Have In Common

When I look at AI tools that seem more likely to survive inside companies, they usually have a few things in common.

First, they are close to the system of record.

They are not floating on top of the business. They are connected to customer data, CRM history, support context, product information, workflows, and internal knowledge.

Second, they have staged autonomy.

They do not jump immediately from “draft” to “fully autonomous.” They start with review. Then approval. Then limited automation. Then higher autonomy only after the team has confidence.

Third, they make handoffs normal.

A good agent should not pretend it can solve everything. It should know when to pass the work to a person. Escalation is not failure. Bad escalation is failure.

Fourth, they measure completed work, not just activity.

The future of agent pricing and evaluation will not be based only on messages, credits, or usage. It will move toward outcomes: resolved issues, qualified leads, completed workflows, booked meetings, retained customers, cleaner data, faster response time, better conversion.

Fifth, they give humans control.

The best systems let teams decide what the agent can touch, when it can act, when it should pause, when it should escalate, and how its work should be reviewed.

That is the pattern I trust.

Not “AI replaces the team.”

More like: “AI takes the repetitive work, humans keep the judgment, and supervisors manage the operating boundary.”

Pricing Beyond Credits

I do not think pricing is just a commercial detail anymore. Pricing tells you how a company thinks about work.

If a tool charges only for usage, it may encourage more activity.

If it charges for outcomes, it has to care more about whether the work actually got done.

That matters.

Early AI pricing often focused on credits, messages, seats, or usage. That makes sense when the product is new and the tasks are variable. But it does not fully match how enterprise buyers think. A business does not really want more AI activity. It wants more resolved work.

A support team wants resolved conversations.
A sales team wants qualified meetings.
A marketing team wants pipeline influence.
A customer success team wants retained accounts.
An operations team wants cleaner workflows.

The more AI matures, the more pricing will move toward completed work.

But outcome pricing has its own risk. Someone has to define the outcome. And if the vendor defines the outcome too generously, the buyer may pay for “success” that is not truly valuable.

So enterprises need to ask better questions.

What counts as a resolution?
What counts as a qualified lead?
What counts as a successful handoff?
What counts as an autonomous action?
How is quality checked?
Can the company audit the result?
Can humans dispute the outcome?

The best pricing model will probably be hybrid.

Some value comes from autonomous completion.
Some comes from human productivity.
Some comes from the control layer that lets the company govern the work.

That mirrors the operating model itself.

Not fully human.
Not fully autonomous.
Supervised.

Why Ecosystem Marketing Suddenly Matters

One thing I think founders underestimate is that ecosystem marketing is no longer just “partnerships.”

For AI agents, ecosystem is trust.

An enterprise buyer does not only ask, “Is the model good?”

They ask:

Can this connect to my systems?
Can it act where my team already works?
Can it be governed?
Can it be installed safely?
Can my procurement team understand it?
Can my security team approve it?
Can my managers monitor it?
Can my employees actually use it?

That is why marketplaces, partner networks, integrations, verified apps, and shared billing matter so much.

They reduce anxiety.

A standalone agent may feel risky.
An agent inside a trusted ecosystem feels more manageable.

This is also why the future of enterprise AI will not be won only by the smartest model. It will be won by the systems that feel safe enough to deploy.

The agent has to see enough context.
It has to do enough real work.
It has to be governed enough to trust.
It has to fit into the tools people already use.

That is why ecosystem is not just distribution.

It is part of the product.

How I Think This Topic Should Be Studied

If I were studying this more seriously, I would not only ask executives whether they are using AI.

That is too shallow.

I would want to know what work the agent is doing, what it is allowed to touch, how often humans override it, when customers get frustrated, when employees stop trusting it, and whether the business actually improves.

There are a few ways to study this properly.

The first is broad surveys.

These are useful because they show patterns across companies. But they are limited because everyone defines “AI agent,” “pilot,” “production,” and “success” differently.

The second is controlled experiments.

These help us understand trust, behavior, and how people react when AI makes mistakes. But experiments can be too narrow compared to the messy reality of enterprise GTM.

The third is long-term field studies.

This is where I think the most useful evidence will come from. Put AI into real sales, marketing, support, or success workflows. Compare teams over time. Measure not just speed, but quality, conversion, escalation, complaints, employee trust, and revenue quality.

The fourth is product telemetry.

This shows what people actually do inside tools. But it has to be read carefully, because product companies naturally highlight their best stories.

The studies I would most want to see are practical.

Give one sales team no AI, one team AI drafts with human approval, and one team higher autonomy. Then compare meeting quality, conversion, complaints, and brand risk over multiple quarters.

Study when support agents should escalate. Not just whether they solved the ticket, but whether they passed it to a human at the right moment.

Study supervisor workload. Maybe the agent saves frontline time but creates a new hidden review burden. That matters.

Study pricing models. Do companies trust outcome pricing more than usage pricing? Does it improve adoption? Or does it just create arguments over definitions?

Study marketplaces. Do verified ecosystems make buyers more willing to enable higher-risk actions?

Those are the questions that matter.

Not just “How smart is the AI?”

But “Can the organization control the AI while still getting leverage from it?”

What Changes If We Get This Right

If companies get this right, the impact is not just faster email drafts or cheaper support.

The deeper change is that work gets redesigned.

A salesperson spends less time researching and more time building trust.
A marketer spends less time repurposing and more time positioning.
A support team spends less time answering repeated questions and more time handling complex situations.
A customer success team sees risk earlier.
A manager gets better visibility into what is happening.
A founder gets more leverage without adding endless headcount.

But if companies get it wrong, the future is messy.

Employees ignore the tools.
Customers get weird answers.
Managers lose confidence.
Pilots never scale.
Leadership says AI was overhyped.
The company ends up with a graveyard of half-used tools.

That is why the operating model matters so much.

The companies that win will not simply “adopt AI.” They will learn how to allocate work between humans and agents.

For business, the lesson is direct: start with low-ambiguity, high-volume, reversible GTM tasks. Prove value. Watch quality. Build trust. Then expand.

For technology teams, the lesson is that the winning stack will include more than a model. It will include memory, tools, permissions, policies, logs, checkpoints, evaluations, and human oversight.

For workers, the lesson is that the job will not just disappear in one clean motion. It will be reshaped. Some tasks will move to agents. Some human skills will become more valuable. New roles will appear around supervision, workflow design, quality control, and AI operations.

For governance, the lesson is that agents need accountability. Not in a vague way. In a practical way.

Who approved this workflow?
Who monitors it?
Who owns the escalation policy?
Who reviews failures?
Who decides when autonomy expands?
Who is responsible when something goes wrong?

Those questions need answers before the agent becomes deeply embedded in the business.

Where The Evidence Still Feels Thin

I want to be honest about where this whole space is still early.

First, people use the word “agent” too loosely.

Sometimes it means chatbot. Sometimes it means workflow automation. Sometimes it means a real system that can reason, use tools, and take action. That makes conversations messy.

Second, a lot of success stories are still selective.

Companies naturally talk about the best outcomes. Vendors naturally highlight the strongest case studies. That does not mean the results are fake, but it does mean we should be careful.

Third, short-term productivity is easier to prove than long-term business value.

Saving time is good. But the real question is whether the company becomes better. More revenue. Better retention. Happier customers. Faster cycles. Higher-quality work. Less operational drag.

Fourth, hybrid systems can hide labor.

A company may say the agent automated the work, but maybe a human is still reviewing, correcting, escalating, and cleaning up behind the scenes. That does not mean the system is bad, but the economics need to be honest.

Fifth, the field is moving quickly.

Specific tools, pricing models, and product names will change. But I think the underlying principle will hold: enterprise AI only scales when it becomes governable work.

That is the real lesson.

Future Research Directions

If I were pushing the field forward, I would focus on six practical studies.

First, a sales supervision study.

Compare human-only outreach, human-reviewed AI outreach, and higher-autonomy AI outreach. Measure not just meetings booked, but meeting quality, conversion, complaints, and brand risk.

Second, a customer success escalation dataset.

Collect examples of support and success interactions. Label emotional intensity, account value, policy exceptions, reversibility, and resolution quality. The question should be: did the agent know when to escalate?

Third, a pricing comparison study.

Compare outcome-based pricing with usage-based pricing. Does outcome pricing create more trust? Better adoption? Better quality? Or just more debate about what counts as success?

Fourth, a supervisor workload study.

Measure how much time humans spend reviewing, correcting, overriding, and cleaning up agent work. This is important because hidden labor can make AI look more efficient than it really is.

Fifth, a marketplace trust study.

Test whether buyers are more willing to deploy AI agents when they come through a verified ecosystem with integrations, security review, shared billing, and governance controls.

Sixth, a controllability benchmark.

Instead of only benchmarking intelligence, benchmark whether companies can control the agent. Can they audit it? Restrict it? Escalate it? Explain it? Recover from mistakes? Assign responsibility?

That is the kind of research I think would actually help builders and operators.

What The Enterprise Must Learn Next

My conclusion is simple.

AI agents are becoming real, but the version that scales in the enterprise is not the fantasy version.

It is not an army of unsupervised digital employees running wild through the company.

It is supervised digital labor.

That may sound less exciting, but I think it is far more powerful.

The agent handles volume.
The human keeps judgment.
The supervisor manages the boundary.
The company earns trust over time.

That is how enterprise AI becomes real.

Not through one dramatic demo.
Not through a vague promise of autonomy.
Not through pretending the technology is already perfect.

It becomes real when it fits into the business as a reliable operating system for work.

That is the lesson I would leave founders, operators, and GTM teams with:

Do not sell AI as magic.
Do not deploy it as a toy.
Do not trust it blindly.
Do not bury it under endless review.

Design it like supervised work.

If we do that, AI agents can become one of the most important productivity layers inside modern companies.

If we do not, we will get what enterprise software always produces when the operating model is missing: beautiful demos, scattered pilots, frustrated teams, and a long list of tools that almost worked.