AI & Strategy2026-03-2514 min

Managing AI Engineering Teams: A CTO's Playbook

Managing an AI or machine learning team is one of the most misunderstood challenges in engineering leadership. The skills that make you effective at managing traditional software engineering teams transfer partially — but the gaps are significant enough to trip up even experienced CTOs.

I have watched excellent engineering leaders stumble with AI teams because they applied the same playbook that worked for their platform or product teams. The timelines were wrong. The success metrics were wrong. The team structure was wrong. And the expectations they set with the business were catastrophically wrong.

This is a practical guide to what is actually different about managing AI engineering teams, based on patterns I have seen work — and fail — across organisations ranging from early-stage startups to large enterprises.

Embedded vs Centralised: The Structure Question

The first decision you will face is whether to build a centralised AI team or embed ML engineers within existing product teams. Both models work. Both models fail. The difference is context.

Centralised AI Team

A centralised team — sometimes called an AI platform team, ML centre of excellence, or data science team — sits as a shared resource that product teams request work from.

When this works:

Your AI capabilities serve multiple products and need to be consistent (shared recommendation engine, common NLP pipeline, centralised feature store)
You have fewer than five ML engineers and cannot afford to spread them thin
You need to build foundational infrastructure (model training pipelines, feature stores, experiment tracking) before productionising any specific use case
Your product teams do not have enough AI work to justify a dedicated ML engineer

When this fails:

Product teams treat the AI team as a service desk, filing tickets and waiting weeks for results
The AI team builds technically impressive models that do not match what product actually needs because they are too far from the user
Prioritisation becomes political — which product team gets the AI team's time this quarter?
ML engineers become disconnected from business outcomes and optimise for model accuracy instead of product impact

Embedded ML Engineers

In this model, ML engineers sit within product teams, reporting to the product team's engineering manager but with a dotted line to a central ML lead for technical standards and career development.

When this works:

Your AI use cases are product-specific and benefit from deep context (personalisation for a specific user flow, domain-specific NLP for a particular industry)
Product teams have enough ML work to keep a dedicated engineer busy
You have a strong central ML lead who can maintain technical standards across embedded engineers without direct authority
Your product engineering managers are willing to learn enough about ML to manage effectively

When this fails:

Embedded ML engineers become isolated from the ML community within the company and stop growing technically
Product managers treat ML as "just another feature" and apply the same estimation and scoping approaches
There is no central investment in ML infrastructure, so each team builds its own training pipelines and monitoring tools
The engineering manager has no idea how to evaluate ML work or set appropriate expectations

The Hybrid Approach

Most organisations that get this right eventually land on a hybrid: a small central platform team that builds and maintains ML infrastructure (training pipelines, model serving, feature stores, experiment tracking), with ML engineers embedded in product teams who build on that infrastructure.

The central team is a force multiplier. It keeps embedded engineers productive and ensures consistency. The embedded engineers stay close to the user and the business problem. The central ML lead sets technical standards and runs a community of practice.

Hiring: Build or Train?

Hiring ML Engineers

The ML hiring market is simultaneously oversaturated and undersupplied. There are thousands of people with "machine learning" on their resume who have completed an online course and built a toy classifier. There are very few people who have taken a model from research to production at scale.

What to look for:

Production experience. Can they deploy a model, monitor its performance in production, and debug it when it degrades? A Kaggle ranking is not a substitute for this.
Engineering fundamentals. ML engineers who cannot write clean, testable, maintainable code will produce models that work in notebooks and break in production. Software engineering skills matter as much as statistical skills.
Business orientation. The best ML engineers ask "what problem are we solving and for whom?" before "what model architecture should we use?" This orientation is rare and enormously valuable.
Communication skills. ML work requires constant communication with product managers, designers, and other engineers about what is feasible, what the trade-offs are, and what the results mean.

Training Existing Engineers

The alternative to hiring ML specialists is training your existing engineers. This works better than most CTOs expect, with important caveats.

When training works:

Your AI use cases involve applying well-understood techniques (classification, recommendation, NLP with pre-trained models) rather than pushing the state of the art
Your existing engineers have strong fundamentals in statistics and software engineering
You can provide structured learning time (not just "go take an online course on weekends")
You pair trained engineers with at least one experienced ML practitioner who can review their work

When you need to hire:

Your use cases require deep expertise in specific ML domains (computer vision, reinforcement learning, large language models)
You are building ML infrastructure that requires specialised knowledge of distributed training, model optimisation, or serving systems
Time to market matters and you cannot afford a six-month ramp-up period

The pragmatic approach for most organisations: hire two or three experienced ML engineers to set standards and build infrastructure, then train interested engineers from your existing team to build on that foundation. Many of the same hiring principles apply here as in building any engineering team -- our guide on what a CTO does covers the broader team-building responsibilities that frame AI hiring decisions.

Setting Realistic Timelines

This is where most CTOs get burned. ML projects do not follow the same estimation patterns as traditional software engineering, and applying the same mental models leads to chronic disappointment.

Why ML Timelines Are Different

Data work is unpredictable. Before you can train a model, you need data. Getting that data — cleaning it, labelling it, handling edge cases, dealing with gaps and biases — typically takes longer than the modelling itself. And you often do not know how bad the data situation is until you start.

Iteration is non-linear. In traditional engineering, if a feature is 80% done, you can usually predict when it will be 100% done. In ML, you might get to 85% accuracy in two weeks and spend six more weeks fighting for the last 5% — only to discover that 85% is as good as it gets with your current data.

Research phases are genuinely uncertain. If you are applying a well-understood technique to a well-defined problem with good data, timelines are relatively predictable. If any of those three conditions is not met, you are doing research, and research timelines are inherently uncertain.

A Practical Framework for ML Timelines

Break every ML project into phases with explicit decision points:

Data assessment (1-2 weeks). Do we have the data? Is it clean enough? What are the gaps? This phase produces a data readiness report and a realistic assessment of data preparation effort.
Proof of concept (2-4 weeks). Can we demonstrate the approach works at all? This uses a subset of data, a simple model, and offline evaluation. The output is not production code — it is evidence that the approach is viable.
Go/no-go decision. Based on the POC, decide whether to invest in productionisation. This is the most important decision point and the one most organisations skip.
Productionisation (4-8 weeks). Build the production pipeline: data ingestion, feature computation, model training, serving, monitoring. This is software engineering work with ML characteristics.
Iteration and optimisation (ongoing). Improve model performance, handle edge cases, retrain on new data. This is continuous, not a one-time project.

The key insight: budget your timeline for phases 1-3 before committing to a delivery date for phases 4-5. Too many CTOs promise the business a delivery date before the POC is complete.

Measuring AI Project Success

If you measure AI projects the same way you measure feature development, you will either kill good projects too early or let bad projects run too long.

Beyond Accuracy

Model accuracy is the metric ML engineers default to because it is easy to measure. But accuracy alone is a terrible measure of project success.

Business impact is what matters. A recommendation model that is 2% more accurate but does not change user behaviour is worthless. A fraud detection model that catches 5% more fraud but generates so many false positives that the operations team ignores all alerts is worse than worthless.

Measure what the business cares about:

Revenue influenced by AI-powered features
Cost reduction from automated processes
User engagement changes attributable to AI features
Time saved for internal users of AI tools
Error rate reduction in processes augmented by AI

Leading Indicators

Business impact takes time to materialise. In the meantime, track leading indicators:

Model performance on held-out data. Is the model getting better over time? Is performance consistent across different segments of your data?
Inference latency and reliability. Can the model serve predictions fast enough for the use case? Is it available when needed?
Data pipeline health. Are features being computed correctly? Is fresh data flowing through the pipeline?
Experiment velocity. How quickly can the team test new approaches? A team that runs one experiment per month is not learning fast enough.

Setting Expectations With the Business

The most important thing you can do as a CTO is set appropriate expectations with your CEO and board about AI projects. This means:

Being explicit about which phase each project is in (exploration, POC, productionisation, optimisation)
Communicating confidence levels alongside timelines ("we are 70% confident this approach will work based on the POC; we will know more in three weeks")
Establishing kill criteria upfront: what results would tell us to stop investing in this approach?
Celebrating learning, not just launches: a POC that demonstrates an approach will not work is a success if it saves you six months of productionisation effort

Common Failure Modes

The Science Project

An ML team builds technically impressive models that never reach production. They publish internal papers, run beautiful experiments, and present exciting results at all-hands meetings. But no user ever sees the output.

Root cause: The team is disconnected from product priorities and optimising for technical novelty rather than business impact.

Fix: Ensure every ML project has a product owner who defines the use case, a production path that is agreed before work begins, and a timeline that includes productionisation — not just research.

The Demo That Never Ships

A POC or demo works brilliantly in controlled conditions. Leadership gets excited. A launch date is announced. Then the team discovers that the demo used hand-curated data, did not handle edge cases, requires ten seconds of latency per prediction, and breaks on inputs that differ slightly from the training distribution.

Root cause: Confusing a proof of concept with a production system. These are fundamentally different things.

Fix: Always present POCs as what they are: evidence that an approach might work. Budget the productionisation phase explicitly and do not commit to launch dates until that phase is underway.

The Accuracy Trap

The team spends months squeezing out incremental accuracy improvements — from 92% to 93% to 93.5% — while the business needed a "good enough" solution six months ago. The diminishing returns are invisible to the team because they are deeply invested in the technical challenge.

Root cause: Lack of clear business requirements translated into model performance targets. "As accurate as possible" is not a requirement.

Fix: Define minimum viable accuracy based on the business use case before the project starts. When you hit that threshold, shift to productionisation. Set a separate track for accuracy improvements that runs in parallel.

No Feedback Loop

A model is deployed and... nobody measures whether it is working. There is no monitoring for model drift. Nobody checks whether predictions match reality. The model gradually degrades and nobody notices until a customer complaint reveals it has been broken for months.

Root cause: Treating model deployment as the finish line rather than the starting line.

Fix: Model monitoring is not optional. Build it into the productionisation phase. Track prediction distributions, feature drift, and downstream business metrics. Set alerts for significant changes. Budget ongoing maintenance time.

Building Your AI Management Capability

If you are a CTO who has not managed AI teams before, here is a practical development path:

Build your own understanding. Spend time with AI tools yourself. Build something small. Understand the development loop of data preparation, training, evaluation, and iteration. You do not need to become an expert, but you need enough intuition to ask good questions.
Hire your first ML lead carefully. This person needs to be both technically strong and business-oriented. They will set the culture for your entire AI practice. Prioritise communication skills and production experience over publication history.
Start with a well-defined problem. Your first AI project should have clear success criteria, available data, and a well-understood technique. Do not start with your hardest, most ambiguous problem.
Build the infrastructure early. Invest in ML infrastructure (experiment tracking, model serving, monitoring) before you scale the team. It is much easier to build infrastructure for two ML engineers than to retrofit it for ten.
Connect AI work to business outcomes from day one. Every AI project should have a business sponsor, a clear metric it is trying to move, and regular check-ins with stakeholders outside the ML team.
Establish a regular model review cadence. Monthly reviews where the ML team presents model performance, business impact, upcoming experiments, and blockers. This keeps AI work visible and connected to the broader engineering and product roadmap.
Create psychological safety for failed experiments. AI work involves more uncertainty than traditional engineering. If the team feels punished for experiments that do not work out, they will stop taking the risks that produce breakthroughs. Celebrate learning from failed experiments as aggressively as you celebrate successful launches.

For a broader view of how AI strategy fits into the CTO role, see the CTO AI strategy guide. And if you want to understand how the CTO role itself is evolving in response to AI, the article on engineering leadership in the AI era covers that ground.

Take the Next Step

Managing AI teams well is becoming a core competency for every CTO, not just those in "AI companies." The patterns in this article will help you avoid the most common mistakes and build a high-performing AI practice.

If you want to assess your overall readiness for the evolving CTO role, take the CTO Readiness Assessment. It covers all the key competencies — including technical leadership in an AI-augmented world — and gives you a personalised view of where to focus your development.

Looking for your next senior technology leadership role? FractionalChiefs connects CTOs and VPs of Engineering with companies that need experienced technical leadership.

Ready to level up?

Discover your strengths and gaps with our free CTO Readiness Assessment.

Take the CTO Readiness Assessment