We did not migrate because the Vercel AI SDK stopped being useful.
It is still a strong way to build streaming AI interfaces, structured output utilities, and fast product surfaces. The problem was that Zeiko had outgrown a chat-first model integration pattern.
Our product is no longer just "send a message, get a response." Zeiko runs business agents that:
- operate across chat, website widgets, Slack, Telegram, WhatsApp, GitHub, and scheduled ticks
- call tools with account-scoped permissions
- pause for approvals
- resume long-running work
- hand off to humans
- publish to external systems
- keep auditable memory, events, and runtime evidence
That is a different job. For that job, the OpenAI Agents SDK became the better runtime boundary.
The short version
We moved model-and-tool orchestration into the OpenAI Agents SDK because our agents needed a more explicit execution model.
The Vercel AI SDK helped us build fast UI and model utilities. The OpenAI Agents SDK gave us a clearer home for production agent behavior: tools, guardrails, handoffs, state, tracing, and rollout evidence.
We still keep UI transport and retained utility paths where they make sense. The migration was not "replace everything." It was "put durable agent work in the runtime designed for durable agent work."
Vercel AI SDK vs OpenAI Agents SDK
| Decision area | Vercel AI SDK pattern | OpenAI Agents SDK pattern | Why it mattered for Zeiko |
|---|
| Primary job | AI UI, streaming, structured generation | Agent execution, tools, handoffs, guardrails | Our core product became deployed agents, not isolated AI calls |
| Tool orchestration | App-defined tool loops around model calls | Agent/Runner execution with SDK tools | Easier to reason about account-safe tool policy and runtime outcomes |
| Long-running work | Usually app-level orchestration | Clearer agent execution boundary | Better fit for approvals, retries, scheduled ticks, and resumable workflows |
| Observability | Mostly our own wrappers and logs | SDK-native tracing concepts plus our event store | Less custom glue around agent outcomes |
| Risk control | Possible, but mostly app-defined | Guardrails and approvals are part of the agent model | Safer for publishing, data access, browser automation, and external channels |
| Best retained use | UI streaming, one-shot utilities, schema helpers | Production agent runtime paths | We kept each tool where it had the strongest job fit |
What changed in our product
The migration forced us to separate three categories that had started to blur.
1. Agent runtime work
This is work where the system is not just generating text. It is deciding, using tools, applying policy, recording state, and producing an operator-visible outcome.
Examples:
- main agent chat
- selected-agent execution
- website widget runtime
- external channel replies
- autonomous ticks
- delegated worker tasks
- approval-backed workflow execution
- YouTube Niche Finder tool execution
These paths now belong behind the OpenAI Agents SDK runtime boundary.
2. Retained utilities
Some AI calls are still just utilities. They do not need a full agent runner.
Examples:
- simple classifiers
- one-shot content generation
- schema normalization
- OCR-style helpers
- internal formatting calls
- UI streaming transport
Moving these for the sake of purity would add complexity without reducing risk. So we kept them as utilities and documented why.
3. Blocked high-risk lanes
Some surfaces needed evidence before they could become SDK-default:
- social publishing and visual generation
- workspace memory/data export
- GitHub operations
- Shopify inventory and finance tools
- browser/code execution
- broader YouTube growth workflows
For these, we built local certification gates before promotion. The point was not to "trust the migration." The point was to prove each risky lane had approval, quota, sandbox, credential, and rollback evidence.
The real reason: our old boundary was too implicit
When a product is young, implicit orchestration feels fast.
You call a model. You pass tools. You parse output. You add a few wrappers. You add more wrappers. Then approvals arrive. Then external channels arrive. Then scheduled jobs arrive. Then publishing arrives. Then one day, the "AI call" is actually a distributed business operation with credentials, audit logs, retries, and human review.
That is where our previous boundary started to hurt.
We needed a runtime that made the following things explicit:
- What agent is running?
- Which tools are allowed?
- What state was read?
- What approval did it require?
- What channel did it reply to?
- What happened when a tool failed?
- Can the same behavior run from chat, a widget, or a scheduled tick?
- Can we prove this lane is ready before making it default?
The OpenAI Agents SDK gave us a cleaner center of gravity for that.
Why not stay on the Vercel AI SDK forever?
The honest answer is switching has a cost.
This is where status quo bias can trick engineering teams. Staying with a familiar SDK feels safer because the failure modes are known. But known friction is still friction.
For us, the cost of staying looked like this:
- more app-specific wrappers around every tool loop
- more duplicated runtime logic across chat and channels
- more custom approval/resume handling
- more difficulty proving high-risk capability lanes were safe
- more ambiguity about which AI paths were true agent execution and which were utilities
The migration was worth it once the cost of keeping the old shape became larger than the cost of changing it.
What we kept from Vercel AI SDK patterns
We did not throw away everything around the Vercel AI SDK.
The most important retained idea is still excellent: build great AI product interfaces. Users need streaming, responsive UI, stable message transport, and predictable rendering.
So our migration kept the parts that were still doing the right job:
- UI message streaming where it is purely transport
- retained one-shot utilities where a full agent runner is unnecessary
- compatibility fallbacks during rollout
- structured output wrappers where the call site is not operating as an autonomous agent
This matters because migrations fail when they become ideological. We were not trying to win a framework argument. We were trying to make the runtime match the product.
Why OpenAI Agents SDK fit the next stage
Zeiko's product vision is deployable AI agents for small businesses. That means the agent is not a sidebar feature; it is the operating layer.
The OpenAI Agents SDK lined up with five product needs.
Agents need tools, but not every tool should be available in every context.
A support agent, SEO agent, Shopify agent, and social publishing agent need different permissions. The runtime needs to make tool availability visible and enforceable.
2. Handoffs are real product events
An agent should be able to escalate to a human, delegate to a specialist, or resume after a worker finishes. That cannot be treated as just another generated paragraph.
3. Approvals are part of execution
For destructive actions, external publishing, data export, and account mutations, approval is not optional UX. It is part of the runtime contract.
4. State needs one source of truth
The same agent may run from the web app, a website widget, Slack, Telegram, WhatsApp, or a scheduled tick. If each surface carries its own state model, behavior drifts.
5. Rollout evidence matters
The migration had to be provable:
- local smoke tests
- hosted smoke tests
- blocked-lane certification
- default-runtime promotion
- post-flip smoke
- fallback-removal readiness
That evidence is what let us move from "it should work" to "we can merge this without guessing."
What improved after the migration
The most important improvement was not raw model quality. It was operational clarity.
We can now answer questions like:
- Is this lane SDK-backed, retained, or blocked?
- What evidence allowed it to become default?
- Which account/agent cohort is promoted?
- Did post-flip smoke resolve the SDK runtime?
- Is the fallback still present for rollback only?
- Which utilities are intentionally retained?
That clarity compounds. Every future agent feature has a better place to live.
What got harder
The migration was not free.
The hardest parts were not syntax changes. They were product-boundary decisions:
- deciding which paths were true agent runtime versus utility calls
- keeping compatibility fallbacks without letting them become permanent ambiguity
- proving social publishing and external-channel behavior without breaking production
- documenting retained paths so future work does not accidentally reopen the migration
- keeping local evidence strong enough when hosted credits were constrained
This is the part most migration posts skip. The code change is only half the work. The other half is reducing regret risk for the team that has to operate the system after merge.
A practical migration checklist
If you are deciding whether to migrate from Vercel AI SDK patterns to OpenAI Agents SDK, use this checklist.
Stay with lighter AI SDK patterns when:
- the feature is mostly UI streaming
- the model call is one-shot and stateless
- tools are simple and low-risk
- no approval, handoff, or durable runtime state is needed
- failure is easy to retry manually
Move to an agent runtime when:
- tools mutate customer or business data
- work can run from multiple channels
- the agent must pause and resume
- approval is required before execution
- external publishing or integrations are involved
- you need operator-visible evidence and replay
- scheduled/autonomous execution matters
For Zeiko, the second list became the product.
FAQ
Is Vercel AI SDK bad for agents?
No. It is very useful for AI interfaces and app-level AI features. Our issue was not quality; it was fit. Zeiko needed a stronger runtime boundary for deployed agents, tools, approvals, and cross-channel state.
Did Zeiko remove every Vercel AI SDK usage?
No. We retained utility and transport paths where they still fit. The migration focused on production agent execution, not ideological removal.
Why migrate if the old system worked?
Because "works" is not the same as "scales safely." The old shape worked for narrower AI features. It became expensive once agents needed durable state, approvals, external publishing, and multi-channel behavior.
What was the biggest risk?
High-risk capability lanes: publishing, browser/code execution, data export, Shopify operations, GitHub operations, and broad YouTube workflows. We kept those behind certification gates until evidence existed.
What is the lesson for other teams?
Choose the runtime based on the job. Use lightweight AI SDK patterns for lightweight AI features. Use an agent runtime when the product is actually asking an agent to operate a business process.
The takeaway
The migration was not about chasing a newer SDK.
It was about admitting that Zeiko had crossed a product threshold. Once agents can use tools, publish, approve, hand off, remember, retry, and run across channels, the runtime becomes part of the product's trust model.
That is why we moved production agent execution to the OpenAI Agents SDK.
If your team is building AI features, keep the stack simple. If your team is building agents that operate real workflows, invest in the runtime boundary early.
That boundary is where reliability starts.