
AI Product Engineer Day by Day
2026. 05. 19. 22:41:53@侯佳林
Issue 02: Token Reckoning — Copilot's Billing Overhaul, Uber's Budget Blowout, and a New Agent Runtime Order
Five developments reshaping how engineers build and pay for AI: GitHub Copilot's token billing switch on June 1 (with the real pricing math), Uber burning its 2026 AI budget in four months on Claude Code, Anthropic's new Managed Agents and CI auto-fix platform with a candid postmortem, Microsoft retiring AutoGen for Foundry Agent Service, and MCP hitting 97M monthly downloads under Linux Foundation governance.
리서치 브리프
The past few weeks have forced a hard accounting across the AI engineering stack. On the cost side: GitHub is flipping from flat-rate to token billing on June 1, and Uber burned its entire 2026 AI budget before May — both Claude Code-adjacent stories with different lessons. On the capability side: Anthropic shipped a substantial agent platform update, published a candid postmortem on a month of degraded Claude Code quality, and Microsoft formally retired AutoGen in favor of a new unified agent framework. Underneath all of it, MCP crossed 97 million monthly SDK downloads and entered Linux Foundation governance. Five developments worth tracking closely.
Copilot's token switch: the math you need to know before June 1
On April 27, GitHub announced that all Copilot plans will move from premium request units to GitHub AI Credits on June 1, 2026. 1
The core mechanism: credits are consumed by token usage (input + output + cached tokens) at per-model rates. Code completions and next-edit suggestions stay unlimited and don't touch the credit bucket. After the promotional period, this is what the individual lineup looks like:
| Plan | Price | Base credits | Flex allotment | Total included |
|---|---|---|---|---|
| Pro | $10/month | $10 | $5 | $15 |
| Pro+ | $39/month | $39 | $31 | $70 |
| Max (new) | $100/month | $100 | $100 | $200 |
The "flex allotment" is the variable part: GitHub can adjust it as model pricing changes. The base credits are locked 1:1 to subscription price. 2
For Business and Enterprise teams, the billing pools credits across the entire org (eliminating stranded per-seat waste) and adds admin budget controls at the enterprise, cost-center, and user level. 3
The practical tradeoff: if you're running longer agent sessions or switching to more capable models like Opus, your effective credit consumption per session rises fast. The fallback that previously degraded you to a cheaper model when you hit limits is gone. That's a real change in how budget exhaustion behaves. Teams that haven't run an April usage report should do so before June 1 — GitHub published those reports specifically to help admins forecast what the new system costs at current consumption levels. 4
Uber burned its 2026 AI budget in four months
Reports surfacing across Reddit and X in late April indicate that Uber exhausted its full-year AI budget before May, driven primarily by Claude Code adoption across its engineering organization. 5
The mechanism is straightforward even if the scale surprised Uber's finance team. Claude Code runs on consumption-based pricing: every token generated on complex, large-context codebase tasks compounds across hundreds of engineering teams simultaneously. What looks modest in a pilot looks different when it's running around the clock at company scale.
This isn't a governance failure unique to Uber. Consumption-based AI pricing is genuinely hard to forecast when the underlying tool is good enough to drive organic, enthusiastic adoption. The same dynamic appeared in early cloud compute sprawl — predictably. What's notable here is that Uber is one of the more cost-disciplined tech companies in the sector after years of post-growth-era rationalization. If their controls couldn't contain it, the pattern will repeat at similarly disciplined organizations.
For engineering leaders, the Uber episode is a concrete argument for building per-project token attribution and budget gates before broad rollout — not after. Claude Code running in CI/CD pipelines without visibility into which team is generating which costs is a structurally different problem than a single developer's seat license. 6
Anthropic's agent platform: new capabilities, and a month of degraded quality explained
At "Code with Claude 2026" in May, Anthropic announced several meaningful additions to the Claude Code/agent platform: 7
- Managed Agents: composable APIs that decouple agent execution (tools, sandboxed code execution, checkpoints, credentialing) from the "brain" — targeting the infrastructure bottleneck rather than model intelligence gaps
- Routines: async automation triggered by cron schedules, GitHub webhooks, or API endpoints, so Claude can run overnight and produce mergeable PRs
- CI auto-fix: Claude automatically reproduces failing tests and opens PRs only when it can make the regression test pass on the fix branch and fail on the old version
- Remote agents: sessions can move between devices (including mobile)
- Desktop GUI: full-screen interface with inline diff comments and auto-generated table of contents
The capability headline: Opus 4.7 is at 87% on SWE-bench Verified, up from 62% for Sonnet 3.7 a year ago. Anthropic also cited 80x annualized revenue and usage growth in Q1 2026, against an internal target of 10x. 7
These announcements came alongside a detailed postmortem covering a rough March–April period. Three independent bugs degraded quality for Claude Code, Agent SDK, and Claude Cowork users: 8
- On March 4, default reasoning intensity was quietly downgraded from
hightomediumto fix UI latency. Wrong tradeoff. Most users don't change defaults. - On March 26, a caching optimization bug caused the model to clear its reasoning history every turn after the first idle-session trigger — leading to apparent amnesia, repeated tool call errors, and accelerated credit consumption.
- On April 16, a system prompt change added to reduce verbosity in Opus 4.7 combined with other prompt changes to reduce code quality by 3% on internal benchmarks.
All three were fixed by April 20, and usage limits were reset for all subscribers on April 23. The postmortem is worth reading as an example of how to write one: it names specific dates, version numbers, and internal evaluation scores, and the remediation section includes structural changes (requiring employees to use release builds rather than test builds, stricter staged rollouts for any change that could affect reasoning intensity). 8
Microsoft retires AutoGen, ships Foundry Agent Service
Microsoft deprecated AutoGen on April 7, 2026, moving it to maintenance mode: bug fixes and security patches only, no new features. The replacement is a two-part stack. 9
Microsoft Agent Framework (public preview, .NET + Python): the unified SDK that merges AutoGen's multi-agent research capabilities with Semantic Kernel's production patterns. Core additions over AutoGen include graph-based workflows, streaming, checkpoints, human-in-the-loop, native MCP/A2A/OpenAPI support, and pluggable memory. Microsoft provides an official AutoGen migration guide. 9
Microsoft Foundry Agent Service: the managed hosting layer. Handles agent runtime, scaling, identity (Microsoft Entra with per-agent identity and RBAC), observability (end-to-end tracing + Application Insights), and built-in content filtering. It supports three agent types: prompt-based (no code, fully managed), workflow-based (declarative YAML, preview), and hosted agents (bring your own code, containerized, preview). 10
The A2A protocol support is notable: agents published to the Foundry Agent Service can be discovered and called by other agents via A2A, which means Microsoft's runtime is positioning as an enterprise-grade participant in the inter-agent protocol layer that MCP/A2A are defining.
For teams currently running AutoGen: the migration guide exists, but the new framework is still in preview and the hosted agent tier (closest equivalent to running complex AutoGen workflows) is also preview. This is a real migration, not a rename.
MCP at 97 million downloads: Linux Foundation governance and the 2026 roadmap
In December 2025, Anthropic donated MCP to the Agentic AI Foundation (AAIF), a directed fund under the Linux Foundation co-founded by Anthropic, Block, and OpenAI. By March 2026, the SDK was at 97 million monthly downloads across TypeScript and Python — up from roughly 2 million at launch, a 4,750% increase in 16 months. There are now 9,400+ public MCP servers, with private/enterprise deployments estimated at 3–4x that number. 11
The 2026 roadmap, published in March by lead maintainer David Soria Parra, defines four priority areas — no longer structured around release milestones:
Transport evolution: make Streamable HTTP stateless across multiple server instances so it behaves correctly behind load balancers. This is the single most-requested production fix. Add scalable session handling (so server restarts don't destroy context) and MCP Server Cards (a
.well-known URL for server metadata discovery without needing a live connection).Agent communication: async task primitives so agents can launch work and retrieve results later — critical for long-running, non-blocking workflows. Streaming for agents that need to process large outputs incrementally.
Governance maturation: a formal contributor ladder (community participant → WG contributor → lead maintainer → core maintainer), a delegation model so mature Working Groups can accept spec changes without full core-team review, and quarterly-reviewed charters for every Working Group.
Enterprise readiness: standardized audit trails, OAuth 2.1 with PKCE for browser agents, SAML/OIDC for enterprise identity providers, gateway behavior standards, and configuration portability. Most of this work lands as extensions rather than core spec changes. 11
The audit trail and authentication gaps are the most immediate blockers for regulated-industry deployments. Both are acknowledged in the roadmap, but neither has shipped yet. Teams building for compliance-sensitive environments need to implement structured tool-call logging themselves until the standard lands.
Next issue: watching for LangGraph 0.3 production release details, GitHub Copilot's first post-billing usage reports, and whether Foundry Agent Service's hosted-agent tier exits preview.
참고 출처
- 1GitHub Copilot is moving to usage-based billing
- 2Introducing flex allotments in Pro and Pro+, and a new Max plan
- 3GitHub moves Copilot to usage-based billing as AI coding costs climb
- 4April reports are now available to prepare for usage-based billing
- 5Uber has burned through its entire 2026 AI budget in four months
- 6Agentic Token Explosion: How to Attribute, Budget, and Control LLM Costs in CI/CD
- 7Anthropic's Code with Claude Announces Managed Agents, Proactive Workflows, Capability Curve
- 8An update on recent Claude Code quality reports
- 9Deep Dive into Microsoft Agent Framework for AutoGen Users
- 10What is Microsoft Foundry Agent Service?
- 11The future of MCP: 2026 roadmap, enterprise adoption, and what comes next
이 콘텐츠를 둘러싼 관점이나 맥락을 계속 보강해 보세요.