Zoom AI Companion Summary via API: Why a Server-to-Server OAuth App Can't Read It (and the VTT Workaround)

The plan looked clean on a whiteboard. A consult call happens on Zoom. AI Companion writes a tidy summary. An automation grabs that summary, hands it to Claude along with the lawyer's notes, drafts a standardized consult note, and uploads it into the right matter in Clio. Two sources in, one document out, hands-free.

Then we tried to read the AI Companion summary from the API, and our Server-to-Server OAuth app couldn't see it.

We build and maintain the open-source Clio and MyCase MCP connectors, so we hit these integration edges as part of the job. This one is worth writing down, because the documentation makes the summary feel like a normal endpoint and it really isn't. If you're wiring Zoom into a CRM or a practice-management tool, this post will save you a day of confusion and one wrong architectural bet.

Can a Server-to-Server OAuth app read the AI Companion summary?

Short answer: not reliably. The read scope for AI Companion meeting summaries is not granted to Server-to-Server (S2S) OAuth apps in the general case. You install an S2S app, you request the scopes you think you need, and the summary scope simply isn't on the menu the way the recording scopes are.

This is easy to miss because S2S OAuth is the obvious choice for this kind of backend automation. There's no human clicking "allow." One admin installs the app once, it's scoped account-wide, and it can pull recordings for any user without an interactive authorization flow. For grabbing cloud recordings and transcripts, S2S is the cleanest path. So you reach for it, wire up the recording calls, and only later discover the AI summary won't come along for the ride.

The takeaway: A Zoom S2S OAuth app is excellent for pulling cloud recordings and VTT transcripts account-wide with no user authorization step. It is the wrong tool if reading the AI Companion summary through the API is a hard requirement. Those are two different app types with two different scope sets.

When do you need a General (account-managed) OAuth app instead?

If reading the AI Companion summary through the API is genuinely non-negotiable, you generally need a General (account-managed) OAuth app with the summary read scope granted, not an S2S app. That's the trade-off: a General app supports the broader scope set, but it brings an authorization flow and more setup overhead than the single-install S2S model.

Before you go down that road, ask whether you actually need the summary at all. In our experience, the answer is usually no, for one blunt reason.

The AI Companion summary is fragile even when you can read it

Two properties make the AI summary a poor backbone for any durable automation:

It auto-deletes at 30 days. Zoom expires AI Companion summaries after 30 days. If your pipeline lags, retries after a failure, or processes a backlog, the summary you were counting on may already be gone.
Availability depends on account configuration. Whether the summary exists, and whether your app type can read it, varies by Zoom plan and admin settings. You can't promise a workflow that quietly assumes it'll always be there.

So you end up with a feature that is both hard to reach (wrong app type) and unreliable when you do reach it (auto-deleting, config-dependent). That combination is a clear signal: don't build your automation on it. Treat the AI summary as best-effort. If it's there, great, fold it in. If it isn't, your pipeline should not care.

The VTT workaround: build on the cloud-recording transcript

The reliable backbone is the cloud-recording VTT transcript, not the AI summary. Turn on cloud recording (a paid feature) and enable audio transcription, and Zoom produces a WebVTT transcript of the meeting that you can pull through the API with the recording scopes an S2S app already has.

If you specifically wanted the summary for its brevity, the move is simple: pull the full VTT and summarize it yourself. Hand the transcript to whatever model you're already using in the pipeline, with a prompt that produces exactly the structure you want. You get a consistent, controllable summary every time, instead of inheriting whatever Zoom generated and hoping the format holds. For a workflow that has to look identical across many runs, generating your own summary from the transcript is actually the better outcome, not a consolation prize.

What we standardized on: VTT transcript as the source of truth, our own model-generated summary on top of it, and the AI Companion summary treated as an optional extra that's nice to have when the account and timing line up. The transcript is the contract; the summary is a bonus.

Do you need a public server to do any of this?

No, and this is the part that surprises people. Polling GET /users/me/recordings works fully from a local process. There is no public endpoint requirement, no webhook receiver to host, no inbound traffic to expose. A single machine can poll every ten to fifteen minutes, notice a new recording, pull the VTT, and move on.

We had initially assumed the auto-trigger would force us to stand up a hosted server to receive Zoom webhooks. It doesn't. Polling covers the trigger entirely, which matters a lot if your design goal is to keep data local and the moving parts few. You only need a public, push-based webhook setup if you have a specific reason to react in real time, and for meeting-to-CRM work you almost never do.

Plan for the transcript-ready delay

Cloud-recording transcripts are not instant. Budget for roughly twice the meeting length. A 30-minute consult typically has its VTT ready about an hour after the call ends. If you poll the moment the meeting closes and expect a transcript, you'll get nothing and may wrongly conclude something broke.

Design around the latency. Poll on an interval, check whether the transcript exists yet, and process it when it lands. The good news is that this delay also removes any argument for real-time webhooks. If the data isn't ready for an hour, a fifteen-minute poll is more than fast enough.

Data residency: pull-and-store-fast

If you're handling privileged or otherwise sensitive content, residency matters. Zoom operates a Canadian data center, but AI-content residency is not guaranteed to stay Canada-only. We don't treat "it's probably in the right region inside Zoom" as a control you can rely on.

The pattern we use is pull-and-store-fast: retrieve the transcript (and the AI summary if it happens to be available) as soon as it's ready, write it into a system you control with the residency you need, and stop depending on Zoom to hold the content in a specific region indefinitely. The same logic applies wherever the data flows next. If you send the transcript to a model for summarization, know where that model processes data. For Canadian legal work specifically, note that the major model providers don't all offer a Canadian region, so you may be making a documented cross-border call. That's a decision to make on purpose, with a zero-data-retention arrangement and a clear record, not something to discover after the fact.

Closing the loop into the CRM

The whole point of pulling a transcript is to do something with it. In our case that's drafting a note and writing it back into the matter. A couple of things are worth flagging if your endpoint is a practice-management tool like Clio, MyCase, or PracticePanther.

First, writing a document back usually isn't a single call. Clio's document upload, for example, is a two-step presigned-S3 flow: you create the document record, PUT the bytes to the returned upload URL, then PATCH the record to mark it fully uploaded. It works well once you've built it, but it's not the one-liner people expect. We cover the broader integration surface in our Clio API integration notes and the Claude + Clio integration work the connectors came out of.

Second, mind the rate limits on the CRM side. Clio's API should be treated conservatively at around 3 requests per second per app; honor the X-RateLimit-* headers and back off cleanly on a 429. A meeting-to-CRM flow is low-volume, so this rarely bites for a single consult, but it absolutely will if you ever batch a backlog.

Third, keep a human in the loop on the output. For legal work, professional-responsibility guidance (the LSO's evolving practice guidance and the ABA's Formal Opinion 512 cover the same ground) points to a duty to verify AI-assisted work product. Bake a review gate into the workflow so the drafted note lands in front of the lawyer for approval before it's treated as final. That's not just compliance theater; it's also where the automation earns trust.

The short version

If you're wiring Zoom into a CRM and reached for the AI Companion summary, here's the decision in four lines:

A Server-to-Server OAuth app can't reliably read the AI Companion summary. The summary read scope generally needs a General (account-managed) app.
Even with the right app type, the summary auto-deletes at 30 days and depends on account config, so it's a bad backbone.
Build on the cloud-recording VTT transcript instead. Pull it with the recording scopes an S2S app already has, and generate your own summary if you want one.
You don't need a public server. Local polling of the recordings endpoint covers the trigger; just plan for the transcript to be ready about twice the meeting length after the call.

Plenty of teams will get this far with off-the-shelf glue. You can wire much of it together in something like n8n, and that's a perfectly good place to prototype. The parts that tend to need real engineering are the write-back into the CRM, the residency handling, and the human-review gate, which is the work we do. Generic integration shops and firms like Arkenea or Topflight can build apps, but the specific edges here, the S2S scope wall, the 30-day summary expiry, the two-step Clio upload, are the kind of thing you only know by having hit them.

Frequently Asked Questions

Can a Zoom Server-to-Server OAuth app read the AI Companion meeting summary?

Not reliably. The read scope for AI Companion meeting summaries is not available to Server-to-Server (S2S) OAuth apps in the general case. To read the AI summary through the API you typically need a General (account-managed) OAuth app with the summary scope granted. Even then the summary is best-effort: it auto-deletes after 30 days. For a dependable automation, build on the cloud-recording VTT transcript and treat the AI summary as a bonus when it's available.

What's the difference between S2S OAuth and a General OAuth app for the Zoom API?

Server-to-Server OAuth is a single admin install scoped account-wide, with no end-user authorization flow. It's the cleanest way to pull recordings and transcripts across an account. A General (account-managed) OAuth app supports broader scopes, including the AI Companion summary read scope, but requires an authorization flow and more setup. Use S2S for transcripts; only reach for a General app if the AI summary is a hard requirement.

Do I need a public server to pull Zoom recordings and transcripts?

No. Polling GET /users/me/recordings works fully from a local process and needs no public endpoint or webhook receiver. You can run the entire recording-and-transcript pull on a single machine. A public server is only required if you want push-based webhooks, which most meeting-to-CRM workflows don't need given that the transcript itself isn't ready until roughly twice the meeting length after the call ends.

How long after a Zoom meeting is the VTT transcript ready?

Plan for roughly twice the meeting length. A 30-minute call typically has its cloud-recording VTT transcript available about an hour after the meeting ends. Cloud recording must be on (a paid feature) and audio transcription must be enabled. Design your automation around this delay rather than expecting the transcript the instant the meeting ends.

Where is Zoom AI Companion data processed for Canadian data residency?

Zoom operates a Canadian data center, but AI-content residency is not guaranteed to stay Canada-only. If you have a strict data-residency requirement, the safe pattern is pull-and-store-fast: retrieve the transcript and any AI summary through the API as soon as they're ready, store them in a system you control with the residency you need, and don't rely on the content living in the right region inside Zoom indefinitely.

Next Step

We build Zoom-to-Clio and Zoom-to-MyCase automations for legal teams, on top of the open-source MCP connectors we maintain. If you're trying to turn consult calls into clean, written-back matter notes, and you'd rather not rediscover the S2S scope wall and the 30-day summary expiry the hard way, we're happy to talk architecture.

We offer a free 30-minute review where we'll look at your current Zoom and CRM setup, your residency constraints, and the workflow you want to automate, then tell you honestly what's a config change versus real connector work. No pitch deck, just a technical conversation.

Book a free integration review →

Or read more from our legal AI integration practice:

Zoom AI Companion Summary via API: Why a Server-to-Server OAuth App Can't Read It