@danecando
Published on

Introduction to Cloudflare Calls

Cloudflare Calls is a new service in beta that was announced during their 2023 developer week extravaganza. At the time of this writing the documentation for this new service is very succinct. If you don't have experience with WebRTC API's it may be a little bit challenging to understand on a first pass. I've had the opportunity to chat with some of the team that is working on this product and I believe there is more documentation and content about the service coming soon.

The Calls API is a pretty low-level one as far as WebRTC service providers go. It's got a small surface area and leaves the developer with the flexibility to build simple to complex real-time media applications without many constraints. The other side of the coin is the fact that building media streaming applications is a lot of work and can be pretty complex. There are other competing services on the market that offer a lower barrier to entry with a higher level API and drop in widgets. Like everything else with software it's a set of trade-offs to consider. This post will focus on the Cloudflare Calls offering and what the pros and cons are compared to some of the other services out there.

Cloudflare overview

If you're familiar with Cloudflare, they have a pretty incredible suite of services and infrastructure available for websites and other software applications. I think that they are most well-known for their globally distributed network that powers their CDN, web security services, and cloud infrastructure like storage and compute. Their Workers and related services fall into the 'serverless' category where compute is distributed across the network at a large scale. Most (all?) of their offerings are specifically designed to leverage this global network infrastructure and Calls is no different.

Here's a simple illustration to highlight how their global network works. This is also commonly referred to as an 'edge' network these days. A user will connect to a Cloudflare service that is nearby to minimize network latency.

Cloudflare Locations

WebRTC basics

WebRTC (Web Real-Time Communication) is a standard that was developed to support real-time communication / media sharing between users on the web. It provides a set of API's that enable users to establish peer-to-peer connections and transfer data directly to one another with minimal latency.

Here's a simplistic illustration of the WebRTC peer-to-peer connection model.

WebRTC Peer-to-Peer connection

If WebRTC does all the heavy lifting, what do we need a third-party service for?

Peer-to-peer connections work great for a very small number of users but it doesn't take long for scale issues to arise using that model. There are several different solutions to help handle this scaling issue and one of the most common is a SFU (Selective Fowarding Unit). A SFU is a server that acts as a central hub for all the connected participants and handles forwarding media streams around to the connected users.

All of this can get pretty complicated and requires a lot of expertise. If you want to learn more about it I suggest reading the WebRTC content on MDN. The rest of the article we will dig into what the Cloudflare Calls service provides for us and what it's doing from a high-level perspective.

Cloudflare + WebRTC = Calls

We needed to touch on the Cloudflare global network to setup the high-level view of how the Calls service works and what it provides. I think the most simplistic summary is that it's the 'Edge' or 'Serverless' version of SFU/media broadcasting servers. Instead of a centralized server or cluster of servers to handle routing and streaming media to users, it's distributed through their global network and media is delivered to users from a server nearby to them.

From this perspective you have the same sort of advantages you might get from using other types of more traditional serverless/edge compute.

  • Lower latency for the end user
  • Don't need to manage your own servers / infrastructure
  • Automatic scaling

The Calls vs SFU's page on the documentation covers all of this stuff in pretty good detail. The next section we will walk through and talk about the details of the 'How Cloudflare Calls Works' section of that page.

Breaking down 'How Cloudflare Calls Works'

Establishing Peer Connections

To initiate a real-time communication session, an end user’s client establishes a WebRTC PeerConnection to the nearest Cloudflare location.

In the WebRTC section we highlighted how peers will connect directly to each other. In a Cloudflare Calls application, instead of making the PeerConnection directly to the other user, it's made between the user and the Cloudflare server running near to them.

Cloudflare has a nice diagram of this on their announcement blog post:

Cloudflare Calls diagaram

The PeerConnection is established with the Cloudflare node where the user can publish and subscribe to media from the other users in the session. Simple enough, but there's a little bit more involved with this process from our end as developers.

Calls doesn't automatically deliver media from other users when the connection is established. A mechanism that keeps track of all the users in a call room and can provide a list of available tracks to subscribe to is required. In the demo app this is all coordinated through a WebSocket server.

Some of this is described in the 'Signaling and Media Stream Management' section. The PeerConnection between the user and the CF node will deliver the tracks that they request.

The "Calls as a Programmable 'Switchboard'" analogy that they use is great for understanding the Calls programming model.

Sessions, tracks, and the Calls API

The Calls API is pretty unique compared to other solutions that I've seen. It's a small API abstraction of the WebRTC model.

If you read the "Session and Tracks" page, they put an emphasis on distancing themselves from traditional domain concepts like "rooms" and "participants". While this seems nice in theory, I believe that most applications will probably end up with these abstractions one way or another.

With Calls you work with Sessions and Tracks. The WebRTC PeerConnection that we described is a Session and Tracks are any audio/video stream for a user.

Here's how you might start a session and subscribe to tracks with a couple of calls to the Calls API:

  • POST /apps/{appId}/sessions/new - User joins call, creates a new session
  • POST /apps/{appId}/sessions/{sessionId}/tracks/new - User provides tracks to the session (Your own tracks and tracks of any peers that are also part of the call that you want to subscribe to)

In upcoming posts we will deconstruct the orange demo application to get a deeper understanding of all the bits and pieces required to develop an app with Calls.

Calls vs other services

We mentioned at the beginning that Calls provides a lower-level API compared to most other similar services out there. Let's highlight some of the scenarios where Calls might be a good fit.

  • Flexibility - If the grab-and-go style solutions won't work for whatever reason(s). Calls can provide the infrastructure needed and get out of the way for you to build your application.
  • Network - Again, infrastructure. Cloudflare has this in spades. Low-latency and scalability.
  • Cost - Streaming media is expensive. Most other services charge per minute streamed where Calls has a pricing model based on data transferred (per GB streamed). This leaves a lot more room for controlling costs.

I'm not a complete shill so I won't tell you that it's the best option period for any use case. (Especially at this point in time considering it's a new Beta product). If you have an established business or product and you're confident that you're building something for the long term - I think it will be a solid option.

Building with Calls

Calls is a Beta product at the time of this writing and you can get started building with it by forking their demo application. If you decide to build on Calls currently, it's going to come at the cost of a lot of engineering hours. Building real-time media apps can be complex and time-consuming.

I have personally spent some time building with Calls myself. Following along with the orange repo it seems that they have added some additional resources to expanding and improving the product. This has made me hesitant to spend to much time working on it without knowing what their plan is. I would like to see a more plug-n-play solution around the service though.

Summary

Calls is a compelling new WebRTC / Real-time media solution powered by Cloudflare's global network/infrastructure. It may not be the right choice for everyone at the moment. In a future post I will dive in to the source code of the orange demo application to get a better idea of what a Calls implementation looks like and how it works.