Real-time collaboration demands real-time infrastructure. Doing web-based, real-time interactions efficiently is a hard problem, but we've found a solution that reduces the number of open connections by a factor of 10 or more. TL;DR: We're multiplexing many tabs' network interactions via a single persist websocket.
What the Heck is Multiplexing?
Wikipedia will tell you that multiplexing is:
In telecommunications and computer networks, multiplexing (sometimes contracted to muxing) is a method by which multiple analog or digital signals are combined into one signal over a shared medium. The aim is to share a scarce resource.
A good metaphor is taking public transport. If every person who takes the subway or the bus were to buy their own car and try to drive from their start point to their end point, the roads would be absolutely clogged with traffic. Instead, we multiplex ourselves. We share underground carriages and trains and buses. As anyone who has ridden a London night bus can attest, it ain't always pretty, but it is highly efficient.
An interesting detail about multiplexing people is that you have to do extra work to accommodate the shared transportation method. If you had your own car, you could drive directly from Point A to Point B. But when you share a public transport method, you have to go to the train station or bus stop. You have to catch the right bus or train or whatever. You have to get off at the right stop. It's more efficient, but it also requires some additional logistics to do it.
What's Our Scarce Resource?
Cord is a super useful addition to any SaaS tool. We add some sweet, sweet collaboration tooling. Real-time chat, presence, page annotations -- among many other useful features. These are the sort of features that make the best software tools in the world so good. Our sidebar integrates directly into the pages our users do their jobs in. Working in Google Analytics? Cord is there. Working in Hubspot? Cord is there, too. In fact, our sidebar adds chat and presence and all the rest to literally every tool your team uses. You get best-of-breed chat with your team, Slack integrations, all the rest -- for every single tool.
That's the good news.
The bad news is that all those tools mean lots and lots of browser tabs. It's not uncommon for our sidebar to be active on ten or more browser tabs, all open at the same time. The backbone of our chat experience is a real-time connection to one of our servers. So, ten browser tabs... ten socket connections because... well... maths. That's not good news at all.
Our users are working folks. They're on reasonably modern laptops or desktops. We could just do one socket connection per tab. In fact, for the first year of our evolution, that's exactly what we did. However, as our user base has grown and as the number of tools we support has expanded, the number of open socket connections has climbed and climbed and climbed. Going back to our public transport metaphor above, this is a lot like many people all starting their journey nearby to one another and ending their journey nearby to one another, but they're all still taking separate cars.
This clogs up our users' machines with redundant socket connections which eat up additional CPU and RAM and battery power. This also clogs up our servers with lots of connections that are all from the same person. Worst of all, because our users generally only use one browser tab at a time, all these socket connections are mostly doing absolutely nothing. Surely, we can we do better!
Yes, We Can! (And don't call me Shirley.)
When you have lots and lots of socket connections open on a single machine, all funnelling traffic to the exact same server, it creates a really exciting opportunity: Multiplexing!
We knew if we could find a smart way to send all the traffic from all these browser tabs through a single persistent socket connection, we could massively reduce the resource use on both sides of the network. This means less CPU and memory and battery usage for our users. It also means less server CPU power and RAM lost to redundant socket connections.
Sounds like a silver bullet, right? But how exactly do you merge the web traffic of many browser tabs together?
Chrome Extension Superpowers
Because Cord interacts with our users' tools via a Chrome extension, we get a set of superpowers that go way beyond the limitations of a standard webpage. One of the coolest things about working with a Chrome extension is that we have the ability to establish a connection via the background script.
The first step in our multiplexing effort was to find a way to create one persistent socket connection. Creating that connection via the background script solves this one. With that connection established, we can use the Chrome Runtime API to send the traffic from every browser tab to our background script. Likewise, we can listen for incoming events from our background script and responding to incoming data in each browser tab. Two way traffic, achieved!
The Next Challenge: Apollo
We use the popular Apollo framework for managing client-server GraphQL interactions. Most commonly, GraphQL queries and mutations are executed via one-off HTTP requests. To make our app as real-time as humanly possible, we've opted for sending data via a WebSocket connection. This reduces the latency on every client-server interaction. Instead of sending queries via simple GET or POST requests, we send and receive data via this persistent socket on each page. For the single-page case, this is pretty straightforward. Our React application knows what requests it has sent and it knows what subscriptions it's listening for.
But what happens if we re-route all of those requests to the background? And when the background script gets new subscription data, how does it know what browser tab to send that new data to? Without some sort of orchestration, this would become a big mess.
Ports into the Background
To untangle the communication between all these browser tabs and pages and one single socket connection, we're leveraging a piece of Chrome extension infrastructure called a Port. These Ports create a connection between a Chrome page or tab and our Chrome extension itself. Using Ports, each page makes a persistent connection to the Chrome extension. Within the background script of our Chrome extension, we funnel all the messages from the Port connections into a single open WebSocket. This is the substance of the multiplex.
In this structure, instead of each tab/page making a persistent network connection to one of our servers, that page will make an in-memory connection between the CPU thread running the page and the CPU thread running our background script. On a machine with 10 open tabs, this means 10 lightweight background script connections entirely on that machine, rather than 10 open socket connections stretching across the whole internet including our servers.
This does require some careful housekeeping work to maintain. Communicating between the page and the background script can only be done with data. It would be much simpler to be able to pass a callback function to the background script and have it execute when there is new data, but that's not how the Chrome runtime API works. Instead, it's a lot like communicating across frame boundaries, which means you can only pass data back and forth.
To keep track of which requests come from which pages, we've implemented a really straightforward reverse mapping. The Chrome runtime API creates a unique identifier for each Port, which we store in a dictionary along with the related objects responsible for sending data back and forth. Using this reverse lookup, it becomes easy to map the response data from the WebSocket connection back to its origin Port. The mapping is also important because it gives the page/tab the ability stop active GraphQL subscriptions that the background script is maintaining on its behalf.
Beyond the Extension
One of the most useful aspects of how we've implemented this architecture is that gracefully degrades when there is no Chrome extension present. The fallback behaviour is to go back to the socket-connection-per-page approach that we started with. Our application has an API boundary between the network layer and the user-facing React application (largely implemented via React Contexts). The application knows when it has a connection or not. It has no idea if that is an exclusive websocket connection or a Port to the Chrome Extension background script. This enables us to switch out the type of network connection we're using seamlessly. It's a big win on code maintenance and separation of concerns as well.
There is an as-yet unexplored extension to this work. This careful API boundary between the network interaction code and the UI gives us room to go even further with our implementation. In the absence of a browser extension, it's theoretically possible to implement a service worker that takes the place of the background script for multiplexing the network traffic. Implementing this would mean that even when we don't get our Chrome extension super powers, we can still create a high-quality, low-latency, efficient user experience.
Sound like fun?
If that sounds like work you'd like to dive into, you should give us a shout. We're hiring across the stack for people who want to build tools that make people work together better. Checkout our openings here.