When we set out to build device-sync in Muse, we had a very strict requirement at the outset: it should be local-first.
Most apps work by storing all of your data on the server, and send only little bits of it to each device as you use it. Then, if a document is modified on two devices at once causing a conflict, the server is the mediator to choose the winner.
With local-first sync, the world gets much more complicated. Every device you use contains a full copy of your data. If you edit the same document from multiple devices, there is no server to mediate which device wins. Instead, both devices get a full copy of each other’s changes, and they need to independently resolve to the same state.
The Sync Protocol
When building the sync framework for Muse, we not only want to support Muse’s current feature set, but also support any future features we may build in the future. Ideally, implementing new features in Muse won’t also require new development on the sync protocol.
Sync is hard. There’s no way around it, it’s a tough problem to solve, and so an important priority for us is finding ways we can simplify the problem. And our final solution is a bit counter-intuitive in this regard: we decided to build a sync protocol that is application agnostic.
It sounds like building sync for only Muse would be simpler than building a general sync engine, but the opposite is true. Thinking about the general case has forced us to develop a simple protocol based around tiny atomic changes – which we’ll get to soon.
Importantly, this lets us iterate on the sync protocol without making any changes to Muse logic, and also allows for us to build new Muse features without needing to add to the sync protocol at all. This decoupling helps simplify development dramatically, as any engineer working on Muse features doesn’t need to know the intricacies of the sync engine, and any engineer working on sync doesn’t need to know the details of a particular Muse feature implementation. This reduces complexity and reduces bugs.
Last-Write-Wins
Since the server won’t mediate which device’s changes win in the case of a conflict, this means that each device will need to resolve any conflict itself. Since all devices might not be online at the same time, it’s important for all devices to resolve any given conflict to the exact same resolution. We’ve decided to use the very simple last-write-wins method. In the case of a conflict, whichever device made the edit most recently wins and its edits will be used.
We use a hybrid logical clock to determine the order of events between devices. These are built from the device’s wall clock, a counter, and the device id. We also considered a vector clock, which has some benefits over a HLC. specifically, it can differentiate between edits that were made sequentially and edits that are made in parallel. In the case of a conflict, it would give us the option to ask the user to choose between the two conflicting paths.
We decided on HLCs for a few reasons: vector clocks grow in size over time, while HLCs can be a fixed size. Our edits will be so small, that asking the user to choose between conflicts is never an interaction we’d want to build into Muse, so that benefit as marginal at best.
Another option would have been to use something like Operational Transforms. This method requires tight coupling to the application data structure itself. The number of transforms can grow enormous for large data structures, and these are significantly more complicated to reason about compared to last-write-wins.
For the above reasons, we decided to use fixed-size HLCs, small data changes, and a last-write-wins methodology. This gives us an application agnostic sync strategy that’s very easy to reason about.
Our final strategy was inspired by the work many of the Muse team did while at Ink-and-Switch, James Long’s work on Actual Budget app, and the Datomic distributed database.
Atomic Changes
In order to choose the most recent write to win, we needed a definition of what a actually composes a write. We wanted each distinct write to be as small as possible, so that any conflicts that arise will affect the smallest piece of the data-structure. we want the inevitable conflicts to be small and self-healing.
The smallest change to the data-model would be to change the value of a single attribute of a single object, so at a minimum our change tracking should show:
Object Id | Attribute | Value | Clock |
---|---|---|---|
UUID object identifier | String attribute name | Int, Double, String, or Binary | Hybrid Logical Clock |
We also need to know which device sent the changes. If a device is lost or stolen, we’d like the ability to ignore that devices changes after some given clock value. We also plan for encryption, so we’d need to know which encryption keys to encode/decode for a given message, as each device would have its own key-pair.
The ability to add sharing and collaboration is on our minds as well. We want to provide an easy way for a user to share some portion of their data with someone else, without needing to share their entire corpus or share every application object individually. So we introduced the concept of Scopes. A Scope holds any number of objects, and in practice is mapped 1:1 to a Muse document, and is defined by a UUID. This also means that each object in the database is now addressed by both its Scope Id and Object Id.
So our final model for an atomic change is:
Scope Id | Object Id | Attribute | Value | Clock | Device Id |
---|---|---|---|---|---|
UUID | UUID | String name | Int, Double, String, or Binary | HLC | UUID |
Each row in the database is called an Atom
, and is the smallest atomic change that can be made in the sync engine. Each device sends its list of Atoms
to other devices, and the most recent change per scopeId
, objectId
, and attribute
define the value
for that property.
Mapping the Sync-Schema to Swift
Pre-sync Muse is built on top of CoreData. This gave us numerous advantages, first being that the Sqlite schema used by CoreData is automatically translated to the Swift class. We don’t need to manually query Sqlite and populate objects ourselves, all of that database logic is handled under-the-hood by CoreData.
Property wrappers in Swift let us do something similar for our own sync engine. Below is an example class definition for a sync-backed object type.
class Board { let scopeId: UUID let objectId: UUID @Merged var title: String } class Card { let scopeId: UUID let objectId: UUID @Merged var position: CGPoint @Merged var size: CGSize @Merged var board: Board }
The @Merged
property wrapper automatically saves/loads from the sqlite database for its property. The @Merged
wrapper requires that the type of the property implement a protocol to facilitate that transformation – the property should be translated to/from an Int, Double, String, or opaque binary data. For the String title, if device A
at time 1
makes the following change: myBoard.title = "some value"
, that will generate the sync change below:
Scope Id | Object Id | Attribute | Value | Clock | Device Id |
---|---|---|---|---|---|
scopeId | objectId | “title” | “some value” | 1 | A |
This change arrives in Sqlite as a row in the database. For cards, their position
and size
properties are more complicated than the board’s title
property, as they contain 2 Doubles instead of 1 String. While we could store this structure in a single change inside a binary blob, it’s much easier and clearer to store each subpart separately.
To support such a world, some Swift types such as CGPoint
and CGSize
are composed of two CGFloat
atomic changes. So setting myCard.size = CGSize(200, 200)
will generate the following two Atoms:
Scope Id | Object Id | Attribute | Value | Clock | Device Id |
---|---|---|---|---|---|
scopeId | objectId | “size.width” | 200 | 2 | A |
scopeId | objectId | “size.height” | 200 | 3 | A |
Since each Swift type can be either a single atomic row, or be composed of multiple atomic rows, this lets us define an arbitrary Swift data structure that can sync atomically between devices, and the property wrapper handles the translation between Swift and Sqlite automatically.
Those changes can even be built up by multiple layers of Swift types, with CGRect
being a simple example. The following pseudocode describes how changes for complex types can be built from changes to simpler types.
// pseudocode for simplicity, as attribute names would be // separated with '.' for clarity // CGFloats return a single atomic change CGFloat: AtomicChanges { func changes(for prefix: String) -> [Atom] { return [prefix + propertyName: self] } } // CGPoint and CGSize return two changes - the changes from each CGFloat part CGPoint: AtomicChanges { func changes(for prefix: String) -> [Atom] { return x.changes(for: prefix + "x") + y.changes(for: prefix + "y") } } CGSize: AtomicChanges { func changes(for prefix: String) -> [Atom] { return width.changes(for: prefix + "width") + height.changes(for: prefix + "height") } } // CGRect is built from 4 changes, 2 from each of its origin and size CGRect: AtomicChanges { func changes(for prefix: String) -> [Atom] { return origin.changes(for: prefix + "origin") + size.changes(for: prefix + "size") } }
If we had defined a Card.frame
property, its CGRect
type would be built from CGPoint
and CGSize
types, which build CGFloat
Atoms
.
If we had defined the Card
with a single frame
property:
class Card { let scopeId: UUID let objectId: UUID @Merged var frame: CGRect @Merged var board: Board }
Then we could edit a card with myCard.frame = CGSize(125, 145, 200, 300)
Scope Id | Object Id | Attribute | Value | Clock | Device Id |
---|---|---|---|---|---|
scopeId | objectId | “frame.origin.x” | 125 | 4 | A |
scopeId | objectId | “frame.origin.y” | 145 | 5 | A |
scopeId | objectId | “frame.size.width” | 200 | 6 | A |
scopeId | objectId | “frame.size.height” | 300 | 7 | A |
To load an object, we can search Sqlite for all rows grouped by the scopeId
, objectId
, and attribute
and group by the clock
‘s max value. The property type can then load its value in a similar recursive way to decode the Atoms
into the final Swift value.
Referencing another object will simply save that object’s scope and object ids as the binary value. So setting myCard.board = myBoard
will generate the following two Atoms
.
Scope Id | Object Id | Attribute | Value | Clock | Device Id |
---|---|---|---|---|---|
scopeId | objectId | “board.scopeId” | board’s scopeId data | 8 | A |
scopeId | objectId | “board.objectId” | board’s objectId data | 9 | A |
This simple Sqlite schema storing Atoms
in a table and translating these to Swift types with property wrappers gives us enormous flexibility in our application’s data structures while keeping the sync logic itself very simple.
The Sync Foundation
Now we have a flexible and powerful local-first sync framework. The synced data is composed of Atoms
, application-agnostic atomic changes, and from these small changes each client can resolve to the exact same state as every other client thanks to the hybrid logical clock.
From this list of changes, we can build up complex application-specific data types, and Swift property-wrappers let us hide the underlying sync implementation from the application logic.
While this post describes the structure of the synced changes into Atoms
, it doesn’t describe how those Atoms
are actually synced across devices. I may cover this in a future post, but the important piece is done – defining what it is that actually needs to be synced. The sync itself can happen over any number of pathways – peer to peer, a central server, flag semaphore or morse code. No matter the pathway, all synced devices will agree on the state of the application thanks to the simple sync data structure and hybrid logical clocks.
Local First at Muse
If you’ve read this far, then you’re a good fit for Muse – we’re hiring a local-first engineer!