Why are we refactoring THORN?

Why are we refactoring THORN?

July 12, 2022

July 12, 2022

In November 2021, THORN opened a closed internal test with more than seven hundred users participating.

Although THORN's emphasis on interaction design and writing experience was recognized by many users, a number of users complained that THORN was not a good experience when the network environment was unstable. At that time, our first reaction was that this did not seem to be a problem with THORN.

But then, a question came to our mind: Is the network really necessary for writing applications? With this as an opportunity, the team's next discussion extended to a series of questions, especially about privacy and security.

We decisively paused development work and stopped to rethink what kind of software application THORN should be? Then, after much research and practice based on privacy, speed and practicality, we began to realize the endless possibilities of local-first for THORN:

  • We believe that data ownership and real-time collaboration are not contradictory. It is possible to create software that has all the benefits of a cloud application, while still allowing you to retain ownership and control of all the data you create.

  • We refer to this type of software as local-first software because it prioritizes the use of local storage over servers in remote data centers.

  • In a cloud application, the data on the server is considered the primary copy of the data; if the client has a copy of the data, it is simply subordinate to the server's cache. Any data modifications must be sent to the server. In local-first applications, we swap these roles: we treat the copy of data on the local device (laptop, tablet, or phone) as the primary copy. Servers remain, but they keep secondary copies of the data to help with access from multiple devices.

Immediately afterwards, we are enlightened as if all the problems have been solved.

Going back to the previous description of local-first software, it doesn't seem much different from some local writing software with cloud backup, right? Actually, no, please read on.

Not abandoning the cloud, but local first

For multi-device synchronization and user collaboration, traditional cloud-based solutions have been successful with many products and services. While they allow you to access your data from anywhere, all access to your data must go through their servers, and you are only allowed to do what the servers allow you to do. So all of your data, including the history of changes made to that data, is recorded in its entirety.

In other words, you do not fully own the data at all, the ownership and control of the data is in the hands of the service providers and you have no recourse but to assume that these service providers will follow the user agreements and privacy policies they propose and handle all your data properly.

On the other hand, you are also at the mercy of the company providing the service. If the service is unavailable or shut down, you will no longer be able to access the data created with the software, even if you can export the data, but in most cases you will usually not be able to use the software properly if there is no server. This is why most users trust large companies or companies with a strong capital background, which are less likely to go out of business and therefore can provide longer and more stable services.

You may have used software that synchronizes data between devices via iCloud or WebDAV drives, and the data is usually stored in files on local disks, so you have full control and ownership of your data: you can do whatever you want, including long-term archiving, making backups, and manipulating files with other programs.

You don't need anyone's permission to access your files at any time, without going through a server operated by a company. These programs usually guarantee you absolute privacy and are free of any censorship. However, they do not give you the ability to synchronize in real time, collaborate online, etc.

So, you have a tough trade-off between privacy and collaboration. Can't we have the best of both worlds?

Of course we can, and the answer is "local first". Local-first software has seven features.

  1. Fast response: The primary copy of data is stored on the local device and users never have to wait for a network connection. Data synchronization with other devices and users takes place silently in the background.

  2. Multi-device synchronization: Data is stored in local storage on each device, and this data is also automatically synchronized across all devices the user is working on.

  3. Optional Network: Users can read and write data at any time, whether online or not. When a network connection is available, the local device is automatically synchronized with other devices.

  4. Collaboration: Native devices and other devices (whether they are yours or not) support real-time collaboration on the same data.

  5. Longevity: Your data should be accessible indefinitely. Since you have native software and a copy of your data, this software will work forever. Even if the software manufacturer goes out of business, you can continue to run the last released version of the software. And you can export all your data to a common format and access it using other software.

  6. Security and privacy: Unlike traditional cloud-based solutions, local-first software does not have a centralized database that holds all of your users' data; your local device stores only your own data.

  7. Data ownership and control: Data ownership and control here is not in the legal sense, but means that the local-first software manufacturer does not restrict your access to the local copy data, and you are allowed to copy and modify this data at any time and by any means without having to go through the service provider's API to access the data.

At this point, you should have a good understanding of local-first software. But unfortunately, what is described above is only an ideal state.

So how exactly should we achieve the best possible implementation of local-first software?

Infrastructure for Native-First Software: CRDT

According to Ink&Switch's research, CRDT is one of the most likely infrastructure technologies to be used for native-first software implementations.

CRDT is a special data structure that allows multiple devices to collaboratively edit the same data object. Specifically, if device A and device B are editing a data object at the same time, and over time two sequences of changes are generated, changes-on-A and changes-on-B, then both devices can compute the final state of the data locally, as long as device A and device B send their changes to each other one by one.

Sounds like nothing special, right? But the amazing thing is that CRDT is mathematically proven to ensure that the final state of the data computed by each device must be the same as long as all changes to the data are received (regardless of the order in which they arrive).

We can imagine that there is no central server to handle and resolve data conflicts for all users, but rather each device resolves the conflicts itself, and CRDT ensures that all devices can compute identical results even if each device does its own computation.

The above description is not accurate, but it can already express the core advantage that CRDT can bring, decentralization.

The prerequisite for decentralization is the application and implementation of CRDT on the client side, as existing collaboration software can also be used to resolve data conflicts by applying CRDT technology to their server-side software. Some of the more representative ones are Azure Cosmos DB, Redis, Riak, Weave Mesh, SoundCloud's Roshi, and Facebook's OpenR.

If you're still interested in exploring CRDTs, check out this article by Alexei Baboulevitch on Data Laced with History and this video by Martin Kleppmann on CRDTs and the Quest for Distributed Consistency.

By now, I think you understand the significance of CRDTs for local-first software.

We believe that CRDTs have the potential to become the foundation for a new generation of software. Just as packet switching is the technology that powers the Internet and the Web, or capacitive touch screens are the technology that powers smartphones, we think CRDT could be the foundation for collaborative software that gives users complete ownership of their data.

THORN's data synchronization engine

Let's go back to the implementation. As you can notice, for CRDT, a server is still needed to ensure that the client is always informed of changes to the data by other clients when they are connected to the network (and of course, the client needs to inform the server of changes to the data when they are offline).

The diagram above shows the logical structure of THORN's data synchronization mechanism, as you can notice.

  1. all client devices of each user, have a copy of the data saved in the local database.

  2. The THORN synchronization service also has a copy of the data, but it is stored in the AliCloud OSS.

Technically speaking, the official THORN Sync Service is a more reliable "client" because it stores the encrypted copy of data in AliCloud OSS with co-location redundancy and off-site disaster recovery enabled, providing up to 12-9 data persistence compared to the local storage media on the user's device.

Meanwhile, when you actively delete a data object, the official synchronization service will broadcast the deletion event to all online devices, but if there is an offline device (which does not receive the broadcasted deletion event due to network reasons), then the deleted data will remain in this offline device (you can subsequently resynchronize (restore) the data object through this offline device at any time).

On the other hand, when any client device connects to THORN Synchronization Service, the client and THORN Synchronization Service will transmit data updates to each other to complete data synchronization, and then THORN Synchronization Service will push the data updates from other clients to that client, and at the same time the data changes from that client will be transmitted to the Synchronization Service at any time.

Therefore, there is a star architecture between the synchronization service and the client devices:

As shown in the figure, the synchronization between clients 1, 3, and 4, and the THORN synchronization service, ensures that the state of their respective data copies is consistent.

When client 3 reconnects to the network, it also synchronizes with the THORN synchronization service to achieve the final consistency of the state of all the data copies.

Doesn't seem like anything special? Aren't traditional cloud-based solutions similar to star architecture. This is not the right understanding, the traditional model is a centralized star architecture, the client is subject to the server, without which the client can barely function.

THORN's model is a star architecture similar to P2P networks, where the client and server are equal and there is no synchronization service, and the client can operate independently (but without user collaboration and multi-device synchronization).

The equality between the client and server is reflected in the fact that the server can be replaced, either with the official THORN Ops synchronization service or with a self-deployed THORN synchronization service. It is worth mentioning that the switch between the synchronization services can be done quickly, all you need to do is to change the synchronization service and resynchronize the full amount of data once.

I'm sure you have a big question in your mind: does THORN's model not seem to guarantee the aforementioned local-first software privacy and security?

How is privacy and security ensured?

The issue of privacy and security really only concerns the THORN Sync service itself, because under the aforementioned understanding, it is not part of your assets and facilities, even though it exists to provide you with a better service (multi-device sync and collaboration) and to assist you in good faith, not to snoop on you in bad faith.

Aren't you worried that some evil THORN team member is going to peek at your data? While you have your data locally, the sync service also has your data. Maybe you don't care about your data being peeked at by hackers, but in reality, there are many people who simply can't use cloud applications due to legal restrictions and confidentiality obligations. So, how does THORN do it?

In THORN, each user can create multiple spaces, each of which can use different THORN sync services.

It is very important to understand this point above. Because we will implement different THORN synchronization services based on different technologies in the future. Different implementations of synchronization services may have completely different levels of privacy and security guarantees.

The current solution

The current THORN synchronization service is based on Websocket implementation, which not only has low demand on computing and network resources, but also has high performance.

The official THORN synchronization service uses AliCloud OSS with co-location redundancy and off-site disaster recovery as the storage medium, but we will also support connecting users' own purchased object storage services. In addition, THORN Synchronization Service will also support self-deployment.

It is important to mention that since THORN Synchronization Service does not have traditional database dependencies, the resources and cost required for even individual users to self-deploy THORN Synchronization Service is extremely low.

Of course, this is all optional, depending on how you choose to.

  1. Trust the THORN team: use the official THORN synchronization service and save a copy of your data to the official THORN object storage service.

  2. Maximize data ownership and control: use the official THORN synchronization service and save a copy of your data to your own purchased object storage service.

  3. Maximize privacy and data sovereignty: use the self-deployed official synchronization service to save data to your own purchased object storage service.

For 1 this case, your data may face censorship when it is disseminated under the requirements of relevant laws and regulations, as detailed in the THORN Terms Of Use section on "Content related rules".

However, whether 1, 2 or 3, you are expected to comply with the THORN Terms Of Use and use THORN legally, and you will be prohibited from using the THORN Sync Service if you violate the laws and regulations in your area and the THORN Terms Of Use.

Future Solutions

Given the simplicity and excellent scalability and compatibility of THORN, we expect to address data security and privacy issues from the underlying technology in the future through more mature Web 3.0 related technologies such as web3.storage.

To sum up, we bring you the new THORN 2022.

For this article, we refer to Ink&Switch's Local-first software, as well as this article Data Laced with History by Alexei Baboulevitch and this video CRDTs and the Quest for Distributed Consistency by Martin Kleppmann.


© 2022 Mooncyan Inc.