Zero-latency SQLite storage in every Durable Object

285 points by ajhit406 a year ago

emadda a year ago

Some other interesting points:

- The write api is sync, but it has a hidden async await: when you do your next output with a response, if the write fails the runtime will replace the response with a http failure. This allows the runtime to auto-batch writes and optimistically assume they will succeed, without the user explicitly handling the errors or awaits.

- There are no read transactions, which would be useful to get a pointer to a snapshot at a point in time.

- Each runtime instance is limited to 128mb RAM.

- Websockets can hibernate and you do not have to pay for the time they are sleeping. This allows your clients to remain connected even when the DO is sleeping.

- They have a kind of auto RPC ability where you can talk to other DOs or workers as if they are normal JS calls, but they can actually be calling another data center. The runtime handles the serialisation and parsing.

crabmusket a year ago

The RPC stuff is pretty interesting. More here: https://blog.cloudflare.com/javascript-native-rpc/
- skybrian a year ago
  
  Without a schema, I’m wondering about validation. I guess your server should use Zod or an equivalent library?
ngrilly a year ago

> The write api is sync, but it has a hidden async await: when you do your next output with a response, if the write fails the runtime will replace the response with a http failure. This allows the runtime to auto-batch writes and optimistically assume they will succeed, without the user explicitly handling the errors or awaits.
It reminds me of PostgreSQL's commit_delay, even thought it's not exactly the same principle: https://www.postgresql.org/docs/current/runtime-config-wal.h...
Litestream, mentioned in the post, is also suggesting a similar technique.
matharmin a year ago

Just wondering, do you have a specific use case for read transactions implemented on the database level here?
In SQLite in general read transactions are useful since you can access the same database from multiple processes at a time. Here, only a single process can access the database. So you can get the same effect as read transactions either by doing all reads in one synchronous function, or implement your own process-level locking.
- emadda a year ago
  
  E.g. if you have many websocket connections and they each have a snapshot at a point in time (that spans over many different await function calls/ws messages).
  SQLite can have many readers and a single writer with WAL, so a many read transactions can exist whilst the writers move the db state forward.
  
  kentonv a year ago
  
  We (Cloudflare) have considered adding an API to create multiple "database connections", especially to be able to stream a response from a long-running cursor while representing a consistent snapshot of the data.
  It's a bit tricky since if you hold open that old connection, the WAL could grow without bound and cannot be checkpointed back into the main database. What do we do when the WAL gets unreasonably large (e.g. bigger than the database)? Cancel old cursors so we can finally checkpoint? Will that be annoying for app developers to deal with, e.g. causing errors when traffic is high?
  SQLite itself calls an open database a "connection" even though there's no actual network involved.
  
  emadda a year ago
  
  I did guess it might be harder to do than vanilla SQLite, as vanilla SQLite just has the WAL and main db on the same hard drive, so it has more space to grow the WAL and it is not an issue when the machine/instance reboots (as it just starts where it left off, even if the WAL is large and has not been check-pointed back to the main db).
  To be honest this is an edge case. But I often start a read transaction on a SQLite connection just so I know multiple queries are reading from the same state (and to ensure state has not been changed between queries).
  
  kentonv a year ago
  
  Ugh didn't notice until too late to edit, but apparently HN interpreted my asterisk as an instruction to italicize everything between it and the footnote it referred to.

tmikaeld a year ago

> ..each DO constantly streams a sequence of WAL entries to object storage - batched every 16MB or every ten seconds.

Which also means it may take 10 seconds before you can (reliably) read the write globally.

I keep failing to see how this can replace regionally placed database clusters which can serve a continent in milliseconds.

Edit: I know it uses streams, but those are only to 5 followers and CF have hundreds of datacenters. There is no physical way to guarantee reads in seconds unless all instances of the SQLite are always connected and even then, packet latency will cause issues.

firtoz a year ago

AFAIK the writes and reads are done only from the same process, so the long term storage will apply only if the current process is hibernated. When you write something and then read it, it's immediate, because the writes and reads are also updating the current process's state in memory.
For another process (e.g. another DO or another worker) to access the data, they need to go through the DO which "contains" the data, so they'd be making a RPC or a HTTP request to the DO, and they'd get the latest information.
+ the hibernation happens after x seconds of inactivity, so it feels like the only time a data write to be unavailable as expected would be when the DO or worker crashes right after a write.
- tmikaeld a year ago
  
  You're right that reads and writes are immediate in the same client connection, this is how it works with CF KV as well - but not across the entire network.
  On KV they expect up to 30 second latency before a write can be written everywhere, I expect similar here.
  
  ec109685 a year ago
  
  Cloudflare ensures all operations on a DO happen on _the_ single instance of that DO, worldwide.
  There’s no such thing as the read after wrote problem because only one host will ever do reads and writes (until that host dies).
  
  dumbo-octopus a year ago
  
  Indeed. The entire purpose of DO’s is essentially to provide the consistency guarantees that KV cannot.
neamar a year ago

The writes are streamed in near real time to five followers, acknowledging it near instantly. The cloudflare blog article mention this more in depth. So writes remain fast, while still having durability.
kentonv a year ago

As others have noted, you misunderstand how Durable Objects work. All traffic addressed to the same object is routed to a single machine where that object lives. That machine always has a consistent view of its SQLite database. You can have billions of objects, but each has its own separate database. There's no way to read from a database directly from a different machine than the one the DO is running on.
memothon a year ago

Those WAL entries streamed to object storage I think are just for backups.
Each DO is globally unique (there's one DO with a given id running anywhere) and runs sqlite on its own local storage in that datacenter.

simonw a year ago

One thing I don't understand about Durable Objects yet is where they are physically located.

Are they located in the region that hosted the API call that caused them to be created in the first place?

If so, is there a mechanism by which a DO can be automatically migrated to another location if it turns out that e.g. they were created in North America but actually all of the subsequent read/write traffic to them comes from Australia?

mhart a year ago

By default in the region you created them in, but you can alternatively specify a locationHint. Use "oc" for Australia. https://developers.cloudflare.com/durable-objects/reference/...
Note the "Dynamic relocation of existing Durable Objects is planned for the future"
- simonw a year ago
  
  Thanks, added that to my post.
dantiberian a year ago

https://where.durableobjects.live is a good website that shows you where they live. Only about 10-11% of Cloudflare PoPs host durable objects. Requests to another PoP to create a DO will get forward to one of the nearby PoPs which do host them.
masterj a year ago

> Durable Objects do not currently change locations after they are created
> Dynamic relocation of existing Durable Objects is planned for the future.
https://developers.cloudflare.com/durable-objects/reference/....
IIRC Orleans (https://www.microsoft.com/en-us/research/wp-content/uploads/...) allows actors to be moved between machines, which should map well to DOs being moved between locations.
- pests a year ago
  
  As actors in Orleans are virtual and persistent it can also be the case it is running nowhere.
  If it's stateless it could be running in multiple locations.
  I worry "Dynamic relocation of DOs" might be going a bit too granular, this should be something the runtime takes care of.
ko_pivot a year ago

Durable Objects have long term storage. They get hydrated from that storage, so in that sense, they can move to any Cloudflare DS. However, there is no API call to move a Durable Object. It has to have no connections and then gets recreated in the DS nearest to the next/first connection. Memory gets dropped when that happens, storage survives. (This is slightly out of date as they have some nuanced hibernation stuff that is recent).
crabmusket a year ago

Not an answer to your question, but shoutout to https://where.durableobjects.live/

jwblackwell a year ago

Does anyone else struggle to wrap their head around a lot of this new cloud stuff?

I have 15+ years experience of building for the web, using Laravel / Postgres / Redis stack and I read posts like this and just think, "not for me".

djtango a year ago

From the article:
> For useful background on the first version of Durable Objects take a look at Cloudflare's durable multiplayer moat by Paul Butler, who digs into its popularity for building WebSocket-based realtime collaborative applications.
First apps that come to mind that have RT collaboration:
- Google Docs/Sheets etc
- Notion
- Miro
- Figma
These are all global scale collaborative apps, I'm not sure a Laravel stack will support those use cases... Google had to in house everything and probably spearheaded the usage of CRDTs ( this is a guess!) but as the patterns emerge and the building blocks get SAASified, mass-RT collaboration no longer becomes a giant engineering problem and more and more interesting products get unlocked
- jlokier a year ago
  
  > Google had to in house everything and probably spearheaded the usage of CRDTs ( this is a guess!)
  Fwiw, Google Docs/Sheets etc don't use CRDTs, they use the more server-oriented Operational Transforms (OT). CRDTs were spearheaded by others.
- fastball a year ago
  
  Google actually uses OT for their collab.

skrebbel a year ago

I really love the Durable Object design, particularly because it's easy to understand how it works on the inside. Unlike lots of other solutions designed for realtime data stuff, Durable Objects have a simplicity to them, much like Redis and Italian food. You can see all the ingredients. Given enough time and resources (and datacenters :) ), a competent programmer could read the DO docs and reimplement something similar. This makes it easy to judge the tradeoffs involved.

I do worry that DOs are great for building fast, low-overhead, realtime experiences (eg five people editing a document in realtime), but make it very hard to make analyses and overviews (which groups of people have been which editing documents the last week?). Putting the data inside SQLite might make that even harder - you'd have to somehow query lots and lots of little SQLite instances and then merge the results together. I wonder if there's anything for this with DOs, because this is what keeps bringing me back to Postgres time and time again: it works for core app features and for overviews, BI, etc.

parsadotsh a year ago

I think for those cases you're expected to use something like this: https://developers.cloudflare.com/analytics/analytics-engine...

stavros a year ago

This is a really interesting design, but these kinds of smart systems always inhabit an uncanny valley for me. You need them in exactly two cases:

1. You have a really high-load system that you need to figure out some clever ways to scale.

2. You're working on a toy project for fun.

If #2, fine, use whatever you want, it's great.

If this is production, or for Work(TM), you need something proven. If you don't know you need this, you don't need it, go with a boring Postgres database and a VM or something.

If you do know you need this, then you're kind of in a bind: It's not really very mature yet, as it's pretty new, and you're probably going to hit a bunch of weird edge cases, which you probably don't really want to have to debug or live with.

So, who are these systems for, in the end? They're so niche that they can't easily mature and be used by lots of serious players, and they're too complex with too many tradeoffs to be used by 99.9% of companies.

The only people I know for sure are the target market for this sort of thing is the developers who see something shiny, build a company (or, worse, build someone else's company) on it, and then regret it pretty soon and move to something else (hopefully much more boring).

Does anyone have more insight on this? I'd love to know.

crabmusket a year ago

As far as I can tell, multiplayer is the killer app for Durable Objects. If you want to build another Figma, Google Docs, etc, the programming model of Durable Objects is super handy.
This article goes into it more: https://digest.browsertech.com/archive/browsertech-digest-cl...
I think this old article is quite relevant too: http://ithare.com/scaling-stateful-objects/
Anyone who read the Figma multiplayer article and thought "that's kind of what I need" would be well served by Durable Objects, I think. https://www.figma.com/blog/rust-in-production-at-figma/
There are other approaches - I've worked in the past with CRDTs over WebRTC which felt absolutely space-age. But that's a much more complicated foundation compared to a websocket and a single class instance "somewhere" in the cloud.
- stavros a year ago
  
  That's a very interesting use case. Given that your "players" aren't guaranteed to be local to the DO, doesn't using DOs only make sense in high-traffic situations again? Otherwise you might as well just serve the players from a conventional server, no?
  CRDTs really do sound amazing, though.
  
  crabmusket a year ago
  
  Best case, the players are co-located in a city or country, and they'll benefit from data center locality.
  Worst case, they're not co-located, and one participant has good latency, and the other doesn't. This is equivalent to the "deploy the backend in a single server/datacenter" approach.
  Aside from the data locality, I still find the programming model (a globally-unique and addressable single-threaded class instance) to be quite nice, and would want to emulate it even without the Cloudflare edge magic.
  
  paulgb a year ago
  
  > Aside from the data locality, I still find the programming model (a globally-unique and addressable single-threaded class instance) to be quite nice, and would want to emulate it even without the Cloudflare edge magic.
  You might be interested in Plane (https://plane.dev/ / https://github.com/jamsocket/plane), which we sometimes describe as a sort of Durable Object-like abstraction that can run anywhere containers can.
  (I'm also one of the articles you linked, thanks for the shoutout!)
  
  crabmusket a year ago
  
  I am interested, and I really enjoy your work on Browsertech! I haven't needed Plane above/over what Cloudflare is providing, but I've got it in the back of my mind as an option.
  I've long hoped other providers might jump on the Durable Objects bandwagon and provide competing functionality so we're not locked in. Plane/Jamsocket looks like one way to go about mitigating that risk to a certain extent.
  
  tlarkworthy a year ago
  
  It's the actor model essentially.
  You can have a DO proxy each user connection, then they forward messages to the multipler document. The user proxy deals with ordering and buffering their connection message state in the presence of disconnects, and the document DO handles the shared state.
  
  crabmusket a year ago
  
  It's actors plus a global routing system that means all messages addressed to a unique identifier will arrive in the actor instance. I haven't seen any other actor frameworks that provide that.
  
  tlarkworthy a year ago
  
  Akka and Erlang both support distributed routing to their actors, but this is planetary scale and fully-managed out of the box, which is very cool.
  
  skybrian a year ago
  
  Some games have regions and you only see players in the same region. For example, a “Europe” region. If you’re in the US and you connect to the Europe region, you know that you should expect some lag.
  And it seems like that would work just as well with durable objects.
  
  dumbo-octopus a year ago
  
  In practice you’re most likely to be collaborating with other folks on your school project group, work team, close family, etc. Sure there are exceptions, but generally speaking picking a service location near your first group member ensures low latency for them (and they’re probably most engaged), and is likely to have lowish latency for everyone else.
  On the flip side, picking US-East-1 gives okayish latency to folks near that, and nobody else.
  
  crabmusket a year ago
  
  And the corollary to that is that often your collaborations have a naturally low scale. While your entire app/customerbase as a whole needs to handle thousands of requests per second or more, one document/shard may only need to handle a handful of people.
klabb3 a year ago

Databases is an extremely slow-maturing area, similar to programming languages, but are all deviations from Postgres shiny and hipster?
The idea of colocating data and behavior is really a quantifiable reduction in complexity. It removes latency and bandwidth concerns, which means both operational concerns and development concerns (famously the impact of the N+1 problem is greatly reduced). You can absolutely argue that networked Postgres is better for other reasons (and you may be right) but SQLite is about as boring and predictable as you can get, with known strong advantages. This is the reason it’s getting popular on the server.
That said, I don’t like the idea of creating many small databases very much - as they suggest with Durable Objects. That gives noSQL nightmares - breaking all kinds of important invariants of relational dbs. I think it’s much preferable to use SQLite as a monolithic database like it’s done in their D1 product.
- crabmusket a year ago
  
  > That gives noSQL nightmares - breaking all kinds of important invariants of relational dbs
  IMO Durable Objects map well to use cases where there actually are documents. Think of Figma. There is a ton of data that lives inside the literal Figma document. It would be awful to have a relational table for like "shapes" with one row per rectangle across Figma's entire customer base. That's just not an appropriate use of a relational database.
  So let's say I built Figma on MongoDB, where each Figma document is a Mongo document. That corresponds fairly straightforwardly to each Figma document being a Durable Object instance, using either the built-in noSQL storage that Durable Objects already have, or a small Sqlite relational database which does have a "shapes" table, but only containing the shapes in this one document.
  
  jchanimal a year ago
  
  We are wrestling with questions like this on the new document database we’re building. A database should correspond to some administrative domain object.
  Today in Fireproof a database is a unit of sharing, but we are working toward a broader model where a database corresponds to an individual application’s state. So one database is all the shared documents not just a single unit of sharing.
  These small changes early on can have big impact later. If you’re interested in these sort of design questions, the Fireproof Discord is where we are hashing out the v0.20 api.
  (I was an early contributor to Apache CouchDB. Damien Katz, creator of CouchDB, is helping with engineering and raised these questions recently, along with other team members.)
  
  klabb3 a year ago
  
  > Durable Objects map well to use cases where there actually are documents
  Right. I wouldn’t dispute this. This is akin to a file format from software back in the day (like say photoshop but now with multiplayer). What this means is that you get different compatibility boundaries and you relinquish centralized control and ability to do transparent migrations and analysis. For all intents and purposes, the documents should be more or less opaque and self-contained. I personally like this, but I also recognize that most web engineers of our current generation are not used to think in this disciplined and defensive way upfront.
- 8n4vidtmkvmk a year ago
  
  N+1 problem is also reduced if you keep your one and only server next to your one and only database.
  This was actually the solution we came up with at a very big global company. Well, not 1 server, but 1 data center. If your write leaders are all in one place it apparently doesn't matter that everything else is global, for certain write requests at least.
- masterj a year ago
  
  If you adopt a wide-column db like Cassandra or DynamoDB, don’t you have to pick a shard for your table? The idea behind Durable Objects seems similar
  
  simpsond a year ago
  
  You have a row key, which gets consistently hashed to a shard / node on the ring.
jmtulloss a year ago

If you're in #1, you talk to CloudFlare. They need some great customer stories and they have some great engineers that are most likely willing to work with you on how this will work/help you with bugs in exchange for some success stories. If it gets proven out this turns into a service relationship, but early on it's a partnership.
danpalmer a year ago

I'd view the split here along the axes of debuggability/introspection.
There are many services that just don't require performance tuning or deep introspection, things like internal tools. This is where I think serverless frameworks do well, because they avoid a lot of time spent on deployment. It's nice if these are fast, but that's rarely a key requirement. Usually the key requirement is that they are fast to build and low maintenance. It's possible that Cloudflare have got a good story for developer experience here that gets things working quickly, but that's not their pitch, and there are a lot of services competing to make this sort of development fast.
However where I don't think these services work well is when you have high debuggability and introspection requirements. What metrics do I get out of this? What happens if some Durable Objects are just slow, do we have the information to understand why? Can we rectify it if they are? What's the logging story, and how much does it cost?
I think these sorts of services may be a good idea for a startup on day 1 to build some clever distributed system in order to put off thinking about scaling, but I can't help but think that scale-up sized companies would be wanting to move off this onto something they can get into the details more with, and that transition would be a hard one.
camgunz a year ago

First, this is very insightful--I think most people should go through this exact analysis before architecting a system.
As others have said, the use is multiplayer, and that's because you need everyone to see your changes ASAP for the app to feel good. But more broadly, the storage industry has been trying to build something that's consistent, low latency, and multiuser for a long time. That's super hard, just from a physics point of view there's generally a tradeoff between consistency and latency. So I think people are trying different models to get there, and a lot of that experimentation (not all, cf Yugabyte or Cockroach) is happening with SQLite.
yen223 a year ago

I almost have the opposite view:
When starting out you can get away with using a simple Postgres database. Postgres is fine for low-traffic projects with minimal latency constraints, and you probably want to spend your innovation tokens elsewhere.
But in very high-traffic Production cases with tight latency requirements, you will start to see all kinds of weird and wacky traffic patterns, that barebones Postgres won't be able to handle. It's usually in these cases where you'd need to start exploring alternatives to Postgres. It's also in these cases where you can afford to hire people to manage your special database needs.
- simonw a year ago
  
  Have you worked on any examples of projects that started on PostgreSQL and ended up needing to migrate to something specialized?
  
  yen223 a year ago
  
  I did, twice.
  The second time, we had a reporting system that eventually stored billions of rows per day in a Postgres database. Processing times got so bad that we decided to migrate to Clickhouse, resulting in a substantial boost to query times. I maintain that we haven't exhausted all available optimisations for Postgres, but I cannot deny that the migration made sense in the long run - OLTP vs OLAP and all that.
  (The first time is a funny story that I'm not quite ready to share.)
  
  simonw a year ago
  
  That makes a lot of sense to me. One of my strongest hints that a non-relational data store might be a good idea is "grows by billions of rows a day".
  
  adhamsalama a year ago
  
  Isn't Clickhouse relational?
  
  crabmusket a year ago
  
  It does allow you to query with SQL, but it's meant for OLAP workloads, not OLTP. Its internal architecture and storage is different to what you'd usually think of as a relational database, like Postgres. See https://clickhouse.com/docs/en/concepts/why-clickhouse-is-so...
  The term "relational" is overloaded. Sometimes it means "you can use SQL" and sometimes it means "OLTP with data stored in an AoS btree".
  (And sometimes, a pet peeve of mine, it means "data with relationships" which is based on misunderstanding the term "relation". If someone asks you if "your data is relational" they are suffering from this confusion.)
  
  yen223 a year ago
  
  Clickhouse is a SQL database, so I guess it is?
  (Strictly speaking since a "relation" in the original Codd-paper sense is a table, anything with tables is relational. I don't know if that's what people mean by "relational", plus I don't know what counts as "non-relational" in that sense)
  
  simonw a year ago
  
  Kind of? By "relational" there I meant "traditional relational databases like MySQL and PostgreSQL that are optimized for transactions and aren't designed for large scale analytics".
  
  xarope a year ago
  
  Right, OLTP vs OLAP are very different workloads (using the car analogy, that would be like using a ferrari to tow a trailer, and an F250 to... oh wait, an F250 can do anything!).
  But seriously though, even if you use postgres, as a former DBA (DB2 and Oracle) I would have tuned the OLTP database very differently to the OLAP database, and I don't mean just indexes, but even during ETL from OLTP->OLAP you might decide to de-normalize columns on the OLAP side simply to speed up queries (OLAP databases are the sort of database you were warned about, where indexes can be 10x the data size)
  
  adhamsalama a year ago
  
  Well, this isn't specific to Postgres, is it?
  If you were storing billions of rows per day in MySQL, SQL Server, or Oracle, it still wouldn't be able to handle it, would it?
  
  yen223 a year ago
  
  That's right. The key difference is using row-based vs column-based databases (i.e. OLTP vs OLAP). Any good database person should be cringing at the thought of using Postgres (or MySQL, Oracle, Sql Server, etc) for pulling reporting data.
  That said, no regrets using Postgres there. If we started with Clickhouse the project could have not launched as quickly as it did, and that would have given us more problems.
gregwebs a year ago

There are a lot of cases of low traffic applications that aren’t toys but instead are internal tools- this could be a great option for those.
For higher traffic they are asking you to figure out how to shard your data and it’s compute. That’s really hard to do without hitting edge cases.
- stavros a year ago
  
  Why would you use this for an internal, low-traffic tool over Postgres?
  
  alright2565 a year ago
  
  It's so low traffic that you don't want to pay the minimum $35/mo for a PostgreSQL instance on AWS maybe. Or you're required by policy to have a single-tenant architecture, but a full always-on database server would be overkill.
  
  fracus a year ago
  
  Could this be used to get a time edge in trading? I'm not an expert, just thinking out loud. I remember hearing about firms laying wire in a certain way because getting a microsecond jump on changing rates could be everything for them.
  
  crabmusket a year ago
  
  I'm also no expert, but from reading around the subject a little (Flash Boys by Michael Lewis was pretty cool, also Jane Street's podcast has some fantastic information)... no. I doubt you'd be on a public cloud if low-latency trading is what you're doing.
  
  aldonius a year ago
  
  Aren't the HFT boxes usually stock exchange colocations? Each trader gets a rack (or multiple racks depending on size) in the exchange's datacenter, every rack has the same cable length to the switch, etc.
MuffinFlavored a year ago

> If this is production, or for Work(TM), you need something proven.
I feel like part of Cloudflare's business model is to try to convince businesses at scale to solve problems in a non-traditional way using technology they are cooking up, no matter the cost.

blixt a year ago

I'm constantly impressed by the design of DOs. I think it's easy to have a knee-jerk reaction that something is wrong with doing it this way, but in reality I think this is exactly how a lot of real products are implicitly structured: a lot of complex work done at very low scale per atomic thing (by which I mean, anything that needs to be transactionally consistent).

In retrospect what we ended up building at Framer for projects with multiplayer support where edits are replicated at 60 FPS while being correctly ordered for all clients is a more applied version of what DOs are doing now. We also ended up with something like a WAL of JSON object edits so in case a project instance crashed its backup could pick up as if nothing had happened, even if committing the JSON patches into the (huge) project data object didn't have time to occur (on an every-N-updates/M-seconds basis just like described here).

bluehatbrit a year ago

This is probably a really stupid question, but how would one handle schema migrations with this kind of setup? My understanding is it's aimed at having a database per-tenant (or even more broken down than that). Is there a sane way of handling schema migrations, or is the expectation that these databases are more short-lived and so you support multiple versions of the db (DO) until it's deleted?

In my head, this would be a fun way to build a bookmark service with a DO per user. But as soon as you want to add a new field to an existing table, you meet a pretty tricky problem of getting that change to each individual DO. Perhaps that example is too long lived though, and this is designed for more ephemeral usage.

If anyone has any experience with this, I'd be really interested to know what you're doing.

simonw a year ago

You'd need to roll your own migrations.
I have a version of that for SQLite written in Python, but I'm not sure if you could run that in Durable Objects - maybe via WASM and PyOdide? Otherwise you'd have to port it to JavaScript.
https://github.com/simonw/sqlite-migrate
- bluehatbrit a year ago
  
  Appreciate the response (and the blog post itself)! I probably worded my question poorly, but I'm more wondering about executing schema migrations against a large number of DO's as part of a deployment (such as 1 per customer).
  I suppose the answer is "it's easier to have 1 central database/DO", but it feels like this approach to data storage really shines when you can have a DO per tenant.
  
  simonw a year ago
  
  A pattern where you check for and then execute any necessary migrations on initialization of a Durable Object would actually work pretty well I think - presumably you can update the code for these things without erasing the existing database?
  
  bluehatbrit a year ago
  
  Ah yes, that would work pretty well! You'd have to be able to guarantee any migrations can run within the timeouts, but at a per-tenant level that should be very doable for most cases. Not sure why I didn't think of that approach - great idea.
  I might have to try this out now.
  
  simonw a year ago
  
  I think those SQLite databases are capped at 1GB right now, so even complex migrations (that work by creating a new temporary table, copying old data to it and then atomically renaming it) should run in well under a second.

segalord a year ago

Noticing CF pushing for devs to use DO for eveything over workers these days. Even websocket connections on workers get timed out after ~30s and the recommended way is to use DO for them

rozenmd a year ago

Durable Objects have always been the recommended way to do websocket connections on Cloudflare Workers? (as far as I remember, anyway)
The original chat demo dates back to 2020, using DOs + websockets: https://github.com/cloudflare/workers-chat-demo

kondro a year ago

Does this mean SQLite for DO can lose up to 10 seconds of data in the event of a failing DO?

stavros a year ago

> To ensure durability beyond that ten second window, writes are also forwarded to five replicas in separate nearby data centers as soon as they commit, and the write is only acknowledged once three of them have confirmed it.
I think Simon meant "within", rather than "beyond", here.
- simonw a year ago
  
  Thanks, I've updated that word.

myflash13 a year ago

What I don’t understand is why, in the example of flight seat mapping provided, you create a DO per flight. So does a DO correspond to a “model” in MVC architecture? What if I used DOs in a per-tenant way, so one DO per user. And then how do I query or “join” across all DOs to find all full flights? I guess you would have to design your DOs such that joins are not required?

ec109685 a year ago

They support “function” calling between DOs, so you are able to compose a response from more than one DO.

braden-lk a year ago

Durable objects seem so cool but the pricing always scares me. (Specifically, having to worry about getting hibernation right.) They’d be a great fit for our yjs document based strategy, but while everything in prod still works on plain ol redis and Postgres, it’s hard to justify an exploration.

attilakun a year ago

Does CloudFlare have proper spending caps? If they have, I'd be open to try DOs but if they don't, it's a non-starter for an indie dev as I can't risk bankruptcy due to a bad for loop.
- viraptor a year ago
  
  It's not just the listed prices either. There was a story here not long ago where they essentially requested someone to migrate to an enterprise plan or get out. With AWS it's pretty common to get a refund for accidental abuse. From my contact so far and from stories here, I wouldn't expect anything close to that treatment from CF.
ignoramous a year ago

> Specifically, having to worry about getting hibernation right.
As long as the client doesn't exchange websocket messages with DO, it'll hibernate. From what I can tell, ping/pong frames don't count towards uptime, if you're worried about that.
anentropic a year ago

What scares me is it is super specific to Cloudflare
What is your option if you want to eject to another cloud?
- paulgb a year ago
  
  For the specific example of a Yjs backend, I happen to be working on one that can be hosted either on Cloudflare or as a native process. We’ve had people running in production migrate from cloudflare to native just by swapping out the URL they connect to in their application config.
  https://github.com/jamsocket/y-sweet

vlaaad a year ago

Re https://where.durableobjects.live/ — why the hell are they still operating in Russia?

esnard a year ago

From https://blog.cloudflare.com/steps-taken-around-cloudflares-s... :
> Since the invasion, providing any services in Russia is understandably fraught. Governments have been united in imposing a stream of new sanctions and there have even been some calls to disconnect Russia from the global Internet. As discussed by ICANN, the Internet Society, the Electronic Frontier Foundation, and Techdirt, among others, the consequences of such a shutdown would be profound.
> [...]
> Beyond this, we have received several calls to terminate all of Cloudflare's services inside Russia. We have carefully considered these requests and discussed them with government and civil society experts. Our conclusion, in consultation with those experts, is that Russia needs more Internet access, not less.

9dev a year ago

I would love to work with Durable Objects and all the other cool stuff from Cloudflare, but I’m really hesitant to make a single cloud providers technology the backbone of my application. If CF decides to pull the plug, or charge a lot more, the only way to migrate elsewhere would be rebuilding the entire app.

As long as there aren’t any comparable technologies, or abstraction layers on top of DOs, I’m not going to make the leap of faith.

rcarmo a year ago

The first thing I wondered was how this plays with data residency and privacy/regulatory requirements.

avinassh a year ago

I'd love to know how they have hooked VFS with WAL to monitor changes. The SQLite's WAL layer deals with page numbers where as VFS deals with file and byte offsets. I am curious to understand how they mapped it, how they get new writes to the WAL and read from the WAL.

kikimora a year ago

This design does not handle hot partitions well and they are ubiquitous to so many domains.

simonw a year ago

Your partition would have to be VERY hot for SQLite not to be able to handle it - anything up to several thousand writes per second would likely work fine.
Since this is all running on Cloudflare you could scale reads with a 1 second cache TTL somewhere, which would drop your incoming read queries to around one per second no matter how much read traffic you had.

CyberDildonics a year ago

What is the difference between a "durable object" and a file?

simonw a year ago

You can read the full article for an answer to that: https://blog.cloudflare.com/sqlite-in-durable-objects/
Short version: it's replicated to five data centers on every transaction, and backed up as a stream to object storage as well.
- CyberDildonics a year ago
  
  So it's a file that gets backed up like dropbox
  
  simonw a year ago
  
  At a very high level, yes. But the details matter here - you commit a transaction to the SQLite database and know that the commit has been pushed out to 3/5 replicas by the time the write API request returns - and that it will be logged to object storage (supporting 30 days of rollback) within ten seconds. AND it will live in a data center physically close to the user who caused it to be created.

pajeets a year ago

wonder how this works with Pocketbase