Show HN: Hocuspocus 4 – self-hosted Yjs collaboration backend

(github.com)

38 points | by philipisik a day ago ago

6 comments

Terretta a day ago
Making this plug and play is fantastic, and the live "editor types" demo on tiptap.dev is spectacularly convincing.
So, say a data-privacy conscious prospect is interested a click up from the editor, considers the service, and pokes around. Can't find anywhere clarifying how you cannot even if you are ordered to by warrant see a customer's documents content. You have a sample app for legal; that type of client is going to care about this.
Also not readily seeing how security or auth actually works. Requests over TLS are sufficient for the "end to end military grade encryption" type marketing claims; every site with HTTPS or an S3-type storage can make the same claims about encryption in motion and encryption at rest. That relies on transport and provider. It's more interesting if the content is encrypted against you as the provider, like Apple's Advanced Data Protection for iCloud-stored content (e.g. Messages, Reminders, Bookmarks, iCloud Drive, Notes, Voice Memos…).
Any time a SaaS is asking a firm to keep all their documents on or run them through the SaaS, the data protection story should be stronger than this present security page.
Even Cybersecurity & Infrastructure Security Agency (CISA) might randomly write passwords into a notes document…
Alternatively, say HIPAA and etc. shouldn't be on it yet, and talk about when that is on the roadmap. But security story is generally best when baked into design from start.
[-]
- philipisik a day ago
  I can definitely see your point for SaaS hosted documents, which, to some extent, applies to a lot of startup cloud services, and that's exactly why we open-sourced Hocuspocus: so you can host it yourself :)
youngbum 20 hours ago
Huge fan of Hocuspocus. Congrats on the new release and grateful for your efforts.
We have been using Hocuspocus to sync multiple users in our form builder editor (https://walla.my), and they were very reliable and sturdy. We not only write descriptions with TipTap but also sync the fields and logic itself.
Actually, the previous version was eligible to build and deploy with Bun though. We did also try to deploy them on Cloudflare infrastructure but… as you know, Cloudflare workers were not built for Hocuspocus. We also tried Cloudflare Containers, but they weren’t as reliable as just spinning up a small VM.
1vCPU and 1GB RAM were enough to synchronize about 3,000 users.
One small claim is the yjs ecosystem itself. The docs are fragmented, and the nature of y”JS”, locks the whole code into Javascript infrastructure. Hocuspocus might have well ported into more secure and memory-safe, faster infra, modern stacks like Go or Rust if yjs compatible were in other languages.
Anyways, great work and we will definitely take a look at the new release. Again, thanks for sharing.
[-]
- philipisik 14 hours ago
  Awesome to hear you're using Hocuspocus in production! Yjs can indeed be hard to understand well, but when used together with Tiptap it's as easy as adding an extension to your editor :-) We're running Hocuspocus in small docker containers in swarm clusters, which makes it super cheap to run.
curtisblaine a day ago
When I try to do this kind of thing with y.js in a non-trivial way I always battle against two issues and ultimately quit because they're really hard to do efficiently:
1) Materializing documents. Assuming you don't have "live" yjs documents and you only merge diffs with diffUpdate, when one or more user are connected, it's always worth to have the blob in RAM to quickly merge diffs in it and save it periodically; when the usages of a document go away, you save it for the last time and you "ice" it in long term storage, offloading from RAM. I typically use a LRU cache for that. The problem is when too many users are working on too many docs and they all have to fit in RAM. How do you solve that?
2) GC. Again, assuming you don't have live documents but you only merge diffs, those blobs need to be garbage collected to compact them after a while iirc (if the doc is live it's done automatically). This normally is a periodic process that eventually GCs all documents in turn, one after the other. If you handle that, how do you manage to not make your server essentially unpredictable when it comes to compacting big blobs? GC'ing takes a toll on your CPU, and not GC-ing takes a toll on your RAM and secondary storage.
[-]
- philipisik 14 hours ago
  Interesting. What kind of content do you store in the ydoc? We're mostly working with text-based documents and don't really have any kind of performance or storage issues. Yjs documents are, if created well, both really fast and small. Hocuspocus easily handles >25k concurrent user connections on single instances without any real scaling effort.