Coding agents have made development dramatically faster, but paperwork is still manual. Every time I need to update a dollar amount in a SAFE, fill in company details on a cloud services agreement, or negotiate an NDA, I'm back in Word doing it by hand. For anyone at a startup, more of your time is now spent on .docx tasks precisely because the coding got faster.
I built safe-docx at a legal AI startup where we use it in production for contract editing. One of our law firm customers asked for more transparency into how the tools works, so we open-sourced it. We ported it from Python to TypeScript so it runs anywhere JS runs with no native dependencies. It lets coding agents make format-preserving edits to existing Word documents — read and search without blowing up your context window, apply edits that preserve formatting, and export a clean copy or tracked-changes version.
Why not just have the agent unzip the .docx and edit the raw XML? I tried that. Across 25 Common Paper and Bonterms templates, the XML is a median of 12x the size of the actual text. An agent could write regex or DOM parsing to extract the text, but that's code it generates each session — and sometimes there's a bug it has to debug. safe-docx gives the agent a compact view that only shows formatting where it's changing, stable paragraph IDs that don't drift, and a JSON edit format you can tweak and reapply. MIT licensed — fork it and customize as you like.
I use it for: NDAs, order forms, equity docs, SOC 2 policy templates auditors hand you in .docx. One-off documents you just need to fill out once and move on.
Size ratio methodology: I measured 25 .docx templates from Common Paper (https://commonpaper.com) and Bonterms (https://bonterms.com) — open-source standard agreement templates anyone can download. I excluded our own templates to avoid cherry-picking. Short-form templates (amendments, term sheets) skew the high end because they're mostly formatting with little text; the median is representative of typical multi-page agreements.
Results (template: doc.xml bytes / text bytes = ratio):
Coding agents have made development dramatically faster, but paperwork is still manual. Every time I need to update a dollar amount in a SAFE, fill in company details on a cloud services agreement, or negotiate an NDA, I'm back in Word doing it by hand. For anyone at a startup, more of your time is now spent on .docx tasks precisely because the coding got faster.
I built safe-docx at a legal AI startup where we use it in production for contract editing. One of our law firm customers asked for more transparency into how the tools works, so we open-sourced it. We ported it from Python to TypeScript so it runs anywhere JS runs with no native dependencies. It lets coding agents make format-preserving edits to existing Word documents — read and search without blowing up your context window, apply edits that preserve formatting, and export a clean copy or tracked-changes version.
Why not just have the agent unzip the .docx and edit the raw XML? I tried that. Across 25 Common Paper and Bonterms templates, the XML is a median of 12x the size of the actual text. An agent could write regex or DOM parsing to extract the text, but that's code it generates each session — and sometimes there's a bug it has to debug. safe-docx gives the agent a compact view that only shows formatting where it's changing, stable paragraph IDs that don't drift, and a JSON edit format you can tweak and reapply. MIT licensed — fork it and customize as you like.
I use it for: NDAs, order forms, equity docs, SOC 2 policy templates auditors hand you in .docx. One-off documents you just need to fill out once and move on.
Install:
No .NET / Python / LibreOffice dependencies.What .docx edge cases should I prioritize next?
Size ratio methodology: I measured 25 .docx templates from Common Paper (https://commonpaper.com) and Bonterms (https://bonterms.com) — open-source standard agreement templates anyone can download. I excluded our own templates to avoid cherry-picking. Short-form templates (amendments, term sheets) skew the high end because they're mostly formatting with little text; the median is representative of typical multi-page agreements.
Results (template: doc.xml bytes / text bytes = ratio):
Reproduction script and templates are in the open-agreements repo: https://github.com/open-agreements/open-agreements/blob/main...