Reflecting on the first 6 months of a rewrite

Many of us have heard that you should never rewrite code from scratch. Some of us have read Joel's fantastic article, but there isn't much written on why. Outside of Joel's article, I can't name any from the top of my head. So I thought I should write my experiences and the results I've gotten.

Context:

The software is a text editor that has nearly 3,000 hours of work put into it (that's about a year and a half of a 40hr a week job). It has no users, no one to break if everything changes, no one who'd be upset if there is a rewrite, and no one to disappoint if things don't go well (besides myself). It should be obvious why Joel's arguments don't apply to this project, but I added that section anyway. This text editor has quite a number of features. It has its own UI system, supports the most common LSP commands, DAP and GDB support for debugging, and source highlighting when the LSP provides none. There's more, but those are the bigger features.

The problem is that text editors and IDEs are big and offer a lot to explore. The first version I wrote was meant to be a prototype. I got carried away implementing as much as I could to explore as much of the problem space before running into problems. The codebase ended up being twice the size I estimated and took three times as long to get there. Some things I had no experience with were

Text editing in general, what happens if a user writes a letter in the middle of a 1gb log file?
An event system between the main thread (UI) and a background thread
A UI system. I knew to start with something basic, and that it should be async, but I didn't know that it all should be async. When typing in the find bar in a document large enough, just a 75ms delay made typing feel clunky. It gave an amateur feel (I did say I should start with something basic).
Font rendering. Monospace isn't really one space; don't try to fix it, even if a few letters are 1 pixel too wide. You can't have one space because 'dz' is two codepoints and 'ǳ' is one codepoint (\u01F3). It should take up twice the width of ascii characters, and you probably don't want to figure out which unicode characters should be twice the width. There's also � to show an unrecognized character, which tends to take up 1.5x.
Dealing with third-party software. I expected lots of illegal data, and that LSPs and DAPs were going to be slow. But I didn't expect them to send data, pause for a second, then send more data for something as simple as listing local variables in a small function. This caused at least one UI bug (highlighting reset because code thought a new action had happened)

There was a lot to explore, which had a lot of code written, which is a lot to throw out. Let's look at Joel's arguments.

Joels Arguements:

(You can skip this section; this is for people who want to argue about why this situation does or does not apply to what Joel wrote about)

Joels arguement are

It's harder to read code than to write it: I understood every line. I didn't think the codebase was messy; I thought it was wrong with a high todo ratio.
The idea that new code is better than old is patently absurd: That's fairly true, but not in this case. My code was meant to be a prototype that shouldn't be kept. Joel has another article that suggests the solution is to refactor with every commit, keeping everything in working order. I'm not sure if I could rewrite the threading and messaging system without breaking anything, but also, the software had no users. There is no reason to keep the current functionality.
You lose bugfixes to problems you don't know about: There aren't many, if any, bugfixes.
You are throwing away your market leadership. You are giving a gift of two or three years to your competitors: It would have been nice to have an editor for people who don't want AI, but a prototype IDE is no replacement for current text editors & IDEs
It's important to remember that when you start from scratch there is absolutely no reason to believe that you are going to do a better job than you did the first time: Completely false. I don't have to guess what the problems are or how 90% of the code should look.

It's pretty clear Joel's rewrite warnings doesn't fit my situation. There's nothing online about why my rewrite may go bad. There's no guarantee that I won't make large mistakes like having huge scope creep or attempting to solve the wrong problem. So let's move on to how I thought it would go and how it went.

How I Thought The Rewrite Would Go:

Have you ever intentionally (or accidentally) deleted code and rewritten it in a fraction of the time? I thought I'd be able to rewrite a significant amount of features in 1/5th to 1/4th of the time. But obviously it wouldn't take 1/4th of the time since I'd need to implement the many todos lying around my codebase. I suspected that outside of GUI I'd be able to get everything mostly right on the first try. I understand the problem space pretty well now. I thought within two months I'd have the text editing ready and would post screenshots every week.

I was very wrong. It didn't go poorly, but it didn't go how I expected at all.

How It Went:

It's no surprise that the estimates were optimistic. It wasn't too bad, but far from perfect. I kept a devlog for people who want to read what's involved in writing this text editor. The weekly logs are short, and the monthly summaries aren't too long. I tried to keep it high-level enough for everyone to understand.

I implemented new things and ignored the core

I should have known I'd do that. I always focus on the large hard tasks because they tend to dictate a portion of the architecture. Task I understand I normally don't find hard, so they're ignored for later. One exception is the text editing code; it may be hard, but I understood that problem space so well that I didn't feel like touching it in the first 5 months.

Todos were larger than I expected

One of my todos was to use an async API so IO would never block. Never is the keyword here; file APIs aren't completely async. I researched and wrote experimental code for 10 days and threw it out. I need one or more threads to deal with the IO. I thought that todo would take 2-3 weeks, but it's looking to be 4-7 depending on how smoothly other OSes go.

I didn't get everything outside of UI mostly right on the first attempt

The event system has no third-party code, and anything I mess up is 100% my fault. The initial event system in the prototype was bad. I had a simple struct with a few generic variables that I set. It was simple, but changing things was extremely annoying.

This time around, I wrote a struct for every event. If there were more than one call site, I couldn't forget to change one because I'd get a compile error. However, having two giant switch statements (request and response) is really annoying. I rewrote it to use an interface and had the code for both the background and main thread next to each other. It's much better. It didn't take much time to change one version to the other (iirc less than 2hrs), but it wasn't 'mostly right'.

Another instance of this happening was after I wrote my DAP code. I was able to generate a significant amount of DAP structs and JSON related code, so I thought that, since LSPs are similar, it may be a good idea to do the same. It was not a good idea. The language server protocol is a disaster. I wrote some quick code to estimate how much of the protocol I'd be able to generate and where the corner cases would be. It looked like it was enough to be worth the time; however, it turned out most LSPs use a small fraction of the protocol, the fraction with the corner cases that I couldn't cleanly generate. The part I already had code for. Oh well. It ate up a week, but that's ok for something that's the size of the LSP.

Many parts didn't take 1/4th of the time

The text editing took me a ridiculously long time the first time around, so long that I thought it'd be impossible to take 1/4th of the time even if I added multi-cursor support and better metadata handling. The issue in my prototype was that I didn't understand the problem; I tackled it the wrong way, which led to a lot of debugging, and I wasn't sure what it needed to do.

When I estimated, I forgot why I didn't want to do multi-cursor; there's a fair number of cases that it affects, which made the code pretty different. It didn't take too much longer to write, but it took a lot longer to think about how the function signatures should change and how to reorder the code. I didn't consider the time to rearchitect code when I estimated the rewrite. The metadata part I mostly got right the first time, and it's larger than I remember. It was barely quicker to write in the rewrite, and there were enough changes that I couldn't simply reuse my old code.

My utility code grew

This went smoothly, but I'm surprised. I thought since I'd mostly reimplement what I already had, that I would barely add anything to the utility file. Since I understood how the codebase would look, I was able to recognize what I wanted to reuse in other places. Half were related to arrays, text printing, and text parsing code. The other half was new code for multi-threading.

Code size

I didn't try to estimate this. I suspected I'd be able to implement most features faster. Faster mostly means I don't need to debug or think about the problems nearly as much, and I won't need to delete as many lines while working on it. I have no idea if the implementation would be any smaller. I knew I'd be working on many todos, which could easily explode the code size by being simple but long/tedious to write. I wasn't sure how this would turn out. In 6 months, the codebase is slightly larger than the prototype with 3k hours. I was able to copy/reuse the GDB, text-search, font loading, and the utils without changes, which are a few thousand lines, but I didn't think the code base would reach this size this soon.

What Remains?

I barely did any UI. I did some at the start, but didn't go further since there wasn't anything to do or show (no text editing, no debugging, no lsp, etc). The IO thread has ways to go, the LSP is in a worse situation than when I started. All the mac and windows code is broken or incomplete. This is why you don't want to write from scratch; you end up losing more than you realize, and you likely will take longer to get to the point you expect to be. That being said, I'm glad most of the todos are gone. Those were more painful than not being able to debug in the editor right now.

What's New?

I switched to SDL3 since I had to rewrite the UI code anyway, might as well use the current version. The threading situation is much better, and there are no locks in the codebase now. The event system is much better. DAP support is more complete despite no UI for it at the moment. I improved it enough that I won't get that bug mentioned when there's a long pause in the middle of a response.

There's code for user configuration. I'd like the config to be searchable, so I wrote the code with that in mind, but it'll be a while before I implement that feature. There's multiple window support, but I only scratched the surface of that. The text editing code is much better and has multi-cursor support, and there won't be a problem if a line has tens of thousands of letters.

What's Next?

I like to release a beta sometime soon. The IO thread and filesystem code are next, so I can use real files with the text editing code. I'd like to source after that. I'll need to write the code to build files, then I'll need to hook up the debugger. I'll probably focus on UI at that point, but I may try to get LSPs working well and hook that code up. It's pretty close to the end of the year, and I'd like to get a beta out before the end of it.