August Summary

Bold is a fast text editor. You can find an overview on the home page.

During the first two weeks, I wanted to take it easy, so I turned up the warnings and fixed a few potential problems when handling large files.

Next, I wrote more LSP code. Last month, I wrote code that read in a JSON file describing the LSP specs. My code used it to generate structs, enums, parsing code, and more. However... The language server protocol is total 💩
I might write a full post on it, but I can summarize. It's inconsistent and difficult enough that there is no C SDK implementation (at least not on the official page). The biggest problem is that the entire spec says a field could be either this or that, which isn't friendly to languages that prefer a struct, and worse in a language without a tagged type.

For example, in the signature help response, there's a field called 'documentation' which can be either a string or a MarkupContent object. If you look at MarkupContent, it has two string fields called 'value' and 'kind'. Kind can be the string "markdown" or "plain". What's the difference between 'documentation' being a string or a markdown object using the plain kind? From what I can tell, nothing. But I'll have to support both because an LSP might choose one over the other, or potentially both if there's an if-statement that leads to a different response.

Also, right next to the 'documentation' field is a 'label' field. It can be either a string or an array of 2 uinteger, start index and end index, which are used to create a substring using the 'label' field in the parent object. Why do they have this? I have no idea. If they wanted to save memory, they could have specified an alternative binary protocol in the spec. Both of these problems are in the 'signature help' response.

Another issue is how frequently there's an LSPAny. For example, it's in the completion request, which triggers on every letter you write (to guess the function you're typing), and when you type a period. If I want (or am required) to keep the data from an LSPAny field, I'll have to either keep the json object (which can be large) or implement some kind of dynamic data object to hold it. I'm not sure why, but there are IDs that could either be an int or a string. Every time I see this, I convert the int to a string so my own code is simpler. A strange but not too annoying thing are enums. Many places they're C styled enums (a list of ints with a clear name), other places they're string literals.

If you look at the DAP specs, it's so different that I wouldn't doubt anyone who claims it's written by a completely different team. The DAP specs are underspecified, but relatively speaking, it's more reasonable. Generating structs and code from the DAP spec was a great idea, but I'm on the fence for the LSP spec. Pretty early, I had checked how many requests and responses I could easily generate; however, I failed to notice that many languages & LSPs use a fraction of the spec. All the popular requests were the requests that I couldn't generate properly. I had to manually write those. If I don't end up using half of the LSP spec, then it would have been quicker to skip the code generation and manually write all the requests/responses I use. However, there could be a popular LSP that uses the half of the LSP spec my code generated, so maybe it will be a good idea; we'll see.

The last thing I worked on was fixing my utility code. I ran into two bugs while writing the LSP generator, and writing unit tests for util code was on my to-do list. So I went ahead and wrote those tests. One util function (which had one of the bugs) I completely rewrote to my surprise, I had thought it was mostly if not completely correct. All my code that doesn't involve 3rd party code (ie lsp, debuggers, and some OS specifics) has a high coverage. I might need a week to implement a good test for third-party code, but hopefully I can put that off until I'm in another mood for an easy week.