Building an LSP for dbt Core - Jonathan Clemons

TLDR: I built an LSP for dbt Core. Check it out on GitHub

Why build it?

dbt Core is a solid set of tooling for data transformations, but it can be lacking in terms of developer tooling that is common with other languages and frameworks. One of those things for a long time was a proper language server to support the framework.

When I started this project there was another tool available, but it was bundled for VSCode and also had support dropped. I use Neovim (btw), so my motivation level to try and repackage that extension was low. Since there was no clear option for my use case, it led to me thinking a lot about writing my own language server. Along with lack of options, I wanted to have control for the features that were available and be able to add what I wanted. So thus that began a journey of building a language server for dbt Core.

Having not done this before, research was one of the first things I did. Thankfully, I found a very helpful video from TJ DeVries about building a language server from scratch. The example in this video ended up being the starting point for this project.

What about dbt Fusion? Doesn't it already have a language server?

Yes, it does within their VSCode extension and if you are using VSCode or one its forks you should probably be using it. It has a native integration with dbt Fusion and supports the common features.

All of that said... dbt has communicated that they will not be distributing an unbundled version of the language server and they do not plan on open sourcing it either.

slack-reply

This is frustrating for a user like myself who wants to use a different editor and would contribute to the open source project if I could. I will hold other thoughts on this behavior for another time, but in short I think language servers for open source frameworks should also be open source.

Design Decisions

Written in Golang using primarily core libraries.

dbt Core is not a dependency of the language server.
Since dbt Core can be very slow to compile, the language server will do its own parsing of the project. A python environment is not required to run the language server.

dbt Fusion is changing things.
Since dbt Fusion is fast enough to leverage for some features like diagnostics or model compilation, it is used for some more recent features. These are documented in the project README and a CLI flag is required to enable the use of the fusion features.

So what features are available?

The base features are ones I desired the most in my workflow and felt like added value. If there are features you have seen and want included in the tool, feel free to open a PR or an issue for discussion.

Go to definition

Project navigation was one of my biggest wants since dbt already is formed into a graph and navigating to upstream models is a common task. In the current state of the language server, this is supported for models, seeds, sources, macros, and variables. This also can handle macros and models that are defined in packages. Within a model this feature is also supported for CTE names.

definition

Context in hover

dbt projects have context stored in so many different places. The hover feature should be able to give some context for what is underneath the cursor. Currently this includes: models docs, macro inputs, sources, variable values, and function documentation (dialect specific).

hover

Completions

Completions with context are available for models, seeds, sources, macros, variables, and functions (dialect specific).

completions

Diagnostics

This is the current place where dbt Fusion is required. Since dbt Fusion can compile and run static analysis on models, the language server can provide diagnostics for errors and warnings. This integration will be continued to be explored as Fusion continues to become more complete.

What got explored, but left out

As with any project there have been a number of things that I wanted to add, but have opted to leave out for now. A few examples for now:

Linting support via sqlfluff
This actually got built out pretty significantly, but had a lot of limitations and I didn't feel like it worked well enough to outweigh its shortcomings. The major blocker was that the sqlfluff dbt templater requires dbt core to compile the models in order to lint, so it ends up being very slow. The speed was not close enough for what I felt like made a good developer experience. The jinja templater is faster, but misses on certain items (which is precisely why the dbt templater exists). The inaccuracy and differences from the dbt templater results, ended up with it just being scrapped. This will be rexplored as dbt Fusion gets more completed.

Integration with a more complete sql parser like sqlglot
This was also explored significantly, but it ended up with similar issues to sqlfluff. Primarily, due to the requirement of dbt core compiling the models. For right now the features of sqlglot are not exactly what I was looking for right now. Again, this might get re-explored as dbt Fusion continues to be more complete.

More code completion suggestions
With the proliferation of in-line AI code assistants I think it's less important for the lsp to be generating completions that don't need dbt specific context. I chose to focus on areas where the developer would want more context and insight. Join conditions, alias patterns, and starter code snippets are all things that AI's are good at figuring out and generating completions. So completion features are likely to not be prioritized within this project, but if you have a compelling reason I would be happy to discuss.

The Future

I have a few things I would like to add in the future, but I am also very interested in hearing what other people might want. This has been built to support my own workflow so features have come from an specific point of view.

Some things that I have in mind:

Add support for more dialects
Use dbt fusion compilation for linting results in diagnostics
Increase parser functionality for intra model navigation

If you have made it this far and want to check it out you can find it on GitHub. Any feedback is appreciated.