What I Learned Creating Junction

Jim Hughes, April 9th 2020 - 10 min read

Junction is a Python package that allows developers to publish markdown files tracked in a Git repository to Confluence. With Junction, you can use existing Git workflows for managing code (pull requests, code review, release branches, etc) to also manage how documentation gets released to your wiki.

It's available on PyPI as a pure-python dependency:

pip install confluence-junction

Keep your documentation by your code

The idea for Junction came about in 2017 when I worked at Two Sigma. As one the firm's lead architects, I advocated for the adoption of lightweight architecture decision records.

Lightweight architecture decision records capture the reasoning and context behind changes to your software's design. They work best when kept near the code they pertain to because developers will find them in the course of working, and they can be created as part of the PR introducing a change—all ensuring the pertinent context is available with minimal searching. You can see an example of this in use by the UK government.

The problem was that over the course of years, Confluence had become the center of gravity for internal documentation and was the first place people went to find designs, best practices, and reference material. Moreover, not all documentation fits naturally into the codebase as markdown files, so many teams wished to have a single view that combined in-repo docs with richer wiki pages.

Since then I've seen this same problem play out again and again, everywhere I see development teams using Confluence. With some free time on my hands, and an itch to work on a side project, Junction was born.

As a side project, I wanted to not only solve a problem (how do I make in-repo docs available on a wiki?) but to also broaden my horizons and try something new. I didn't have unlimited time, and wanted to reach a reasonable level of completion in only a few weeks...hence using an entirely new language or toolkit was out of bounds.

Therefore I chose to use my current-strongest language, Python, but to try and make use of newer features and emerging standards:

Python 3.8
Typed Python (with static type checking)
Poetry and pyproject.toml
GitHub Actions
Managing the entire project form GitHub (issues, projects, etc)

It turned out to be quite informative! Here are my top 5 takeaways that may help you evaluate and get started with any of the above:

1. Don't go overboard with Python typing

The Junction codebase uses statically typed Python everywhere, no exceptions. In fact, the requirement to type-annotate every single function signature is enforced by mypy as part of a pre-commit hook and build rule. Here is an example:

def markdown_to_storage(text: Optional[Union[str, bytes]]) -> str:
	pass

This function converts markdown text to a string containing Confluence storage representation. The markdown text can come in as None, a string, or bytes. The function will handle the different argument types without complaint. Besides self-documenting how to safely use this function, it allows for static type checking.

I have lusted for static typing in Python for a long time. Witnessing brutally simple bugs slip into production because of developer-error is painful. Don't get me wrong: Python is my primary language these days for a reason. I don't necessarily believe statically-typed languages are inherently safer than dynamically-typed: in fact, compilers often lend developers a false sense of confidence.

That doesn't mean type-checking is a waste of time! After I went back and added in all of the type annotations, mypy managed to find ~4 different bugs in my code! Nice!

In the end, though, I do not recommend typed Python for every project under the sun.

Typing support didn't really reach maturity until Python 3.8 which is still inconsistently supported by other community projects and is less likely to be available in your runtime environment than say Python 3.7.
mypy will happily report 0 errors when your code is not fully typed, leading to overconfidence. You need to explicitly configure mypy to disallow this. Moreover, it's very easy to accidentally let an Any or two into your typing, and inadvertently cause an entire critical path to essentially go unchecked, with none the wiser. This isn't a mypy flaw, more like mixing duck-typing and static-typing is an inherently tricky task.
Trying to actually make use of Python's fast-and-fluid duck-typing and "trust the developer" mentality is almost impossible. In fact, those aspects of the language actively fight you when statically typing and can make you very unproductive as a result.

In effect, using Python with totally enforced static typing makes the dynamic aspect of the language a burden, while loading you up with the overconfidence of static type checking to boot.

My advice? The choice is not between dynamically typed or statically typed Python. You should view static typing as a safety tool: it makes code safer to change by pointing out unexpected side-effects early, and documents your code in a way that is very accessible to other contributors. Does your project actually have a requirement for that level of safety? If yes, then typing is for you...and even then, maybe only the parts of your project that are relatively stable and mature.

2. New Python features have a long bake time

I normally use a fairly standard Python project layout which makes use of setup.py, requirements.txt, src/, test/, pip, and virtualenv. Before jumping in with my usual boilerplate, I decided to take a look around because I knew some exciting things were on the horizon back in 2017 when I last assessed the environment management and packaging landscape.

Pipenv and Pipfile's were high on my list for solving the pains of using a mosaic of different tools and configs across dev, test, and release. However that project has two gaps: (1) you still need to maintain setup.py if you intend to release, and (2) you can't use pre-release packages without permitting pre-release packages for everything.

No matter, what other options exist for leaving behind setup.py and friends? PEP518 and pyproject.toml. Nothing makes this new standard more appealing and user friendly than Poetry. From getting setup with the right version of Python, to activating my environment and resolving dependencies...it was painless. With Poetry in place, I could lean into the use of pyproject.toml and avoid littering the root of my project with one-off ini files!

Despite years of maturation and community adoption, not all projects have picked up support for this new configuration option: Flake8 still uses .flake8, mypy requires mypy.ini, and even Tox only has partial support.

These little compatibility land mines pop up in other arenas as well. Python 3.8 adds support for assignment expressions (:= AKA PEP572), but both pyflakes and pycodestyle lack support and will report false-negatives..unless you're using the latest pre-release builds.

All annoyances and no major road blockers, but the experience has left me doubtful. When you choose to develop in Python, you must acknowledge how critical these community-supported tools are to providing a fully-fleshed out SDLC. Python 3.8 is 6 months old at this point, and pyproject.toml has been around for ~3 years.

This paints a pretty bleak picture about how long advancements in Python take before they can be adopted and embraced by developers at large. So what's to be done? Well, if you can, start contributing to these tools yourself! PyCQA and PyPA may sound super-official, but they aren't paid endeavors. It's still just people like you and me largely donating their time to creating amazing tools that make our favorite programming language more usable. You can make a difference by lending a hand. In fact, the next release of pyflakes and pycodestyle with full 3.8 support appears to be orchestrated by a very helpful community member.

3. OpenAPI is still painful

Confluence Cloud publishes an OpenAPI (formerly known as Swagger) specification for their API. Just finding that specification was an exercise in frustration, because it's hidden behind a link in a "..." menu in the corner of their website.

Once I found the actual spec I thought my troubles were over and I could get a fully working API client with the help of OpenAPI Generator. I mean, that's the entire value proposition of OpenAPI isn't it? And surely things have gotten better since when I last tried to this with Swagger (OpenAPI 2.0)?

lol, no.

The Confluence Cloud API spec has bugs in it. It's massive, clocking in at 13,000 lines of JSON. It's clearly generated from the actual API stubs in their server code, and whatever is generating this is broken.
After you manually fix the bugs, OpenAPI Generator doesn't produce valid Python code. It doesn't put down the proper imports for handling the object hierarchy found inside Confluence's API.
Apparently not everyone agrees with the direction of OpenAPI Generator, so the original Swagger Codegen is still maintained and under active development, however the code it generates is even more broken. Open source fragmentation is cool.

In the end I made my own lightweight API client. It took forever and was mostly mechanical...sounds like a job for codegen ?. OpenAPI still has a ton of potential, but the codegen feels like the same toy/tech demo it was back in 2016 when I tried it last. Without reliable codegen, I honestly see very little value in publishing OpenAPI specs for my own API's...it's just setting my users up for frustration.

4. GitHub Actions are great; YAML is not

Actions are the new(ish) CI/CD offering from GitHub built in to your repository. The reception of Actions has been universally positive and enthusiastic in my extended social circle, so I was excited to try it out. It definitely lives up to the hype, here are my favorite parts

There is a large (and growing) library of community-contributed actions which make building out your workflow almost effortless.
It's free for public repositories, and you get up to 2000 minutes/month for private repositories...which is way more than enough for hobby projects.
Sooo fast; with only ~10s per job spent on setup/teardown.
Beautiful documentation, seriously a gold standard that leaves nothing to the imagination. No need to spin up an experiment just to see how something works.

In spite of my new-found love for Actions, it's hampered by writing workflows in YAML, software engineering's most ubiquitous ~~accomplishment~~ mistake. GitHub provides a competent web UI for editing workflows, but there is an upper-bound on how usable any YAML-driven tool can be in my opinion. For any sufficiently expressive configuration system (i.e. CI pipelines), the YAML devolves into a domain-specific language. This sort of "meta programming" is a poor replacement for a full-fledged imperative programming language that has mature IDE support and the possibility of running locally. Even when keeping your workflows lightweight and deferring the heavy lifting to in-repo scripts, you'll still be keeping the workflow YAML specification close at hand to muddle through ever so slowly.

None of this is really the fault of GitHub Action's, and maybe just an expression of some pent-up frustration with Kubernetes, but the nail in the coffin is that there is no way to run the workflows locally (or directly form the web editor) without pushing a commit. Having a way to test the workflows quickly tightens the feedback loop and eases the pain associated with idiosyncratic YAML DSL's.

Matters of taste aside, I still recommend Actions to everyone, even if that means migrating your project from another source control platform.

5. Click is my new goto tool for CLI's

I make a lot of CLI's. They are a quick way to provide functionality to a developer, as well as a cheap "API" when something more serious is overkill (see: CI/CD workflows). The usability of a CLI is largely a function of its presdictable (and forgiving) parsing of arguments/flags, and towards that end pretty much every language has a goto library for argument parsing:

Java has Apache Commons CLI, which I despise.
C# has PowerShell cmdlets, which I enjoy.
Python has argparse, which I find toilsome.
Bash has getopt(1), which I use for simple cases.

The CLI for Junction is fairly tricky to use so I wanted strict parsing and detailed help text, and to do so with argparse alone would have been a large effort. After a short search, I stumbled upon Click.

I will never look back.

Click made producing Junction's CLI take less than 2 hours...and that includes figuring out how to use the library. My CLI has gloriously detailed help text, as well as clean output with ANSI coloring on supported terminals! The whole thing fits in less than 200 SLOC. Seriously, I have never felt "good" about any CLI I have ever released because I know they are full of bugs and edge cases, but are juuust good enough to pass. With Click I am confident this CLI is going to work, and if I want to unit test the hell out of it, I can with ease.

High-quality side projects are hard

The hardest part about this entire experience wasn't grappling with these unfamiliar elements... it was the basics of maintaining good developer hygiene: writing good issue descriptions, leaving useful commit messages, documenting architecture, and unit testing. In fact I probably failed at 3/4 of the above because the only thing I really did was document the architecture (and it felt like pulling teeth despite being one of my favorite activities at my day job).

What was missing? Well obviously not everyone finds the above intrinsically enjoyable. More than that though, what I felt missing was a feeling of community. What motivates me to do any of these hygiene tasks well and to do them regularly is the obligation I feel to my peers and the sense of pride I receive as a craftsman.

I am able to draw a few fundamental takeaways from this whole experience then:

Don't sweat the small stuff, just do what you can. If not writing enough unit tests stopped me from starting (or finishing) this project I would have missed out on a wealth of experience.
When starting a new project, prioritize building a sense of team. Shared commitments and interpersonal morality are strong motivators for all the things that keep development projects healthy i.e. documentation, testing, and communication.
As a hiring manager, don't hold a lack of side projects against someone. If I had a child, a spouse, or other commitments there is no way in hell this project would have happened...it was a slog. GitHub and Stack Overflow profiles can be useful, but at most they can be used to abridge the interview process..never to filter someone out.

What I Learned Creating Junction

Keep your documentation by your code

1. Don't go overboard with Python typing

2. New Python features have a long bake time

3. OpenAPI is still painful

4. GitHub Actions are great; YAML is not

5. Click is my new goto tool for CLI's

High-quality side projects are hard

Older Post

The Privilege of Vulnerability

Newer Post

Mastering the Phone Screen