Introducing the Broken k8s Project
Posted on Tue 22 October 2019 in k8s • 3 min read
One of the most common things I see with people new to development or devops is a lack of resources for them to learn good troubleshooting tactics on their own in a near-real-life environment. There are great tutorials, but I haven't been able to find a good, free sandbox filled with deliberately broken environments that need fixing outside of a corporate environment. Sure, you learn tactics on the job or on your own system when you accidentally bork it, but those scenarios can be very frustrating for a beginner who doesn't even know where to start to find the cause of why their systems aren't working when there are multiple reasons it's broken. Most of the time, you have to have a more experienced mentor to help debug, and those mentors can be hard to find for some folks in the community due to factors such as geography, time, or biases against them for one reason or another. In addition, containerized systems have the added drawback of being ephemeral, making it harder for new folks to figure out just how to begin when they don't even know how to access the broken system in the first place.
As a result, I thought it would be great to take a number of systems and just start deliberately breaking them in specific ways that I often see when dealing with broken systems and then providing them to the community as a set of stepping stones to debug more and more complex Kubernetes systems. So I started with a minikube build that is fairly bare bones to set a control environment. This environment is mainly to ensure that the user's system works and is configured properly. Then I started tweaking things here and there to break the system in fairly straight-forward ways to give people a chance to get some easy wins. In addition, these easier systems can teach a new user to hunt for problems in specific places, isolating the noise in the logs to surface common errors and their symptoms in a way that will be recognizable for the next set of systems. Each system lives in its own namespace on minikube to avoid cross-contamination by accident. Next, I aim to build out complex systems, make some attempts at breaking them in multiple ways to get a new person used to the idea that solutions are almost always not perfect, and introduce some variability into the mix, as well. Hopefully, building a set of troubleshooting tutorial sandboxes in this pyramid-like fashion will teach good practices and hone troubleshooting skills before those skills are truly needed.
I'm mainly posting about this idea to ask for help, even if it's just sharing the tweet I'm sending out with this post or sharing this repo with people you know. I've started on building some systems, but I know I haven't seen everything out there, and I know there's people with a lot more experience than me. The GitHub repo I started, https://github.com/nimbinatus/broken-k8s, is open to the public, open sourced with an MIT license, and open to contributions, including an initial contributor's guide on what should be in each broken system's directory.
Join me in making the devops world a bit easier to break into. I'd love your help.