Like a lot of things in life, you often start out with a clear end goal in mind. Think of any idea you’ve had – paint the fence, put up a curtain. It’s more often than not an apparently simple idea, we don’t usually embark on moon shot projects. Inevitably, even the simplest of projects are more complicated than they seem. Surprisingly, keeping the simple original goal in mind and staying on target can be the hardest things to do.
My simple goal was to improve our code quality by getting a basic pipeline to run ansible-lint and ansible syntax checks on merge requests for our gitlab installation. It’s a good first step, it’s self-contained and doesn’t require any interaction with test systems or the like.
Or so I thought.
First off, I had to figure out a few things. First off, getting gitlab-runner installed and registered as a runner with our gitlab. I chose to use docker as the target rather than shell or ssh, as I wanted to keep it all on the one server (we’re a relatively small setup) for simplicity. Next, I had to get docker installed, which was straightforward enough (the online docs are great for gitlab and docker). Then, I needed a base image to use. I didn’t want to use official (and potentially untrusted) docker.io images, so the answer was to set up a tiny local machine registry to use. Now, I worked out how to make a debian image using debootstrap. This is great, I’m getting places!
I’m still pretty close to the original goal – if anyone asked a this point what I was doing, it still sounds like I’m on target, right? I have a gitlab-runner registered, I have a docker image to use. I’ve created my .gitlab-ci.yml and set up the run tasks, stages and settings for the docker image. Great!
Or not. It runs, but then it throws me a weird error.
Couldn't parse task at /foo/bar/task.yml:3 (conflicting action statements: yum, __file__ The error appears to be in '<unicode string>': line 3, column 3, but may be elsewhere in the file depending on the exact syntax problem. (could not open file to display line))
That really threw me, because I couldn’t reproduce on the laptop, and also this playbook was working perfectly well in daily operations. Was it some strange encoding problem? I tried changing the playbook around, and eventually I ran the docker image directly and copied over a minimal set of two playbooks to re-create the issue. Again, this was only happening in the docker image. Could it be an encoding problem? LANG was unset. So running python -v and comparing the output with what came out on the laptop, last line – ascii.py not utf-8.py
/usr/lib/python2.7/encodings/ascii.pyc matches /usr/lib/python2.7/encodings/ascii.py import encodings.ascii # precompiled from /usr/lib/python2.7/encodings/ascii.pyc
vs
/usr/lib/python2.7/encodings/utf_8.pyc matches /usr/lib/python2.7/encodings/utf_8.py import encodings.utf_8 # precompiled from /usr/lib/python2.7/encodings/utf_8.pyc
But setting LANG makes no difference it seems. I even uninstalled and reinstalled python in the container.
Exact,
same
error.
Now, we’re off down the rabbit hole. How does python decide the encoding?
Some googling lead me to look at the sitecustomization.py script, and, lo, in site.py in the same folder, I see:
def setencoding():
"""Set the string encoding used by the Unicode implementation. The
default is 'ascii', but if you're willing to experiment, you can
change this."""
encoding = "ascii" # Default value set by _PyUnicode_Init()
if 0:
# Enable to support locale aware default string encodings.
import locale
loc = locale.getdefaultlocale()
if loc[1]:
encoding = loc[1]
if 0:
# Enable to switch off string to Unicode coercion and implicit
# Unicode to string conversion.
encoding = "undefined"
if encoding != "ascii":
# On Non-Unicode builds this will raise an AttributeError…
sys.setdefaultencoding(encoding) # Needs Python Unicode build !
Now, this looks promising. So after thinking it was a lint problem, I’m now seeing it as a python problem. Apologies for asking questions about lint in every IRC channel I could find 🙂
But none of that worked, it made no difference. At least I was able to quickly reproduce the error by running the docker image with a bash shell entrypoint.
I’m now definitely shaving a yak, and what I’m doing is incomprehensible to anyone. Do we even remember our original goal?
When you’re down in the weeds, it’s sometimes best to just leave it, go back and look at it fresh the next day.
The next day brought some fresh googling, and I found this un-answered stackoverflow question: https://stackoverflow.com/questions/64054313/ansible-lint-do-not-understand-confusing-linting-error
There’s a comment suggesting it might be a bug in ansible-lint 4.2.0 and ansible 2.10, and what versions am I running? Those exact ones in my docker.
So am I having an obscure error and the solution could be to upgrade lint? When you’ve been lost on a problem for long enough even the possibility of a solution is exciting.
So I do, in my still running docker image, but I can’t, the latest available even on python3 is 4.3.0a, which doesn’t sound stable. Maybe upgrade the docker image from stretch to buster release of Debian. And off we go with a trial upgrade, while I also run debootstrap to create a fresh base image for docker.
Some time later, and Boom! Confirmed – it is a bug in ansible-lint 4.2.0 – so says one suggestion to that question. So, upgrading to 4.3.6 fixed the problem.
Finally, I can get back to my original goal – lint and syntax pipelines for gitlab.
Yay!