Debugging adventures: ncmpcpp and taglib

Wynn Wolf Arbor

I have a pretty extensive music library which I manage with mpd(1) and ncmpcpp(1). Whilst I would love to find a standalone tag editor that is easily usable from the command line (or alternatively a nicely designed GUI one), ncmpcpp’s built-in tag editor has always been enough for my needs.

About two weeks ago I was in the process of adding a new album to my collection. This always includes a quick glance at the tags and some cleanup. This time, however, after I saved my changes, the following happened:

ncmpcpp displaying duplicate entries

Every time I saved my changes, another duplicate entry would be added to each tag. I had not updated ncmpcpp(1) in a while and was initially confused as to why it failed suddenly. I left it, removed all tags using metaflac(1), and tagged the album manually. At the time I could not really be bothered to fix it, but I still wanted to investigate it at a later point, so I added an entry to my to-do list.

Diving in

Today I decided to look into this finally. First stop: the ncmpcpp(1) source code. I already knew that it used taglib to access and modify Vorbis comments, so I quickly found the part of the code that was responsible for saving changes:

void writeXiphComments(const MPD::MutableSong &s, TagLib::Ogg::XiphComment *tag)
{
    auto writeXiph = [&](const TagLib::String &type, const TagLib::StringList &list) {
        tag->removeField(type);
        for (auto it = list.begin(); it != list.end(); ++it)
            tag->addField(type, *it, false);
    };
    // remove field previously used as album artist
    tag->removeField("ALBUM ARTIST");
    // remove field TRACK, some taggers use it as TRACKNUMBER
    tag->removeField("TRACK");
    // remove field DISC, some taggers use it as DISCNUMBER
    tag->removeField("DISC");
    // remove field DESCRIPTION, it's displayed as COMMENT
    tag->removeField("DESCRIPTION");
    writeXiph("TITLE", tagList(s, &MPD::Song::getTitle));
    // [...]
}

Here, ncmpcpp(1) calls removeField(type) before adding a tag, which looked fine. Maybe the list contains more than one entry? Stepping through this code in gdb(1) revealed that addField was only called once per tag, meaning it was only added once. At this point I remembered that I was pretty sure ncmpcpp(1) had not been updated in a while, so I refocused on taglib instead: Maybe removeField stopped working for some reason.

Unorthodox packaging decisions

A quick query against the Gentoo package database revealed that I was using taglib-1.11.1_p20190920-r1 which looked very suspicious already. The latest stable version on the website is 1.11.1 from October 24, 2016.

nabokov ~$ eix taglib
[I] media-libs/taglib (1.11.1_p20190920-r1@11/01/20)

So what is going on here? taglib is still under active development, but there has not been a stable release for about four years. Gentoo instead decided to package the latest commit (at time of writing) as a “stable” version. For projects that have some kind of stability guarantee regarding their API, this is a very dangerous and strange choice on the part of the packagers; quite frankly, it is inviting random breakage.

Exploring taglib

Digging around in taglib’s github repo revealed that removeField had been marked deprecated internally in mid-2015 because of an apparent linking error when using String::null. Let’s have a look at the prototype of removeField from back then:

void removeField(const String &key, const String &value = String::null);

This method removes a tag entry of a specific type and optionally matches it against a given value. For example, removeField("TITLE") removes all TITLE tags, and removeField("TITLE", "Garbagemx36") only removes TITLE entries if they match “Garbagemx36”.

The issue mentions that changing the default parameter for value to an empty string would change the behaviour of the method. Suddenly, removeField("TITLE") would only remove TITLE entries if they matched the empty string - leaving everything else intact.

Sounds like we found our culprit. But didn’t the issue say that they would not make this change? What gives?

Enter GDB

At this point I started doubting the correctness of five year old issue comments, and decided to have a look at things myself. Thankfully, Gentoo makes it very easy to add debugging information to packages and have the source code installed alongside binaries and library files. A single addition to /etc/portage/package.env later and I was debugging removeField in gdb(1).

Stepping ahead to where the method checks whether value is null or not, I printed its contents:

(gdb) p value
$1 = (const TagLib::String &) @0x7fffffffc240: {
  _vptr.String = 0x7ffff5be83b8 <vtable for TagLib::String+16>, static null = {
    _vptr.String = 0x7ffff5be83b8 <vtable for TagLib::String+16>, 
    static null = <same as static member of an already seen type>, 
    static WCharByteOrder = TagLib::String::UTF16LE, d = 0x5555559a95c0}, 
  static WCharByteOrder = TagLib::String::UTF16LE, d = 0x555555b9f700}

Ah, like any good project, taglib is using a custom string type. Let’s convert it to a C string:

(gdb) print value.toCString(false)
$5 = 0x555555b9f740 ""

And there it was. taglib indeed was using an empty string (and not null) as the default for value in removeField.

Closing Blame

One question remained: Why was this changed after all? It seemed pretty clear back then that the developers did not want to break the API. Perhaps there was a different bug at work here?

As I had learnt earlier, the default value for a parameter is found in the header file, so I used git-blame(1) to find commits that introduced changes to the prototype. Surprisingly, what I found was a very recent change from September that added actual deprecation warnings… and quietly changed String::null to String(). I was not able to find a reason for this; to my knowledge, no commit or issue mentions this change.

So in the end, whilst wanting to keep up API promises, the behaviour of an already deprecated method was changed in what can only be called the “development branch” of the project, and it hit me because Gentoo decided that it was a good idea to use work-in-progress code as a stable candidate for a package.

Around the time I discovered this, someone else had already opened an issue on GitHub. I shared my findings and later opened a pull request. I am pondering bringing this to the package maintainer’s attention, but am not confident it will result in any positive change.