Parsing conText: irregularities in existing data

Just wanted to drop a note regarding the format and parsing of conText (preset metadata in data stream 1).

There isn’t a formal specification for the min-language used in conText. From a casual simple inspection, it appears to be a simple but there are some anomalies in the existing corpus of conText that make it not as simple as it appears.

At a casual glance it appears that there is a simple 3-levels of delimited segments: whitespace (space or line break) delimit sections (Macro, Category, and Author, in that order). Then separate on ‘=’, then separate on ‘_’.

Unfortunately, the existing data doesn’t follow this naive first impression. There are exceptions that need to be considered:

  • a few presets that begin with the Author (the three v1-v3 MicroDelay WaveGuide v* by C.Duquense).

  • a few presets where the Author segment contains spaces. These all contain “R. Kram” with a space.

These anomalies are making me have to rewrite some of my conText parsing code to use a different strategy for extracting the interesting data. Instead of simple delimited segments, I’ll need to do pattern recognition to find the segment start: [<non-alpha>]<alpha>=. Then within a segment I can consider _ a space. I haven’t written the code, yet, but I think this will work.

1 Like

I’ve found a way to parse that solves all the anomalies noted previously and uses a simple state machine. The trick is to scan backwards for the section tags.

I found another anomaly in the data that I can’t fix. Effect Modman, Effect Modman 1, Karplus Effect, Phaser and others fail to use A= to indicate the author. It’s effectively an un-tagged section, based on whitespace separation vs recognizing section labels. So I can’t win either way (splitting on whitespace or recognizing section labels).

Overall, I think I miss fewer pieces of information by recognizing section labels, so that’s what I’m going with. Sorry, the tooltips for Effect Modman* will not credit Christophe properly.

For a completely unique variation, check out the conText for RingMod Voice.

1 Like

Yes in my overlays I have an issue if there is no Author string I was using as a terminator. I was thinking about just adding a terminator of my own on the end of getting the whole context string and then parse it.

1 Like

So, I just updated my code that extracts the Author segment, and it now handles the cases where the Author is appended without the A= tag. I think I may be getting them all, now. At least Christophe gets a little more credit, in spite of the problematic authoring :-). This does expose a couple of anomalies where applying this results in a few presets that credit the Author “A” (someone forgot to type the rest of A=something. I blame “A.Nonymous”)

std::string parse_author(const std::string &text)
{
    if (text.empty()) return text;

    const char * end{text.c_str() + text.size()};
    const char * scan{end - 1};
    const char * data{scan};
    const char * eq{nullptr};
    const char * id{nullptr};

    enum State { Data, Id };
    State state = Data;
    const char * lim = end - text.size();
    while (scan >= lim) {
        switch (state) {
        case Data:
            if ('=' == *scan) {
                eq = scan;
                state = Id;
            } else {
                data = scan;
            }
            break;
        case Id:
            if (std::isspace(*scan)) {
                if (!id) return "";
                if ((1 == (eq - id)) && 'A' == *id) {
                    return make_result(data, end);
                } else {
                    data = scan;
                    end = scan + 1;
                    state = Data;
                }
            } else {
                id = scan;
            }
            break;
        }
        --scan;
    }
    if (id && eq && (1 == (eq - id)) && 'A' == *id) {
        return make_result(data, end);
    }

    // Didn't find an Author section (A=). So,
    // handle observed pattern where the author is simply appended
    // with a whitespace separator and without the A= section tag
    auto pos = text.rfind('=');
    if (std::string::npos != pos) {
        auto scan = text.cbegin() + pos;
        while (scan != text.cend() && *scan != ' ') scan++;
        while (scan != text.cend() && *scan == ' ') scan++;
        if (scan != text.cend()) {
            std::string result(scan, text.cend());
            std::replace(result.begin(), result.end(), '_', ' ');
            return result;
        }
    }
    return "";
}

It would be nice if a future update to the firmware and manuals would clean up the presets and provide a formal definition of the syntax.

1 Like

Here is my solution in Juce C++:
String mns = makroNamString;
String aktP = aktuellesPreset;
StringArray strA = { “1=”, “2=”, “3=”, “4=”, “5=”, “6=” }; // römische Numerierung ersetzen
if (mns.contains(“i=”)) mns = mns.replace(“i=”, “1=”, false);
if (mns.contains(“i1=”)) mns = mns.replace(“i1=”, “2=”, false);
if (mns.contains(“i2=”)) mns = mns.replace(“i2=”, “3=”, false);
if (mns.contains(“i2=”)) mns = mns.replace(“i2=”, “3=”, false);
if (mns.contains(“iv=”)) mns = mns.replace(“iv=”, “4=”, false);
if (mns.contains(“v=”)) mns = mns.replace(“v=”, “5=”, false);
if (mns.contains(“v1=”)) mns = mns.replace(“v1=”, “6=”, false);
if (mns.contains(“g1=”)) mns = mns.replace(“g1=”, “5=”, false);
if (mns.contains(“g2=”)) mns = mns.replace(“g2=”, “6=”, false);
if (mns.contains(“ovl=LorisCycle”) || mns.contains(“ovl=LorisChoral”)) isOverlayLoris = 1;
else isOverlayLoris = 0;
for (auto lab : makroLabelArr)
{
lab->setText(“”, dontSendNotification);
str = mns.fromFirstOccurrenceOf(strA[k], false, false);
if (str != “”)
{
str = str.upToFirstOccurrenceOf(" “, false, false);
if (str.contains(”“)) str = str.upToFirstOccurrenceOf(”", false, false);
if (str.contains(“C=”)) str = str.upToFirstOccurrenceOf(“C=”, false, false);
if (str.contains(“A=”)) str = str.upToFirstOccurrenceOf(“A=”, false, false);
}
lab->setText(str, dontSendNotification);
lab->repaint();
k++;
}

1 Like