insert_separator

I recently realized that there is a problem that has kept cropping up regularly in my code for the last few months: transform a sequence of words in a comma separated list. Not a big issue in itself, but I find it interesting for two reasons: first, it is almost impossible to specify a solution that matches the regularity and simplicity of the expected outcome; second, it’s a good test case for checking how well different languages support writing a generic solution.

I have to confess that I wouldn’t have given it much thought if this wasn’t a kind of problem that really irks my sense of aesthetics; what you want is something as simple as:

string1, string2, string3

A very plain sequence, without any exception or special case. Yet when you try to code it you can’t avoid making a special case of either the first or the last string, as in this Python function:

def insert_separator1(l):
    ret = l[0]
    for s in l[1:]:
        ret += ", "
        ret += s
    return ret

Otherwise you have to insert a conditional inside the loop, which conceptually you should only need to compute once, as in this alternative Python implementation:

def insert_separator2(l):
    ret = ""
    first = True
    for s in l:
        if first:
            first = False
        else:
            ret += ", "
        ret += s
    return ret

By the way, I’m well aware that Python’s standard library provides a ready-made solution:

print ", ".join([ "string1", "string2", "string3" ])

While Python is very convenient to expose programming concepts, the code where this little problem keeps coming up is actually written in C#. This language also has a library solution, which is very similar to Python’s:

using System;

class InsertSeparator1
{
    static void Main(string[] args)
    {
        string[] l = { "string1", "string2",
                "string3" };
        System.Console.Out.WriteLine(
                string.Join( ", ", l));
    }
}

These library solutions are undoubtedly convenient, but they are limited to the scenario where you want to concatenate your list elements into a single string. Sometimes, however, you would rather output your strings to a file, or you might want to interleave other kinds of things, rather than just strings.

Let’s see then how different languages support a generic solution. Thanks to generators and duck typing Python makes it very easy:

import sys

def insert_separator4(elements, sep):
    first = True
    for e in elements:
        if first:
            first = False
        else:
            yield sep
        yield e

if __name__ == "__main__":
    for e in insert_separator4([ "string1",
            "string2", "string3" ], ", "):
        sys.stdout.write(e)
    sys.stdout.write( '\n' )

Slightly less so C#, which is constrained by static typing and the subordination of generics to inheritance as a means of generalization:

class InsertSeparator2
{
    static IEnumerable<ValueType>
            InsertSeparator<ValueType>(
                    IEnumerable<ValueType> elements,
                    ValueType sep)
    {
        bool first = true;
        foreach (ValueType e in elements)
        {
            if (first)
                first = false;
            else
                yield return sep;
            yield return e;
        }
    }

    static void Main(string[] args)
    {
        string[] l = { "string1", "string2",
                "string3" };
        foreach ( string e in
                InsertSeparator<string>(l, ", " ) )
        {
            System.Console.Out.Write(e);
        }
        System.Console.Out.WriteLine();
    }
}

Note that you need to implement the IEnumerable<T> interface in order to exploit this function.

C++ is also statically typed, but thanks to a form of compile time duck typing supported by templates it doesn’t require an abstract base class:

#include <iterator>
#include <iostream>
#include <string>

template <typename InputIter, typename OutputIter, 
        typename ValueType>
void insert_separator(InputIter first, 
        InputIter last, 
        OutputIter out, 
        ValueType sep)
{
    bool initial = true;
    while ( first != last )
    {
        if ( initial )
            initial = false;
        else
            *out++ = sep;
        *out++ = *first++;
    }
}

int main()
{
    std::string l[] = { "string1", "string2",
            "string3" };
    insert_separator(l, l + 3, 
            std::ostream_iterator<std::string>(
                    std::cout), 
            ", ");
}

At first sight the C++ solution isn’t much better than the C# one. However generalization is not just a matter of language constructs; it is also a matter of conceptual framework. With little effort from our part the function above integrates perfectly with the standard library’s iterators and containers, as well as with any user defined ones, provided they adhere to the standard library constraints.

Finally, there is a solution to this problem which satisfies my quest for elegance, but it’s currently late at night and this post is already long enough as it is 😉

Advertisements

4 Responses to “insert_separator”

  1. xpmatteo Says:

    Hi Nick,

    your first solution fails when the input array is empty!

    My next-to-favourite solution (in Ruby) would be

    def join(array, separator)
    result = “”
    for element in array
    result += separator unless result.empty?
    result += element
    end
    result
    end

    And this would be the algorithm I would use in old school languages like Java or C++. But if you have nice “higher order” methods in your language, you can shorten it to something as cool as:

    def join(array, separator)
    array.inject {|a,b| a + separator + b} || “”
    end

  2. nmusatti Says:

    Hi Matt,
    You’re right of course. I was more concerned with making my point than with writing production quality code.
    Your first solution is more compact than mines whenever you actually do have an underlying accumulator, otherwise you need different means for distinguishing the first iteration.
    I have to say that I’m not sure I understand the details of your second solution. Would you care to explain?

  3. xpmatteo Says:

    Ah yes, the inject trick. Inject takes a block of code and gives it two parameters; the first is the accumulator and the second is each element of the array in turn. Functional programming people call it “fold”.

    [1,2,3].inject {|sum, element| sum + element } gives 6.

    When the array is empty, inject returns nil. So I add the

    … || ”

    to cover that case. If inject returns nil, we return empty string.

    So there are two “ifs” in the second solution, the first is in the “||” which treats the special case of empty array, and the second is hidden in “inject” which will just return the first element if the array is a singleton.

    I think your C# solution would be prettier if you used

    if (e == elements.first)

    in place of the ugly boolean flag. You need a test for the first case, anyways. One way is to test if the accumulator is empty, the other is to test if the current array element is the first.

    Here’s another solution that is more similar to yours. The intertwine function captures the behaviour of joining and might be useful in other places.

    require “test/unit”

    def intertwine(array, separator)
    for element in array
    yield separator unless element == array.first
    yield element
    end
    end

    def join(array, separator)
    result = “”
    intertwine(array, separator) do |element|
    result += element
    end
    result
    end

    class TestLibraryFileName < Test::Unit::TestCase
    def test_case_name
    assert_equal "", join([], ", ")
    assert_equal "x", join(["x"], ", ")
    assert_equal "x, y", join(["x", "y"], ", ")
    end
    end

  4. nmusatti Says:

    Thank you for your explanation.

    You’re right again about my C# solution; the boolean variable is indeed ugly, but on the other hand it makes it possible to represent the same idea in the same way in all the examples.

    There are situations where there isn’t any superior alternative: for instance C++ iterators do not have a notion of first element so in order to check for it an additional loop control variable would be required, so as not to move the starting iterator.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: