I recently realized that there is a problem that has kept cropping up regularly in my code for the last few months: transform a sequence of words in a comma separated list. Not a big issue in itself, but I find it interesting for two reasons: first, it is almost impossible to specify a solution that matches the regularity and simplicity of the expected outcome; second, it’s a good test case for checking how well different languages support writing a generic solution.
I have to confess that I wouldn’t have given it much thought if this wasn’t a kind of problem that really irks my sense of aesthetics; what you want is something as simple as:
string1, string2, string3
A very plain sequence, without any exception or special case. Yet when you try to code it you can’t avoid making a special case of either the first or the last string, as in this Python function:
def insert_separator1(l):
ret = l[0]
for s in l[1:]:
ret += ", "
ret += s
return ret
Otherwise you have to insert a conditional inside the loop, which conceptually you should only need to compute once, as in this alternative Python implementation:
def insert_separator2(l):
ret = ""
first = True
for s in l:
if first:
first = False
else:
ret += ", "
ret += s
return ret
By the way, I’m well aware that Python’s standard library provides a ready-made solution:
print ", ".join([ "string1", "string2", "string3" ])
While Python is very convenient to expose programming concepts, the code where this little problem keeps coming up is actually written in C#. This language also has a library solution, which is very similar to Python’s:
using System;
class InsertSeparator1
{
static void Main(string[] args)
{
string[] l = { "string1", "string2",
"string3" };
System.Console.Out.WriteLine(
string.Join( ", ", l));
}
}
These library solutions are undoubtedly convenient, but they are limited to the scenario where you want to concatenate your list elements into a single string. Sometimes, however, you would rather output your strings to a file, or you might want to interleave other kinds of things, rather than just strings.
Let’s see then how different languages support a generic solution. Thanks to generators and duck typing Python makes it very easy:
import sys
def insert_separator4(elements, sep):
first = True
for e in elements:
if first:
first = False
else:
yield sep
yield e
if __name__ == "__main__":
for e in insert_separator4([ "string1",
"string2", "string3" ], ", "):
sys.stdout.write(e)
sys.stdout.write( '\n' )
Slightly less so C#, which is constrained by static typing and the subordination of generics to inheritance as a means of generalization:
class InsertSeparator2
{
static IEnumerable<ValueType>
InsertSeparator<ValueType>(
IEnumerable<ValueType> elements,
ValueType sep)
{
bool first = true;
foreach (ValueType e in elements)
{
if (first)
first = false;
else
yield return sep;
yield return e;
}
}
static void Main(string[] args)
{
string[] l = { "string1", "string2",
"string3" };
foreach ( string e in
InsertSeparator<string>(l, ", " ) )
{
System.Console.Out.Write(e);
}
System.Console.Out.WriteLine();
}
}
Note that you need to implement the IEnumerable<T> interface in order to exploit this function.
C++ is also statically typed, but thanks to a form of compile time duck typing supported by templates it doesn’t require an abstract base class:
#include <iterator>
#include <iostream>
#include <string>
template <typename InputIter, typename OutputIter,
typename ValueType>
void insert_separator(InputIter first,
InputIter last,
OutputIter out,
ValueType sep)
{
bool initial = true;
while ( first != last )
{
if ( initial )
initial = false;
else
*out++ = sep;
*out++ = *first++;
}
}
int main()
{
std::string l[] = { "string1", "string2",
"string3" };
insert_separator(l, l + 3,
std::ostream_iterator<std::string>(
std::cout),
", ");
}
At first sight the C++ solution isn’t much better than the C# one. However generalization is not just a matter of language constructs; it is also a matter of conceptual framework. With little effort from our part the function above integrates perfectly with the standard library’s iterators and containers, as well as with any user defined ones, provided they adhere to the standard library constraints.
Finally, there is a solution to this problem which satisfies my quest for elegance, but it’s currently late at night and this post is already long enough as it is
November 8, 2009 at 7:12 am |
Hi Nick,
your first solution fails when the input array is empty!
My next-to-favourite solution (in Ruby) would be
def join(array, separator)
result = “”
for element in array
result += separator unless result.empty?
result += element
end
result
end
And this would be the algorithm I would use in old school languages like Java or C++. But if you have nice “higher order” methods in your language, you can shorten it to something as cool as:
def join(array, separator)
array.inject {|a,b| a + separator + b} || “”
end
November 8, 2009 at 4:12 pm |
Hi Matt,
You’re right of course. I was more concerned with making my point than with writing production quality code.
Your first solution is more compact than mines whenever you actually do have an underlying accumulator, otherwise you need different means for distinguishing the first iteration.
I have to say that I’m not sure I understand the details of your second solution. Would you care to explain?
November 8, 2009 at 5:27 pm |
Ah yes, the inject trick. Inject takes a block of code and gives it two parameters; the first is the accumulator and the second is each element of the array in turn. Functional programming people call it “fold”.
[1,2,3].inject {|sum, element| sum + element } gives 6.
When the array is empty, inject returns nil. So I add the
… || ”
to cover that case. If inject returns nil, we return empty string.
So there are two “ifs” in the second solution, the first is in the “||” which treats the special case of empty array, and the second is hidden in “inject” which will just return the first element if the array is a singleton.
I think your C# solution would be prettier if you used
if (e == elements.first)
in place of the ugly boolean flag. You need a test for the first case, anyways. One way is to test if the accumulator is empty, the other is to test if the current array element is the first.
Here’s another solution that is more similar to yours. The intertwine function captures the behaviour of joining and might be useful in other places.
require “test/unit”
def intertwine(array, separator)
for element in array
yield separator unless element == array.first
yield element
end
end
def join(array, separator)
result = “”
intertwine(array, separator) do |element|
result += element
end
result
end
class TestLibraryFileName < Test::Unit::TestCase
def test_case_name
assert_equal "", join([], ", ")
assert_equal "x", join(["x"], ", ")
assert_equal "x, y", join(["x", "y"], ", ")
end
end
November 8, 2009 at 6:16 pm |
Thank you for your explanation.
You’re right again about my C# solution; the boolean variable is indeed ugly, but on the other hand it makes it possible to represent the same idea in the same way in all the examples.
There are situations where there isn’t any superior alternative: for instance C++ iterators do not have a notion of first element so in order to check for it an additional loop control variable would be required, so as not to move the starting iterator.