Tuesday, August 16, 2005

Erlang and strings

A lot of people complain about strings in erlang. I am one of them. Right now I am under the impression a string type needs to be added to erlang. Joe Armstrong thinks that instead of a string we simply need a character type. My complaint with that is there is no decent container for the character type. In erlangs we have tuples, binaries, and lists. Tuples are meant to store a fixed number of objects and do not have operations on them to perform operations such as iterate through them. The element/2 function allows you to access an index of a tuple but that isnt' very useful for iterating through. Binaries might be nice, but there are no functions to nicely deal with binaries as strings. Lists are what we currently have and I am not pleased. Using a linked list to store a string certinaly seems unreasonable in any other language. You have to store a character value and a link to the next node. This is a lot of memory for a string. The other problem with these containers is none of them allow O(1) access to indecies as far as I know. Am I wrong here? I suppose the question then is, is that a problem? I am under the impression that one generally wants O(1) access. For instance, if you have an index in the string and need to access it and surrounding indecies repeatedly.

http://schemecookbook.org/view/Erlang/StringBasics has a quote supposedly from the sendmail people:
But Erlang's treatment of strings as lists of bytes is as elegant as it is impractical. The factor-of-eight storage expansion of text, as well as the copying that occurs during message-passing, cripples Erlang for all but the most performance-insensitive text-processing applications.

This is in reference to their load balancing software. Is this true? I am inclined to think that it certinaly uses a lot of memory used up, but most text-processing is going to require touching every character in a string anyways won't it? What exactly is performance intensive text-processing? Does anyone have any ideas? If one is going to be iterating through the string they can use a binary to store the byte values. The problem with this is that none of the string functions work on binaries. I'm under the impression a string container type will solve some problems. Using integers as the character values seems like a fine idea to me, as people like to point out it makes dealing with unicode slightly easier.

What problems would having a character type solve? Maybe in the morning I'll be able to think of something.

0 Comments:

Post a Comment

<< Home