Sunday, September 06, 2009

Impressed with Haskell's concurrency

I have been playing with both Ocaml and Haskell* lately. In Ocaml, I have been rewriting a lot of simple Python scripts I use at work to see how they compare. They tend to be faster and more trustworthy. Ocaml, though, has pretty bad concurrency support. They have some thread library that wraps the OS threads, no real good message passing interface. Even the guys at Jane St have listed that as a complaint**.

So I decided to check out Haskell again. With Haskell I have always thought it was a great language, but could never grok it. That sounds a little weird, but reading Haskell code is simply great. When you figure out what it does, it tends to be very expressive and flexible. After looking at what Haskell offers for concurrency, I think that in theory it could give Erlang a serious run for its money. Here's why:

- Haskell, in general and in my opinion, is a safer language to write code in than Erlang. One can do a lot of work with Dialyzer to verify their code, but they are still somewhat limited. Haskells type system is very expressive and powerful. Haskell also has ADT's, which as far as I know Erlang still does not. For an idea of why ADT's are so powerful when it comes to writing safe code, see the Caml Trading video. And in this case, by 'safe' I mean writing correct code.

- Haskell produces generally fast code. GHC does some impressive optimizations, and supercompilation*** is most likely getting going to be part of GHC in the near future. It should noted that while Haskell can produce fast results, its lazyness can also be unpredictable in the optimizations (both speed and memory) that it performs. This is believed, by many, to be a big drawback to Haskell. I haven't done enough work in Haskell to really say how bad this is but Real World Haskell does give a complete chapter to diagnosing performance problems and optimizing them.

- Haskell's threading implementation uses green threads with a many-to-one mapping to OS threads. This means you can take advantage of multiple cores without modifying your Haskell code. The Erlang VM does this too, with SMP support. The greenthread implementation also appears to be blazing fast. Haskell is number one in the threadring solution on the language shootout. Erlang is about 4x slower. Take from that what you will.

- You can implement Erlang-like best-case programming in Haskell, from what I can tell. Haskell supports Exceptions, and you can throw exceptions to other threads using asynchronous exceptions. This gives you a way to 'link' threads like you can link Erlang processes. It is still programmer driven, so you can make a mistake, point for Erlang.

- STM, while I haven't read up on this, it seems like a pretty great way to handle synchronization between threads.

- Monads. When reading up on concurrency in Haskell, monads seemed to often come up as valuable in ensuring correctness. For example, in an STM transaction, one wants to restrict the transaction from doing things that cannot be rolled back, such as IO. This is done by the 'atomically' function taking an STM monad as input and wrapping the output in an IO monad. What this all means is one can't do IO in the atomically block unless it's unsafe. The ability to restrict what the user can do, when one wants to, with the type system is quite powerful.

That being said, Haskell does have some clear losses next to Erlang. The biggest drawback is the lack of a distributed model. There is distributed Haskell, I have not researched it much but I'm under the impression it is not 'there' yet. Erlang is easy to learn, very easy. Haskell is not. I have found that, in order to write really good Haskell code, one has to keep a lot of stuff in their head at once. Perhaps this is just because I am new but I have found Haskell better to read than to write. Erlang is not like this, while the syntax has some peculiarities to it, it is not hard to pick up and start writing good Erlang. I think Ocaml shares more in common to Erlang in this regard. The hump one needs to get over in order to write solid Haskell code is a real and legitimate reason to not choose Haskell.


All that being said, I admittedly am fairly new to Haskell so these opinions will change over time, I'm planning on putting more effort into writing real projects in Haskell to see how it goes. Needless to say, I am impressed by what Haskell has to offer for concurrency. If I'm factually wrong on anything here, please correct me, this is all based on some reading research I've been doing on Haskell, not my actual experiences.

* When I say Haskell, I really mean GHC here

** Yaron Minsky's Caml Trading lecture, very good! Makes a great, practical argument, for Ocaml - http://ocaml.janestreet.com/?q=node/61

*** Supercompilation http://community.haskell.org/~ndm/downloads/slides-supercompilation_for_haskell-03_mar_2009.pdf

Labels: , ,

5 Comments:

Anonymous Anonymous said...

I'd be curios to get your take on mythryl: http://mythryl.org/

thx!

07 September, 2009 05:05  
Blogger Jake McArthur said...

Yes, Haskell does have exceptions, but I and a lot of others discourage exceptions whenever possible. They are not good functional style, in my subjective opinion. However, things like the Maybe or Error monad also allow you to do best-case programming as long as you are willing to use some monadic combinators in place of purely functional combinators. (Thinking about it, I hate the terminology for that distinction. There is nothing about monads that is inherently impure.)

I disagree with your deeming of laziness as unpredictable in optimizations. I have mistakenly introduced laziness or strictness to a function before, which caused problems until I figured out what I did, but it was never the fault of the compiler. My own mistakes have dramatically dropped in frequency now that I have internalized most of what (I think) there is to know about programming with laziness.

I definitely agree with the lack of a reasonable distributed model being a negative for Haskell. It also turns out that Haskell's semantics doesn't lend itself to easy distributed execution, either. For example, the behavior of Int is platform specific (and a source of impurity, by my perhaps overly-strict definition of purity). I think the only manageable way to get distributed execution would be with some sort of embedded DSL, which is a shame considering the potential we could have if only we would guarantee that our pure computations return the same things on any machine.

I've not found that there is much you must keep in your head when coding in Haskell. The main problem for beginners is realizing that when you are working with an abstraction, the point is to pretend it's not there. Haskell has a lot of abstractions, and a lot of things can happen, operationally, in very few characters, so the only way to manage that complexity is to treat the abstractions as real abstractions.

07 September, 2009 11:02  
Blogger StoneCypher said...

Try Mythryl.

07 September, 2009 13:13  
Blogger Greg said...

From my experience (both learning and teaching Haskell) it doesn't take any more effort to learn to write code in Haskell to the same quality as you write in another language (though obviously you're learning slightly different stuff - there are numerous code smells Haskell lacks support for ;) but Haskell gives you a lot of opportunities to learn a little _more_ than that and write even _better_ code.

07 September, 2009 22:46  
Blogger Isaac Gouy said...

... take advantage of multiple cores without modifying your Haskell code. ... Haskell is number one in the threadring solution on the language shootout. Erlang is about 4x slower.


Note the faster thread-ring programs only use one core - check the ≈ CPU Load column.

Thread-ring is just about task-switching.

17 June, 2010 13:13  

Post a Comment

<< Home