Sunday, October 18, 2009

This blog has moved

Since I'd like to talk about more than just Erlang I decided to move to a new location. Check it out at http://functional-orbitz.blogspot.com

Sunday, September 06, 2009

Impressed with Haskell's concurrency

I have been playing with both Ocaml and Haskell* lately. In Ocaml, I have been rewriting a lot of simple Python scripts I use at work to see how they compare. They tend to be faster and more trustworthy. Ocaml, though, has pretty bad concurrency support. They have some thread library that wraps the OS threads, no real good message passing interface. Even the guys at Jane St have listed that as a complaint**.

So I decided to check out Haskell again. With Haskell I have always thought it was a great language, but could never grok it. That sounds a little weird, but reading Haskell code is simply great. When you figure out what it does, it tends to be very expressive and flexible. After looking at what Haskell offers for concurrency, I think that in theory it could give Erlang a serious run for its money. Here's why:

- Haskell, in general and in my opinion, is a safer language to write code in than Erlang. One can do a lot of work with Dialyzer to verify their code, but they are still somewhat limited. Haskells type system is very expressive and powerful. Haskell also has ADT's, which as far as I know Erlang still does not. For an idea of why ADT's are so powerful when it comes to writing safe code, see the Caml Trading video. And in this case, by 'safe' I mean writing correct code.

- Haskell produces generally fast code. GHC does some impressive optimizations, and supercompilation*** is most likely getting going to be part of GHC in the near future. It should noted that while Haskell can produce fast results, its lazyness can also be unpredictable in the optimizations (both speed and memory) that it performs. This is believed, by many, to be a big drawback to Haskell. I haven't done enough work in Haskell to really say how bad this is but Real World Haskell does give a complete chapter to diagnosing performance problems and optimizing them.

- Haskell's threading implementation uses green threads with a many-to-one mapping to OS threads. This means you can take advantage of multiple cores without modifying your Haskell code. The Erlang VM does this too, with SMP support. The greenthread implementation also appears to be blazing fast. Haskell is number one in the threadring solution on the language shootout. Erlang is about 4x slower. Take from that what you will.

- You can implement Erlang-like best-case programming in Haskell, from what I can tell. Haskell supports Exceptions, and you can throw exceptions to other threads using asynchronous exceptions. This gives you a way to 'link' threads like you can link Erlang processes. It is still programmer driven, so you can make a mistake, point for Erlang.

- STM, while I haven't read up on this, it seems like a pretty great way to handle synchronization between threads.

- Monads. When reading up on concurrency in Haskell, monads seemed to often come up as valuable in ensuring correctness. For example, in an STM transaction, one wants to restrict the transaction from doing things that cannot be rolled back, such as IO. This is done by the 'atomically' function taking an STM monad as input and wrapping the output in an IO monad. What this all means is one can't do IO in the atomically block unless it's unsafe. The ability to restrict what the user can do, when one wants to, with the type system is quite powerful.

That being said, Haskell does have some clear losses next to Erlang. The biggest drawback is the lack of a distributed model. There is distributed Haskell, I have not researched it much but I'm under the impression it is not 'there' yet. Erlang is easy to learn, very easy. Haskell is not. I have found that, in order to write really good Haskell code, one has to keep a lot of stuff in their head at once. Perhaps this is just because I am new but I have found Haskell better to read than to write. Erlang is not like this, while the syntax has some peculiarities to it, it is not hard to pick up and start writing good Erlang. I think Ocaml shares more in common to Erlang in this regard. The hump one needs to get over in order to write solid Haskell code is a real and legitimate reason to not choose Haskell.


All that being said, I admittedly am fairly new to Haskell so these opinions will change over time, I'm planning on putting more effort into writing real projects in Haskell to see how it goes. Needless to say, I am impressed by what Haskell has to offer for concurrency. If I'm factually wrong on anything here, please correct me, this is all based on some reading research I've been doing on Haskell, not my actual experiences.

* When I say Haskell, I really mean GHC here

** Yaron Minsky's Caml Trading lecture, very good! Makes a great, practical argument, for Ocaml - http://ocaml.janestreet.com/?q=node/61

*** Supercompilation http://community.haskell.org/~ndm/downloads/slides-supercompilation_for_haskell-03_mar_2009.pdf

Labels: , ,

Saturday, April 05, 2008

Amanda Bynes is awesome

Not particularly Erlang related but, Amanda Bynes is simply amazing. I just watched Sydney White, while it wasn't as awesome as She's The Man, it did rule.

Amanda Bynes presents a Hollywood image that all girls can look up to as a role model.

All in all, she rocks.

Wednesday, April 02, 2008

Scaling webapps

DISCLAIMER: I am not an expert in web apps and simply have been doing some research on them. Given that, my claims and concerns may be completely unfounded and simply a result of my ignorance of web app design, if so, please let me know.

I have been doing a little bit of research on webapps and I'm not sure how I feel about the standard model. The standard model, as I understand it, works by every incoming http request results in some number of DB requests where the DB is the holder of all state information. The web app itself, whether it be in PHP, or Rails, etc is stateless. I certainly don't doubt that for a large majority of applications this works fine. Take something like the yellow pages which is a Rails application according to this. The standard model seems quite reasonable for this. You have a page that is fairly static, people are mostly doing requests for data which requires walking through some large database and the results are being displayed and not much is happening again until the user initiates another page request.

Web apps are changing though. Today, we expect our web apps to look and feel like native applications, which means snappy responses and updates of it even without the user explicitly reloading the page. We have things like AJAX and Comet-style serverpush. In the end, this means more requests are going to the http server even when the user is not doing anything. The Comet style is pretty neat but from what I can see, polling with AJAX is the most widely used way. My concern is how well the standard model is going to scale as web apps are required to act more and more like native applications. The reason for my concern is because this method is simply polling the DB for state changes on every request and in every other situation polling quite clearly has not scaled. This is the exact reason Comet exists.

For the sake of simplicity, let's imagine that the web app is AJAX with polling. Some portion of the page is going to be asking for updates every 5 seconds. The standard model would have us querying the DB every 5 seconds for that request to see if the state has changed. If you have a couple thousand people on this page that is a lot of work. If the component of the page is changing often for a majority of the people then it's not too big of a deal but if it's not changing often for a majority we are doing a lot of worthless DB calls. I'm sure we would design our database so that this call is hopefully very lightweight but how well does that really scale in the end?

Now, the reason we want our wep app (such as something in Rails) to be stateless is because for each request we can be pushed around to a separate VM. So if we hold onto session state information there is a good chance we might not even get a chance to use it on the next request. On top of that, if you are load balancing between several hosts, you may not even be in a good place to share state information between VMs.

I'm sure almost all of this is not news to anyone who has made it this far in the post. What can be done to make a webapp scale better? I'm not sure but here is my suggestion and hopefully someone with more experience than myself can come back and say if it's a horrible idea or not. Let's say I want to make a web app that will be handling something like instant messaging such as google talk or meebo (which are both using Comet).

For starters, I want this to be fairly real time, when I send a message to someone, I want it to get to them ASAP. Secondly, there will be a lot of interaction between users. Clearly, page-by-page viewing is not going to work here. People can't be refreshing their page every few seconds to see if there is an update. AJAX with polling or Comet are clearly your two choices currently. How should this look after the HTTP request comes in though?

Here is how I am suggesting it should:
Each request should have some sort of session ID. Each session ID will be associated with a process. By process here, it could be an Erlang process or some OS process written in some other language, whatever. The point here is, for a particular session ID, its request will get forwarded to the same process every time for the life time of its session. This way, the process can store all the state information. We don't have to do worry about sharing state information in something like memcache. The process would have some sort of timeout value so they die off eventually. At worst case, we are back to the standard model where if a user does multiple requests that happen to have a pause between them longer than the timeout we now have to initiate a new process and it has to do the DB calls to initiate itself. In the best case though, we are consistently getting sent to the same process which is holding onto our data. The upside to this method is the process can also do things specific for that user such as opening up other connections. For instance, in the instant messaging example, a user logs in, they get a session ID and a process created for them somewhere that will be mapped to this session ID. The process opens up a connection to an event server for that user so it can listen for IMs and push IM's out. We now have an application that is quite event based on all sides of it. We won't be hitting the DB too often. Given that we don't really care where this process lives, we can also scale it out to multiple machines and not have to worry about replicating the same data over many machines because we don't know where the users request will go to.

Downsides?
Certainly. Clearly if we have 2000 people using our IM application at once, we need to have 2000 processes a live. If we wrote this in Erlang where each process for a session ID maps to an Erlang process (sorry about the terminology here all)? That's childsplay. We could host this application on one machine! But not everyone wants to write things in Erlang, so what about them? If I were in this situation, I would probably have some machines each running some amount of applications that will be doing some I/O multiplexing to handle this. So you would have some amount of OS processes and each one can handle some amount of session ID's. Pretty standard for something that isn't Erlang but needs to handle a bunch of things at once. If you are into Python, Twisted comes to mind. I'm sure other languages have their own way of doing this.

Another edge case here, and I think this is probably not too hard to deal with, is what if an event comes in (such as an IM) and the process times out due to lack of activity? You could have each event have a timeout and if it is not ACK'd in that time it gets saved to a DB and the next process that is created for that user picks it up as part of its initiation.


To reiterate, I do not have much experience in webapps, I simply have been doing some research and this is the impression that I have gotten. Are my concerns valid? Is the system I described what people are already going to or is it broken? How would someone write Meebo or google talk? Let me know.

Labels: , , , , , , ,

Wednesday, March 19, 2008

Back in the saddle

So I have been working on Wall Street for the last year+ and last week decided to quit my job. And non-too-soon, as my job will most likely no longer exist in a few months if you have been following the news over the past few weeks.

I am going back to school, moving out of NYC and off to Maryland so over this summer I should have a lot of free time and I am working on possibly being involved in sort of a freelance project in order to make money (details are still pending). Anyways, the current design of the project has a typical web-app interface, then a manager layer that encapsulates the database. The manager layer handles caching data and moving it back to the database when needed, and event dispatching, blah blah blah.

Immediately I think this is the perfect use case for Erlang. We want it to be fault tolerant of course (the downside of a web-app is when things go wrong it affects everyone) and be able to handle a lot of data moving back and forth (although I'm unsure of the specifics since this is so early). Basically this sounds like a standard Erlang app with mnesia as the cache most likely, spanning a few nodes and moving data back to a DB in a write-behind method.

Some people have a bit of a concern about Erlang, how will we find developers and so on, which are perfectly valid. My response to that is:
Erlang is such a simple language it does not take much time to learn, and while OTP is not as simple, it does not take much time to learn either. In the end, one needs to be less of an expert in Erlang to get an equal or better application (in terms of Erlang's strengths) than they would have to be in another language such as Java.

One possible alternative language being considered is Java because of JBoss. I haven't looked into JBoss in-depth yet, but at a quick glance it looks like it has some really nice and really mature features. The JMS implementation sounds pretty solid and the clustering. Everyone knows Java, or at least puts it on their CV, but this sounds misleading. How good does one need to be in Java in order to not make a mess of an application written using JBoss compared to Erlang? My opinion is that the way one learns Erlang is fairly similar to how they would write production software with Erlang, but perhaps not the same for Java. The features we use in our 'Hello World' are the same that we use in a production environment, this is not true of Java in my experience.

We are still looking into things but I'm currently hoping for Erlang.

Friday, January 25, 2008

Erlang Job Advert

Well, it's been over a year, sorry. Today I received an email with an Erlang job offer for a company based in Boston. I don't know anything about the company at all, perhaps it is junk, but the project seems like something Erlang is good at. It's a telecommuting job, so that might raise a red flag with some people. If you are interested I'll put you in touch with the recruiter or whatever he is, email me at orbitz AT ortdotlove.net (that is really ortdotlove.net, not fancy spelling for ort.love.net).

Here are the job details:
Job Description:

Our client, who is based in Boston, is seeking an adept Erlang developer to help build a next generation metrics and monitoring system on the GNU/Linux platform. This system will watch and report on hundreds and thousands of systems, ranging from network devices to servers to software applications.



Required Skills:

· Must have experience with the Erlang programming language

· Must have expert level network programming experience, preferably in Erlang, C++ and/or C

· Strong background in developing on GNU/Linux for mission-critical production deployments

· Strong understanding of MySQL 4.1-5.x, including database design patterns

· Strong experience programming network servers/clients, including knowledge of fundamental protocols such as TCP, UDP, IPV4/6, and SSL/TLS

Monday, November 20, 2006

Cryptogram solver

Newspaper puzzles beware. I have been working on soving newspaper cryptograms for the past few days in my free time. It is a bit more interesting of a problem to solve than sudoku I think. The algorithm I've implementing is nothing alltogether impressive and it relies on a good dictionary file to get solutions.

The algorithm works like so:
First you need an index which is a map of abstract words to lists of words.
An abstract word is taking a word and turning it into a pattern, for instance, 'cat' has the pattern '123', as does 'hat', and 'fat'. 'mom' has the pattern '121'.

Then it takes the sentence you have given it, splits it on spaces.
Popping the first word off the list, finds the list in the index of possible words it could be, iterates over it generating a map of letters to letters and recurses on the rest of the sentence, once it has reached the end of the sentence it puts the map it has generated into a list of possible solutions. If a generated map for a word conflicts with teh current map it is not a valid solution and moves onto the next possible solution.
The solve function returns a list of solution maps that can eb applied to any sentence to get the result.

Little things that make it helpful include being able to give an intial map, if you are sure some letters map to other letters you can give that as a hint. It can also remove any words whos pattern does not appear in the map.

Todo:
Solve only those subsets of words that, if solved, will result in the entire alphabet being solved. It should do this for all possible combinations of words in teh sentence that will result in this. This is not an optimizations, it should probably make it take longer to solve actually, however it allows it to solve sentences with words that might not exist in the dictionary but could come about as a result of solving the other words. I have a few ideas of how to do this but havn't had time to implement it yet.

The current code can be downloaded here.