Lisp for Erlangers

I was always curious about Lisp family of programming languages.
There are two reasons for that.
Firstly, there is Scheme[1], Closure[2], Common Lisp[3], Shen[4]
and they come in different flavors,
so there must be some really valuable property in Lisp,
that people want to have in their language.
Secondly, many people say, that you can “expand your programming horizons”
and that the best way to do it, is to learn about Lisp macros[5].

There is one book, that is particularly about Lisp and its macros
– “On Lisp” by Paul Graham[6].
I am an Erlang programmer, so chapters 2 – 5 were pretty basic for me.
Functions as data? Used every day. Closures? Easy.
Interactive shell? Can’t live without it now!
Recursion? Even looping over list is done this way.
Chapter 6 is called “Functions as Representation”
and I was curious, what does it mean,
so instead of jumping straight to “Macros” (chapter 7),
I started reading how can you use functions to model networks.

Simple problem was chosen to illustrate the idea:
write a 20 questions game.
User has to think of a person and then he
is asked question with only “yes” or “no” answers about that person.
Based on the answer, he is either asked another question
or he gets the person name from the program.

The tree of questions and answers was represented as a list of nodes.
Based on that data structure, there were two ways to solve the problem:
1. The usual way, where each time, we search for the node in a list,
ask question and based on the answer, we search for the new node.
2. The clever way, where we first preprocess the list to create a tree of functions.

As an exercise, I wanted to implement it in Erlang
and this is what I came up with:

-module(test).

-compile([export_all]).
%% node is either:
%% {name, question, node for yes answer, node for no answer}
%% {name, answer, undefined, undefined}
%% undefined is a special value for not implemented parts of the tree
list_of_nodes() ->
    [
     {people, "Is the person a man?", male, undefined},
     {male, "Is he living?", undefined, deadman},
     {deadman, "Was he American?", us, undefined},
     {us, "Is he on a coin?", coin, undefined},
     {coin, "Is the coin a penny?", penny, undefined},
     {penny, "Lincoln!", undefined, undefined}
    ].

usual_way() ->
    Nodes = list_of_nodes(),
    run_nodes(people, Nodes).

%% searches for node and runs actual code
run_nodes(NodeName, Nodes) ->
    Node = get_node(NodeName, Nodes),
    run_nodes1(Node, Nodes).

%% if both "yes node name" and "no node name" is undefined
%% this must be a leaf with answer
run_nodes1({_Name, Answer, undefined, undefined}, _Nodes) ->
    Answer;
%% ask question and run recursively
run_nodes1({_Name, Question, YesNodeName, NoNodeName}, Nodes) ->
    {ok, [Answer]} = io:fread(Question, "~s"),
    case Answer of
        "yes" -> run_nodes(YesNodeName, Nodes);
        _ -> run_nodes(NoNodeName, Nodes)
    end.

%% searching for undefined? it is either not implemented
%% or we simply don't know
get_node(undefined, _Nodes) ->
    {undefined, "I have no idea!", undefined, undefined};
%% if there is no such NodeName - we made a spelling mistake
%% otherwise, return the node
get_node(NodeName, Nodes) ->
    case lists:keyfind(NodeName, 1, Nodes) of
        false ->
            invalid_node_name;
        Node ->
            Node
    end.

clever_way() ->
    Nodes = list_of_nodes(),
    RootFun = compile_net(people, Nodes),
    RootFun().

%% After compiliation a node is a fun(),
%% which has pointers to other nodes.
compile_net(Root, Nodes) ->
    %% it uses get_node as the previous one
    Node = get_node(Root, Nodes),
    case Node of
        %% if this is leaf node,
        %% we have to return Answer, but it has to be wrapped with a fun()
        {_Name, Answer, undefined, undefined} ->
            fun() -> Answer end;
        %% if this is some other node,
        %% we immidietly call compilie_net on both children
        %% this means, that we will process the whole tree
        {_Name, Question, YesNodeName, NoNodeName} ->
            YesFun = compile_net(YesNodeName, Nodes),
            NoFun = compile_net(NoNodeName, Nodes),
            %% this entire fun() is returned
            %% it asks the question when called
            %% but it doesn't have to look for the node in the node list,
            %% because nodes were precalculated above,
            %% even in other scope, YesFun and NoFun
            %% will still point to right functions
            %% based on the answer, it runs one of the nodes
            fun() -> 
                    {ok, [Answer]} = io:fread(Question, "~s"),
                    Fun = case Answer of
                              "yes" ->
                                  YesFun;
                              _ ->
                                  NoFun
                          end,
                    Fun()
            end
    end.

You can see, that the clever way seems to be way more complicated.
One of the reasons for writing this program was to show,
that you can represent data with closures.
What are the advantages of this representation?
Firstly, it runs faster after compiling
(from now, by “compiling”, I mean calling “compile_net”).
Usual implementation searches for the node on the list each time,
user is asked a question.
In balanced tree, with N elements, we have to ask log(N) questions,
so the runtime of one game is O(N*logN).
Clever implementation takes only O(logN) to run single game,
because all the nodes are found earlier.
The compilation itself has to search entire node list for each node it compiles,
so it takes O(N^2).

This pattern of precomputing something expensive first,
so that next calculations can be performed faster, is really important.
Think of LU decomposition[7] for solving systems of linear equations.
You need O(n^3) operations to use Gaussian elimination[8],
but you can firstly decompose the matrix, which also takes O(n^3) operations,
but after that, you can solve systems of linear equations
for different vectors in O(N^2) time.

Second advantage of precompilation is finding bugs earlier.
In the usual way, if I misspell a node name,
I’ll get an error in runtime.
In the clever way, I’ll get the error during compilation.
This is huge win for me.

Last, but not least – at the end of the chapter,
you can read:

“To represent with closures” is another way of saying “to compile”
and since macros do their work at compile-time,
they are a natural vehicle for this technique.

For now, I don’t fully understand why representing with closures
is the same as compilation.
I can see, that is useful for precomputing things.
During my studies at AGH University of Science and Technology[9],
I had entire semester about compilers,
scanners, parsers, grammars and final code optimization,
but I have never fought about it this way.

This makes me even more interested in Lisp macros!

[1] http://groups.csail.mit.edu/mac/projects/scheme/
[2] http://clojure.org/
[3] http://www.clisp.org/
[4] http://shenlanguage.org/
[5] http://www.paulgraham.com/icad.html
[6] http://www.paulgraham.com/onlisp.html
[7] http://en.wikipedia.org/wiki/LU_decomposition
[8] http://en.wikipedia.org/wiki/Gaussian_elimination
[9] http://www.agh.edu.pl/en

Advertisements

Why Erlang modules have long names or how to troll Erlang developer?

Yesterday, my friend, who is learning Erlang, asked me to show him, how to use funs in Erlang. I could have just typed the answer in Adium window, but I like to be sure, that everything, I am sending always compiles and works, so I quickly created Erlang module, scribbled an example and tried to compile it.

While in a rush, I didn’t think of a file name and just named the file file.erl. When I compiled it, I got an error saying:

121> c(file).
{error,sticky_directory}

=ERROR REPORT==== 20-Sep-2014::10:08:33 ===
Can't load module that resides in sticky dir

Sticky dir is something with file permissions, right?
So I quit the Erlang shel, checked the permissions and restarted it:

122> q().
ok
$ erl
Erlang R16B02 (erts-5.10.3)  [smp:2:2] [async-threads:10] [hipe] [kernel-poll:false]

{"init terminating in do_boot",{undef,[{file,path_eval,[[".","/Users/tomaszkowal"],".erlang"],[]},{c,f_p_e,2,[{file,"c.erl"},{line,474}]},{init,eval_script,8,[]},{init,do_boot,3,[]}]}}

Crash dump was written to: erl_crash.dump
init terminating in do_boot ()

WTF?! I am opening fresh Erlang shell and it crashes?! Did I just broke the ErlangVM?! How? And then, while reading the error, it struck me: “.” is in the path, so my module called file overshadowed file module, which is used during boot to search for .erlang and execute its contents [1].

Of course, I didn’t come up with that idea during the first reading. Somehow my brain did not associate the file from error message with file that I just created. They were in different contexts, because error is from the guts of ErlangVM and my module is just 4 lines of code including module declaration and exports.

So – to answer the question from post title: Send an Erlang developer module called file and let him compile it. If he or she is not careful enough to delete the .beam file, he/she won’t be able to use erl in this directory and he/she will get cryptic message, that puzzled couple of people [2]. It took me couple of minutes to realise how stupid I was, so maybe someone will fall for it too!

This also explains, why in most Erlang applications module names are so long and prefixed with application name. There are no namespaces in Erlang, so all modules should have unique names. It is not as bad as it seems. I am working with Erlang for a couple of years now and it was the first time, I had this kind of problem. Next time, I’ll name my file asdf.

[1] http://erlang.org/doc/man/erl.html#id168387
[2] http://erlang.2086793.n4.nabble.com/Installing-a-module-from-code-td2113022.html

Erlang OTP gen_server boilerplate.

gen_server [1] is the most basic OTP behaviour. It is also very convenient and used in almost every bigger project. Nevertheless, people sometimes ask: “Why there is so much boilerplate? [2]”. Usually, we can find three sections in the module implementing gen_server:

1. At the top are the API functions to the server
2. In the middle, there are callbacks,
that you have to specify for gen_server behaviour.
3. At the end, there are helper functions.

Useful stuff happens in the second section. There you can see actual operations on the data and state management. So why do we want repeat everything, that is already in handle_call and handle_cast to provide API? Why gen_server cannot generate the API for us?

Lets take an example from Learn You Some Erlang[3]: a gen_server for storing cats. It is described in detail here [4].

-module(kitty_gen_server).
-behaviour(gen_server).

-export([start_link/0, order_cat/4, return_cat/2, close_shop/1]).
-export([init/1, handle_call/3, handle_cast/2, handle_info/2,
         terminate/2, code_change/3]).

-record(cat, {name, color=green, description}).

%%% Client API
start_link() ->
    gen_server:start_link(?MODULE, [], []).

%% Synchronous call
order_cat(Pid, Name, Color, Description) ->
   gen_server:call(Pid, {order, Name, Color, Description}).

%% This call is asynchronous
return_cat(Pid, Cat = #cat{}) ->
    gen_server:cast(Pid, {return, Cat}).

%% Synchronous call
close_shop(Pid) ->
    gen_server:call(Pid, terminate).

%%% Server functions
init([]) -> {ok, []}. %% no treatment of info here!

handle_call({order, Name, Color, Description}, _From, Cats) ->
    if Cats =:= [] ->
        {reply, make_cat(Name, Color, Description), Cats};
       Cats =/= [] ->
        {reply, hd(Cats), tl(Cats)}
    end;
handle_call(terminate, _From, Cats) ->
    {stop, normal, ok, Cats}.

handle_cast({return, Cat = #cat{}}, Cats) ->
    {noreply, [Cat|Cats]}.

handle_info(Msg, Cats) ->
    io:format("Unexpected message: ~p~n",[Msg]),
    {noreply, Cats}.

terminate(normal, Cats) ->
    [io:format("~p was set free.~n",[C#cat.name]) || C <- Cats],
    ok.

code_change(_OldVsn, State, _Extra) ->
    %% No change planned. The function is there for the behaviour,
    %% but will not be used. Only a version on the next
    {ok, State}. 

%%% Private functions
make_cat(Name, Col, Desc) ->
    #cat{name=Name, color=Col, description=Desc}.

Not a single line of code was changed, the only thing, I added is background color.

The code with green background is executed in the client process. Sometimes, it is easy to think about all code in gen_server module as code, that runs on the server side, but this is wrong.

The API functions are called in the client process and client process sends messages to the server.
Why is it important?

Lets look closer at return_cat function. Second parameter must be a valid cat record. If you try to call something like this:

return_cat(Pid, dog).

your client will crash, but if you call it like this:

gen_server:cast(Pid, {return, dog}).

your server will crash.

This makes huge difference. “Let it crash” philosophy provides confidence, that errors will not propagate, so it is really important to crash as fast as possible. If you can do some validation on the client side – do it. Let the programmers know, that it is client, that sent bad data and not gen_server, that has a bug.

What is even more important, you will not loose the precious state. Sometimes, it is good to crash the gen_server. For example, when somehow its internal state became invalid. I had a gen_server, that was a frontend to database connection. It kept the connection in its state. If some operation failed, the server crashed and supervisor tried to restart it couple of times and create a new connection in init. Failed operation, usually required reconnecting, so it worked, if the connection problem was temporary, but if the database was down permanently, supervisor crashed itself and shut down entire application. But more often than not, you would like to preserve the state and you wouldn’t like to let invalid data to contaminate it.

So the API functions, which at first glance look like boilerplate are really important for your application.

[1] http://www.erlang.org/doc/man/gen_server.html
[2] http://en.wikipedia.org/wiki/Boilerplate_code
[3] http://learnyousomeerlang.com/
[4] http://learnyousomeerlang.com/clients-and-servers

Functional JavaScript – passing additional arguments to callback.

This post will show perfect use case for closure and higher order functions.

In work, I am developing websites in Erlang, but sometimes, there are things, that just have to be done on client side using JavaScript.
I knew JavaScript basics, but I wanted to truely know, what I am doing, so I picked up “Programming JavaScript Applications” [1].
For a quick reference, i also use JavaScript “JavaScript: The Good Parts” [2].
I like to get my hands dirty early, so I decided to make a simple game.
Searching for resources, I’ve found tutorial on building game main loop [3].

The main function looked like this:

var mainloop = function() {
    updateGame();
    drawGame();
};

and it was called like this:

var animFrame = window.requestAnimationFrame
var recursiveAnim = function(timestamp) {
    mainloop();
    animFrame( recursiveAnim );
};
animFrame( recursiveAnim );

Actually, it was more complicated, but it is not important for making my point about functional programming.
If you want to see the details, check out orignal blog post [3].

There are two things, you have to know:

1. animFrame [4] takes a callback, that you want to call before repeainting the window.
This callback takes exactly one argument – a timestamp.

2. mainloop, updateGame and drawGame take no parameters,
so they just operate on variables, that are defined out of their scope.

I don’t like second point, I would like to write functions,
that don’t depend on external state – pure functions [5].
They are easier to test and reason about.
For example, updateGame function could take game state as an argument
and return new game state.
drawGame should also take canvas context and game state as arguments.
It won’t be pure, because it will draw to the canvas,
but at least, it will always work the same way, given the same inputs.

In short, I wanted something like this:

var mainloop = function mainloop(gameState, ctx) {
    var newGameState = updateGame(gameState);
    drawGame(newGameState, ctx);
    return newGameState;
};

But there is a problem: how to pass it to animFrame?
animFrame doesn’t know anything about game state or other variables,
that I want to pass to the mainloop.
This is a perfect job for closures [6].

I can create a function, that takes exactly one argument,
but “carries” two more from otuer scope:

function (timestamp) {
    //uses arguments from outside its scope
    //use only inside other function, that has gameState and ctx defined
    recursiveAnime(timestamp, gameState, ctx);
}

and use it like this:

var recursiveAnim = function(timestamp, gameState, ctx) {
    var newGameState = mainloop(gameState, ctx);
    animFrame( function (timestamp) {
        recursiveAnime(timestamp, newGameState, ctx);
    });
};

// start the mainloop
animFrame( function (timestamp) {
    recursiveAnime(timestamp, newGameState, ctx);
});

This allows passing additional arguments to callback
without exposing newGameState or ctx to any other code.
Its already great, but it can be better.
Lets extract the pattern, that wraps three argument function into one argument function:

var toRecursiveAnim = function toRecursiveAnime(callback, gameState, ctx) {
    return function(ts) {
        callback(ts, gameState, ctx);
    };
};

This one takes callback and two additonal arguments
and creates one argument function, which
calls given callback with those arguments.
Now, the code can be simplified:

var recursiveAnim = function(timestamp, gameState, ctx) {
    var newGameState = mainloop(gameState, ctx);
    animFrame( toRecursiveAnim(recursiveAnim, newGameState, ctx) );
};

// start the mainloop
animFrame( toRecursiveAnim(recursiveAnim, gameState, ctx) );

JavaScript allows extracting patterns and writing code, that is really easy to test.
This example showed good use case for closures and higher order functions in JavaScript.
Hope, you enjoyed it :)

[1] http://chimera.labs.oreilly.com/books/1234000000262/index.html
[2] http://it-ebooks.info/book/274/
[3] http://www.playmycode.com/blog/2011/08/building-a-game-mainloop-in-javascript/
[4] https://developer.mozilla.org/en-US/docs/Web/API/window.requestAnimationFrame
[5] http://en.wikipedia.org/wiki/Pure_function
[6] http://stackoverflow.com/questions/111102/how-do-javascript-closures-work