Observer in Erlang/Elixir release

When I teach OTP I always show observer application. Observer is a graphical user interface capable of displaying supervision trees and providing information about processes. It is really cool to show how processes get restarted and what structure they follow. It is also suitable for using with production systems too!

Erlang example for teaching purposes

You can start observer by typing this inside Erlang shell.

observer:start().
observer app
observer app

Cowboy and ranch are great examples of using OTP principles in action.

WARNING: You need at least Erlang 18 installed to work with cowboy examples from master.

git clone https://github.com/ninenines/cowboy.git
cd cowboy/examples/rest_hello_world
make run
(rest_hello_world_example@127.0.0.1)1 observer:start().
** exception error: undefined function observer:start/0

You can easily start the example but not observer.

Cowboy uses erlang.mk to build the apps. It automatically pulls dependencies, compiles code and generates a release. Releases are self contained packages that include Erlang. They can be copied to another server and they should run fine even if Erlang is not installed there (as long as the other server has the same CPU architecture). They also strip all unnecessary applications to make the final output lightweight. This includes observer.

Releases are great for deploying to production but for teaching purposes I would like to use observer with the generated release. How can I do that? After running make run for the first time, ebin directory is created. Edit file ebin/rest_hello_world.app like this:

{applications, [kernel,stdlib,cowboy,runtime_tools,wx,observer]},

Those three applications: runtime_tools, wx and observer are all you need to launch observer gui. Just type make run again. For producation systems you probably don’t want to include entire observer with wx widgets. It would be better to leave out the graphical part and just add bare minimum to the release so we can inspect it from outside. Lets do it this time with an Elixir example.

Elixir example for production purposes

We can apply similar trick. Lets build an Elixir release using exrm. After adding it as dependency we can run:

MIX_ENV=prod mix compile
MIX_ENV=prod mix release
rel/my_app_name/bin/my_app_name console

This should result in something like this:

iex(my_app_name@MacBook-Pro-Tomasz)1 Node.get_cookie
:my_app_name
iex(my_app_name@MacBook-Pro-Tomasz)2 :observer.start
** (UndefinedFunctionError) undefined function :observer.start/0 (module :observer is not available)

As before our release stripped all additional stuff. This time instead of adding all three applications needed to start observer, lets only add bare minimum that will enable inspecting running node from outside. Inside mix.exs application section add :runtime_tools:

applications: [:phoenix, :phoenix_html, :cowboy, :logger, :gettext,
               :phoenix_ecto, :postgrex, :runtime_tools]]

Now repeat those three steps:

MIX_ENV=prod mix compile
MIX_ENV=prod mix release
rel/my_app_name/bin/my_app_name console

But this time open another iex console starting completely different Erlang node:

iex --sname watcher --cookie my_app_name
iex(watcher@MacBook-Pro-Tomasz)1 :net_adm.ping :"my_app_name@MacBook-Pro-Tomasz"
:pong
iex(watcher@MacBook-Pro-Tomasz)2 :observer.start
:ok

Now in observer choose from menu Nodes and then my_app_name@MacBook-Pro-Tomasz. Adding just one application to our release (the :runtime_tools) enabled us to connect to our release from outside and start inspecting it! Runtime tools is an application that delivers “low footprint tracing/debugging tools suitable for inclusion in a production system”.

If you want to play with it, try going to Applications tab. You can right click on any process and kill it with arbitrary reason. What happens with children PIDs?

Advertisements

Failing fast and slow in Erlang and Elixir

I am recently teaching programming with Elixir and Phoenix in Kraków. During classes, I saw, that new Erlang and Elixir programmers have problems with concept of “failing fast”, so I’ll explain it with examples, but first I need to show you…

The Golden Trinity of Erlang

Torben Hoffman in many of his webinars about Erlang programming language mentions “The Golden Trinity of Erlang”
ndc-london-2014-thinking-like-an-erlanger-9-638

The three principles are:

  • Fail fast
  • Share nothing
  • Failure handling

There are many articles explaining how sharing nothing is great for concurrency. Failure handling is usually explained when teaching OTP supervisors. But what does it mean to “fail fast”?

Failing fast principle

“Fail fast” principle isn’t exclusive to Erlang. In agile methodologies, it expands to:

  1. don’t be afraid to try something new;
  2. evaluate it quickly;
  3. if it works – stick with it;
  4. if not – abandon it fast, before it sucks too much money/energy.

This business approach translates almost directly to programming practice in Erlang.

Happy path programming

When I write in Erlang, I usually don’t program for errors. I can treat most errors or edge cases as if they don’t exist. For example:

{ok, Data} = file:read_file(Filename),
do_something_with_data(Data)

There is no code for handling situation, where file does not exist or I don’t have permissions to open it. It makes the code more readable. I only specify business logic instead of myriad of edge cases.

Of course, I can match on some errors. Maybe I want to create a file if it doesn’t exist. But in that case it becomes application logic, so my argument about not programming for edge cases still holds.

case file:read_file(Filename) of
  {error, enoent} -> create_file();
  {ok, Data} -> do_something_with_data(Data)
end,

This style of programming is called “happy path programming”. It doesn’t mean, that I don’t anticipate errors. It just means that I handle them somewhere else (by supervision trees and restarting).

Failing fast case study 1 – reading from a file

This style of programming requires the code to fail quickly, when problem occurs. Consider this code:

{_, Data} = file:read_file(Filename),
do_something_with_data(Data)

Reading the file could actually return {error, Reason} and then I treat the Reason atom as Data. This propagates the error further, where it is harder to debug and can pollute state of other processes. Erlang is dynamically typed language, so do_something_with_data/1 can pass atom many levels down the call stack. The displayed error will say, that it can’t treat atom as text and the bug gets tricky to find. Even functions, that are used purely for their side effects should match on something to check, if they worked, so instead of:

file:write_file(FileName, Bytes)

it is usually better to use:

ok = file:write_file(FileName, Bytes)

Failing fast case study 2 – calling gen_server

It is even more important to fail before sending anything wrong to another process. I once wrote about it in this blog post. Sending messages using module interfaces helps keep the damage made by errors contained. Crashing caller instead of server is “quicker”, so it doesn’t contaminate application state. It allows failure handling strategies much simpler than preparing for all possible edge cases. Most of the time those strategies are based on restarting processes with clean state. Processes and computation are cheap and can be restarted, but data is sacred.

Case study 3 – tight assertions

Lets consider another example. A test suite:

ok = db:insert(Value),
Value = hd(db:get(Query))

It tests database code by inserting single value to empty database and then retrieving it. However, if we assume, that the database was empty before test execution, we can make sure, that id doesn’t return anything else. Second line above is equivalent to:

[Value | _] = db:get(Query)

But I can make the assertion stronger by writing:

[Value] = db:get(Query)

It asserts both value and number of elements in the list. Sweet!

“Fail fast” is another example of applying “immediate feedback principle” in programming. It allows happy path programming, which makes programs more readable, but requires treating each line as an assertion. It is easy to do this with pattern matching.

Failing fast and supervision trees = ♥♥♥

Building docker images for Elixir applications

TL;DR: Use exrm to speed up working with Elixir and Docker. Time of running docker pull dropped from 5m to 16s.

Docker among many other things solves problem of deploys. It makes them easy to perform and repeatable. I can deploy the same docker image many times on different machines. Developers, testers, QAs and Ops can work with almost identical environment. Performing manual tests, automated tests and stress tests in parallel saves a lot of time.

Up to last week during docker pull, docker had to perform number of steps:

During pull docker waits for layers, if next layer depends on it. It means, that it can’t pull Elixir layer before Erlang layer is ready and it can’t pull application before Elixir is ready.
On my development machine (with the base image precached) docker pull took about five minutes to complete. In case, you are interested, we used this docker image, that installed Elixir and Erlang.

As I said before those 5 minutes pulls are performed many times during the day.

5 minutes * number of environments * number of features we want to push to production adds up quickly.

Can we somehowe speed up the process? Yes! Erlang introduces concept of releases. A release is a minimal self contained build. Releases include Erlang runtime, so you don’t have to have Elixir or Erlang installed.

Releases were historically painful to build, so there are tools that do if for you using sane defaults. relx for Erlang and exrm for Elixir.

Now docker pull performs only two steps:

  • Get the base image
  • Pull our application files

And it takes only 16s!

Is there a catch? Yes, there is.

We also liked to perform unit tests in docker image with mix tests, but exrm contains only application code. No tests code, no mix at all.

We use the old image with all dependencies for unit tests. After they finish, we build a release and create new docker image, which is then pulled many times by other teams.

If you are working with Phoenix web framework, there is a great step by step guide for setting up relx with Phoenix.

Why Erlang modules have long names or how to troll Erlang developer?

Yesterday, my friend, who is learning Erlang, asked me to show him, how to use funs in Erlang. I could have just typed the answer in Adium window, but I like to be sure, that everything, I am sending always compiles and works, so I quickly created Erlang module, scribbled an example and tried to compile it.

While in a rush, I didn’t think of a file name and just named the file file.erl. When I compiled it, I got an error saying:

121> c(file).
{error,sticky_directory}

=ERROR REPORT==== 20-Sep-2014::10:08:33 ===
Can't load module that resides in sticky dir

Sticky dir is something with file permissions, right?
So I quit the Erlang shel, checked the permissions and restarted it:

122> q().
ok
$ erl
Erlang R16B02 (erts-5.10.3)  [smp:2:2] [async-threads:10] [hipe] [kernel-poll:false]

{"init terminating in do_boot",{undef,[{file,path_eval,[[".","/Users/tomaszkowal"],".erlang"],[]},{c,f_p_e,2,[{file,"c.erl"},{line,474}]},{init,eval_script,8,[]},{init,do_boot,3,[]}]}}

Crash dump was written to: erl_crash.dump
init terminating in do_boot ()

WTF?! I am opening fresh Erlang shell and it crashes?! Did I just broke the ErlangVM?! How? And then, while reading the error, it struck me: “.” is in the path, so my module called file overshadowed file module, which is used during boot to search for .erlang and execute its contents [1].

Of course, I didn’t come up with that idea during the first reading. Somehow my brain did not associate the file from error message with file that I just created. They were in different contexts, because error is from the guts of ErlangVM and my module is just 4 lines of code including module declaration and exports.

So – to answer the question from post title: Send an Erlang developer module called file and let him compile it. If he or she is not careful enough to delete the .beam file, he/she won’t be able to use erl in this directory and he/she will get cryptic message, that puzzled couple of people [2]. It took me couple of minutes to realise how stupid I was, so maybe someone will fall for it too!

This also explains, why in most Erlang applications module names are so long and prefixed with application name. There are no namespaces in Erlang, so all modules should have unique names. It is not as bad as it seems. I am working with Erlang for a couple of years now and it was the first time, I had this kind of problem. Next time, I’ll name my file asdf.

[1] http://erlang.org/doc/man/erl.html#id168387
[2] http://erlang.2086793.n4.nabble.com/Installing-a-module-from-code-td2113022.html

Erlang OTP gen_server boilerplate.

gen_server [1] is the most basic OTP behaviour. It is also very convenient and used in almost every bigger project. Nevertheless, people sometimes ask: “Why there is so much boilerplate? [2]”. Usually, we can find three sections in the module implementing gen_server:

1. At the top are the API functions to the server
2. In the middle, there are callbacks,
that you have to specify for gen_server behaviour.
3. At the end, there are helper functions.

Useful stuff happens in the second section. There you can see actual operations on the data and state management. So why do we want repeat everything, that is already in handle_call and handle_cast to provide API? Why gen_server cannot generate the API for us?

Lets take an example from Learn You Some Erlang[3]: a gen_server for storing cats. It is described in detail here [4].

-module(kitty_gen_server).
-behaviour(gen_server).

-export([start_link/0, order_cat/4, return_cat/2, close_shop/1]).
-export([init/1, handle_call/3, handle_cast/2, handle_info/2,
         terminate/2, code_change/3]).

-record(cat, {name, color=green, description}).

%%% Client API
start_link() ->
    gen_server:start_link(?MODULE, [], []).

%% Synchronous call
order_cat(Pid, Name, Color, Description) ->
   gen_server:call(Pid, {order, Name, Color, Description}).

%% This call is asynchronous
return_cat(Pid, Cat = #cat{}) ->
    gen_server:cast(Pid, {return, Cat}).

%% Synchronous call
close_shop(Pid) ->
    gen_server:call(Pid, terminate).

%%% Server functions
init([]) -> {ok, []}. %% no treatment of info here!

handle_call({order, Name, Color, Description}, _From, Cats) ->
    if Cats =:= [] ->
        {reply, make_cat(Name, Color, Description), Cats};
       Cats =/= [] ->
        {reply, hd(Cats), tl(Cats)}
    end;
handle_call(terminate, _From, Cats) ->
    {stop, normal, ok, Cats}.

handle_cast({return, Cat = #cat{}}, Cats) ->
    {noreply, [Cat|Cats]}.

handle_info(Msg, Cats) ->
    io:format("Unexpected message: ~p~n",[Msg]),
    {noreply, Cats}.

terminate(normal, Cats) ->
    [io:format("~p was set free.~n",[C#cat.name]) || C <- Cats],
    ok.

code_change(_OldVsn, State, _Extra) ->
    %% No change planned. The function is there for the behaviour,
    %% but will not be used. Only a version on the next
    {ok, State}. 

%%% Private functions
make_cat(Name, Col, Desc) ->
    #cat{name=Name, color=Col, description=Desc}.

Not a single line of code was changed, the only thing, I added is background color.

The code with green background is executed in the client process. Sometimes, it is easy to think about all code in gen_server module as code, that runs on the server side, but this is wrong.

The API functions are called in the client process and client process sends messages to the server.
Why is it important?

Lets look closer at return_cat function. Second parameter must be a valid cat record. If you try to call something like this:

return_cat(Pid, dog).

your client will crash, but if you call it like this:

gen_server:cast(Pid, {return, dog}).

your server will crash.

This makes huge difference. “Let it crash” philosophy provides confidence, that errors will not propagate, so it is really important to crash as fast as possible. If you can do some validation on the client side – do it. Let the programmers know, that it is client, that sent bad data and not gen_server, that has a bug.

What is even more important, you will not loose the precious state. Sometimes, it is good to crash the gen_server. For example, when somehow its internal state became invalid. I had a gen_server, that was a frontend to database connection. It kept the connection in its state. If some operation failed, the server crashed and supervisor tried to restart it couple of times and create a new connection in init. Failed operation, usually required reconnecting, so it worked, if the connection problem was temporary, but if the database was down permanently, supervisor crashed itself and shut down entire application. But more often than not, you would like to preserve the state and you wouldn’t like to let invalid data to contaminate it.

So the API functions, which at first glance look like boilerplate are really important for your application.

[1] http://www.erlang.org/doc/man/gen_server.html
[2] http://en.wikipedia.org/wiki/Boilerplate_code
[3] http://learnyousomeerlang.com/
[4] http://learnyousomeerlang.com/clients-and-servers