torsdag, april 09, 2009

Backtracking in Erlang, part 2 - passing data

In the first part I showed how we could program backtracking in a generic way in Erlang. While Erlang does not directly support it, it is relatively easy to do. In this part we will look at some ways of handling data within this context. The basic problem is how to get data out of a function as in this context functions never return.

For a specific application this may not be a problem as either each function always explcitly knows what the contunation is, or a it is possible to have a local convention of how to pass arguments. For an example of this see the Erlog parser in the file erlog_parse.erl where each continuation function has two arguments, the remaining tokens and the term parsed so far. This works well for this specific case.

This is not possible in the general case or when writing generic functions. Seeing we cannot return values we will adopt the method used in logic languages like prolog (or C) of "returning" values through the function arguments. For example the following "function" fubar/2 (written in a logic style) which takes input in the argument X and "returns" data through the argument Y:

fubar(X, Y) :- foo(X, M, N), bar(M, O), baz(N, O, Y).

Each of the functions foo/3, bar/2 and baz/3 behave in a similar way but they have varying numbers of input and output arguments.

I will show one way how this can done. We will define a variables structure which contains the bindings of variables. It has the following interface:

new_vars() -> Vars.
new_var(Vars) -> {Var,Vars}.
bind_var(Var, Value, Vars) -> Vars.
get_var(Var, Vars) -> Value.

A requirement on the Vars structure is that all variable bindings are undone on backtracking. Note that we are not really defining logical variables here or implementing unification, we are just creating variables and binding them. There is no default value for a variable, trying to get the value of an unbound variable will generate an error.

We will adopt the convention that all continuation functions have one argument and they will be called with the current Vars structure. We will also adopt the convention that each normal function will have its arguments in the following order: first come normal Erlang arguments, then the Vars structure, then the arguments which use the Vars structure with the input arguments first followed by the output arguments, and finally the continuation. Using Vars and this convention we could translate fubar/2 into the following Erlang function:

fubar(Vars0, X, Y, Next) ->
%% Create the new variables in fubar.
{M,Vars1} = new_var(Vars0),
{N,Vars2} = new_var(Vars1),
{O,Vars3} = new_var(Vars2),
Next2 = fun (Vars) -> baz(Vars, N, O, Y, Next) end,
Next1 = fun (Vars) -> bar(Vars, M, O, Next2) end,
foo(Vars, X, M, N, Next1).

Here we have assumed that fubar is part of a larger application which has created the Vars structure. The continuation function is called with the current Vars. So for example the (trivial) function bar/2 which has M as input and binds O for output could be written:

bar(Vars0, M, O, Next) ->
Mval = get_var(M, Vars),
Oval = do_stuff(Mval),
Vars1 = bind_var(O, Oval, Vars),
Next(Vars1).

Choice points are now introduced by:

cp([fun (Vs) -> do_stuff_a(Vs, Next) end,
fun (Vs) -> do_stuff_b(Vs, Next) end,
fun (Vs) -> do_stuff_c(Vs, Next) end,
...], Vars).

The code for cp/2, is still simple:

cp(Vars, [G|Gs]) ->
case G(Vars) of
{succeed,Value} -> {succeed,Value};
fail -> cp(Vars, Gs)
end;
cp(_, []) -> fail.

We see here that if the Vars structure does not use side effects then we will automatically undo all bindings on backtrackíng.

I will give a simple definition of the Vars structure and its interface. It uses dicts to store the variable bindings.

-record(vars, {vc,dict}).

new_vars() -> #vars{vc=0,dict=dict:new()}.

new_var(#vars{vc=Vc,dict=Dict}=Vars) ->
{{var,Vc},Vars#vars{vc=Vc+1}}.

bind_var({var,Var}, Value, #vars{dict=Dict}=Vars) ->
Vars#vars{dict=dict:store(Var, Value, Dict)}.

get_var({var,Var}, #vars{dict=Dict}) ->
dict:fetch(Var, Dict).

That's it, we're done. Again a lot of explanation for quite a simple concept and very little code.

In the next installment I will give an example of how this can be used and some problems with the current implementation.