Wednesday, 18 March 2009

Backtracking in Erlang, part 1 - control

Sometimes you need to be able to do search algorithms which can step back and try other alternate ways of finding a solution, "backtracking". Generally this is difficult to do in functional languages like Erlang (most other languages too for that matter). A classic example of this type of problem is parsing ambiguous grammars.

Prolog syntax is like that, in some cases you need almost indefinite look-ahead before you can decide how input is to be parsed. When I implemented Erlog (a prolog in and for Erlang) I decided to do the parser in Erlang so that it could run stand-alone and so I ran straight into this problem.

Here is *a* solution to the problem. It is in two parts, the first describing the control structures and the second (more difficult) describing handling of data. There may also be a third part discussing some problems with the solution to control, if anyone is still interested.

The main problem we have to overcome is that normally when you choose a path you "commit" to it and can't go back and try an alternate path. For example in:

case get_a_value() of
Choice1 -> ... ;
Choice2 -> ... ;
...
end

once you have chosen a path, say Choice1, there is no (simple) way to go back try Choice2 if you later decide that Choice1 was wrong. We can't backtrack.

Another problem is that if we have:

a() -> foo(), bar(), baz().

and once we have completed foo() then even if it gave an answer which later was shown to be wrong there is no way to step back into it, backtrack, and let it give a new answer.

In this first section we will (almost) completely ignore arguments and return values.

When looking at control we will (almost) completely ignore arguments and returning values. We will use the convention that functions return either '{succeed,Value}' when they succeed or 'fail' when they can't find a solution.

The trick is to ensure that functions only return at all when they fail, otherwise they just continue by explicitly calling the next thing which should be done. We will do this by using a Continue Passing Style (CPS) where each function will have an argument which is the next function call to continue. (CPS is not new and is generally known) So functions will usually look like this:

a(Next) ->
do_stuff(),
OurNext = fun () -> even_more_stuff(Next) end,
more_stuff(OurNext).

We will use the convention that when a function fails and cannot go on it will return 'fail'. When *the* solution has been found then the final test will return {succeed,Value}. So the top level function will look something like:

top_call() ->
Succeed = fun () -> {succeed,yes} end, %We have succeeded
find_solution(Succeed).

So now our program will keep going forwards until it finally calls the Succeed fun when it all returns back to the initial caller.

The next problem is to introduce 'choice points' where we can choose one path but still allow the code to backtrack and make another choice. This is the done in the function cp/1 where the argument is a list functions which are separate paths to choose from:

cp([fun () -> do_stuff_a(Next) end,
fun () -> do_stuff_b(Next) end,
fun () -> do_stuff_c(Next) end,
...]).

Cp/1 will try the functions in order, if one succeeds then the choice point will succeed and return the same succeed value, if it fails then the next function in the list will be tried, until no more exist when the choice point will fail: The code for cp is simple:

cp([G|Gs]) ->
case G() of
{succeed,Value} -> {succeed,Value};
fail -> cp(Gs)
end;
cp([]) -> fail.

That's it, we're done. A lot of explanation for very little code.

A slight variation is to not chain the continuation together as shown here but to have Next as a list of functions to call. Instead of directly calling Next you call next/1 with argument Next, next(Next), where next/1 is defined as:

next([N|Ns]) -> N(Ns).

There is no functional difference it just make writing some code a bit easier. For example the function defined above:

a() -> foo(), bar(), baz().

would look like this in the normal way:

a(Next) ->
Next1 = fun () -> baz(Next) end,
Next2 = fun () -> bar(Next1) end,
foo(Next2).

while using the list variant it would look like:

a(Next) ->
foo([fun (Ns) -> bar(Ns) end,
fun (Ns) -> baz(Ns) end | Next]).

which can be easier to read, especially if there many calls in the continuation.

After I introduce handling data in the next installment I can give a proper example.

8 comments:

  1. I like this, it's very similar to parser combinators with [memoizing] PEG/Packrat parsers, which can do infinite lookahead and handle ambiguous left-recursive grammars, but with a large processing [memory] overhead.

    I hope Part II Data Handling discusses passing back failure details to provide useful error messages. One interesting problem is coming up with an inverse merge for the error details (co-combinator ?) to pick the 'best' diagnostic - usually the deepest (rightmost) error in the input stream, but could be the most 'likely' error, or even a set of possible errors.

    Mik

    ReplyDelete
  2. References

    1. Parser combinators in Erlang:

    eParSec

    2. Papers on PEG/Packrat parsing, including memoizing and handling left-recursive grammars:

    Brian Ford's PEG/Packrat page

    Mik

    ReplyDelete
  3. The idea is similar but my suggestion can be used for anything, not just parsers. I just used parsing as an example. Returning failure details is simple but picking the right one can be difficult. I will try to give some examples of doing it but it may have to wait until part 3, it depends on how long part 2 gets.

    ReplyDelete
  4. Here's a simple example of where you could use continued passing techniques. I use it to handle query strings on a web page...

    %%checking for querystrings
    out(Arg)->
      continue(Arg).

    continue(Arg) ->
      Check = "days",
      case queryvar(Arg, Check) of
        {ok,DayString}->
          case string:to_integer(DayString) of
            {DaysInt,_} ->
               continue(Arg,Days);
            _ ->
               error(Check)
            end;
          _ ->
          error(Check)
    end.

    continue(Arg,Days)->
        Check = "hours",
        case queryvar(Arg, Check) of
          {ok,HoursString}->
            case string:to_integer(HoursString) of
               {HoursInt,_} when DaysInt =:= 24 ->
                 continue(Arg,Days + 1,0);
              {HoursInt,_} ->
            continue(Arg,Days,HoursInt);
          _ ->
            error(Check)
        end;
        _ ->
          error(Check)
      end.

    continue(Arg,Days,Hours) ->
      and_so_on.

    .
    .
    .

    Even REST calls that require certain headers/data passed in, if not they take the default parameters.

    out(Arg)->
      handle(Arg).

    handle(Arg)->
      %% use format, else take default as xml
      case yaws_api:queryvar(Arg,"format") of
        {ok, "xml"} ->
          continue({Arg,xml});
        {ok, "json"} ->
          continue({Arg,json});
        _ ->
          continue({Arg,?DEFAULT_FORMAT})
      end.

    But while ^ might not be rocket science, I can see your wisdom in having a list of attempt functions that are called when the previous attempt fails. eg: I can envison using this in another common scenario.

    attempts_for_trying_to_manipulate_data(Arg)->
      [
        does_the_request_come_with_a_logged_in_cookie(Arg),
        well_does_the_request_come_with_secure_REST_call(Arg),
        for_heavens_sake_check_if_this_guy_is_an_admin(Arg),
        i_give_up(failed)
      ].

    The funny thing is that if your used to keeping your functions nice and small (mine are typically <10 lines ) this sort of catching the failure sort of comes naturally, but that said - again, having a list of attempt functions will help in making code that fails gracefully, until it infact succeeds.

    Keep Clicking,
    ~B
    *hints at blogger to get code friendly commenting!*

    ReplyDelete
  5. @Blosky: How did you manage to indent code? Haven't found a way to do that yet.

    Part two on data almost done. Then all I need to do is an example. This searching type of coding can be quite fun, the use of generators interacting with testers.

    ReplyDelete
  6. ya was tricky, prefixed everyline with &nbsp;&nbsp; and doubled it for every depth of indent

    ; )

    ReplyDelete
  7. Thanks. I found this blog helpful to build a checker (here) that uses abnf types to parse and validate binaries.

    ReplyDelete
  8. Those are the possible values which would allow students out of all those probabilities and meaning which one must needed to occupy for the better success and meaning.

    ReplyDelete