2.6. Parsing
In this section we describe monadic parsing. Recursion relies on progress
indicators. The type Progress is a builtin type and it can be used only in
monadic types which are some specific forms of functions.
Progress: Any
yes: Progress
no: Progress
(and): Progress -> Progress -> Progress
(or): Progress -> Progress -> Progress
2.6.1. String Parser
For a module we need one or more types, constructors of objects of the types, combinators which combine objects of the types and eliminators which use objects of the types to return objects which no longer contain the types.
-- module interface
module "String Parser"
-- Type
Parser: Any -> Progress -> Any
section {A B: Any} {i j}: Progress :=
-- Elementary constructors for the parser
return: A -> Parser A no
fail: Parser A no
char: (Char -> Bool) -> Parser Char yes
-- Combinators
(>>=): Parser A i -> (A -> Parser B j) -> Parser B (i or j)
(</>): Parser A i -> Parser A j -> Parser A (i and j)
-- Run a parser
run: Parser A i -> String -> (Maybe A, String)
This interface declares Parser to be a type constructor. One of the type
arguments is Progress. This is meaningful only if Parser A i is a
function type.
A string parser has a monadic bind operator >>=. This enables the use of
do expressions.
Next we define the module and start with the type
module "String Parser" :=
Parser (A: Any) (_: Progress): Any :=
String -> (Maybe A, String)
A parser is a function with a string argument (i.e. only one state) and the string appears in the result. This imposes the requirement that no function is allowed to increase the string and all progressing parser have to decrease the string in the success case.
From the declaration of the monadic bind the compiler knows that A is the
monadic argument. The argument type A in the result type has to within a
container with two constructors, one for the success case and one for the
failure case. In the success case the result has to contain an object of type
A, in the failure case it must not contain an object of type A. The
inductive type Maybe satisfies this requirement.
The remainder of the definition is understood to be within a section as in the module interface in order to avoid repetition of formal arguments.
First we define the elementary constructors which construct non progresssing parsers.
return (a: A): Parser A no :=
\ s := (just a, s)
fail: Parser A no :=
\ s := (empty, s)
Both functions make non progressing parsers, they don’t change the string.
Now the only constructor for a progressing parser.
char (d: Char -> Bool): Parser Char yes := case
\ (s := []) := (empty, s) -- string not decreased
\ (s := c :: s2) :=
if d c then (just c, s2) -- string decreased
else (empty, s) -- string not decreased
In the failure cases the string is kept the same and in the success case the string is decreased.
We have to define the monadic bind.
(>>=) (p: Parser A i) (f: A -> Parser B j): Parser B (i or j) :=
\ s :=
match p s case
\ (just a, s2) := f a s2
\ (empty, s2) := (empty, s2)
The function f can be called only in the success case (otherwise there is no
object of type A.
The parser p might make progress or not in the success
case. I.e. in the success case the function f is guaranteed to be called
with a string s2 which is the same as s or decreased, depending on the
progress indicator i. The parser f a makes progress depending on j.
The combined parser makes progress if p or f a make progress.
In the failure case the parser p must not change the string. The combined
parser fails as well and does not change the string either.
The remaining combinator is the biased choice.
</> (p: Parser A i) (q: Parser A j): Parser A (i and j) :=
\ s :=
match p s case
\ (just a, s2) := (just a, s2)
\ (empty, s2) := q s
If p makes progress in the success case, then the result parser makes
progress as well.
p might make progress in the failure case. Therefore the progress of the
combined parser depends of the progress of q.
The combined parser p </> q is guaranteed to make progress in the success
case only if both guarantee progress in the success case.
Running the parser.
run (p: Parser A i) (s: String): (Maybe A, String) :=
p s
This completes the definition of the string parser module. The next functions
are defined outside the module and therefore cannot use the knowledge that an
object of type Parser A i is a function. Outside the module an object of
type Parser A i can be applied to a string only by the function call
run p s.
Recursive functions:
many (p: Parser A yes): Parser (List A) no :=
do
a := p
lst := many
return (a :: lst)
</>
return []
many1 (p: Parser A yes): Parser (A, List A) yes :=
do
a := p
lst := many
return (a :: lst)
2.6.2. Incremental Parsers
All the following code is within a section
section {A B R: Any} {i j: Progress}
:=
...
A parser is an object which is either done with a number of consumed characters and an optional result, or it needs more input. In the second case it contains a function which accepts more input and returns a new incremental parser.
-- Parser
type Parser (A: Any) :=
done: Nat -> Maybe A -> Parser
more: (String -> Parser) -> Parser
put: String -> Parser A -> Parser A
:= case
\ _, (p: = done _ _ ) := p
\ s, more f := f s
A partial parser is a continuation monad with state where the state consists of the number of consumed characters and a lookahead string. The lookahead string is a prefix of the remainder of input which is not yet consumed.
A partial parser receives the state, parses a certain part of the lookahead and then calls the continuation with success or failure and the new state.
Mon (A R: Any) (_: Progress): Any
-- Continuation Monad
:= Nat -- consumed characters
-> String -- lookaheads to parse
-> (Maybe A -> Nat -> String -> Parser R) -- continuation
-> Parser R
(>>=) (m: Mon A R i) (f: A -> Mon B R j): Mon B R (i or j)
-- monadic bind
:=
\ n s k :=
m n s (case
\ just a, n, s := f a n s k
\ empty, n, s := k empty n s)
Recursion rule: Whenever there is a recursive call, then the continuation function is called in the success case with a structurally smaller argument and in failure cases the string argument is not increased.
-- Elementary monads
return (a: A): Mon A R no :=
\ n s k := k (just a) n s
fail: Mon A R no :=
\ n s k := k empty n s
char (d: Char -> Bool): Mon Char R yes
:= case
\ n, (s := []), k := more (\ s := char s k)
-- ^^^^ no recursion, partial call
\ m, (s := c :: s2), k :=
if d c then k (just c) (succ n) s2
else k empty n s
The monad char d is a function which makes progress. It calls the
continuation function with a shorter string in the success case and with the
original string in the failure case.
-- Biased choice
(</>) (p: Mon A R i) (q: Mon A R j): Mon A R (i and j) :=
\ n s k :=
p zero (case
\ empty, zero, s2 :=
-- no consumption, try 'q'
q (zero + n) s2 k
\ empty, succ n2, s2 :=
-- 'p' has consumed and failed
k empty (n2 + n)
\ just a, n2, s2 :=
-- 'p' has succeeded
k (just a) (n2 + n) s2
)
-- Make the parser from a partial parser
make (m: Cont R R i): Parser R :=
m 0 "" (\ res n _ := done res n)
2.6.3. Backtracking (DRAFT)
PROBLEM:
When backtracking is possible we have to buffer some consumed characters. The buffer has to be part of the state. When a parser which shall be made backtrackable fails and consumes characters, then part of the consumed characters have to be shifted back to the lookahead. This breaks the progress and termination proof!!
POSSIBLE SOLUTION:
Each stripping off on one constructor adds at most one constructor to another inductive type. A backtrackable parser has to initialize the other object to zero level. When backtracking, the other object can be eliminated in case of failure.
char (d: Char -> Bool): Mon Char R yes
:= case
\ pre, (la := c :: la2), k :=
if d c then
k (just c) (c :: pre) la2
-- prefix increased by one constructor
-- lookahead decreased by one constructor
else
k empty pre la
-- prefix and lookahead unchanged
\ pre, (la := ""), k :=
more (\ la := char pre la k)
-- Not a recursive call, just a closure.
-- Continuation paused until a new lookahead is
-- available. Prefix and continuation unchanged.
backtrack (p: Mon A R i): Mon A R no :=
\ pre la k :=
p "" la -- pre initialized to zero level
(case
\ empty, "", la2 :=
-- 'p' failed without consumption
k empty "" la2
\ empty, pre2, la2 :=
-- 'p' failed and consumed
-- consumption is in 'pre2'
k empty pre (revPrepend pre2 la2)
-- ^ ^ elimination of 'pre2'
-- | restoring the old value
\ just a, pre2, la2 :=
-- 'p' has succeeded and consumed 'pre2' since its start
-- The continuation has to be called with a consumption
-- of 'pre2 + pre'.
k (just a) (pre2 + pre) la2
)