3.12. Pattern Match
3.12.1. Syntax
The general form of a pattern match expression:
case
{ ∀ (x₁: A₁) (x₂: A₂) ... : R } -- optional type
λ (p₁₁: A₁₁) (p₁₂: A₁₂) ... : R₁ := e₁ -- pattern clauses
λ (p₂₁: A₂₁) (p₂₂: A₂₂) ... : R₂ := e₂
...
Type annotations for the pattern and the results in the pattern match expression
are optional. Note that R can be a function type of the form ∀ (y: B):
C.
Pattern can be explicit (p or (p: A)) or implicit ({p} or {p:
A}.
Syntax for pattern:
p ::= identifier/operator
| _
| constant -- number, character, string
| p p
I.e. a pattern is a head term (identifier, operator, _, constant) followed by
zero or more argument pattern with the restriction that _ and constants
cannot have arguments. Operators can be used in prefix or infix form. In the
abstract syntax we use only the prefix form.
Example:
case
{ List (ℕ, String) → ℕ }
λ (::) ((,) (succ i) "hello") ((::) ((,) x _) _ -- prefix form
:=
i + x
λ (succ i, "hello") :: (x,_) :: _ -- infix form
:=
i + x
3.12.2. Type
The type of a pattern match expression is a function type which has the general form
∀ (x₁: A₁) (x₂: A₂) ... : R
-- example
∀ {n m}: succ n ≤ succ m → n ≤ m
-- in long form
∀ {n m: ℕ} (_: succ n ≤ succ m): n ≤ m
Note that type annotations can be ommitted as long as the compiler can infer
them and A → B is a shorthand for ∀ (_: A): B. Braces are used to mark
implicit arguments.
Variables which occur in types are inferrable variables and the corresponding
types are dependent types. In the example n and m are inferrable
variables and succ n ≤ succ m and n ≤ m are dependent types.
In a type of a pattern match expression, all implicit variables must be inferrable variables. The reverse is not true in general.
3.12.3. Rules
- Distinct pattern variables:
All variables used in the explicit pattern of the same pattern clause have to be distinct.
Variables in inferable pattern of the same pattern clause need not be distinct.
- Number of pattern:
The number of arguments in the type and the number of matched patterns in each line must be the same.
However if there are implicit arguments in the type, the corresponing pattern in the pattern match lines can be ommitted because the compiler can infer them.
The compiler adds wildcard arguments
{_}for the missing implicit arguments in the pattern clauses.- Implicit arguments in braces:
The pattern corresponding to implicit arguments in the type of the pattern match expression have to be put in braces.
- Type completeness:
All variables occuring in the types
A₁, A₂, ..., R(i.e. all inferable variables) of the type must occur as variables in the type. E.g. the typen ≤ mis not a legal type of a pattern match expression.∀ {n m}: n ≤ mis a legal type.- Welltyped:
The types in the
ith pattern clause (Ai₁ Ai₂ ... Ri) must be unifiable with the corresponding types (A₁ A₂ ... R) of the type of the pattern match expression where all inferable variables are considered as unification variables.This consistency requirement excludes pattern clauses with some pattern combinations where the types of the pattern clause are not unifiable with the corresponding types in the type of the pattern match. In the extreme case there are no allowed pattern clauses and the pattern match is empty.
For the details of type checking see section Welltyped.
- Exhaustive:
For all possible arguments which do not contain variables at least one of the pattern clauses must match. The check for being exhaustive can be done by transforming a pattern match expression into its canonical form (see section Canonical Forms). For the details to check exhaustiveness see section Exhaustiveness.
- Reachable:
All clauses must be reachable. I.e. for each clause there is at least one set of arguments which matches the clause and fails to match all previous clauses.
3.12.4. Welltyped
The general form of a pattern match expression is
case
{ ∀ (x: A) (y: B) ... : R}
λ p q ... := e
...
where p and q are pattern. For each argument in the type (x: A)
there is a corresponding pattern p.
In order to typecheck a clause we typecheck from left to right all arguments and
finally the result type. We consider all variables x, y, … in the type
as substitutable. Each typecheck step for one argument replaces the
coresponding variable in the type by an expression from the pattern.
At the start of the checking we have all variables in the type unassigned. In
the ith step all variables before the ith variable of the type are
assigned. We look at the ith argument and the corresponding pattern.
-- explicit argument implicit argument
∀ (x: A): R ∀ {x: A}: R
λ p := ... λ {p}
Note that the variables in the type before x can occur in the type A and
they have already been replaced by their corresponding expressions. R
represents the remaining type where all substitutions have been done as well.
If an implicit argument in the type does not have a corresponding pattern, we
add the wildcard pattern _.
In order to elaborate the pattern we have to distinguish various cases:
Head term is a constant: For a number
Ahas to be a numerical type. For a characterAhas to beChar. For a stringAhas to beString. Constants cannot have argument pattern.Head term is a wildcard
_: In that case no argument pattern are allowed. We introduce a metavariable?m: ∀ (v0: V0) (v1: V1) ... : Awherev0,v1, … are all pattern variables introduced up to now and the patternpis elaborated as?m v0 v1 .... The elaborated pattern has the typeAby construction.Head term is an identifier which is an already introduced pattern variable: This is allowed only if the type of the pattern variable is unifiable with
Aand the identifier represents an implicit argument. The implicitness is necessary in order to have some unification which will verify the sameness of the pattern in both positions.Head term is an identifier which is not a constructor of the type
A: In that case no argument pattern are allowed. We introduce a new pattern variablevof typeA. The elaborated pattern has the typeAby construction.Head term is an identifier which is a constructor of the inductive type
A. The implicit arguments which represent the parameters of the inductive type have to match exactly the parameters of the inductive typeA.Then we construct recursively each argument pattern of the constructor arguments.
Finally we unify the actual type of the expression
id p0 p1 ...with the required typeAand in case of success replace the variablexin the type by the elaborated expressionid p0 p1 ....
After the successful elaboration of all pattern there might remain some
unassigned metavariables. Unassigned metavariables ?m occur only in the form
?m v0 v1 ... where v0, v1 are pattern variables which existed at the
point of the introduction of the metavariable. For each pattern ?m v0 v1 ...
we introduce a new pattern variable of the correponding type and replace the
pattern by the pattern variable.
As a last step the expression e has to be elaborated with the required type
R. Note that at that point of the elaboration all variables in the type of
the whole pattern match expression have already been replaced by expressions
depending only on pattern variables.
3.12.5. Canonical Forms
The transformation into canonical form works by case splitting on variable pattern, reordering of the pattern clauses and dropping of non reachable clauses.
3.12.5.1. Focus of Subsequent Clauses
We consider two pattern as equivalent if the have the same structure and only have different variables at the same position. Furthermore inferable pattern are always considered as equivalent.
The pattern in focus of two subsequent clauses is the first pattern on which both clauses are different. If there is no focal pattern, then the second one is unreachable.
-- Example 1:
\ p1 p2 ... zero ... := ...
\ p1 p2 ... (succ n) ... := ...
-- ^ focal point
-- Example 2:
\ p1 p2 ... (succ n) ... := ...
\ p1 p2 ... (succ (succ m)) ... := ...
-- ^ focal point
The focal point of two pattern is the first subpattern when scanned from left to right where they are different. The difference can be because of two different constructors at the focal point or a constructor and a variable at the focal point.
Reorder:
If both clauses have different constructors at the focal point, then the order does not matter (because at most one of them can succeed).
If one clause has a constructor and the other a variable at the focal point, then the order is significant. Both can succeed on the constructor but produce different results.
If both have a variable at the focal point, then the order is significant. Both can succeed on all constructors but produce different results.
Specialization of variables:
Assume
c(x)is a clause which has the variablexat a certain pattern position. We can always convert this clause into the two semantically equivalent clauses:c(x:=p) -- replace 'x' by some constructor pattern c(x)We can use such a splitting to generate missing clauses. Let’s assume we have two clauses
c1andc2:c1(p1, p2, ... , pn, qi) c2(x1, x2, ... , xn, y)such that:
c1(p1, p2, ... , pn, qi) c2(x1:=p1, x2:=p2, ... , xn:=pn, y) -- ^ focal pointhave a focal point at
y / qi. In that case the following three clauses have the same semantics as the original two clauses.:c1(p1, p2, ... , pn, qi) c2(x1:=p1, x2:=p2, ... , xn:=pn, y:=qj) c2(x1, x2, ... , xn, y)which can be reordered without changing the semantics:
c2(x1:=p1, x2:=p2, ... , xn:=pn, y:=qj) c1(p1, p2, ... , pn, qi) c2(x1, x2, ... , xn, y)If the pattern
qjhad been missing in the set of clauses beforec1, then the first of the three can bubble up and provide the missing clause.
3.12.5.2. Reorder Clauses
We reorder clauses in order to transform them into the lexicographic order. The order is induced by the order in which the constructors are introduced in the corresponding inductive type.
We swap the order of two subsequent clauses if there is a focal point where both have different constructors and the constructor of the second clause comes before the constructor in the first clause in the corresponding inductive type.
Examples of out of order clauses:
λ p₁ p₂ ... (succ (succ n)) ... := ...
λ p₁ p₂ ... (zero ) ... := ...
-- ^ focal point with out of order constructors
λ p₁ p₂ ... (succ (succ n)) ... := ...
λ p₁ p₂ ... (succ zero ) ... := ...
-- ^ focal point with out of order constructors
The swapping of the clauses does not change the semantics of the pattern match expression.
3.12.5.3. Split a Variable Pattern
Case splitting of a variable occurs if we have two subsequent clauses with a focal point where the first one has a constructor at the focal point and the other has a variable at the focal point.
Examples of overlapping clauses:
λ p₁ p₂ ... (succ (succ n)) ... := ...
λ p₁ p₂ ... m ... := ...
-- ^ focal point with overlap
λ p₁ p₂ ... (succ m ) ... := ...
λ p₁ p₂ ... (succ (succ n)) ... := ...
-- ^ focal point with overlap
We do a case split on the variable. The case splitting does not change the semantics of the pattern match expression.
Example 1:
λ p₁ p₂ ... (succ (succ n)) ... := ...
λ p₁ p₂ ... m ... := ...
-- ^ focal point with overlap
-- case split 'm'
λ p₁ p₂ ... (succ (succ n)) ... := ...
λ p₁ p₂ ... (zero ) ... := ...
λ p₁ p₂ ... (succ m ) ... := ...
Example 2:
λ p₁ p₂ ... (succ m ) ... := ...
λ p₁ p₂ ... (succ (succ n)) ... := ...
-- ^ focal point with overlap
-- case split 'm'
λ p₁ p₂ ... (succ zero ) ... := ...
λ p₁ p₂ ... (succ (succ n)) ... := ...
λ p₁ p₂ ... (succ (succ n)) ... := ...
Example 3:
λ p₁ p₂ ... zero ... := ...
λ p₁ p₂ ... m ... := ...
-- ^ focal point with overlap
-- case split 'm'
λ p₁ p₂ ... zero ... := ...
λ p₁ p₂ ... zero ... := ...
λ p₁ p₂ ... (succ m) ... := ...
3.12.5.4. Transform into Canonical Form
- Definition of canonical form:
A pattern match expression is in canonical form if there are no two subsequent clauses with a focal pattern where the pattern are out of order or overlapping.
- Transformation into canonical form:
Search for a focal pattern in two subsequent clauses and do a reordering or a case splitting until no more focal pattern which are out of order or overlapping can be found in subsequent clauses.
It remains to be shown that the algorithm terminates.
The pattern match expression has an initial maximal constructor nesting \(m\). This maximal constructor nesting \(m\) remains constant during the algorithm
- Proof:
A reordering does not change the maximal constructor nesting.
A variable case split does not change the maximal constructor nesting. During a variable case split, the splitted clauses have a new constructor at the place of the variable. At that place the other clause had already a constructor. Therefore the maximal constructor nesting does not change.
Now we create a sequence of numbers \(n_0 n_1 n_2 \ldots n_m i\) for each step. \(n_k\) is the number of variables which are nested below \(k\) constructors and \(i\) is the number of out of order clauses in the pattern match expression. Clearly there cannot be any variable nested below more than \(m\) constructors, because \(m\) is the maximal constructor nesting during the algorithm.
We consider a lexicographic order on the sequence \(n_0 n_1 n_2 \ldots n_m i\) and claim that this sequence decreases lexicographically at each step of the algorithm.
- Proof:
Reordering does not change \(n_0 n_1 \ldots n_m\), it only decreases \(i\).
Variable case splitting decreases the sequence lexicographically. The case splitted variable occurs at a certain nesting depth \(k\). After the split the number \(n_k\) has decreased by one. The numbers \(n_{k+1} \ldots n_m i\) might increase. But the number \(n_k\) has higher significance in the lexicographic order.
3.12.6. Reachability
Reachability can be checked by transforming a pattern match expression into its canonical form. Clauses which are unreachable follow immediately the clause which shadows the unreachable clauses. The unreachable clauses have to be eliminated.
Each clause in the canonical form stems exactly from one original clause. If all clauses stemming from the same original clause are unreachable, then the original clause is unreachable which has to be flagged as an error.
3.12.7. Exhaustiveness
Exhaustiveness can be easily checked in the canonical form where all nonreachable clauses have been removed.
In the canonical form the sequence of clauses are nicely grouped. The pattern vary from left to right from low frequency to the highest frequence. Therefore missing variations can be easily spotted.
We can ignore all missing variations in inferable pattern. We concentrate only on the non inferable pattern. If a clause is missing and it is unifiable with the type, then the pattern match is not exhaustive. If all missing clauses are not unifiable, then the pattern match is exhaustive even if not all combinations are present.
We demonstrate the check on the following inductive types:
class (=) {α: Any} (x: α): α → Prop :=
identical: (=) x
class (≤): Endorelation ℕ :=
start {n} : zero ≤ n
next {n m} : n ≤ m → succ n ≤ succ m
class Vector (α: Any): ℕ → Any :=
[] : Vector zero
(::) : ∀ {n}: α → Vector n → Vector (succ n)
We look at the follwing pattern match expressions in canonical form
Example 1:
case
{ ∀ {n: ℕ}: zero = succ n → False }
-- no clauses
Since there are no clauses, the expression is certainly in canonical form. The missing clause has the form:
λ {i} identical := ...
Typechecking of the first argument leads to the substitution n := i.
Therefore the type of the second argument is zero = succ i. However the
typing judgement
identical: zero = succ i
is invalid because the only valid typing judgement is
identical: zero = zero
Therefore the clause is not really missing. It is not typable.
The same reasoning applies if we flip the arguments:
case
{∀ {n: ℕ}: succ n = zero → False }
-- no clauses
-- missing clause
λ {i} identical
-- type check the first argument
n := i
-- required typing judgement for the second argument
identical: succ i = zero
-- actual typing judgement for the second argument
identical: succ i = succ i
Example 2:
case
{ ∀ {n m: ℕ}: succ n ≤ succ m → n ≤ m }
λ {i j} (next {i j} le) := le
The obviously missing clause has the form:
λ {i j} (start {k}) :=
Typechecking of the first two arguments leads to the substitutions n := i
and m := j and the required type succ i ≤ succ j for the third argument
i.e. the required typing judgement:
start {k}: succ i ≤ succ j
However the only valid typing judgment is:
start {k}: zero ≤ k
There is no possible substitution for the variables i, j and k which
unifies the types succ i ≤ succ j and zero ≤ k.
Therefore the obviously missing clause is not really missing.
Example 3:
case
{ ∀ {n: ℕ}:
Vector ℕ n → Vector ℕ n → Vector ℕ n
}
λ {zero} [] [] := ...
λ {i} ((::) {j} x xs) ((::) {k} y ys) := ...
The obviously missing clauses are the mixed cases:
λ {i} [] ((::) {k} y ys) := ...
λ {i} ((::) {j} x xs) [] := ...
Let’s look at the unification problems generated by the first seemingly missing case. Type checking of the first argument leads to:
n := i -- substitution
Vector ℕ i -- required type for the second argument
Typechecking of the second argument leads to:
[]: Vector ℕ i -- required typing judgement
[]: Vector ℕ zero -- actual typing judgement
i := zero -- substitution
Vector ℕ zero -- required type for the third argument
However the required typing judgement for the third argument:
(::) {k} y ys: Vector ℕ zero
is not satisfiable because the constructor :: constructs an object of type
Vector ℕ (succ k).
A similar reasoning applies to the second missing case:
-- type check the first argument
n := i -- substitution
Vector ℕ i -- required type for the second argument
-- type check the second argument
(::) {j} x xs: Vector ℕ i -- required typing judgement
(::) {j} x xs: Vector ℕ (succ j) -- actual typing judgement
i := succ j -- substitution
Vector ℕ (succ j) -- required type for the third argument
-- type check the third argument
[]: Vector ℕ (succ j) -- required typing judgement
[]: Vector ℕ zero -- actual typing judgment
-- unsatisfiable !!
3.12.8. Pattern Match Compiler
A pattern expression is a function with one or more arguments. In order to execute an pattern match expression in the runtime efficiently, the pattern match expression has to be compiled into a branching with e.g. a switch case.
Each branching step can decide only on one object by looking at the tag.
-- In the source code
(<=?): Natural -> Natural -> Bool := case
(succ n) (succ m) := n <=? m
zero _ := true
_ _ := false
-- More efficient
(<=?): Natural -> Natural -> Bool := case
zero :=
\ _ := true
(succ n) := case
zero :=
false
(succ m) :=
n <=? m
Compilation to javascript:
function le (a, b) {
switch (a[0]) {
case 'zero':
return true
case 'succ':
switch (b[0]){
case 'zero':
return false
case 'succ':
return le (a[1], b[1])
}
}
}
In order to get the more efficient form we can transform the original cases into its canonical form.
-- original
(succ n) (succ m) := n <=? m
zero _ := true
_ _ := false
-- swap
zero _ := true
(succ n) (succ m) := n <=? m
_ _ := false
-- split
zero _ := true
(succ n) (succ m) := n <=? m
zero _ := false
(succ _) _ := false
-- swap
zero _ := true
zero _ := false
(succ n) (succ m) := n <=? m
(succ _) _ := false
-- remove shadowed
zero _ := true
(succ n) (succ m) := n <=? m
(succ _) _ := false
-- split
zero _ := true
(succ n) (succ m) := n <=? m
(succ _) zero := false
(succ _) (succ _) := false
-- swap
zero _ := true
(succ _) zero := false
(succ n) (succ m) := n <=? m
(succ _) (succ _) := false
-- remove shadowed
zero _ := true
(succ _) zero := false
(succ n) (succ m) := n <=? m
This pattern match can be directly compiled into the nested branching where all cases are covered. I.e. the pattern match is exhaustive. Each of the original clauses occurs at least once, therefore there are no non-reachable clauses.
nodups: List Nat -> List Nat := case
(x :: y :: tl) :=
if x =? y then res else y :: res where
res := nodups (y :: tl)
xs :=
xs
nodups: List Nat -> List Nat := case
[] :=
[]
(x :: xs) :=
inspect xs case
[] :=
[x]
(y :: tl) :=
if x =? y then res else y :: res
where res := nodups (y :: tl)
Transformation into canonical form:
-- original
(x :: y :: tl) := exp
xs := xs
-- split
(x :: y :: tl) := exp
[] := []
(x :: xs) := x :: xs
-- swap
[] := []
(x :: y :: tl) := exp
(x :: xs) := x :: xs
-- split
[] := []
(x :: y :: tl) := exp
(x :: []) := x :: []
(x :: y :: tl) := x :: y :: xs
-- swap
[] := []
(x :: []) := x :: []
(x :: y :: tl) := exp
(x :: y :: tl) := x :: y :: xs
-- remove shadowed
[] := []
(x :: []) := x :: []
(x :: y :: tl) := exp
map2 {A B C: Any} (f: A -> B -> C): List A -> List B -> List C
:= case
(x :: xs) (y :: ys) :=
f x y :: map2 xs xs
_ _ :=
[]
map2 {A B C: Any} (f: A -> B -> C): List A -> List B -> List C
:= case
[] _ :=
[]
(x :: xs) :=
case
[] :=
[]
(y :: ys) :=
f x y :: map2 xs ys
Transformation into canonical form
-- original
(x :: xs) (y :: ys) := exp
_ _ := []
-- split
(x :: xs) (y :: ys) := exp
[] _ := []
(x :: xs) _ := []
-- swap
[] _ := []
(x :: xs) (y :: ys) := exp
(x :: xs) _ := []
-- split
[] _ := []
(x :: xs) (y :: ys) := exp
(x :: xs) [] := []
(x :: xs) (y :: ys) := []
-- swap
[] _ := []
(x :: xs) [] := []
(x :: xs) (y :: ys) := exp
(x :: xs) (y :: ys) := []
-- remove shadowed
[] _ := []
(x :: xs) [] := []
(x :: xs) (y :: ys) := exp
3.12.9. Modified Pattern Compiler
Let’s revisit the example:
(<=?): Natural -> Natural -> Bool := case
(succ n) (succ m) := n <=? m
zero _ := true
_ _ := false
The first and the second case are inverted. Since the constructors are different, we can swap them:
zero _ := true
(succ n) (succ m) := n <=? m
_ _ := false
Now for the first argument sufficient cases are available. Looking only at the
first argument, the last case is redundant. However the second argument still
needs a case for zero. I.e. we have to factor out the corresponding case:
zero _ := true
(succ n) (succ m) := n <=? m
_ zero := false
_ _ := false
We cannot yet swap. Before we have to put in the third line the same constructor as in the previous line because we don’t have catch all cases (i.e. variable pattern) not as the last case.:
-- split
zero _ := true
(succ n) (succ m) := n <=? m
(succ n) zero := false
_ _ := false
-- and then swap
zero _ := true
(succ n) zero := false
(succ n) (succ m) := n <=? m
_ _ := false
Now the last case is unreachable and can be deleted:
zero _ := true
(succ n) zero := false
(succ n) (succ m) := n <=? m