3.12. Pattern Match

3.12.1. Syntax

The general form of a pattern match expression:

case
    { ∀ (x: A₁) (x: A₂) ... : R }             -- optional type
    λ (p₁₁: A₁₁) (p₁₂: A₁₂) ... : R:= e-- pattern clauses
    λ (p₂₁: A₂₁) (p₂₂: A₂₂) ... : R:= e₂
    ...

Type annotations for the pattern and the results in the pattern match expression are optional. Note that R can be a function type of the form (y: B): C.

Pattern can be explicit (p or (p: A)) or implicit ({p} or {p: A}.

Syntax for pattern:

p   ::=     identifier/operator
    |       _
    |       constant            -- number, character, string
    |       p p

I.e. a pattern is a head term (identifier, operator, _, constant) followed by zero or more argument pattern with the restriction that _ and constants cannot have arguments. Operators can be used in prefix or infix form. In the abstract syntax we use only the prefix form.

Example:

case
    { List (ℕ, String) → ℕ }

    λ (::) ((,) (succ i) "hello") ((::) ((,) x _) _     -- prefix form
    :=
        i + x

    λ (succ i, "hello") :: (x,_) :: _                   -- infix form
    :=
        i + x

3.12.2. Type

The type of a pattern match expression is a function type which has the general form

∀ (x: A₁) (x: A₂) ... : R

-- example

∀ {n m}: succ nsucc mnm

-- in long form

∀ {n m: ℕ} (_: succ nsucc m): nm

Note that type annotations can be ommitted as long as the compiler can infer them and A B is a shorthand for (_: A): B. Braces are used to mark implicit arguments.

Variables which occur in types are inferrable variables and the corresponding types are dependent types. In the example n and m are inferrable variables and succ n succ m and n m are dependent types.

In a type of a pattern match expression, all implicit variables must be inferrable variables. The reverse is not true in general.

3.12.3. Rules

Distinct pattern variables:

All variables used in the explicit pattern of the same pattern clause have to be distinct.

Variables in inferable pattern of the same pattern clause need not be distinct.

Number of pattern:

The number of arguments in the type and the number of matched patterns in each line must be the same.

However if there are implicit arguments in the type, the corresponing pattern in the pattern match lines can be ommitted because the compiler can infer them.

The compiler adds wildcard arguments {_} for the missing implicit arguments in the pattern clauses.

Implicit arguments in braces:

The pattern corresponding to implicit arguments in the type of the pattern match expression have to be put in braces.

Type completeness:

All variables occuring in the types A₁, A₂, ..., R (i.e. all inferable variables) of the type must occur as variables in the type. E.g. the type n m is not a legal type of a pattern match expression. {n m}: n m is a legal type.

Welltyped:

The types in the ith pattern clause (Ai₁ Ai₂ ... Ri) must be unifiable with the corresponding types (A₁ A₂ ... R) of the type of the pattern match expression where all inferable variables are considered as unification variables.

This consistency requirement excludes pattern clauses with some pattern combinations where the types of the pattern clause are not unifiable with the corresponding types in the type of the pattern match. In the extreme case there are no allowed pattern clauses and the pattern match is empty.

For the details of type checking see section Welltyped.

Exhaustive:

For all possible arguments which do not contain variables at least one of the pattern clauses must match. The check for being exhaustive can be done by transforming a pattern match expression into its canonical form (see section Canonical Forms). For the details to check exhaustiveness see section Exhaustiveness.

Reachable:

All clauses must be reachable. I.e. for each clause there is at least one set of arguments which matches the clause and fails to match all previous clauses.

3.12.4. Welltyped

The general form of a pattern match expression is

case
    { ∀ (x: A) (y: B) ... : R}
    λ p q ... := e
    ...

where p and q are pattern. For each argument in the type (x: A) there is a corresponding pattern p.

In order to typecheck a clause we typecheck from left to right all arguments and finally the result type. We consider all variables x, y, … in the type as substitutable. Each typecheck step for one argument replaces the coresponding variable in the type by an expression from the pattern.

At the start of the checking we have all variables in the type unassigned. In the ith step all variables before the ith variable of the type are assigned. We look at the ith argument and the corresponding pattern.

-- explicit argument                    implicit argument
∀ (x: A): R                             ∀ {x: A}: R
λ p := ...                              λ {p}

Note that the variables in the type before x can occur in the type A and they have already been replaced by their corresponding expressions. R represents the remaining type where all substitutions have been done as well.

If an implicit argument in the type does not have a corresponding pattern, we add the wildcard pattern _.

In order to elaborate the pattern we have to distinguish various cases:

  • Head term is a constant: For a number A has to be a numerical type. For a character A has to be Char. For a string A has to be String. Constants cannot have argument pattern.

  • Head term is a wildcard _: In that case no argument pattern are allowed. We introduce a metavariable ?m: (v0: V0) (v1: V1) ... : A where v0, v1, … are all pattern variables introduced up to now and the pattern p is elaborated as ?m v0 v1 .... The elaborated pattern has the type A by construction.

  • Head term is an identifier which is an already introduced pattern variable: This is allowed only if the type of the pattern variable is unifiable with A and the identifier represents an implicit argument. The implicitness is necessary in order to have some unification which will verify the sameness of the pattern in both positions.

  • Head term is an identifier which is not a constructor of the type A: In that case no argument pattern are allowed. We introduce a new pattern variable v of type A. The elaborated pattern has the type A by construction.

  • Head term is an identifier which is a constructor of the inductive type A. The implicit arguments which represent the parameters of the inductive type have to match exactly the parameters of the inductive type A.

    Then we construct recursively each argument pattern of the constructor arguments.

    Finally we unify the actual type of the expression id p0 p1 ... with the required type A and in case of success replace the variable x in the type by the elaborated expression id p0 p1 ....

After the successful elaboration of all pattern there might remain some unassigned metavariables. Unassigned metavariables ?m occur only in the form ?m v0 v1 ... where v0, v1 are pattern variables which existed at the point of the introduction of the metavariable. For each pattern ?m v0 v1 ... we introduce a new pattern variable of the correponding type and replace the pattern by the pattern variable.

As a last step the expression e has to be elaborated with the required type R. Note that at that point of the elaboration all variables in the type of the whole pattern match expression have already been replaced by expressions depending only on pattern variables.

3.12.5. Canonical Forms

The transformation into canonical form works by case splitting on variable pattern, reordering of the pattern clauses and dropping of non reachable clauses.

3.12.5.1. Focus of Subsequent Clauses

We consider two pattern as equivalent if the have the same structure and only have different variables at the same position. Furthermore inferable pattern are always considered as equivalent.

The pattern in focus of two subsequent clauses is the first pattern on which both clauses are different. If there is no focal pattern, then the second one is unreachable.

-- Example 1:
\ p1 p2 ... zero     ...        := ...
\ p1 p2 ... (succ n) ...        := ...
--          ^ focal point

-- Example 2:
\ p1 p2 ... (succ n)        ...        := ...
\ p1 p2 ... (succ (succ m)) ...        := ...
--                ^ focal point

The focal point of two pattern is the first subpattern when scanned from left to right where they are different. The difference can be because of two different constructors at the focal point or a constructor and a variable at the focal point.

Reorder:

  • If both clauses have different constructors at the focal point, then the order does not matter (because at most one of them can succeed).

  • If one clause has a constructor and the other a variable at the focal point, then the order is significant. Both can succeed on the constructor but produce different results.

  • If both have a variable at the focal point, then the order is significant. Both can succeed on all constructors but produce different results.

Specialization of variables:

Assume c(x) is a clause which has the variable x at a certain pattern position. We can always convert this clause into the two semantically equivalent clauses:

c(x:=p)         -- replace 'x' by some constructor pattern
c(x)

We can use such a splitting to generate missing clauses. Let’s assume we have two clauses c1 and c2:

c1(p1, p2, ... , pn, qi)
c2(x1, x2, ... , xn, y)

such that:

c1(p1,     p2,     ... , pn,     qi)
c2(x1:=p1, x2:=p2, ... , xn:=pn, y)
--                               ^ focal point

have a focal point at y / qi. In that case the following three clauses have the same semantics as the original two clauses.:

c1(p1,     p2,     ... , pn,     qi)
c2(x1:=p1, x2:=p2, ... , xn:=pn, y:=qj)
c2(x1,     x2,     ... , xn,     y)

which can be reordered without changing the semantics:

c2(x1:=p1, x2:=p2, ... , xn:=pn, y:=qj)
c1(p1,     p2,     ... , pn,     qi)
c2(x1,     x2,     ... , xn,     y)

If the pattern qj had been missing in the set of clauses before c1, then the first of the three can bubble up and provide the missing clause.

3.12.5.2. Reorder Clauses

We reorder clauses in order to transform them into the lexicographic order. The order is induced by the order in which the constructors are introduced in the corresponding inductive type.

We swap the order of two subsequent clauses if there is a focal point where both have different constructors and the constructor of the second clause comes before the constructor in the first clause in the corresponding inductive type.

Examples of out of order clauses:

λ pp₂ ... (succ (succ n)) ...     := ...
λ pp₂ ... (zero         ) ...     := ...
--           ^ focal point with out of order constructors

λ pp₂ ... (succ (succ n)) ...     := ...
λ pp₂ ... (succ zero    ) ...     := ...
--                ^ focal point with out of order constructors

The swapping of the clauses does not change the semantics of the pattern match expression.

3.12.5.3. Split a Variable Pattern

Case splitting of a variable occurs if we have two subsequent clauses with a focal point where the first one has a constructor at the focal point and the other has a variable at the focal point.

Examples of overlapping clauses:

λ pp₂ ... (succ (succ n)) ...     := ...
λ pp₂ ... m               ...     := ...
--          ^ focal point with overlap

λ pp₂ ... (succ m       ) ...     := ...
λ pp₂ ... (succ (succ n)) ...     := ...
--                ^ focal point with overlap

We do a case split on the variable. The case splitting does not change the semantics of the pattern match expression.

Example 1:

λ pp₂ ... (succ (succ n)) ...     := ...
λ pp₂ ... m               ...     := ...
--          ^ focal point with overlap

-- case split 'm'

λ pp₂ ... (succ (succ n)) ...     := ...
λ pp₂ ... (zero         ) ...     := ...
λ pp₂ ... (succ m       ) ...     := ...

Example 2:

λ pp₂ ... (succ m       ) ...     := ...
λ pp₂ ... (succ (succ n)) ...     := ...
--                ^ focal point with overlap

-- case split 'm'

λ pp₂ ... (succ zero    ) ...     := ...
λ pp₂ ... (succ (succ n)) ...     := ...
λ pp₂ ... (succ (succ n)) ...     := ...

Example 3:

λ pp₂ ... zero        ...                := ...
λ pp₂ ... m           ...     := ...
--          ^ focal point with overlap

-- case split 'm'

λ pp₂ ... zero        ...     := ...
λ pp₂ ... zero        ...     := ...
λ pp₂ ... (succ m)    ...     := ...

3.12.5.4. Transform into Canonical Form

Definition of canonical form:

A pattern match expression is in canonical form if there are no two subsequent clauses with a focal pattern where the pattern are out of order or overlapping.

Transformation into canonical form:

Search for a focal pattern in two subsequent clauses and do a reordering or a case splitting until no more focal pattern which are out of order or overlapping can be found in subsequent clauses.

It remains to be shown that the algorithm terminates.

The pattern match expression has an initial maximal constructor nesting \(m\). This maximal constructor nesting \(m\) remains constant during the algorithm

Proof:

A reordering does not change the maximal constructor nesting.

A variable case split does not change the maximal constructor nesting. During a variable case split, the splitted clauses have a new constructor at the place of the variable. At that place the other clause had already a constructor. Therefore the maximal constructor nesting does not change.

Now we create a sequence of numbers \(n_0 n_1 n_2 \ldots n_m i\) for each step. \(n_k\) is the number of variables which are nested below \(k\) constructors and \(i\) is the number of out of order clauses in the pattern match expression. Clearly there cannot be any variable nested below more than \(m\) constructors, because \(m\) is the maximal constructor nesting during the algorithm.

We consider a lexicographic order on the sequence \(n_0 n_1 n_2 \ldots n_m i\) and claim that this sequence decreases lexicographically at each step of the algorithm.

Proof:

Reordering does not change \(n_0 n_1 \ldots n_m\), it only decreases \(i\).

Variable case splitting decreases the sequence lexicographically. The case splitted variable occurs at a certain nesting depth \(k\). After the split the number \(n_k\) has decreased by one. The numbers \(n_{k+1} \ldots n_m i\) might increase. But the number \(n_k\) has higher significance in the lexicographic order.

3.12.6. Reachability

Reachability can be checked by transforming a pattern match expression into its canonical form. Clauses which are unreachable follow immediately the clause which shadows the unreachable clauses. The unreachable clauses have to be eliminated.

Each clause in the canonical form stems exactly from one original clause. If all clauses stemming from the same original clause are unreachable, then the original clause is unreachable which has to be flagged as an error.

3.12.7. Exhaustiveness

Exhaustiveness can be easily checked in the canonical form where all nonreachable clauses have been removed.

In the canonical form the sequence of clauses are nicely grouped. The pattern vary from left to right from low frequency to the highest frequence. Therefore missing variations can be easily spotted.

We can ignore all missing variations in inferable pattern. We concentrate only on the non inferable pattern. If a clause is missing and it is unifiable with the type, then the pattern match is not exhaustive. If all missing clauses are not unifiable, then the pattern match is exhaustive even if not all combinations are present.

We demonstrate the check on the following inductive types:

class (=) {α: Any} (x: α): α → Prop :=
    identical: (=) x

class (≤): Endorelation:=
    start {n}   : zeron
    next  {n m} : nmsucc nsucc m

class Vector: Any): ℕ → Any :=
    []      : Vector zero
    (::)    : ∀ {n}: α → Vector nVector (succ n)

We look at the follwing pattern match expressions in canonical form

Example 1:

case
    { ∀ {n: ℕ}: zero = succ nFalse }
    -- no clauses

Since there are no clauses, the expression is certainly in canonical form. The missing clause has the form:

λ {i} identical    :=  ...

Typechecking of the first argument leads to the substitution n := i. Therefore the type of the second argument is zero = succ i. However the typing judgement

identical: zero = succ i

is invalid because the only valid typing judgement is

identical: zero = zero

Therefore the clause is not really missing. It is not typable.

The same reasoning applies if we flip the arguments:

case
    {∀ {n: ℕ}: succ n = zeroFalse }
    -- no clauses

-- missing clause
λ {i} identical

-- type check the first argument
n := i

-- required typing judgement for the second argument
identical: succ i = zero

-- actual typing judgement for the second argument
identical: succ i = succ i

Example 2:

case
    { ∀ {n m: ℕ}: succ nsucc mnm }
    λ {i j} (next {i j} le) := le

The obviously missing clause has the form:

λ {i j} (start {k})   :=

Typechecking of the first two arguments leads to the substitutions n := i and m := j and the required type succ i succ j for the third argument i.e. the required typing judgement:

start {k}: succ isucc j

However the only valid typing judgment is:

start {k}: zerok

There is no possible substitution for the variables i, j and k which unifies the types succ i succ j and zero k.

Therefore the obviously missing clause is not really missing.

Example 3:

case
    { ∀ {n: ℕ}:
        VectornVectornVectorn
    }
    λ {zero}    []              []                  :=  ...
    λ {i}       ((::) {j} x xs) ((::) {k} y ys)     :=  ...

The obviously missing clauses are the mixed cases:

λ {i}       []                  ((::) {k} y ys)     :=  ...
λ {i}       ((::) {j} x xs)     []                  :=  ...

Let’s look at the unification problems generated by the first seemingly missing case. Type checking of the first argument leads to:

n := i              -- substitution
Vectori          -- required type for the second argument

Typechecking of the second argument leads to:

[]: Vectori      -- required typing judgement
[]: Vectorzero   -- actual typing judgement
i := zero           -- substitution
Vectorzero       -- required type for the third argument

However the required typing judgement for the third argument:

(::) {k} y ys:  Vectorzero

is not satisfiable because the constructor :: constructs an object of type Vector (succ k).

A similar reasoning applies to the second missing case:

-- type check the first argument
n := i              -- substitution
Vectori          -- required type for the second argument

-- type check the second argument
(::) {j} x xs: Vectori           -- required typing judgement
(::) {j} x xs: Vector ℕ (succ j)    -- actual typing judgement
i := succ j                         -- substitution
Vector ℕ (succ j)                   -- required type for the third argument

-- type check the third argument
[]: Vector ℕ (succ j)           -- required typing judgement
[]: Vectorzero               -- actual typing judgment
-- unsatisfiable !!

3.12.8. Pattern Match Compiler

A pattern expression is a function with one or more arguments. In order to execute an pattern match expression in the runtime efficiently, the pattern match expression has to be compiled into a branching with e.g. a switch case.

Each branching step can decide only on one object by looking at the tag.

-- In the source code
(<=?): Natural -> Natural -> Bool := case
    (succ n)    (succ m)    := n <=? m
    zero        _           := true
    _           _           := false

-- More efficient
(<=?): Natural -> Natural -> Bool := case
    zero :=
        \ _ := true
    (succ n) := case
        zero :=
            false
        (succ m) :=
            n <=? m

Compilation to javascript:

function le (a, b) {
    switch (a[0]) {
        case 'zero':
            return true
        case 'succ':
            switch (b[0]){
                case 'zero':
                    return false
                case 'succ':
                    return le (a[1], b[1])
            }
    }
}

In order to get the more efficient form we can transform the original cases into its canonical form.

-- original
     (succ n)    (succ m)    := n <=? m
     zero        _           := true
     _           _           := false

 -- swap
     zero        _           := true
     (succ n)    (succ m)    := n <=? m
     _           _           := false

 -- split
     zero        _           := true
     (succ n)    (succ m)    := n <=? m
     zero        _           := false
     (succ _)    _           := false

 -- swap
     zero        _           := true
     zero        _           := false
     (succ n)    (succ m)    := n <=? m
     (succ _)    _           := false

 -- remove shadowed
     zero        _           := true
     (succ n)    (succ m)    := n <=? m
     (succ _)    _           := false

 -- split
     zero        _           := true
     (succ n)    (succ m)    := n <=? m
     (succ _)    zero        := false
     (succ _)    (succ _)    := false

 -- swap
     zero        _           := true
     (succ _)    zero        := false
     (succ n)    (succ m)    := n <=? m
     (succ _)    (succ _)    := false

 -- remove shadowed
     zero        _           := true
     (succ _)    zero        := false
     (succ n)    (succ m)    := n <=? m

This pattern match can be directly compiled into the nested branching where all cases are covered. I.e. the pattern match is exhaustive. Each of the original clauses occurs at least once, therefore there are no non-reachable clauses.

nodups: List Nat -> List Nat := case
    (x :: y :: tl) :=
        if x =? y then res else y :: res where
            res := nodups (y :: tl)
    xs :=
        xs

nodups: List Nat -> List Nat := case
    [] :=
        []
    (x :: xs) :=
        inspect xs case
            [] :=
                [x]
            (y :: tl) :=
                if x =? y then res else y :: res
                where res := nodups (y :: tl)

Transformation into canonical form:

-- original
    (x :: y :: tl)  := exp
    xs              := xs

-- split
    (x :: y :: tl)  := exp
    []              := []
    (x :: xs)       := x :: xs

-- swap
    []              := []
    (x :: y :: tl)  := exp
    (x :: xs)       := x :: xs

-- split
    []              := []
    (x :: y :: tl)  := exp
    (x :: [])       := x :: []
    (x :: y :: tl)  := x :: y :: xs

-- swap
    []              := []
    (x :: [])       := x :: []
    (x :: y :: tl)  := exp
    (x :: y :: tl)  := x :: y :: xs

-- remove shadowed
    []              := []
    (x :: [])       := x :: []
    (x :: y :: tl)  := exp
map2 {A B C: Any} (f: A -> B -> C): List A -> List B -> List C
:= case
    (x :: xs) (y :: ys) :=
        f x y :: map2 xs xs
    _ _ :=
        []

map2 {A B C: Any} (f: A -> B -> C): List A -> List B -> List C
:= case
    [] _ :=
        []
    (x :: xs) :=
        case
            [] :=
                []
            (y :: ys) :=
                f x y :: map2 xs ys

Transformation into canonical form

-- original
    (x :: xs) (y :: ys) := exp
    _ _  := []

-- split
    (x :: xs) (y :: ys) := exp
    []        _  := []
    (x :: xs) _  := []

-- swap
    []        _  := []
    (x :: xs) (y :: ys) := exp
    (x :: xs) _  := []

-- split
    []        _  := []
    (x :: xs) (y :: ys) := exp
    (x :: xs) []  := []
    (x :: xs) (y :: ys) := []

-- swap
    []        _   := []
    (x :: xs) []  := []
    (x :: xs) (y :: ys) := exp
    (x :: xs) (y :: ys)  := []

-- remove shadowed
    []        _   := []
    (x :: xs) []  := []
    (x :: xs) (y :: ys) := exp

3.12.9. Modified Pattern Compiler

Let’s revisit the example:

(<=?): Natural -> Natural -> Bool := case
    (succ n)    (succ m)    := n <=? m
    zero        _           := true
    _           _           := false

The first and the second case are inverted. Since the constructors are different, we can swap them:

zero        _           := true
(succ n)    (succ m)    := n <=? m
_           _           := false

Now for the first argument sufficient cases are available. Looking only at the first argument, the last case is redundant. However the second argument still needs a case for zero. I.e. we have to factor out the corresponding case:

zero        _           := true
(succ n)    (succ m)    := n <=? m
_           zero        := false
_           _           := false

We cannot yet swap. Before we have to put in the third line the same constructor as in the previous line because we don’t have catch all cases (i.e. variable pattern) not as the last case.:

-- split
zero        _           := true
(succ n)    (succ m)    := n <=? m
(succ n)    zero        := false
_           _           := false

-- and then swap
zero        _           := true
(succ n)    zero        := false
(succ n)    (succ m)    := n <=? m
_           _           := false

Now the last case is unreachable and can be deleted:

zero        _           := true
(succ n)    zero        := false
(succ n)    (succ m)    := n <=? m