.. _Pattern Match:

********************************************************************************
Pattern Match
********************************************************************************


Syntax
================================================================================

The general form of a pattern match expression:

.. code-block::

    case
        { ∀ (x₁: A₁) (x₂: A₂) ... : R }             -- optional type
        λ (p₁₁: A₁₁) (p₁₂: A₁₂) ... : R₁ := e₁      -- pattern clauses
        λ (p₂₁: A₂₁) (p₂₂: A₂₂) ... : R₂ := e₂
        ...


Type annotations for the pattern and the results in the pattern match expression
are optional. Note that ``R`` can be a function type of the form ``∀ (y: B):
C``.

Pattern can be explicit (``p`` or ``(p: A)``) or implicit (``{p}`` or ``{p:
A}``.


Syntax for pattern::

    p   ::=     identifier/operator
        |       _
        |       constant            -- number, character, string
        |       p p


I.e. a pattern is a head term (identifier, operator, _, constant) followed by
zero or more argument pattern with the restriction that ``_`` and constants
cannot have arguments. Operators can be used in prefix or infix form. In the
abstract syntax we use only the prefix form.

Example::

    case
        { List (ℕ, String) → ℕ }

        λ (::) ((,) (succ i) "hello") ((::) ((,) x _) _     -- prefix form
        :=
            i + x

        λ (succ i, "hello") :: (x,_) :: _                   -- infix form
        :=
            i + x


Type
================================================================================


The type of a pattern match
expression is a function type which has the general form

.. code-block::

    ∀ (x₁: A₁) (x₂: A₂) ... : R

    -- example

    ∀ {n m}: succ n ≤ succ m → n ≤ m

    -- in long form

    ∀ {n m: ℕ} (_: succ n ≤ succ m): n ≤ m


Note that type annotations can be ommitted as long as the compiler can infer
them and ``A → B`` is a shorthand for ``∀ (_: A): B``. Braces are used to mark
implicit arguments.

Variables which occur in types are inferrable variables and the corresponding
types are dependent types. In the example ``n`` and ``m`` are inferrable
variables and ``succ n ≤ succ m`` and ``n ≤ m`` are dependent types.

In a type of a pattern match expression, all implicit variables must be
inferrable variables. The reverse is not true in general.


Rules
================================================================================

Distinct pattern variables:
    All variables used in the explicit pattern of the same pattern clause have
    to be distinct.

    Variables in inferable pattern of the same pattern clause need not be
    distinct.


Number of pattern:
    The number of arguments in the type and the number of matched patterns in
    each line must be the same.

    However if there are implicit arguments in the type, the corresponing pattern
    in the pattern match lines can be ommitted because the compiler can infer
    them.

    The compiler adds wildcard arguments ``{_}`` for the missing implicit
    arguments in the pattern clauses.


Implicit arguments in braces:
    The pattern corresponding to implicit arguments in the type of the pattern
    match expression have to be put in braces.


Type completeness:
    All variables occuring in the types ``A₁, A₂, ..., R`` (i.e. all *inferable*
    variables) of the type must occur as variables in the type. E.g. the type
    ``n ≤ m`` is not a legal type of a pattern match expression. ``∀ {n m}: n ≤
    m`` is a legal type.


Welltyped:
    The types in the ``i``\ th pattern clause (``Ai₁ Ai₂ ... Ri``) must be
    unifiable with the corresponding types (``A₁ A₂ ... R``) of the type of the
    pattern match expression where all inferable variables are considered as
    unification variables.

    This consistency requirement excludes pattern clauses with some pattern
    combinations where the types of the pattern clause are not unifiable with the
    corresponding types in the type of the pattern match. In the extreme case
    there are no allowed pattern clauses and the pattern match is empty.

    For the details of type checking see section :ref:`welltyped`.


Exhaustive:
    For all possible arguments which do not contain variables at least one of
    the pattern clauses must match. The check for being exhaustive can be done
    by transforming a pattern match expression into its canonical form (see
    section :ref:`canonical-forms`). For the details to check exhaustiveness see
    section :ref:`exhaustiveness`.

Reachable:
    All clauses must be reachable. I.e. for each clause there is at least one
    set of arguments which matches the clause and fails to match all previous
    clauses.


.. _welltyped:

Welltyped
================================================================================

The general form of a pattern match expression is ::

    case
        { ∀ (x: A) (y: B) ... : R}
        λ p q ... := e
        ...

where ``p`` and ``q`` are pattern. For each argument in the type ``(x: A)``
there is a corresponding pattern ``p``.

In order to typecheck a clause we typecheck from left to right all arguments and
finally the result type. We consider all variables ``x``, ``y``, ... in the type
as substitutable.  Each typecheck step for one argument replaces the
coresponding variable in the type by an expression from the pattern.

At the start of the checking we have all variables in the type unassigned. In
the ``i``\ th step all variables before the ``i``\ th variable of the type are
assigned. We look at the ``i``\ th argument and the corresponding pattern. ::

    -- explicit argument                    implicit argument
    ∀ (x: A): R                             ∀ {x: A}: R
    λ p := ...                              λ {p}

Note that the variables in the type before ``x`` can occur in the type ``A`` and
they have already been replaced by their corresponding expressions. ``R``
represents the remaining type where all substitutions have been done as well.

If an implicit argument in the type does not have a corresponding pattern, we
add the wildcard pattern ``_``.

In order to elaborate the pattern we have to distinguish various cases:

- Head term is a constant: For a number ``A`` has to be a numerical type.
  For a character ``A`` has to be ``Char``. For a string ``A`` has to be
  ``String``. Constants cannot have argument pattern.

- Head term is a wildcard ``_``: In that case no argument pattern are allowed.
  We introduce a metavariable ``?m: ∀ (v0: V0) (v1: V1) ... : A`` where ``v0``,
  ``v1``, ... are all pattern variables introduced up to now and the pattern
  ``p`` is elaborated as ``?m v0 v1 ...``. The elaborated pattern has the type
  ``A`` by construction.

- Head term is an identifier which is an already introduced pattern variable:
  This is allowed only if the type of the pattern variable is unifiable with
  ``A`` and the identifier represents an implicit argument. The implicitness is
  necessary in order to have some unification which will verify the sameness of
  the pattern in both positions.

- Head term is an identifier which is not a constructor of the type ``A``: In
  that case no argument pattern are allowed. We introduce a new pattern variable
  ``v`` of type ``A``. The elaborated pattern has the type ``A`` by
  construction.

- Head term is an identifier which is a constructor of the inductive type
  ``A``. The implicit arguments which represent the parameters of the inductive
  type have to match exactly the parameters of the inductive type ``A``.

  Then we construct recursively each argument pattern of the constructor
  arguments.

  Finally we unify the actual type of the expression ``id p0 p1 ...`` with the
  required type ``A`` and in case of success replace the variable ``x`` in
  the type by the elaborated expression ``id p0 p1 ...``.


After the successful elaboration of all pattern there might remain some
unassigned metavariables. Unassigned metavariables ``?m`` occur only in the form
``?m v0 v1 ...`` where ``v0``, ``v1`` are pattern variables which existed at the
point of the introduction of the metavariable. For each pattern ``?m v0 v1 ...``
we introduce a new pattern variable of the correponding type and replace the
pattern by the pattern variable.

As a last step the expression ``e`` has to be elaborated with the required type
``R``. Note that at that point of the elaboration all variables in the type of
the whole pattern match expression have already been replaced by expressions
depending only on pattern variables.


.. _canonical-forms:

Canonical Forms
================================================================================


The transformation into canonical form works by case splitting on variable
pattern, reordering of the pattern clauses and dropping of non reachable
clauses.


Focus of Subsequent Clauses
---------------------------

We consider two pattern as equivalent if the have the same structure and only
have different variables at the same position. Furthermore inferable pattern are
always considered as equivalent.

The pattern in focus of two subsequent clauses is the first pattern on which
both clauses are different. If there is no focal pattern, then the second one is
unreachable.

.. code::

    -- Example 1:
    \ p1 p2 ... zero     ...        := ...
    \ p1 p2 ... (succ n) ...        := ...
    --          ^ focal point

    -- Example 2:
    \ p1 p2 ... (succ n)        ...        := ...
    \ p1 p2 ... (succ (succ m)) ...        := ...
    --                ^ focal point


The focal point of two pattern is the first subpattern when scanned from left to
right where they are different. The difference can be because of two different
constructors at the focal point or a constructor and a variable at the focal
point.


Reorder:

    - If both clauses have different constructors at the focal point, then the order
      does not matter (because at most one of them can succeed).

    - If one clause has a constructor and the other a variable at the focal point,
      then the order is significant. Both can succeed on the constructor but produce
      different results.

    - If both have a variable at the focal point, then the order is significant.
      Both can succeed on all constructors but produce different results.


Specialization of variables:

    Assume ``c(x)`` is a clause which has the variable ``x`` at a certain
    pattern position. We can always convert this clause into the two
    semantically equivalent clauses::

        c(x:=p)         -- replace 'x' by some constructor pattern
        c(x)

    We can use such a splitting to generate missing clauses. Let's assume we
    have two clauses ``c1`` and ``c2``::

        c1(p1, p2, ... , pn, qi)
        c2(x1, x2, ... , xn, y)

    such that::

        c1(p1,     p2,     ... , pn,     qi)
        c2(x1:=p1, x2:=p2, ... , xn:=pn, y)
        --                               ^ focal point

    have a focal point at ``y / qi``. In that case the following three clauses
    have the same semantics as the original two clauses.::

        c1(p1,     p2,     ... , pn,     qi)
        c2(x1:=p1, x2:=p2, ... , xn:=pn, y:=qj)
        c2(x1,     x2,     ... , xn,     y)

    which can be reordered without changing the semantics::

        c2(x1:=p1, x2:=p2, ... , xn:=pn, y:=qj)
        c1(p1,     p2,     ... , pn,     qi)
        c2(x1,     x2,     ... , xn,     y)

    If the pattern ``qj`` had been missing in the set of clauses before ``c1``,
    then the first of the three can bubble up and provide the missing clause.


Reorder Clauses
---------------

We reorder clauses in order to transform them into the lexicographic order. The
order is induced by the order in which the constructors are introduced in the
corresponding inductive type.

We swap the order of two subsequent clauses if there is a focal point where both
have different constructors and the constructor of the second clause comes
before the constructor in the first clause in the corresponding inductive type.

Examples of *out of order* clauses::

    λ p₁ p₂ ... (succ (succ n)) ...     := ...
    λ p₁ p₂ ... (zero         ) ...     := ...
    --           ^ focal point with out of order constructors

    λ p₁ p₂ ... (succ (succ n)) ...     := ...
    λ p₁ p₂ ... (succ zero    ) ...     := ...
    --                ^ focal point with out of order constructors

The swapping of the clauses does not change the semantics of the pattern match
expression.


Split a Variable Pattern
------------------------

Case splitting of a variable occurs if we have two subsequent clauses with a
focal point where the first one has a constructor at the focal point and the
other has a variable at the focal point.


Examples of overlapping clauses::

    λ p₁ p₂ ... (succ (succ n)) ...     := ...
    λ p₁ p₂ ... m               ...     := ...
    --          ^ focal point with overlap

    λ p₁ p₂ ... (succ m       ) ...     := ...
    λ p₁ p₂ ... (succ (succ n)) ...     := ...
    --                ^ focal point with overlap

We do a case split on the variable. The case splitting does not change the
semantics of the pattern match expression.


Example 1::

    λ p₁ p₂ ... (succ (succ n)) ...     := ...
    λ p₁ p₂ ... m               ...     := ...
    --          ^ focal point with overlap

    -- case split 'm'

    λ p₁ p₂ ... (succ (succ n)) ...     := ...
    λ p₁ p₂ ... (zero         ) ...     := ...
    λ p₁ p₂ ... (succ m       ) ...     := ...


Example 2::

    λ p₁ p₂ ... (succ m       ) ...     := ...
    λ p₁ p₂ ... (succ (succ n)) ...     := ...
    --                ^ focal point with overlap

    -- case split 'm'

    λ p₁ p₂ ... (succ zero    ) ...     := ...
    λ p₁ p₂ ... (succ (succ n)) ...     := ...
    λ p₁ p₂ ... (succ (succ n)) ...     := ...


Example 3::

    λ p₁ p₂ ... zero        ...                := ...
    λ p₁ p₂ ... m           ...     := ...
    --          ^ focal point with overlap

    -- case split 'm'

    λ p₁ p₂ ... zero        ...     := ...
    λ p₁ p₂ ... zero        ...     := ...
    λ p₁ p₂ ... (succ m)    ...     := ...


Transform into Canonical Form
------------------------------

Definition of *canonical form*:
    A pattern match expression is in canonical form if there are no two
    subsequent clauses with a focal pattern where the pattern are out of order
    or overlapping.


Transformation into *canonical form*:
    Search for a focal pattern in two subsequent clauses and do a reordering or
    a case splitting until no more focal pattern which are out of order or
    overlapping can be found in subsequent clauses.


It remains to be shown that the algorithm terminates.

The pattern match expression has an initial maximal constructor nesting
:math:`m`. This maximal constructor nesting :math:`m` remains constant during
the algorithm

Proof:
    A reordering does not change the maximal constructor nesting.

    A variable case split does not change the maximal constructor nesting.
    During a variable case split, the splitted clauses have a new constructor at
    the place of the variable. At that place the other clause had already a
    constructor.  Therefore the maximal constructor nesting does not change.


Now we create a sequence of numbers :math:`n_0 n_1 n_2 \ldots n_m i` for each
step. :math:`n_k` is the number of variables which are nested below :math:`k`
constructors and :math:`i` is the number of out of order clauses in the pattern
match expression. Clearly there cannot be any variable nested below more than
:math:`m` constructors, because :math:`m` is the maximal constructor nesting
during the algorithm.

We consider a lexicographic order on the sequence :math:`n_0 n_1 n_2 \ldots n_m
i` and claim that this sequence decreases lexicographically at each step of the
algorithm.

Proof:
    Reordering does not change :math:`n_0 n_1 \ldots n_m`, it only decreases
    :math:`i`.

    Variable case splitting decreases the sequence lexicographically. The
    case splitted variable occurs at a certain nesting depth :math:`k`. After
    the split the number :math:`n_k` has decreased by one. The numbers
    :math:`n_{k+1} \ldots n_m i` might increase. But the number :math:`n_k` has
    higher significance in the lexicographic order.


Reachability
================================================================================

Reachability can be checked by transforming a pattern match expression into its
canonical form. Clauses which are unreachable follow immediately the clause
which shadows the unreachable clauses. The unreachable clauses have to be
eliminated.

Each clause in the canonical form stems exactly from one original clause. If all
clauses stemming from the same original clause are unreachable, then the
original clause is unreachable which has to be flagged as an error.


.. _exhaustiveness:

Exhaustiveness
================================================================================

Exhaustiveness can be easily checked in the canonical form where all
nonreachable clauses have been removed.

In the canonical form the sequence of clauses are nicely grouped. The pattern
vary from left to right from low frequency to the highest frequence. Therefore
missing variations can be easily spotted.

We can ignore all missing variations in inferable pattern. We concentrate only
on the non inferable pattern. If a clause is missing and it is unifiable with
the type, then the pattern match is not exhaustive. If all missing clauses are
not unifiable, then the pattern match is exhaustive even if not all combinations
are present.

We demonstrate the check on the following inductive types::

    class (=) {α: Any} (x: α): α → Prop :=
        identical: (=) x

    class (≤): Endorelation ℕ :=
        start {n}   : zero ≤ n
        next  {n m} : n ≤ m → succ n ≤ succ m

    class Vector (α: Any): ℕ → Any :=
        []      : Vector zero
        (::)    : ∀ {n}: α → Vector n → Vector (succ n)


We look at the follwing pattern match expressions in canonical form


Example 1::

    case
        { ∀ {n: ℕ}: zero = succ n → False }
        -- no clauses

Since there are no clauses, the expression is certainly in canonical form. The
missing clause has the form::

    λ {i} identical    :=  ...

Typechecking of the first argument leads to the substitution ``n := i``.
Therefore the type of the second argument is ``zero = succ i``. However the
typing judgement ::

    identical: zero = succ i

is invalid because the only valid typing judgement is ::

    identical: zero = zero

Therefore the clause is not really missing. It is not typable.

The same reasoning applies if we flip the arguments::

    case
        {∀ {n: ℕ}: succ n = zero → False }
        -- no clauses

    -- missing clause
    λ {i} identical

    -- type check the first argument
    n := i

    -- required typing judgement for the second argument
    identical: succ i = zero

    -- actual typing judgement for the second argument
    identical: succ i = succ i


Example 2::

    case
        { ∀ {n m: ℕ}: succ n ≤ succ m → n ≤ m }
        λ {i j} (next {i j} le) := le

The obviously missing clause has the form::

    λ {i j} (start {k})   :=

Typechecking of the first two arguments leads to the substitutions ``n := i``
and ``m := j`` and the required type ``succ i ≤ succ j`` for the third argument
i.e. the required typing judgement::

    start {k}: succ i ≤ succ j

However the only valid typing judgment is::

    start {k}: zero ≤ k

There is no possible substitution for the variables ``i``, ``j`` and ``k`` which
unifies the types ``succ i ≤ succ j`` and ``zero ≤ k``.

Therefore the obviously missing clause is not really missing.


Example 3::

    case
        { ∀ {n: ℕ}:
            Vector ℕ n → Vector ℕ n → Vector ℕ n
        }
        λ {zero}    []              []                  :=  ...
        λ {i}       ((::) {j} x xs) ((::) {k} y ys)     :=  ...


The obviously missing clauses are the *mixed* cases::

    λ {i}       []                  ((::) {k} y ys)     :=  ...
    λ {i}       ((::) {j} x xs)     []                  :=  ...


Let's look at the unification problems generated by the first seemingly missing
case. Type checking of the first argument leads to::

    n := i              -- substitution
    Vector ℕ i          -- required type for the second argument

Typechecking of the second argument leads to::

    []: Vector ℕ i      -- required typing judgement
    []: Vector ℕ zero   -- actual typing judgement
    i := zero           -- substitution
    Vector ℕ zero       -- required type for the third argument

However the required typing judgement for the third argument::

    (::) {k} y ys:  Vector ℕ zero

is not satisfiable because the constructor ``::`` constructs an object of type
``Vector ℕ (succ k)``.

A similar reasoning applies to the second missing case::

    -- type check the first argument
    n := i              -- substitution
    Vector ℕ i          -- required type for the second argument

    -- type check the second argument
    (::) {j} x xs: Vector ℕ i           -- required typing judgement
    (::) {j} x xs: Vector ℕ (succ j)    -- actual typing judgement
    i := succ j                         -- substitution
    Vector ℕ (succ j)                   -- required type for the third argument

    -- type check the third argument
    []: Vector ℕ (succ j)           -- required typing judgement
    []: Vector ℕ zero               -- actual typing judgment
    -- unsatisfiable !!


Pattern Match Compiler
============================================================

A pattern expression is a function with one or more arguments. In order to
execute an pattern match expression in the runtime efficiently, the pattern
match expression has to be compiled into a branching with e.g. a switch case.

Each branching step can decide only on one object by looking at the tag.


.. code-block::

    -- In the source code
    (<=?): Natural -> Natural -> Bool := case
        (succ n)    (succ m)    := n <=? m
        zero        _           := true
        _           _           := false

    -- More efficient
    (<=?): Natural -> Natural -> Bool := case
        zero :=
            \ _ := true
        (succ n) := case
            zero :=
                false
            (succ m) :=
                n <=? m

Compilation to javascript:

.. code-block:: javascript

    function le (a, b) {
        switch (a[0]) {
            case 'zero':
                return true
            case 'succ':
                switch (b[0]){
                    case 'zero':
                        return false
                    case 'succ':
                        return le (a[1], b[1])
                }
        }
    }

In order to get the more efficient form  we can transform the original cases
into its canonical form.

.. code-block::

   -- original
        (succ n)    (succ m)    := n <=? m
        zero        _           := true
        _           _           := false

    -- swap
        zero        _           := true
        (succ n)    (succ m)    := n <=? m
        _           _           := false

    -- split
        zero        _           := true
        (succ n)    (succ m)    := n <=? m
        zero        _           := false
        (succ _)    _           := false

    -- swap
        zero        _           := true
        zero        _           := false
        (succ n)    (succ m)    := n <=? m
        (succ _)    _           := false

    -- remove shadowed
        zero        _           := true
        (succ n)    (succ m)    := n <=? m
        (succ _)    _           := false

    -- split
        zero        _           := true
        (succ n)    (succ m)    := n <=? m
        (succ _)    zero        := false
        (succ _)    (succ _)    := false

    -- swap
        zero        _           := true
        (succ _)    zero        := false
        (succ n)    (succ m)    := n <=? m
        (succ _)    (succ _)    := false

    -- remove shadowed
        zero        _           := true
        (succ _)    zero        := false
        (succ n)    (succ m)    := n <=? m

This pattern match can be directly compiled into the nested branching where all
cases are covered. I.e. the pattern match is exhaustive. Each of the original
clauses occurs at least once, therefore there are no non-reachable clauses.


.. code-block::

    nodups: List Nat -> List Nat := case
        (x :: y :: tl) :=
            if x =? y then res else y :: res where
                res := nodups (y :: tl)
        xs :=
            xs

    nodups: List Nat -> List Nat := case
        [] :=
            []
        (x :: xs) :=
            inspect xs case
                [] :=
                    [x]
                (y :: tl) :=
                    if x =? y then res else y :: res
                    where res := nodups (y :: tl)


Transformation into canonical form:

.. code-block::

    -- original
        (x :: y :: tl)  := exp
        xs              := xs

    -- split
        (x :: y :: tl)  := exp
        []              := []
        (x :: xs)       := x :: xs

    -- swap
        []              := []
        (x :: y :: tl)  := exp
        (x :: xs)       := x :: xs

    -- split
        []              := []
        (x :: y :: tl)  := exp
        (x :: [])       := x :: []
        (x :: y :: tl)  := x :: y :: xs

    -- swap
        []              := []
        (x :: [])       := x :: []
        (x :: y :: tl)  := exp
        (x :: y :: tl)  := x :: y :: xs

    -- remove shadowed
        []              := []
        (x :: [])       := x :: []
        (x :: y :: tl)  := exp


.. code-block::

    map2 {A B C: Any} (f: A -> B -> C): List A -> List B -> List C
    := case
        (x :: xs) (y :: ys) :=
            f x y :: map2 xs xs
        _ _ :=
            []

    map2 {A B C: Any} (f: A -> B -> C): List A -> List B -> List C
    := case
        [] _ :=
            []
        (x :: xs) :=
            case
                [] :=
                    []
                (y :: ys) :=
                    f x y :: map2 xs ys


Transformation into canonical form

.. code-block::

    -- original
        (x :: xs) (y :: ys) := exp
        _ _  := []

    -- split
        (x :: xs) (y :: ys) := exp
        []        _  := []
        (x :: xs) _  := []

    -- swap
        []        _  := []
        (x :: xs) (y :: ys) := exp
        (x :: xs) _  := []

    -- split
        []        _  := []
        (x :: xs) (y :: ys) := exp
        (x :: xs) []  := []
        (x :: xs) (y :: ys) := []

    -- swap
        []        _   := []
        (x :: xs) []  := []
        (x :: xs) (y :: ys) := exp
        (x :: xs) (y :: ys)  := []

    -- remove shadowed
        []        _   := []
        (x :: xs) []  := []
        (x :: xs) (y :: ys) := exp


Modified Pattern Compiler
============================================================

Let's revisit the example::

    (<=?): Natural -> Natural -> Bool := case
        (succ n)    (succ m)    := n <=? m
        zero        _           := true
        _           _           := false

The first and the second case are inverted. Since the constructors are
different, we can swap them::

    zero        _           := true
    (succ n)    (succ m)    := n <=? m
    _           _           := false


Now for the first argument sufficient cases are available. Looking only at the
first argument, the last case is redundant. However the second argument still
needs a case for ``zero``. I.e. we have to factor out the corresponding case::

    zero        _           := true
    (succ n)    (succ m)    := n <=? m
    _           zero        := false
    _           _           := false


We cannot yet swap. Before we have to put in the third line the same constructor
as in the previous line because we don't have catch all cases (i.e. variable
pattern) not as the last case.::

    -- split
    zero        _           := true
    (succ n)    (succ m)    := n <=? m
    (succ n)    zero        := false
    _           _           := false

    -- and then swap
    zero        _           := true
    (succ n)    zero        := false
    (succ n)    (succ m)    := n <=? m
    _           _           := false

Now the last case is unreachable and can be deleted::

    zero        _           := true
    (succ n)    zero        := false
    (succ n)    (succ m)    := n <=? m