OutsideIn(X) Modular type inference with local assumptions


Download OutsideIn(X) Modular type inference with local assumptions


Preview text

Under consideration for publication in J. Functional Programming

1

OutsideIn(X) Modular type inference with local assumptions
24 March 2011
Dimitrios Vytiniotis
Microsoft Research
Simon Peyton Jones
Microsoft Research
Tom Schrijvers
Universiteit Gent
Martin Sulzmann
Informatik Consulting Systems AG
Abstract Advanced type system features, such as GADTs, type classes, and type families have proven to be invaluable language extensions for ensuring data invariants and program correctness among others. Unfortunately, they pose a tough problem for type inference, because they introduce local type assumptions.
In this article we present a novel constraint-based type inference approach for local type assumptions. Our system, called OutsideIn(X), is parameterised over the particular underlying constraint domain X, in the same way as HM(X). This stratification allows us to use a common metatheory and inference algorithm.
Going beyond the general framework, we also give a particular constraint solver for X = type classes + GADTs + type families, a non-trivial challenge in its own right.

2

Vytiniotis, Peyton Jones, Schrijvers, Sulzmann

Contents

1 Introduction

4

2 The challenge we address

6

2.1 Modular type inference and principal types

8

2.2 The challenge of local constraints

8

2.3 The challenge of axiom schemes

9

2.4 Recovering principal types by enriching the type syntax

10

2.5 Summary

11

3 Constraint-based type systems

11

3.1 Syntax

11

3.2 Typing rules

13

3.3 Type soundness

15

3.4 Type inference, informally

16

3.5 Type inference, precisely

17

3.6 Soundness and principality of type inference

20

4 Constraint-based type systems with local assumptions

24

4.1 Data constructors with local constraints

24

4.2 let should not be generalized

27

4.3 The lack of principal types

34

5 Type inference with OutsideIn(X)

35

5.1 Type inference, informally

35

5.2 Overview of the OutsideIn(X) solving algorithm

36

5.3 Top-level algorithmic rules

38

5.4 Generating constraints

39

5.5 Solving constraints

40

5.6 Variations on the design

42

5.7 Soundness and principality of type inference

43

6 Incompleteness and ambiguity

46

6.1 Incompleteness due to ambiguity

46

6.2 Incompleteness due to inconsistency

47

6.3 Incompleteness of the OutsideIn(X) strategy

48

6.4 Guess-free completeness

49

6.5 Our position on incompleteness and ambiguity

50

7 Instantiating X for GADTs, type classes, and type families

51

7.1 The entailment relation

52

7.2 Solving equality constraints is tricky

53

7.3 The simplifier

54

7.4 Rewriting constraints

56

7.5 The rule SIMPLES

65

7.6 Soundness and principality

66

7.7 Termination

67

8 Implementation

69

8.1 Evidence

70

8.2 Brief sketch of the implementation

70

OutsideIn(X)

3

9 Related work

72

9.1 Constraint-based type inference

72

9.2 The special case of GADTs

73

9.3 The special case of multi-parameter type classes

75

9.4 Solving equalities involving type families

75

9.5 Let generalization for units of measure and type families

76

9.6 Ambiguity

77

9.7 Is the emphasis on principal types well-justified?

78

10 Future Work

79

References

79

4

Vytiniotis, Peyton Jones, Schrijvers, Sulzmann

1 Introduction
The Hindley-Milner type system (Damas & Milner, 1982; Milner, 1978) is a masterpiece of design. It offered a big step forward in expressiveness (parametric polymorphism) at very low cost. The cost is low in several dimensions: the type system is technically easy to describe, and a straightforward inference algorithm is both sound and complete with respect to the specification. And it does all this for programs with no type annotations at all!
Over the following thirty years, type systems have advanced rapidly, both in expressiveness and (less happily) in complexity. One particular direction in which they have advanced lies in the type constraints that they admit, such as type classes, implicit parameters, record constraints, subtype constraints, and non-structural equality constraints. Type inference obviously involves constraint solving, but it is natural to ask whether one can design a system that is somehow independent of the particular constraint system. The HM(X) system embodies precisely such an approach, by abstracting over the constraint domain “X” (Odersky et al., 1999).
However HM(X) is not expressive enough to describe the systems we need. The principal difficulty is caused by so-called local constraints. By local constraints we mean type constraints that hold in some parts of the program but not others. Consider, for example, the following program, which uses a Generalised Algebraic Data Type (GADT), a recently-introduced and wildly popular feature of Haskell (Peyton Jones et al., 2006):
data T :: * -> * where T1 :: Int -> T Bool T2 :: T a

test (T1 n) _ = n > 0

test T2

r=r

The pattern match on T1 introduces a local constraint that the type of (T1 n) be equal to T1 Bool inside the body n > 0. But that constraint need not hold outside that first pattern match. In fact, the second line for test allows any type T a for its first argument.
What type should be inferred for function test? Alas, there are two possible most-general System F types, neither of which is an instance of the other1:

test :: ∀a . T a → Bool → Bool test :: ∀a . T a → a → a

The second type for test arises from the fact that if its first argument has type T a, then the first branch allows the return type Bool to be replaced with a, since the local constraint that T a is equal to T Bool holds in that branch (although not elsewhere). The loss of principal types is both well-known and unavoidable (Cheney & Hinze, 2003).

1 We write “System F” since the answer to this question varies when more or less expressive types are considered.

OutsideIn(X)

5

A variety of papers have tackled the problem of local constraints in the specific context of GADTs, by a combination of user-supplied type annotations and/or constraint-based inference (Peyton Jones et al., 2006; Pottier & R´egis-Gianas, 2006; Simonet & Pottier, 2007; Sulzmann et al., 2008). Unfortunately, none of these approaches is satisfying, even to their originators, for a variety of reasons (Section 9). Simonet and Pottier give an excellent summary in the closing sentences of their paper (Simonet & Pottier, 2007):
We believe that one should, instead, strive to produce simpler constraints, whose satisfiability can be efficiently determined by a (correct and complete) solver. Inspired by Peyton Jones et al.’s wobbly types (Peyton Jones et al., 2006), recent work by Pottier and R´egisGianas (2006) proposes one way of doing so, by relying on explicit, user-provided type annotations and on an ad hoc local shape inference phase. It would be interesting to know whether it is possible to do better, that is, not to rely on an ad hoc preprocessing phase.
Our system produces simpler constraints than the constraints Simonet and Pottier use, and does not rely on an ad hoc local shape inference phase. Furthermore, it does this not only for the specific case of GADTs, but for the general case of an arbitrary constraint system in the style of HM(X). Specifically, we make the following contributions:
• We describe a constraint-based type system that, like HM(X), is parameterised over the underlying constraint system X (Section 4), and includes:
— Data constructors with existential type variables. — Data constructors that introduce local constraints, of which GADTs are
a special case. — Type signatures on local let-bound definitions. — Top-level axiom schemes (such as Haskell’s instance declarations).
These extensions offer substantially improved expressiveness, but at significant cost to the specification and implementation. Local constraints from data constructors or signatures are certainly not part of HM(X); existential variables can sometimes be accommodated by the techniques found in some presentations (Pottier & R´emy, 2005), and top-level axiom schemes is only part of Mark Jones’ qualified types (Jones, 1992). • While developing our type system we show a surprising result: while sound and complete implicit generalisation for local let bindings is straightforward in Hindley-Milner, it becomes prohibitively complicated when combined with a rich constraint system that includes local assumptions (Section 4.2). Happily, we demonstrate that local generalisation is almost never used, and when it absolutely has to be used, a local type signature makes these complications go away. Thus motivated, albeit controversially, we propose to simplify the language by removing implicit generalisation of local let bindings. • We give an inference algorithm, OutsideIn(X), that is stratified into (a) an inference engine that is independent of the constraint system X, and (b) a constraint solver for X itself (Section 5). We show that our approach is not ad-hoc: any program accepted by our algorithm can be typed with a principal type in the simple natural constraint-based type system. Previous

6

Vytiniotis, Peyton Jones, Schrijvers, Sulzmann

work (Peyton Jones et al., 2006; Pottier & R´egis-Gianas, 2006) only infers principal types with respect to specialized type systems, but not with respect to the natural constraint-based type system. • A particularly useful, but particularly delicate, class of constraints are typeequality constraints, including those introduced by GADTs, and by type-level functions (type families in the Haskell jargon). Section 7 gives a concrete instantiation of X with the constraints arising from these features, as well as with the more traditional Haskell type class constraints. Our concrete solver subsumes and simplifies our previous work on solving constraints involving type families (Schrijvers et al., 2008a), and is now part of the GHC implementation.
This paper draws together, in a single uniform framework, the results of a multiyear research project, documented in several earlier papers (Peyton Jones et al., 2006; Schrijvers et al., 2007; Peyton Jones et al., 2004; Schrijvers et al., 2008a; Schrijvers et al., 2009; Vytiniotis et al., 2010). By taking the broader perspective of abstracting over the constraint domain “X” we hope to bring out the core challenges in higher relief, and contribute towards building a single generic solution rather than a multitude of ad-hoc compromises.
The principal shortcoming of our system is shared by every other paper on the subject: an unsatisfactory account of ambiguity, a notion first characterized by Jones (1993). We discuss the issue, along with a detailed account of incompleteness, in Section 6. There is also a good deal of related work, which we describe in Section 9.

2 The challenge we address
We begin by briefly reviewing part of the type-system landscape, to identify the problems that we tackle.
The vanilla Hindley-Milner system has just one form of type constraint, namely the equality of two types, which we write τ1 ∼ τ2. For example, the application of a function of type τ1 → τ2 to an argument of type τ3 gives rise to an equality constraint τ1 ∼ τ3. These equality constraints are structural and hence can be solved easily by unification, with unique most general solutions. However, subsequent developments have added many new forms of type constraints:
• Haskell’s type classes add type-class constraints (Jones, 1992; Wadler & Blott, 1989; Hall et al., 1996). For example, the constraint Eq τ requires that the type τ be an instance of the class Eq. Haskell also allows types that quantify over constraints (often called qualified types in the HM(X) literature). For example the member function has type
member :: Eq a => a -> [a] -> Bool
which says that member may be called at any type τ , but that the constraint Eq τ must be satisfied at the call site.

OutsideIn(X)

7

• An early extension to the original definition of Haskell 98 was to allow multiparameter type classes, which Mark Jones subsequently extended with functional dependencies (Jones, 2000). This pair of features turned out to be tremendously useful in practice, and gave rise to a whole cottage industry of programming techniques that amount to performing arbitrary computation at the type level. We omit the details here but the underlying idea was that the conjunction of two class constraints C τ υ1 and C τ υ2 gives rise to an additional equality constraint υ1 ∼ υ2.
• Generalised Algebraic Data Types (GADTs) added a substantial new twist to equality constraints by supporting local equalities introduced by pattern matching (Xi et al., 2003; Peyton Jones et al., 2006). User type signatures with constrained types have a similar effect, also introducing local assumptions. We will discuss GADTs further in Section 2.
• Kennedy’s thesis (Kennedy, 1996) describes how to accommodate units of measure in the type system so that one may write
calcDistance :: num (m/s) -> num s -> num m calcDistance speed time = speed * time
thereby ensuring that the first argument is a speed in metres/second, and similarly for the other argument and result. The system supports polymorphism, for example
(*) :: num u1 -> num u2 -> num (u1*u2)
There is, necessarily, a non-structural notion of type equality. For example, to type check the definition of calcDistance the type engine must reason that (m/s)*s ∼ m. This is an ordinary equality constraint, but there is now a nonstandard equality theory so the constraint solver becomes more complicated. • More recently, inspired by object-oriented languages, we have proposed and implemented a notion of associated types in Haskell (Chakravarty et al., 2005b; Chakravarty et al., 2005a). The core feature is that of a type family (Kiselyov et al., 2010). For instance the user may declare type family axioms:
type family F :: * -> *

type instance F [a] = F a type instance F Bool = Int
In this example F is a type family with two defining axioms, F [a] ∼ F a and F Bool ∼ Int. This means that any expression of type F [Bool] can be considered as an expression of type F Bool (using the first axiom), which in turn can be considered as having type Int (using the second axiom). Hence, like in the case of units of measure, equalities involving type families are also non-structural.
Type inference for HM(X) is tractable: it boils down to constraint solving for existentially-quantified conjunctions of primitive constraints. However, the type system features discussed above go beyond HM(X), in two particular ways. First,

8

Vytiniotis, Peyton Jones, Schrijvers, Sulzmann

GADTs bring into scope local type constraints, and existentially-quantified variables (Section 2.2). Second, we must allow top-level axiom schemes, such as Haskell’s instance declarations (Section 2.3). In this paper we address modular type inference with principal types for the aforementioned type system features and beyond. In the following sections we explain the problem in more detail, but we begin with a brief discussion of principal types to set the scene.

2.1 Modular type inference and principal types
Consider this expression, which defines f without a type signature, expecting the type inference engine to figure out a suitable type for f:
let f x = in
It is very desirable that type inference should be modular ; that is, it can infer a type for f by looking only at f’s definition, and not at its uses in . After all, might be very large or, in a system supporting separate compilation, f’s definition might be in a library, while its call sites might be in other modules.
In a language supporting polymorphism it is common for f to have many types; for example, reverse has types [Int] → [Int] and [Bool] → [Bool]. For modular type inference to work, it vital for f to have a unique principal type, that is more general than all its other types. For example reverse :: ∀a.[a] → [a].
When the programmer supplies an explicit type signature, the issue does not arise: we should simply check that the function indeed has the specified type, and use that type at each call site. However, for modular type inference, we seek a type system for which an un-annotated definition has a unique principal type – or else is rejected altogether. The latter is acceptable, because the programmer can always resolve the ambiguity by adding a type signature.

2.2 The challenge of local constraints
Generalized Algebraic Data Types (GADTs) have proved extremely popular with programmers, but they present the type inference engine with tricky choices. Notably, as mentioned in the introduction, functions involving GADTs may lack a principal type. Recall the example:
data T :: * -> * where T1 :: Int -> T Bool T2 :: T a

test (T1 n) _ = n > 0

test T2

r=r

One can see that test has two possible most-general System F types, neither of which is an instance of the other:

test :: ∀a . T a → Bool → Bool test :: ∀a . T a → a → a

OutsideIn(X)

9

Since test has no principal type we argue that rather than making an arbitrary choice, the type inference engine should reject the program. The programmer can fix the problem by specifying the desired type with an explicit type signature, such as:

test :: T a -> Bool -> Bool

But exactly which GADT programs should be rejected? For example, consider test2:

test2 (T1 n) _ = n > 0

test2 T2

r = not r

Since T2 is an ordinary (non-GADT) data constructor, the only possible type of r is Bool, so the programmer might be confused at being required to say so.

2.3 The challenge of axiom schemes
A further challenge for type inference are top-level universally quantified constraints, which we refer to as axiom schemes. Early work by Fax´en (Fax´en, 2003) already shows how the interaction between type signatures and axiom schemes can lead to the loss of principal types. Here is another example of the same phenomenon, taken from (Sulzmann et al., 2006a):
class Foo a b where foo :: a -> b -> Int instance Foo Int b instance Foo a b => Foo [a] b
g y = let h :: forall c. c -> Int h x = foo y x
in h True
In this example the two instance declarations give rise to two axiom schemes. The instance Foo Int b provides the constraint Foo Int b for any possible b. Similarly, the second instance provides the constraint Foo [a] b for any possible a and b, as long as we can show Foo a b.
Suppose now that y has type [Int]. Then, the inner expression foo y x gives rise to the constraint Foo [Int] c This constraint can be reduced via the above instance declarations. and thus the program type checks. We can generalize this example and conclude that function g can be given the infinite set of types
g :: Int → Int g :: [Int] → Int g :: [[Int]] → Int g :: . . .
but there is no most general Haskell type for g. Here is a similar example, this time showing the interaction of type families with
existential data types (L¨aufer & Odersky, 1994), yet another extension of vanilla HM(X).

10

Vytiniotis, Peyton Jones, Schrijvers, Sulzmann

type family FB :: * -> * -> * type instance FB Int b = Bool type instance FB [a] b = FB a b

data Bar a where K :: a -> b -> Bar a

h (K x y) = not (fb x y)

-- Assume fb :: a -> b -> FB a b

--

not :: Bool -> Bool

To type check the body of h we must be able to determine some type variable αx (for the type of x) such that, for any b (the type of y), the result type of the call to fb is compatible with not; that is: FB αx b ∼ Bool. In combination with the above type instances, function h can be given the infinite set of types
h :: Bar Int → Bool h :: Bar [Int] → Bool h :: Bar [[Int]] → Bool h :: . . .
Unsurprisingly, axiom schemes combined with local assumptions are no better, even in the absence of GADTs. Consider the following example, using only type classes.

class C a class B a b where op :: a -> b instance C a => B a [a]

data R a where MkR :: C a => a -> T a

k (MkR x) = op x
The function k has both these (incomparable) types:
k :: ∀ab . B a b ⇒ R a → b k :: ∀a . R a → [a]
The first is straightforward; it makes no use of the local (C a) constraint in MkR’s type. The second fixes the return type to be [a], but in return it can make use of the local (C a) constraint to discharge the constraint arising from the call of op.

2.4 Recovering principal types by enriching the type syntax
There is a well-known recipe for recovering principal types ((Simonet & Pottier, 2007)): enrich the language of types to allow quantification over constraint schemes and implications. To be concrete, here are the principal types of the problematic

Preparing to load PDF file. please wait...

0 of 0
100%
OutsideIn(X) Modular type inference with local assumptions