Multi-stage Programming for Mainstream Languages
Edwin Westbrook Mathias Ricken Jun Inoue Yilong Yao Tamer Abdelatif1 Walid Taha
Rice University {emw4,mgricken,ji2,yy3}@cs.rice.edu, eng.tamerabdo@gmail.com, taha@cs.rice.edu
Abstract
Multi-stage programming (MSP) constructs enable a disciplined approach to program generation. In the purely functional setting, it is possible to statically type-check MSP constructs to ensure that they can only generate well-typed programs. Despite numerous attempts, it has been difficult to extend this guarantee in the presence of key features of mainstream languages, especially imperative constructs. This paper proposes a new method for achieving this guarantee and shows that it is powerful enough to express classic applications of MSP in Java. Our key insight is that safety can be regained by ensuring that the bodies of escapes are weakly separable from the rest of the code. This means that computational effects occurring inside an escape can only be visible outside the escape through types guaranteed to not contain code. Our method is simpler than prior proposals, and we expect that it can be intuitively understood by programmers. We formalize a calculus to demonstrate the soundness of the proposed approach. An implementation called Mint, which extends the Java OpenJDK compiler, is used to validate both the expressivity of the system and the performance gains attainable by using MSP in this setting. Categories and Subject Descriptors D.3.1 [Programming Languages]: Formal Definitions and Theory; D.3.3 [Programming Languages]: Language Constructs and Features General Terms Languages Keywords Multi-staged languages, Multi-stage programming, Type systems, Java
a need for a type system that makes MSP accessible to general programmers and domain experts. 1.1 Contributions
To address this need, we propose a new approach to type-safe MSP. We argue that this approach is better suited for type-safe MSP in mainstream language than previous proposals. Our contributions include:
• A minimal language extension to support MSP in Java, combin-
ing the three standard MSP constructs with a reflection library of staged versions of the standard Java reflection classes (Section 2.2).
• The notion of weak separability, which limits the computational
effects that can occur inside the bodies of escapes (Section 3). Weak separability is enforced by a small set of restrictions that ensure that any effects that can be observed outside escape expressions do not involve code objects. We expect that the restrictions will be easily and intuitively understood by mainstream programmers.
• Demonstration of the expressivity of a language with these
restrictions through both standard pure examples and examples with imperative features (Section 4).
• A type system based on weak separability, an operational se-
1.
Introduction
mantics that formalizes the runtime behavior of an objectoriented MSP language with effects, and a proof that running any well-typed program is guaranteed to be free of any runtime errors, including possible scope extrusion and generation (and execution) of ill-formed code (Section 5). Full proofs are available in Appendix A.
• An implementation of this proposal, published online at
Multi-stage programming (MSP) languages provide a hygienic quasi-quotation mechanism intended for program generation. Hygiene ensures that generated programs are free of accidental variable capture, a problem that makes using strings to generate programs in preprocessors like cpp notoriously hard to use. Research on functional languages has shown that it is possible to statically check MSP programs to ensure that they can only be used to generate well-typed programs [22, 23, 4]. Unfortunately, extending this static typing guarantee to mainstream languages has proved to be challenging. In particular, standard features of mainstream languages, such as imperative assignment, lead to scope extrusion, in which variables in code escape the scopes where they are defined. Several approaches to solving this problem have been proposed. Two of these proposals use record polymorphism and index the types of code objects with their free variables [13, 1], while another one uses delimited control to express effects and to limit their scope [11]. These are powerful systems that give the expert MSP user fine-grained control over scoping in code. However, there is still
1 Ain
http://plresearch.org/JavaMint (Section 6). The implementation is based on the Java OpenJDK compiler from Sun Microsystems.
• Validation of the performance impact of MSP in Mint, showing
that it is consistent with prior studies (Section 7). 1.2 Comparisons with Related Efforts
Shams University.
Several efforts have been made to accommodate effects in the context of multi-stage programming, as well as to accommodate object-oriented features. In what follows we summarize the most closely related efforts. Early efforts to develop sound type systems for MSP languages with effects focused on introducing imperative features to functional MSP languages [22, 3, 2, 13, 11]. All of these support manipulation of open terms and guarantee well-formedness of the generated code, but they significantly differ in the approaches and extents to which they support effects. Calcagno et al. [3] allows imperative operations on codes but do not support imperative operations on open terms. Kim et al. [13] support unrestricted imperative op-
public static Integer power ( Integer x , Integer n ) { if ( n == 1) return x ; else return x * power (x , n -1); }
public static Code < Integer > spower ( Code < Integer > x , int n ) { if ( n == 1) return x ; else return <| ‘x * ‘( spower (x , n -1)) | >; } public static abstract class PowerFun { public abstract int apply ( int x ); } Code extends PowerFun > CodePower17 = <| new PowerFun () { public int apply ( final int x ) { return ‘( spower ( <| x | > , 17)); } } | >; PowerFun spower17 = CodePower17 . run (); int val = spower17 . apply (2);
Figure 1. The unstaged power function erations on open terms but choose not to provide α-equivalence for future-stage code. Their system delegates hygiene to a specialized binder λ∗ , whose operation can be explained only in terms of an implicit “gensym.” They present an inferable polymorphic type system. Ancona and Moggi [2] incorporate imperative operations on open terms and provide hygiene. The imperative primitive in all of these, except for Kameyama et al. [11], are ML-style “boxed” references, which is not in line with Java semantics. Pervasive, unboxed references, an essential feature of Java, exascerbate the problem. (See Section 3 for a detailed discussion.) Kameyama et al. [11] use delimited control as their imperative primitive, which is more general than mutable stores. They maintain hygiene and support imperative operations on open terms, but they choose not to allow any side effect to occur inside a future-stage binder that is visible from the code outside. Until recently, efforts to introduce MSP to the object-oriented setting focused on engineering aspects. The staged extensions of Java by Schultz et al. [16], Kamin et al. [12], and Zook et al. [24] focus on implementation, applications, and on quantifying the performance benefits. These extensions were not formalized. Neverov and Roe [14] formalize a core typed, Java-like calculus but leave the type soundness unproved. Their calculus also does not have side effects. Huang et al. [9] state that their system guarantees well-formedness and well-typedness of generated code, but they do not prove such a result or formalize their system. In later work, Huang et al. [8] focus on reflection, and do not allow manipulation of arbitrary code values (in particular open terms). They prove soundness, but their system does not model side effects. Aktemur [1] and Kim et al. [13] rely on a form of record typing that makes the type and the type system complex. As such, our approach is closest to that of Calcagno et al. [3], in which code values involved in effects are checked against certain closedness criteria. In contrast, we identify and solve the problem of finding an appropriate notion that can work with Java’s complex object model.
Figure 2. The staged power function. Code objects can be escaped or run. Escapes are written as ‘ and allow code objects to be spliced into other brackets to create bigger code objects. For example,
Code < Integer > x = <| 2 + 3 | >; Code < Integer > y = <| 1 + ‘x | >;
stores <| 1 + (2 + 3) |> into y. Run is provided as a method run() that code objects support. For example, executing
int z = y . run ();
2. Programming in Mint
As noted earlier, Mint extends Java with three MSP constructs and a library of staged reflection primitives. The guiding principle in Mint’s design is parsimony. In this section we introduce the design from the programmer’s perspective. 2.1 Staging Constructs
after the above example sets z to 6. Basic MSP in Mint can be illustrated using the classic power function example. Figure 1 displays the unstaged power function in Java. Figure 2 displays a staged version. This staged method spower takes in an argument x that is a piece of code for an integer, along with an integer n, and returns a piece of code that multiplies x by itself n times. To use spower we create code for an anonymous inner class PowerFun. The generated class, which is assigned to the variable CodePower17, has an apply method with a body that is generated by spower called with exponent 17. This creates code that multiplies the input by itself 17 times. The code CodePower17 is then compiled and run with the run() method, which produces a PowerFun and assigns it to spower17, and finally val is bound to the result of calling the apply method of spower17 on 2, computing 217 . 2.2 Staged Reflection Primitives
Mint extends Java 1.6 with the three standard MSP constructs: brackets, escape, and run [22, 23, 4]. Brackets are written as <| |> and delay the enclosed computation by returning it as a code object. For example, <| 2 + 3 |> is a value. Brackets can contain a block of statements if it surrounded by curly brackets:
<| { C . foo (); C . bar (); } | >;
Neverov observed that staging and reflection in languages like C# and Java can be highly synergistic [14]. He also noticed that fully exploiting this synergy requires providing a special library of staged reflection primitives. Mint provides such a library. The primitives are based on those in the standard reflection primitives in the Java library, including the Class and Field classes. 1 To represent these in Mint, the library adds two corresponding types, ClassCode and FieldCode. The ClassCode type is indexed by the class itself, just like the type it is modelled after. For example, the corresponding class for Integer objects has type ClassCode. Any ClassCode object provides methods
1 The
Code objects have type Code, where T is the type of the expression contained. For example, <| 2 |> has type Code. A bracketed block of statements always has type Code.
Mint reflection library does not support all reflection primitives. For example, the Method and Constructor require multiple arguments. This requires adding indexed types to Java, and is therefore outside the scope of this work.
for manipulating class corresponding to the methods of Class. For example, the cast method of ClassCode takes any code object of type Code