From a34f2c4979877d2d824309f03db070a706144b51 Mon Sep 17 00:00:00 2001 From: Michael Kay Date: Tue, 16 Oct 2018 10:12:37 +0100 Subject: [PATCH 1/2] Proposal (3 options) for concise inline function syntax Proposes various options for simplified syntax for declaring inline functions, and makes recommendations --- concise-inline-functions.md | 57 +++++++++++++++++++++++++++++++++++++ 1 file changed, 57 insertions(+) create mode 100644 concise-inline-functions.md diff --git a/concise-inline-functions.md b/concise-inline-functions.md new file mode 100644 index 0000000..d3c48a1 --- /dev/null +++ b/concise-inline-functions.md @@ -0,0 +1,57 @@ +# New Conditional Expression Syntax + +**Author**: Michael Kay, Saxonica + +Proposal for extensions to the unary and binary lookup operators. + + +## Description + +The lookup operator in XPath 3.1 allows the RHS operand to be any of the following: + +* NCName +* Integer literal +* Parenthesized expression +* "*" + +This proposal adds three further options + +* String literal +* Variable reference + +The effect of using a string literal or variable reference is the same as using a parenthesized expression containing the string literal or variable reference: it enables you to write `$emp?"date of birth"` in place of `$emp?("date of birth")`, and `$array?$i` in place of `$array?($i)`. + +Note: the justification for not allowing a general expression on the RHS of the lookup operator is essentially so that an unquoted string (an NCName) can be treated differently from its usual meaning of `child::elemName`. It also allows special meaning to be attached to `*` as the right-hand operand. Integer literals were allowed without parentheses because they are clearly useful and clearly cause no parsing problems. The same argument applies to string literals and variable references; the parentheses in these cases are totally unnecessary, and all this proposal does it to remove the need for them. + + + +## Use Cases + +The two new options bring no new functionality but some extra conciseness. See examples below. + +Note that `$array?$i` means `$array?($i)` which is not quite the same as `$array($i)` - the meaning is different if `$array` is a sequence of arrays rather than a single array. Avoiding the parentheses differentiates the two expressions more clearly. + +## Examples + +Map lookup using a string-valued key containing spaces can now be written + +``` +$map?"date of birth" +$map?"123-reg" +``` + +Such keys arise commonly with JSON data. In XPath 3.1 the string literal must be enclosed by parentheses. + +Similarly, map lookup and array lookup where the key is held in a variable no longer need parentheses: + +``` +for $i in 1 to 50 return $array?$i +for $k in map:keys($map1) return $map2?$k +``` + +Disadvantages: the benefits are fairly cosmetic and users may feel that the benefits do not compensate for the costs of being non-standard. + + +## Grammar + +TBA. From 95676fd84266c13c5a4ace01af69783dd017a5c9 Mon Sep 17 00:00:00 2001 From: Michael Kay Date: Tue, 16 Oct 2018 10:31:39 +0100 Subject: [PATCH 2/2] Proposal for new syntax for inline functions (Entire document replaced; previous version was the wrong document) --- concise-inline-functions.md | 126 +++++++++++++++++++++++++++++------- 1 file changed, 102 insertions(+), 24 deletions(-) diff --git a/concise-inline-functions.md b/concise-inline-functions.md index d3c48a1..6a7dd57 100644 --- a/concise-inline-functions.md +++ b/concise-inline-functions.md @@ -1,55 +1,133 @@ -# New Conditional Expression Syntax +# Concise Inline Functions **Author**: Michael Kay, Saxonica -Proposal for extensions to the unary and binary lookup operators. +Discussion of alternative syntax for inline functions, and a proposal. ## Description -The lookup operator in XPath 3.1 allows the RHS operand to be any of the following: +The inline function syntax for XPath/XQuery 3.1 is clumsy and verbose. No one wants to write -* NCName -* Integer literal -* Parenthesized expression -* "*" +``` +sort(//employee, function($emp as element(employee)) as xs:decimal {$emp/@salary}) +``` +when one could write, for example +``` +sort(//employee, 【@salary】) +``` +where【@salary】 is some simple function representation whose concrete syntax is discussed in this proposal. + +The benefits of a more concise representation of inline functions are not merely cosmetic. Higher-order functions are a tough concept for people to +get to grips with, and a simpler intuitive syntax will aid understanding and uptake. No one has problems with +XPath's simple predicate syntax `//emp[@salary > 20000]`, and the reason is that no-one needs to think of the +predicate as a function or as the filter expression containing a higher-order operator. They just learn the syntax, +and it works. + +In this working paper I present three alternatives for inline functions (each with possible syntax alternatives), +and then make recommendations. + +### Focus Functions -This proposal adds three further options +A focus function (like a predicate in a filter expression) is a function with a single argument, of type `item()`, which +is bound to the context item ".". Many of the contexts where inline functions are useful (for example, +the arguments to `fn:filter`, `fn:sort`, `fn:for-each`, and the proposed `houtil:highest`, `houtil:lowest`, +`houtil:index-of`, `houtil:before-first`, etc, are functions of arity one where the argument is a single item. +A concise notation for this special case would therefore be widely applicable. Above, I used the abstract +syntax `【@salary】` to represent such a function. Here we will discuss possible concrete syntax. -* String literal -* Variable reference +Whatever the concrete syntax, the semantics of `【EXPR】` are defined to be equivalent to `function($x as item()) as item()*{$x!EXPR}`, +where `EXPR` is any expression, and `$x` is a variable name that is otherwise unused. Note in particular that like +regular inline functions, the function has a closure and can thus refer to variables declared outside the function itself. -The effect of using a string literal or variable reference is the same as using a parenthesized expression containing the string literal or variable reference: it enables you to write `$emp?"date of birth"` in place of `$emp?("date of birth")`, and `$array?$i` in place of `$array?($i)`. +Alternatives for concrete syntax: -Note: the justification for not allowing a general expression on the RHS of the lookup operator is essentially so that an unquoted string (an NCName) can be treated differently from its usual meaning of `child::elemName`. It also allows special meaning to be attached to `*` as the right-hand operand. Integer literals were allowed without parentheses because they are clearly useful and clearly cause no parsing problems. The same argument applies to string literals and variable references; the parentheses in these cases are totally unnecessary, and all this proposal does it to remove the need for them. +- `{ EXPR }` - Bare braces. Nice and simple, parses without difficulty. Might clash with other existing extensions (e.g. JSONiq), might reduce options available for other language extensions in the future, might lead to syntax errors being difficult to diagnose. +- `fn{ EXPR }` - currently implemented as an extension in Saxon. Easily recognized visually, easy to parse, low risk of conflict with other grammatical constructs; but not particularly appealing visually. +- `{| EXPR |}` - or various other possibilities for composite brackets. +- `-> EXPR` - for example `sort(employee, ->@salary)`. Concise; the arrow symbol is associated with inline functions in other languages; easy to parse; visually distinctive; slightly suggestive of the semantics. +I'm going to propose we adopt the arrow notation. The grammar here is that `ExprSingle` is extended to allow an additional option `FocusFunction` whose syntax is `"->" ExprSingle`. -## Use Cases +Other examples of this syntax: -The two new options bring no new functionality but some extra conciseness. See examples below. +* `filter(//employee, ->@salary gt 20000)` +* `houtil:index-of(*, ->boolean(self::h1))` -Note that `$array?$i` means `$array?($i)` which is not quite the same as `$array($i)` - the meaning is different if `$array` is a sequence of arrays rather than a single array. Avoiding the parentheses differentiates the two expressions more clearly. +Like other constructs that rely on the context item, it breaks down if you want to do more complex things like joins. It's a simple syntax for simple cases. -## Examples +### Short Inline Functions -Map lookup using a string-valued key containing spaces can now be written +This proposal is for a slightly more powerful syntax that can handle multiple arguments, and arguments of any sequence type, while still being considerably more concise than the current inline function syntax. + +The idea is to bind the arguments of a function to the variables `$1`, `$2`, etc, thus avoiding the need for explicit declarations of arguments. + +For example: `map:for-each($map, ->($1 || "=" || $2))` + +The higher-order function `map:for-each` takes a second argument that is a function designed to process key-value pairs, so it has signature +`function($k as xs:anyAtomicType, $v as item()*) as item()*`; this example supplies a function that concatenates the key and value +into a single string. The parentheses around the function body are not essential, but are added for readability. I have again used the same syntax, +a leading arrow, as the syntactic marker, but other choices are possible. + +This example appears in the F&O specification: ``` -$map?"date of birth" -$map?"123-reg" +let $dimensions := map{'height': 3, 'width': 4, 'depth': 5} +return + { + map:for-each($dimensions, function ($k, $v) { attribute {$k} {$v} }) + } ``` -Such keys arise commonly with JSON data. In XPath 3.1 the string literal must be enclosed by parentheses. +This can now be rewritten: -Similarly, map lookup and array lookup where the key is held in a variable no longer need parentheses: +``` +let $dimensions := map{'height': 3, 'width': 4, 'depth': 5} +return + { + map:for-each($dimensions, -> attribute {$1} {$2}) + } +``` + +Detailed rules: + +* The grammar introduces a new construct `ParamRef` which is a new kind of `Primary` and which has the syntax `"$" integer-literal`. +* `ExprSingle` is extended in the same way as for Focus Functions, above +* The arity of the inline function is the largest integer appearing in a `ParamRef` +directly contained in the function body (where + "directly contained" means that it is not contained in a nested inline function). It is not required that all lower-numbered parameters be referenced (a function is not required to reference all its arguments). + +If we choose to support both focus functions and short inline functions then we could use the same syntax for both (distinguishing them by the presence or absence of a `ParamRef`), or we could use distinct syntax. If we use the same syntax, then it would not be possible to represent functions of arity zero. + +### Arrow Syntax with Declared Parameters + +The third proposal is to allow inline functions to be expressed using the arrow syntax familiar +to C# and Java users: ``` -for $i in 1 to 50 return $array?$i -for $k in map:keys($map1) return $map2?$k +fn:sort(//employee, $e -> $e/@salary) +map:for-each($map, ($k, $v) -> $k || "=" || $v) ``` -Disadvantages: the benefits are fairly cosmetic and users may feel that the benefits do not compensate for the costs of being non-standard. +I believe that this syntax is unambiguous but we would need to verify this very carefully. Perhaps more probematically, it requires unbounded lookahead to parse it, +which could cause problems for some implementations if they are based on parser technology with +insufficient power. Lookahead like this also tends to result in poor diagnostics on errors, +and causes difficulty for syntax-directed editors. Nevertheless, the final result looks +attractive, at least to users familiar with similar notations from other languages. + +In these examples I haven't given any types for the parameters or the function result. The syntax could probably +be embellished to add this, but once you do that it seems to have no benefits over the current inline function syntax. + + + + +## Recommendation + +My preference is to add both "focus functions" and "short inline functions" as described above, +using the same syntax with a leading arrow. + +I don't feel that the third proposal ("Arrow Syntax with Declared Parameters) gives sufficient benefits to justify the complexity. ## Grammar