Re: Suggestions for implementing IS DISTINCT FROM? - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Suggestions for implementing IS DISTINCT FROM? |
Date | |
Msg-id | 8698.1025022964@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Suggestions for implementing IS DISTINCT FROM? (Thomas Lockhart <lockhart@fourpalms.org>) |
List | pgsql-hackers |
Thomas Lockhart <lockhart@fourpalms.org> writes: > Let's talk about the preferred technique for doing so, especially with > row-style argument lists. I want to implement IS NULL for rows also > (actually, already did so using a transformation in gram.y -- no need to > jump on that one Tom ;), and found a comment from Joe on the NullTest > code saying that he wanted to do just that "someday". Should we honk > around the existing NullTest node to handle both lists of arguments and > single arguments, or should we have a completely different set of nodes > for handling row arguments? I'll guess the latter, but we should talk > whether that scales properly, about the preferred style for this new > kind of node, and about how to minimize performance hits which we might > see if we have a large number of new node types being handled by the > executor. I have a mild preference for minimizing the number of expression node types. If we can represent two similar kinds of expressions with one node type instead of two, we roughly halve the amount of overhead code needed. My inclination would be to divvy things up on the basis of the kinds of things to be compared, so we'd have these classes of nodes: 1. Test a single <row value expression>; this would cover <null predicate> ::= <row value expression> IS [ NOT ] NULL 2. Compare two <row value expressions>; this would cover <comparison predicate> ::= <row value expression> <comp op> <row value expression> <distinct predicate> ::= <row value expression> IS DISTINCT FROM <row value expression> (OVERLAPS is a special case that should remain separate, IMHO) 3. Compare three <row value expressions>: <between predicate> ::= <row value expression> [ NOT ] BETWEEN [ ASYMMETRIC | SYMMETRIC] <row value expression> AND <row value expression> 4. Compare a <row value expression> to a list of <row value expressions>: <in predicate> ::= <row value expression> [ NOT ] IN <left paren> <in value list> <rightparen> <in value list> ::= <row value expression> { <comma> <row value expression> }... 5. Compare a <row value expression> to the outputs of a <subquery>: <in predicate> ::= <row value expression> [ NOT ] IN <table subquery> <quantified comparison predicate> ::= <row value expression> <comp op> <quantifier> <tablesubquery> <match predicate> ::= <row value expression> MATCH [ UNIQUE ] [ SIMPLE | PARTIAL | FULL ] <table subquery> Case 5 corresponds exactly to the existing SubLink node type (although it's missing the MATCH options at the moment, and doesn't implement all the <comp op> cases it should). The spec intends all of these constructs to include the case where the <row value expression> is a single scalar expression. I'm feeling ambivalent about whether we should have the same node type or different ones for the single-value and row-value cases. For SubLink we currently handle single-value and row-value left hand sides with the same code, and that seems fine. For <comparison predicate> I definitely *don't* want to burden scalar comparisons with row-value overhead, so we need two separate representations for that. The other cases seem to be on the borderline. We don't currently have scalar-case node types for these, except for NullTest, so new node-type coding is needed anyway --- it could be either a separate scalar node type or merged with the row-value case. No strong preference here, but slight leaning towards merging. > Using the SubLink node does not seem quite right because IS DISTINCT > FROM does not seem to make sense with an embedded select as one of the > arguments, but maybe it does?? I agree it doesn't make sense. However, we should look at SubLink and see if we can't clean it up and maybe share some code. For all of the cases that compare rows, you need to have a list of the appropriate scalar comparison operators to use for each column. The way SubLink does that is pretty ugly (especially its use of a modifiable Const node). Let's see if we can't improve that before we copy it ;-). I'm thinking that the expression tree itself ought to contain just a list of Oper nodes dangling from the SubLink or row comparison node. We'd need an alternate entry point similar to ExecEvalOper/ExecMakeFunctionResult that would accept two Datum values rather than a list of subexpressions to evaluate, but that seems very doable. regards, tom lane
pgsql-hackers by date: