Home > mailing lists

Re: Suggestions for implementing IS DISTINCT FROM? - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Suggestions for implementing IS DISTINCT FROM?
Date	June 25, 2002 15:36:51
Msg-id	8698.1025022964@sss.pgh.pa.us Whole thread Raw
In response to	Re: Suggestions for implementing IS DISTINCT FROM? (Thomas Lockhart <lockhart@fourpalms.org>)
List	pgsql-hackers

Tree view

Thomas Lockhart <lockhart@fourpalms.org> writes:
> Let's talk about the preferred technique for doing so, especially with
> row-style argument lists. I want to implement IS NULL for rows also
> (actually, already did so using a transformation in gram.y -- no need to
> jump on that one Tom ;), and found a comment from Joe on the NullTest
> code saying that he wanted to do just that "someday". Should we honk
> around the existing NullTest node to handle both lists of arguments and
> single arguments, or should we have a completely different set of nodes
> for handling row arguments? I'll guess the latter, but we should talk
> whether that scales properly, about the preferred style for this new
> kind of node, and about how to minimize performance hits which we might
> see if we have a large number of new node types being handled by the
> executor.

I have a mild preference for minimizing the number of expression node
types.  If we can represent two similar kinds of expressions with one
node type instead of two, we roughly halve the amount of overhead code
needed.

My inclination would be to divvy things up on the basis of the kinds
of things to be compared, so we'd have these classes of nodes:

1. Test a single <row value expression>; this would cover
        <null predicate> ::=             <row value expression> IS [ NOT ] NULL

2. Compare two <row value expressions>; this would cover
        <comparison predicate> ::=             <row value expression> <comp op> <row value expression>
        <distinct predicate> ::=             <row value expression> IS DISTINCT FROM <row value expression>
  (OVERLAPS is a special case that should remain separate, IMHO)

3. Compare three <row value expressions>:
        <between predicate> ::=             <row value expression> [ NOT ] BETWEEN               [ ASYMMETRIC |
SYMMETRIC]               <row value expression> AND <row value expression>
 

4. Compare a <row value expression> to a list of <row value expressions>:
        <in predicate> ::=             <row value expression>               [ NOT ] IN <left paren> <in value list>
<rightparen>
 
        <in value list> ::=             <row value expression> { <comma> <row value expression> }...

5. Compare a <row value expression> to the outputs of a <subquery>:
        <in predicate> ::=             <row value expression>               [ NOT ] IN <table subquery>
        <quantified comparison predicate> ::=             <row value expression> <comp op> <quantifier>
<tablesubquery>
 
        <match predicate> ::=             <row value expression> MATCH [ UNIQUE ]             [ SIMPLE | PARTIAL | FULL
]                <table subquery>
 

Case 5 corresponds exactly to the existing SubLink node type (although
it's missing the MATCH options at the moment, and doesn't implement all
the <comp op> cases it should).

The spec intends all of these constructs to include the case where the
<row value expression> is a single scalar expression.  I'm feeling
ambivalent about whether we should have the same node type or different
ones for the single-value and row-value cases.  For SubLink we currently
handle single-value and row-value left hand sides with the same code,
and that seems fine.  For <comparison predicate> I definitely *don't*
want to burden scalar comparisons with row-value overhead, so we need
two separate representations for that.  The other cases seem to be on the
borderline.  We don't currently have scalar-case node types for these,
except for NullTest, so new node-type coding is needed anyway --- it
could be either a separate scalar node type or merged with the row-value
case.  No strong preference here, but slight leaning towards merging.

> Using the SubLink node does not seem quite right because IS DISTINCT
> FROM does not seem to make sense with an embedded select as one of the
> arguments, but maybe it does??

I agree it doesn't make sense.  However, we should look at SubLink and
see if we can't clean it up and maybe share some code.  For all of the
cases that compare rows, you need to have a list of the appropriate
scalar comparison operators to use for each column.  The way SubLink
does that is pretty ugly (especially its use of a modifiable Const node).
Let's see if we can't improve that before we copy it ;-).  I'm thinking
that the expression tree itself ought to contain just a list of Oper
nodes dangling from the SubLink or row comparison node.  We'd need an
alternate entry point similar to ExecEvalOper/ExecMakeFunctionResult
that would accept two Datum values rather than a list of subexpressions
to evaluate, but that seems very doable.
        regards, tom lane

pgsql-hackers by date:

From: Bruce Momjian
Date: 25 June 2002, 15:33:12
Subject: Re: Democracy and organisation : let's make a

From: Barry Lind
Date: 25 June 2002, 15:40:19
Subject: Re: Definite bug in JDBC

Re: Suggestions for implementing IS DISTINCT FROM? - Mailing list pgsql-hackers

Previous

Next