Re: Joins on TID - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Joins on TID
Date
Msg-id 12958.1546376438@sss.pgh.pa.us
Whole thread Raw
In response to Re: Joins on TID  (Edmund Horner <ejrh00@gmail.com>)
List pgsql-hackers
Edmund Horner <ejrh00@gmail.com> writes:
> On Sat, 22 Dec 2018 at 12:34, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I decided to spend an afternoon seeing exactly how much work would be
>> needed to support parameterized TID scans, ie nestloop-with-inner-TID-
>> scan joins, as has been speculated about before, most recently here:
>> ...

> It seems good, and I can see you've committed it now.  (I should have
> commented sooner, but it's the big summer holiday period here, which
> means I have plenty of time to work on PostgreSQL, but none of my
> usual resources.  In any case, I was going to say "this looks useful
> and not too complicated, please go ahead".)

OK.

> I did notice that multiple tidquals are no longer removed from scan_clauses:
> EXPLAIN SELECT * FROM pg_class WHERE ctid = '(1,1)' OR ctid = '(2,2)';
>  Tid Scan on pg_class  (cost=0.01..8.03 rows=2 width=265)
>    TID Cond: ((ctid = '(1,1)'::tid) OR (ctid = '(2,2)'::tid))
>    Filter: ((ctid = '(1,1)'::tid) OR (ctid = '(2,2)'::tid))

I fixed that in the committed version, I believe.  (I'd been
overoptimistic about whether logic could be removed from
create_tidscan_plan.)

>> I haven't really looked at how much of a merge problem there'll be
>> with Edmund Horner's work for TID range scans.  My feeling about it
>> is that we might be best off treating that as a totally separate
>> code path, because the requirements are significantly different (for
>> instance, a range scan needs AND semantics not OR semantics for the
>> list of quals to apply).

> Well, I guess it's up to me to merge it.  I can't quite see which
> parts we'd use a separate code path for.  Can you elaborate?

The thing that's bothering me is something I hadn't really focused on
before, but which looms large now that I've thought about it: the
TID-quals list of a TidPath or TidScan has OR semantics, viz it can
directly represent

    ctid = this OR ctid = that OR ctid = the_other

as a list of tideq OpExprs.  But what you want for a range scan on
TID is implicit-AND, because you might have either a one-sided
condition, say

    ctid >= this

or a range condition

    ctid >= this AND ctid <= that

I see that what you've done to make this sort-of work in the existing
patch is to insist that a range scan have just one member at the OR-d list
level and that has to be an AND'ed sublist, but TBH I think that's a mess;
for instance I wonder whether the code works correctly if faced with cases
like

    ctid >= this OR ctid <= that

I don't think it's at all practical to have tidpath.c dealing with both
cases in one scan of the quals --- even if you can make it work, it'll be
unreasonably complicated and hard to understand.  I'd be inclined to just
have it thumb through the restrictinfo or joininfo list a second time,
looking for inequalities, and build a path for that case separately.

I suspect that on the whole, you'd be better off treating the range-scan
case as completely separate, with a different Path type and different
Plan type too (ie, separate executor support).  Yes, this would involve
some duplication of support code, but I think the end result would be
a lot cleaner and easier to understand.

            regards, tom lane


pgsql-hackers by date:

Previous
From: Nikolay Shaplov
Date:
Subject: Re: [PATCH] get rid of StdRdOptions, use individual binary reloptions representation for each relation kind instead
Next
From: Nikolay Shaplov
Date:
Subject: Re: [PATCH][PROPOSAL] Add enum releation option type