Re: Joins on TID - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Joins on TID
Date
Msg-id 29739.1545496280@sss.pgh.pa.us
Whole thread Raw
In response to Re: Joins on TID  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: Joins on TID  (Simon Riggs <simon@2ndquadrant.com>)
Re: Joins on TID  (Darafei "Komяpa" Praliaskouski <me@komzpa.net>)
List pgsql-hackers
Simon Riggs <simon@2ndquadrant.com> writes:
> On Sat, 22 Dec 2018 at 04:31, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> BTW, if we're to start taking joins on TID seriously, we should also
>> add the missing hash opclass for TID, so that you can do hash joins
>> when dealing with a lot of rows.

> I don't think we are trying to do TID joins more seriously, just fix a
> special case.
> The case cited requires the batches of work to be small, so nested loops
> works fine.
> Looks to me that Edmund is trying to solve the same problem. If so, this is
> the best solution.

No, I think what Edmund is on about is unrelated, except that it touches
some of the same code.  He's interested in problems like "find the last
few tuples in this table".  You can solve that today, with e.g.
"SELECT ... WHERE ctid >= '(n,1)'", but you get a stupidly inefficient
plan.  If we think that's a use-case worth supporting then it'd be
reasonable to provide less inefficient implementation(s).

What I'm thinking about in this thread is joins on TID, which we have only
very weak support for today --- you'll basically always wind up with a
mergejoin, which requires full-table scan and sort of its inputs.  Still,
that's better than a naive nestloop, and for years we've been figuring
that that was good enough.  Several people in the other thread that
I cited felt that that isn't good enough.  But if we think it's worth
taking seriously, then IMO we need to add both parameterized scans (for
nestloop-with-inner-fetch-by-tid) and hash join, because each of those
can dominate depending on how many tuples you're joining.

            regards, tom lane


pgsql-hackers by date:

Previous
From: Emre Hasegeli
Date:
Subject: Re: Referential Integrity Checks with Statement-level Triggers
Next
From: Tom Lane
Date:
Subject: Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)