Simon Riggs <simon@2ndquadrant.com> writes:
> On Sat, 22 Dec 2018 at 04:31, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> BTW, if we're to start taking joins on TID seriously, we should also
>> add the missing hash opclass for TID, so that you can do hash joins
>> when dealing with a lot of rows.
> I don't think we are trying to do TID joins more seriously, just fix a
> special case.
> The case cited requires the batches of work to be small, so nested loops
> works fine.
> Looks to me that Edmund is trying to solve the same problem. If so, this is
> the best solution.
No, I think what Edmund is on about is unrelated, except that it touches
some of the same code.  He's interested in problems like "find the last
few tuples in this table".  You can solve that today, with e.g.
"SELECT ... WHERE ctid >= '(n,1)'", but you get a stupidly inefficient
plan.  If we think that's a use-case worth supporting then it'd be
reasonable to provide less inefficient implementation(s).
What I'm thinking about in this thread is joins on TID, which we have only
very weak support for today --- you'll basically always wind up with a
mergejoin, which requires full-table scan and sort of its inputs.  Still,
that's better than a naive nestloop, and for years we've been figuring
that that was good enough.  Several people in the other thread that
I cited felt that that isn't good enough.  But if we think it's worth
taking seriously, then IMO we need to add both parameterized scans (for
nestloop-with-inner-fetch-by-tid) and hash join, because each of those
can dominate depending on how many tuples you're joining.
            regards, tom lane