Re: Foreign join pushdown vs EvalPlanQual - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Foreign join pushdown vs EvalPlanQual |
Date | |
Msg-id | CA+Tgmoba0+5G2USTmaW8v33cooDLFm2kZw0V3L6b9w9Q3ZTNDg@mail.gmail.com Whole thread Raw |
In response to | Re: Foreign join pushdown vs EvalPlanQual (Kouhei Kaigai <kaigai@ak.jp.nec.com>) |
Responses |
Re: Foreign join pushdown vs EvalPlanQual
|
List | pgsql-hackers |
On Fri, Oct 16, 2015 at 6:12 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > I think, it is right approach to pretend EPQ doesn't exist if scanrelid==0, > because what replaced by these ForeignScan/CustomScan node are local join > node like NestLoop. They don't have its own EPQ slot, but constructs joined- > tuple based on the underlying scan-tuple originally stored within EPQ slots. I think you've got that backwards. The fact that they don't have their own EPQ slot is the problem we need to solve. When an EPQ recheck happens, we rescan every relation in the query. Each relation needs to return 0 or 1 tuples. If it returns a tuple, the tuple it returns must be either the same tuple it previously returned, or an updated version of that tuple. But "the tuple it previously returned" does not necessarily mean the tuple it returned most recently. It means the tuple that it returned which, when passed through the rest of the plan, contributed to generate the result tuple that is being rechecked. Now, if you don't have an EPQ slot, how are you going to do this? When the EPQ machinery engages, you need to somehow get the tuple you previously returned stored someplace. And the first time thereafter that you get called by ExecProcNode, you need to return that tuple, provided that it still passes the quals. The second time you get called, and any subsequent times, you need to return an empty slot. The EPQ slot is well-suited to this task. It's got a TupleTableSlot to store the tuple you need to return, and it's got a flag indicating whether you've already returned that tuple. So you're good. But with Etsuro Fujita's patch, and I think what you have proposed has been similar, how are you going to do it? The proposal is to call the recheck method and hope for the best, but what is the recheck method going to do? Where is it going to get the previously-returned tuple? How will it know if it has already returned it during the lifetime of this EPQ check? Offhand, it looks to me like, at least in some circumstances, you're probably going to return whatever tuple you returned most recently (which has a good chance of being the right one, but not necessarily) over and over again. That's not going to fly. The bottom line is that a foreign scan that is a pushed-down join is still a *scan*, and every already-existing scan type has an EPQ slot *for a reason*. They *need* it in order to deliver the correct behavior. And foreign scans and custom scans need it to, and for the same reason. >> Again, the root of the problem here is that the EPQ machinery provides >> 1 slot per RTI, and it uses scanrelid to figure out which EPQ slot is >> applicable for a given scan node. Here, we have scanrelid == 0, so it >> gets confused. But it's not the case that a pushed-down join has NO >> scanrelid. It actually has, in effect, *multiple* scanrelids. So we >> should pick any one of those, say the lowest-numbered one, and use >> that to decide which EPQ slot to use. The attached patch shows >> roughly what I have in mind, although this is just crap code to >> demonstrate the basic idea and doesn't pretend to adjust everything >> that needs fixing. >> > One tricky point of this idea is ExecStoreTuple() in ExecScanFetch(), > because the EPQ slot picked up by get_proxy_scanrelid() contains > a tuple of base relation then it tries to put this tuple on the > TupleTableSlot initialized to save the joined-tuple. > Of course, recheckMtd is called soon, so callback will be able to > handle the request correctly. However, it is a bit unnatural to store > a tuple on incompatible TupleTableSlot. I think that the TupleTableSlot is incompatible because the dummy patch I posted only addresses half of the problem. I didn't do anything about the code that stores stuff into an EPQ slot. If that were also fixed, then the tuple which the patch retrieves from the slot would not be incompatible. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: