On Thu, Dec 11, 2014 at 12:29 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> Robert Haas <robertmhaas@gmail.com> wrote:
>> On Sat, Dec 6, 2014 at 10:08 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
>>> select a.i, b.i from a join b on (a.i = b.i);
>>
>> I think the concern is that the inner side might be something more
>> elaborate than a plain table scan, like an aggregate or join. I might
>> be all wet, but my impression is that you can make rescanning
>> arbitrarily expensive if you work at it.
>
> I'm not sure I'm following. Let's use a function to select from b:
>
> create or replace function fb()
> returns setof b
> language plpgsql
> rows 1
> as $$
> begin
> return query select i from b;
> end;
> $$;
>
> explain (analyze, buffers, verbose)
> select a.i, b.i from a join fb() b on (a.i = b.i);
>
> I used the low row estimate to cause the planner to put this on the inner side.
>
> 16 batches
> Execution time: 1638.582 ms
>
> Now let's make it slow.
>
> create or replace function fb()
> returns setof b
> language plpgsql
> rows 1
> as $$
> begin
> perform pg_sleep(2.0);
> return query select i from b;
> end;
> $$;
> explain (analyze, buffers, verbose)
> select a.i, b.i from a join fb() b on (a.i = b.i);
>
> 16 batches
> Execution time: 3633.859 ms
>
> Under what conditions do you see the inner side get loaded into the
> hash table multiple times?
Huh, interesting. I guess I was thinking that the inner side got
rescanned for each new batch, but I guess that's not what happens.
Maybe there's no real problem here, and we just win.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company