Re: row estimation off the mark when generate_series calls are involved - Mailing list pgsql-hackers

From Nikhil Sontakke
Subject Re: row estimation off the mark when generate_series calls are involved
Date
Msg-id w2pa301bfd91004192347xd1396f95qe4caa934b1a97d77@mail.gmail.com
Whole thread Raw
In response to Re: row estimation off the mark when generate_series calls are involved  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hi,

> I'm not very impressed with that patch, even discounting the
> sum-vs-product thinko.  Trawling the tlist for SRFs will add a significant
> number of cycles, to modify the rowcount in a way that is practically
> always wrong (since the estimates for SRF output rowcounts are so bad).

It's trawling *just* the tlist and not the entire query. Given that
rowcount estimates are way off the mark for SRF targets, it seems
worth the cycles. It's not like sane queries will have very many tlist
entries to make this a costly affair IMO.

> What's more, most of the time we don't really care, because the
> top-level rowcount estimate is of no interest for future planning
> purposes.  It might be worth doing something about this inside
> sub-selects, but not till we have less-bogus SRF rowcount estimates.
>

But SRF rowcount estimates will generally be bogus, no? Unless we can
come up with a mechanism to gather plan-time arguments based
statistics mechanism, that will still be the case.

To mention Itagaki san's example again:

INSERT INTO public.x SELECT generate_series(1,1000);

If we have valid row estimates we might use a proper plan to maybe
materialize the SELECT portion into a temp table for example and
insert into the target. The performance might be much better in that
case..

Regards,
Nikhils
--
http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Thoughts on pg_hba.conf rejection
Next
From: Takahiro Itagaki
Date:
Subject: Re: [GENERAL] trouble with to_char('L')