Home > mailing lists

Re: Improve rowcount estimate for UNNEST(column) - Mailing list pgsql-hackers

From	Paul Jungwirth
Subject	Re: Improve rowcount estimate for UNNEST(column)
Date	December 7, 2023 09:32:07
Msg-id	166ae46a-9f77-4a72-987c-54fc3e63b3c6@illuminatedcomputing.com Whole thread Raw
In response to	Re: Improve rowcount estimate for UNNEST(column) (jian he <jian.universality@gmail.com>)
Responses	Re: Improve rowcount estimate for UNNEST(column)
List	pgsql-hackers

Tree view

Hello,

On 11/26/23 12:22, Tom Lane wrote:
 > Yes, this regression test is entirely unacceptable; the numbers will
 > not be stable enough.  Even aside from the different-settings issue,
 > you can't rely on ANALYZE deriving exactly the same stats every time.
 > Usually what we try to do is devise a query where the plan shape
 > changes because of the better estimate.

Here is a patch with an improved test. With the old "10" estimate we get a Merge Join, but now that 
the planner can see there are only ~4 elements per array, we get a Nested Loop.

It was actually hard to get a new plan, since all our regress tables' arrays have around 5 elements 
average, which isn't so far from 10. Adding a table with 1- or 2- element arrays would be more 
dramatic. So I resorted to tuning the query with `WHERE seqno <= 50`. Hopefully that's not cheating 
too much.

I thought about also adding a test where the old code *underestimates*, but then I'd have to add a 
new table with big arrays. If it's worth it let me know.

On 11/26/23 23:05, jian he wrote:
 > I found using table array_op_test test more convincing.

True, arrtest is pretty small. The new test uses array_op_test instead.

On 11/29/23 20:35, jian he wrote:
 > I did a minor change. change estimate_array_length return type to double

I'm not sure I want to change estimate_array_length from returning ints to returning doubles, since 
it's called in many places. I can see how that might give plans that are more accurate yet, so maybe 
it's worth it? It feels like it ought to be a separate patch to me, but if others want me to include 
it here please let me know.

I did add clamp_row_est since it's essentially free and maybe gives some safety.

Rebased onto d43bd090a8.

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Attachment

v2-0001-Use-statitics-for-estimating-UNNEST-column-rows.patch

pgsql-hackers by date:

From: David Rowley
Date: 07 December 2023, 09:29:51
Subject: Re: catalog access with reset GUCs during parallel worker startup

From: David Rowley
Date: 07 December 2023, 09:33:13
Subject: Re: catalog access with reset GUCs during parallel worker startup

Re: Improve rowcount estimate for UNNEST(column) - Mailing list pgsql-hackers

Attachment

Previous

Next