Re: Improve statistics estimation considering GROUP-BY as a 'uniqueiser' - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: Improve statistics estimation considering GROUP-BY as a 'uniqueiser'
Date
Msg-id CAPpHfduirKBHvBuy-ZAht5fxv++jCUeToHnRWxuGHmzAmcb54A@mail.gmail.com
Whole thread Raw
In response to Re: Improve statistics estimation considering GROUP-BY as a 'uniqueiser'  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers
On Tue, Feb 18, 2025 at 2:52 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
> On 17/2/2025 02:06, Alexander Korotkov wrote:
> > On Thu, Nov 28, 2024 at 4:39 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
> >> Here we also could count number of scanned NULLs separately in
> >> vardata_extra and use it in upper GROUP-BY estimation.
> >
> > What could be the type of vardata_extra?  And what information could
> > it store?  Yet seems too sketchy for me to understand.
> It is actually sketchy. Our estimation routines have no information
> about intermediate modifications of the data. Left-join generated NULLs
> is a good example here. So, my vague idea is to maintain that info and
> change statistical estimations somehow.
> Of course, it is out of the scope here.
> >
> > But, I think for now we should go with the original patch.  It seems
> > to be quite straightforward extension to what 4767bc8ff2 does.  I've
> > revised commit message and applied pg_indent to sources.  I'm going to
> > push this if no objections.
> Ok, I added one regression test to check that feature works properly.

Andrei, thank you.  I've pushed the patch applying some simplification
of regression test.

------
Regards,
Alexander Korotkov
Supabase



pgsql-hackers by date:

Previous
From: Bertrand Drouvot
Date:
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Next
From: Amit Kapila
Date:
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation