Re: POC: GROUP BY optimization - Mailing list pgsql-hackers

From Andrei Lepikhov
Subject Re: POC: GROUP BY optimization
Date
Msg-id f5dae142-d351-4991-a2aa-92d12dcea004@postgrespro.ru
Whole thread Raw
In response to Re: POC: GROUP BY optimization  (Richard Guo <guofenglinux@gmail.com>)
Responses Re: POC: GROUP BY optimization  (Richard Guo <guofenglinux@gmail.com>)
List pgsql-hackers
On 2/2/2024 11:06, Richard Guo wrote:
> 
> On Fri, Feb 2, 2024 at 11:32 AM Richard Guo <guofenglinux@gmail.com 
> <mailto:guofenglinux@gmail.com>> wrote:
> 
>     On Fri, Feb 2, 2024 at 10:02 AM Tom Lane <tgl@sss.pgh.pa.us
>     <mailto:tgl@sss.pgh.pa.us>> wrote:
> 
>         One of the test cases added by this commit has not been very
>         stable in the buildfarm.  Latest example is here:
> 
>         https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2024-02-01%2021%3A28%3A04
<https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2024-02-01%2021%3A28%3A04>
> 
>         and I've seen similar failures intermittently on other machines.
> 
>         I'd suggest building this test atop a table that is more stable
>         than pg_class.  You're just waving a red flag in front of a bull
>         if you expect stable statistics from that during a regression run.
>         Nor do I see any particular reason for pg_class to be especially
>         suited to the test.
> 
> 
>     Yeah, it's not a good practice to use pg_class in this place.  While
>     looking through the test cases added by this commit, I noticed some
>     other minor issues that are not great.  Such as
> 
>     * The table 'btg' is inserted with 10000 tuples, which seems a bit
>     expensive for a test.  I don't think we need such a big table to test
>     what we want.
> 
>     * I don't see why we need to manipulate GUC max_parallel_workers and
>     max_parallel_workers_per_gather.
> 
>     * I think we'd better write the tests with the keywords being all upper
>     or all lower.  A mixed use of upper and lower is not great. Such as in
> 
>          explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w;
> 
>     * Some comments for the test queries are not easy to read.
> 
>     * For this statement
> 
>          CREATE INDEX idx_y_x_z ON btg(y,x,w);
> 
>     I think the index name would cause confusion.  It creates an index on
>     columns y, x and w, but the name indicates an index on y, x and z.
> 
>     I'd like to write a draft patch for the fixes.
> 
> 
> Here is the draft patch that fixes the issues I complained about in
> upthread.
I passed through the patch. Looks like it doesn't break anything. Why do 
you prefer to use count(*) in EXPLAIN instead of plain targetlist, like 
"SELECT x,y,..."?
Also, according to the test mentioned by Tom:
1. I see, PG uses IndexScan on (x,y). So, column x will be already 
sorted before the MergeJoin. Why not use Incremental Sort on (x,z,w) 
instead of full sort?
2. For memo, IMO, this test shows us the future near perspective of this 
feature: It is cheaper to use grouping order (w,z) instead of (z,w).

-- 
regards,
Andrei Lepikhov
Postgres Professional




pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Synchronizing slots from primary to standby
Next
From: Amit Kapila
Date:
Subject: Re: Synchronizing slots from primary to standby