Re: Use generation memory context for tuplestore.c - Mailing list pgsql-hackers

From David Rowley
Subject Re: Use generation memory context for tuplestore.c
Date
Msg-id CAApHDvr72_Nur=3Zq3_R-V5JB8u-EFeApTQ2mFVDOtB59=aS=w@mail.gmail.com
Whole thread Raw
In response to Re: Use generation memory context for tuplestore.c  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
List pgsql-hackers
On Sat, 4 May 2024 at 03:51, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
>
> On Fri, 3 May 2024 at 15:55, David Rowley <dgrowleyml@gmail.com> wrote:
> > master @ 8f0a97dff
> > Storage: Memory  Maximum Storage: 16577kB
> >
> > patched:
> > Storage: Memory  Maximum Storage: 8577kB
>
> Those are some impressive numbers.

This patch needed to be rebased, so updated patches are attached.

I was also reflecting on what Bruce wrote in [1] about having to parse
performance numbers from the commit messages, so I decided to adjust
the placeholder commit message I'd written to make performance numbers
more clear to Bruce, or whoever does the next major version release
notes.  That caused me to experiment with finding the best case for
this patch.  I could scale the improvement much further than I have,
but here's something I came up with that's easy to reproduce.

create table winagg (a int, b text);
insert into winagg select a,repeat('a', 1024) from generate_series(1,10000000)a;
set work_mem = '1MB';
set jit=0;
explain (analyze, timing off) select sum(l1),sum(l2) from (select
length(b) l1,length(lag(b, 800) over ()) as l2 from winagg limit
1600);

master:
Execution Time: 6585.685 ms

patched:
Execution Time: 4.159 ms

1583x faster.

I've effectively just exploited the spool_tuples() behaviour of what
it does when the tuplestore goes to disk to have it spool the entire
remainder of the partition, which is 10 million rows.  I'm just taking
a tiny portion of those with the LIMIT 1600.  I just set work_mem to
something that the patched version won't have the tuplestore spill to
disk so that spool_tuples() only spools what's needed in the patched
version. So, artificial is a word you could use, but certainly,
someone could find this performance cliff in the wild and be prevented
from falling off it by this patch.

David

[1] https://www.postgresql.org/message-id/Zk5r2XyI0BhXLF8h%40momjian.us

Attachment

pgsql-hackers by date:

Previous
From: Zaid Shabbir
Date:
Subject: Cluster forcefully removal without user input
Next
From: Daniel Gustafsson
Date:
Subject: Re: Fix possible dereference null pointer (PQprint)