On Sat, 4 May 2024 at 03:51, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
>
> On Fri, 3 May 2024 at 15:55, David Rowley <dgrowleyml@gmail.com> wrote:
> > master @ 8f0a97dff
> > Storage: Memory Maximum Storage: 16577kB
> >
> > patched:
> > Storage: Memory Maximum Storage: 8577kB
>
> Those are some impressive numbers.
This patch needed to be rebased, so updated patches are attached.
I was also reflecting on what Bruce wrote in [1] about having to parse
performance numbers from the commit messages, so I decided to adjust
the placeholder commit message I'd written to make performance numbers
more clear to Bruce, or whoever does the next major version release
notes. That caused me to experiment with finding the best case for
this patch. I could scale the improvement much further than I have,
but here's something I came up with that's easy to reproduce.
create table winagg (a int, b text);
insert into winagg select a,repeat('a', 1024) from generate_series(1,10000000)a;
set work_mem = '1MB';
set jit=0;
explain (analyze, timing off) select sum(l1),sum(l2) from (select
length(b) l1,length(lag(b, 800) over ()) as l2 from winagg limit
1600);
master:
Execution Time: 6585.685 ms
patched:
Execution Time: 4.159 ms
1583x faster.
I've effectively just exploited the spool_tuples() behaviour of what
it does when the tuplestore goes to disk to have it spool the entire
remainder of the partition, which is 10 million rows. I'm just taking
a tiny portion of those with the LIMIT 1600. I just set work_mem to
something that the patched version won't have the tuplestore spill to
disk so that spool_tuples() only spools what's needed in the patched
version. So, artificial is a word you could use, but certainly,
someone could find this performance cliff in the wild and be prevented
from falling off it by this patch.
David
[1] https://www.postgresql.org/message-id/Zk5r2XyI0BhXLF8h%40momjian.us