Re: Prefetch the next tuple's memory during seqscans - Mailing list pgsql-hackers

From David Rowley
Subject Re: Prefetch the next tuple's memory during seqscans
Date
Msg-id CAApHDvqWexy_6jGmB39Vr3OqxZ_w6stAFkq52hODvwaW-19aiA@mail.gmail.com
Whole thread Raw
In response to Re: Prefetch the next tuple's memory during seqscans  (Aleksander Alekseev <aleksander@timescale.com>)
Responses Re: Prefetch the next tuple's memory during seqscans
Re: Prefetch the next tuple's memory during seqscans
List pgsql-hackers
On Tue, 1 Nov 2022 at 03:12, Aleksander Alekseev
<aleksander@timescale.com> wrote:
> I wonder if we can be sure and/or check that there is no performance
> degradation under different loads and different platforms...

Different platforms would be good. Certainly, 1 platform isn't a good
enough indication that this is going to be useful.

As for different loads. I imagine the worst case for this will be that
the prefetched tuple is flushed from the cache by some other operation
in the plan making the prefetch useless.

I tried the following so that we read 1 million tuples from a Sort
node before coming back and reading another tuple from the seqscan.

create table a as select 1 as a from generate_series(1,2) a;
create table b as select 1 as a from generate_series(1,10000000) a;
vacuum freeze a,b;

select pg_prewarm('a'),pg_prewarm('b');
set work_mem = '256MB';
select * from a, lateral (select * from b order by a) b offset 20000000;

Master (@ a9f8ca600)
Time: 1414.590 ms (00:01.415)
Time: 1373.584 ms (00:01.374)
Time: 1373.057 ms (00:01.373)
Time: 1383.033 ms (00:01.383)
Time: 1378.865 ms (00:01.379)

Master + 0001 + 0003:
Time: 1352.726 ms (00:01.353)
Time: 1348.306 ms (00:01.348)
Time: 1358.033 ms (00:01.358)
Time: 1354.348 ms (00:01.354)
Time: 1353.971 ms (00:01.354)

As I'd have expected, I see no regression.  It's hard to imagine we'd
be able to measure the regression over the overhead of some operation
that would evict everything out of cache.

FWIW, this CPU has a 256MB L3 cache and the Sort node's EXPLAIN
ANALYZE looks like:

Sort Method: quicksort  Memory: 262144kB

> Also I see 0001 and 0003 but no 0002. Just wanted to double check that
> there is no patch missing.

Perhaps I should resequence the patches to avoid confusion. I didn't
send 0002 on purpose. The 0002 is Andres' patch to prefetch during HOT
pruning. Here I'm only interested in seeing if we can get the
pg_prefetch_mem macros in core to reduce the number of AIO patches by
1.

Another thing about this is that I'm really only fetching the first
cache line of the tuple.  All columns in the t2 table (from the
earlier email) are fixed width, so accessing the a16 column is a
cached offset. I ran a benchmark using the same t2 table as my earlier
email, i.e:

-- table with 64 bytes of user columns
create table t2 as
select a,a a2,a a3,a a4,a a5,a a6,a a7,a a8,a a9,a a10,a a11,a a12,a
a13,a a14,a a15,a a16
from generate_series(1,10000000)a;
vacuum freeze t2;

My test is to run 16 queries changing the WHERE clause each time to
have WHERE a = 0,  then WHERE a2 = 0 ... WHERE a16 = 0.  I wanted to
know if prefetching only the first cache line of the tuple would be
less useful when we require evaluation of say, the "a16" column vs the
"a" column.

The times below (in milliseconds) are what I got from a 10-second pgbench run:

column master patched
a          490.571 409.748
a2        428.004 430.927
a3        449.156 453.858
a4        474.945 479.73
a5        514.646 507.809
a6        517.525 519.956
a7        543.587 539.023
a8        562.718 559.387
a9        585.458 584.63
a10      609.143 604.606
a11      645.273 638.535
a12      658.848 657.377
a13      696.395 685.389
a14      702.779 716.722
a15      727.161 723.567
a16      756.186 749.396

I'm not sure how to explain why only the "a" column seems to improve
and the rest seem mostly unaffected.

David

Attachment

pgsql-hackers by date:

Previous
From: Dagfinn Ilmari Mannsåker
Date:
Subject: Re: User functions for building SCRAM secrets
Next
From: Justin Pryzby
Date:
Subject: Re: heapgettup refactoring