Re: Streaming read-ready sequential scan code - Mailing list pgsql-hackers

From Alexander Lakhin
Subject Re: Streaming read-ready sequential scan code
Date
Msg-id 7e090ca5-1d3d-b4f3-9060-8ec88f444c6c@gmail.com
Whole thread Raw
In response to Re: Streaming read-ready sequential scan code  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: Streaming read-ready sequential scan code
List pgsql-hackers
Hello Thomas,

27.08.2024 09:52, Thomas Munro wrote:
> Here's a really simple way to see the new unfairness at the end of a
> parallel scan:
>
> drop table if exists t;
> create table t (i int);
> insert into t select generate_series(1, 100000);
> alter table t set (parallel_workers = 2);
> set parallel_setup_cost = 0;
> set parallel_leader_participation = off;
> explain (analyze, buffers, verbose) select count(*) from t;
>
> On my machine, unpatched master shows:
>
>                       Worker 0:  actual time=0.036..12.452 rows=51076 loops=1
>                         Buffers: shared hit=226
>                       Worker 1:  actual time=0.037..12.003 rows=48924 loops=1
>                         Buffers: shared hit=217
>
> The attached patch, which I'd like to push, is effectively what
> Alexander tested (blocknums[16] -> blocknums[1]).  There's no point in
> using an array of size 1, so I've turned it into a simple variable and
> deleted the relevant comments.  My machine shows:
>
>                       Worker 0:  actual time=0.038..12.115 rows=49946 loops=1
>                         Buffers: shared hit=221
>                       Worker 1:  actual time=0.038..12.109 rows=50054 loops=1
>                         Buffers: shared hit=222
>
> That difference may not seem huge, but other pre-existing things are
> going pathologically wrong in the reported query that magnify it (see
> my earlier analysis).  It's an interesting problem that will require
> more study (my earlier analysis missed a detail that I'll write about
> separately), but it doesn't seem to be new or have easy fixes, so that
> will have to be for later work.

I've tried your query and could not get sustainable results, unfortunately.
The following script:
rm -rf "$PGDATA"; initdb -D "$PGDATA" >initdb.log 2>&1

pg_ctl -s -l server.log start

cat << EOF | psql | grep 'Parallel Seq Scan' -A10 | grep 'Worker ' -A1
create table t (i int);
insert into t select generate_series(1, 100000);
alter table t set (parallel_workers = 2);
set parallel_setup_cost = 0;
set parallel_leader_participation = off;
explain (analyze, buffers, verbose) select count(*) from t;
EOF

pg_ctl -s stop

gives me unstable numbers on unpatched master:
                      Worker 0:  actual time=0.024..5.814 rows=51076 loops=1
                        Buffers: shared hit=226
                      Worker 1:  actual time=0.023..5.614 rows=48924 loops=1
                        Buffers: shared hit=217
---
                      Worker 0:  actual time=0.027..5.130 rows=36612 loops=1
                        Buffers: shared hit=162
                      Worker 1:  actual time=0.013..5.605 rows=63388 loops=1
                        Buffers: shared hit=281
---
                      Worker 0:  actual time=0.025..5.447 rows=47460 loops=1
                        Buffers: shared hit=210
                      Worker 1:  actual time=0.019..5.688 rows=52540 loops=1
                        Buffers: shared hit=233

and also with the patch applied:
                      Worker 0:  actual time=0.012..4.486 rows=55478 loops=1
                        Buffers: shared hit=246
                      Worker 1:  actual time=0.014..4.430 rows=44522 loops=1
                        Buffers: shared hit=197
---
                      Worker 0:  actual time=0.013..4.269 rows=55822 loops=1
                        Buffers: shared hit=247
                      Worker 1:  actual time=0.017..4.238 rows=44178 loops=1
                        Buffers: shared hit=196
---
                      Worker 0:  actual time=0.014..4.974 rows=50624 loops=1
                        Buffers: shared hit=224
                      Worker 1:  actual time=0.016..4.932 rows=49376 loops=1
                        Buffers: shared hit=219
---
                      Worker 0:  actual time=0.012..5.459 rows=65648 loops=1
                        Buffers: shared hit=291
                      Worker 1:  actual time=0.022..5.109 rows=34352 loops=1
                        Buffers: shared hit=152

Please correct me, if I'm doing something wrong.

Best regards,
Alexander



pgsql-hackers by date:

Previous
From: Anthonin Bonnefoy
Date:
Subject: Re: Segfault in jit tuple deforming on arm64 due to LLVM issue
Next
From: Matthias van de Meent
Date:
Subject: Re: Parallel CREATE INDEX for GIN indexes