Re: Streaming read-ready sequential scan code - Mailing list pgsql-hackers
From | Alexander Lakhin |
---|---|
Subject | Re: Streaming read-ready sequential scan code |
Date | |
Msg-id | 7e090ca5-1d3d-b4f3-9060-8ec88f444c6c@gmail.com Whole thread Raw |
In response to | Re: Streaming read-ready sequential scan code (Thomas Munro <thomas.munro@gmail.com>) |
Responses |
Re: Streaming read-ready sequential scan code
|
List | pgsql-hackers |
Hello Thomas, 27.08.2024 09:52, Thomas Munro wrote: > Here's a really simple way to see the new unfairness at the end of a > parallel scan: > > drop table if exists t; > create table t (i int); > insert into t select generate_series(1, 100000); > alter table t set (parallel_workers = 2); > set parallel_setup_cost = 0; > set parallel_leader_participation = off; > explain (analyze, buffers, verbose) select count(*) from t; > > On my machine, unpatched master shows: > > Worker 0: actual time=0.036..12.452 rows=51076 loops=1 > Buffers: shared hit=226 > Worker 1: actual time=0.037..12.003 rows=48924 loops=1 > Buffers: shared hit=217 > > The attached patch, which I'd like to push, is effectively what > Alexander tested (blocknums[16] -> blocknums[1]). There's no point in > using an array of size 1, so I've turned it into a simple variable and > deleted the relevant comments. My machine shows: > > Worker 0: actual time=0.038..12.115 rows=49946 loops=1 > Buffers: shared hit=221 > Worker 1: actual time=0.038..12.109 rows=50054 loops=1 > Buffers: shared hit=222 > > That difference may not seem huge, but other pre-existing things are > going pathologically wrong in the reported query that magnify it (see > my earlier analysis). It's an interesting problem that will require > more study (my earlier analysis missed a detail that I'll write about > separately), but it doesn't seem to be new or have easy fixes, so that > will have to be for later work. I've tried your query and could not get sustainable results, unfortunately. The following script: rm -rf "$PGDATA"; initdb -D "$PGDATA" >initdb.log 2>&1 pg_ctl -s -l server.log start cat << EOF | psql | grep 'Parallel Seq Scan' -A10 | grep 'Worker ' -A1 create table t (i int); insert into t select generate_series(1, 100000); alter table t set (parallel_workers = 2); set parallel_setup_cost = 0; set parallel_leader_participation = off; explain (analyze, buffers, verbose) select count(*) from t; EOF pg_ctl -s stop gives me unstable numbers on unpatched master: Worker 0: actual time=0.024..5.814 rows=51076 loops=1 Buffers: shared hit=226 Worker 1: actual time=0.023..5.614 rows=48924 loops=1 Buffers: shared hit=217 --- Worker 0: actual time=0.027..5.130 rows=36612 loops=1 Buffers: shared hit=162 Worker 1: actual time=0.013..5.605 rows=63388 loops=1 Buffers: shared hit=281 --- Worker 0: actual time=0.025..5.447 rows=47460 loops=1 Buffers: shared hit=210 Worker 1: actual time=0.019..5.688 rows=52540 loops=1 Buffers: shared hit=233 and also with the patch applied: Worker 0: actual time=0.012..4.486 rows=55478 loops=1 Buffers: shared hit=246 Worker 1: actual time=0.014..4.430 rows=44522 loops=1 Buffers: shared hit=197 --- Worker 0: actual time=0.013..4.269 rows=55822 loops=1 Buffers: shared hit=247 Worker 1: actual time=0.017..4.238 rows=44178 loops=1 Buffers: shared hit=196 --- Worker 0: actual time=0.014..4.974 rows=50624 loops=1 Buffers: shared hit=224 Worker 1: actual time=0.016..4.932 rows=49376 loops=1 Buffers: shared hit=219 --- Worker 0: actual time=0.012..5.459 rows=65648 loops=1 Buffers: shared hit=291 Worker 1: actual time=0.022..5.109 rows=34352 loops=1 Buffers: shared hit=152 Please correct me, if I'm doing something wrong. Best regards, Alexander
pgsql-hackers by date: