Home > mailing lists

Re: Parallel Seq Scan vs kernel read ahead - Mailing list pgsql-hackers

From	Ranier Vilela
Subject	Re: Parallel Seq Scan vs kernel read ahead
Date	June 22, 2020 19:45:38
Msg-id	CAEudQApHw1Meay=Xdp0sws3femxsh4BduYo+qXWKHCZpfAz4fA@mail.gmail.com Whole thread Raw
In response to	Re: Parallel Seq Scan vs kernel read ahead (David Rowley <dgrowleyml@gmail.com>)
Responses	Re: Parallel Seq Scan vs kernel read ahead
List	pgsql-hackers

Tree view

Em seg., 22 de jun. de 2020 às 02:53, David Rowley <dgrowleyml@gmail.com> escreveu:

On Mon, 22 Jun 2020 at 16:54, David Rowley <dgrowleyml@gmail.com> wrote:
> I also tested this an AMD machine running Ubuntu 20.04 on kernel
> version 5.4.0-37. I used the same 100GB table I mentioned in [1], but
> with the query "select * from t where a < 0;", which saves having to
> do any aggregate work.

I just wanted to add a note here that Thomas and I just discussed this
a bit offline. He recommended I try setting the kernel readhead a bit
higher.

It was set to 128kB, so I cranked it up to 2MB with:

sudo blockdev --setra 4096 /dev/nvme0n1p2

I didn't want to run the full test again as it took quite a long time,
so I just tried with 32 workers.

The first two results here are taken from the test results I just
posted 1 hour ago.

Master readhead=128kB = 89921.283 ms
v2 patch readhead=128kB = 36085.642 ms
master readhead=2MB = 60984.905 ms
v2 patch readhead=2MB = 22611.264 ms

Hi, redoing the tests with v2 here.

notebook with i5, 8GB, 256 GB (SSD)

Windows 10 64 bits (2004

msvc 2019 64 bits

Postgresql head (with v2 patch)

Configuration: none

Connection local ipv4 (not localhost)

create table t (a int, b text);
insert into t select x,md5(x::text) from
generate_series(1,1000000*1572.7381809)x;
vacuum freeze t;

set max_parallel_workers_per_gather = 0;

Time: 354211,826 ms (05:54,212)

set max_parallel_workers_per_gather = 1;

Time: 332805,773 ms (05:32,806)

set max_parallel_workers_per_gather = 2;

Time: 282566,711 ms (04:42,567)

set max_parallel_workers_per_gather = 3;

Time: 263383,945 ms (04:23,384)

set max_parallel_workers_per_gather = 4;

Time: 255728,259 ms (04:15,728)

set max_parallel_workers_per_gather = 5;

Time: 238288,720 ms (03:58,289)

set max_parallel_workers_per_gather = 6;

Time: 238647,792 ms (03:58,648)

set max_parallel_workers_per_gather = 7;

Time: 231295,763 ms (03:51,296)

set max_parallel_workers_per_gather = 8;

Time: 232502,828 ms (03:52,503)

set max_parallel_workers_per_gather = 9;

Time: 230970,604 ms (03:50,971)

set max_parallel_workers_per_gather = 10;

Time: 232104,182 ms (03:52,104)

set max_parallel_workers_per_gather = 8;

postgres=# explain select count(*) from t;
QUERY PLAN
-------------------------------------------------------------------------------------------
Finalize Aggregate (cost=15564556.43..15564556.44 rows=1 width=8)
-> Gather (cost=15564555.60..15564556.41 rows=8 width=8)
Workers Planned: 8
-> Partial Aggregate (cost=15563555.60..15563555.61 rows=1 width=8)
-> Parallel Seq Scan on t (cost=0.00..15072074.88 rows=196592288 width=0)
(5 rows)

Questions:

1. Why acquire and release lock in retry: loop.

Wouldn't that be better?

/* Grab the spinlock. */
SpinLockAcquire(&pbscan->phs_mutex);

retry:
/*
* If the scan's startblock has not yet been initialized, we must do so
* now. If this is not a synchronized scan, we just start at block 0, but
* if it is a synchronized scan, we must get the starting position from
* the synchronized scan machinery. We can't hold the spinlock while
* doing that, though, so release the spinlock, get the information we
* need, and retry. If nobody else has initialized the scan in the
* meantime, we'll fill in the value we fetched on the second time
* through.
*/
if (pbscan->phs_startblock == InvalidBlockNumber)
{
if (!pbscan->base.phs_syncscan)
pbscan->phs_startblock = 0;
else if (sync_startpage != InvalidBlockNumber)
pbscan->phs_startblock = sync_startpage;
else
{
sync_startpage = ss_get_location(rel, pbscan->phs_nblocks);
goto retry;
}
}
SpinLockRelease(&pbscan->phs_mutex);
}

Acquire lock once, before retry?

2. Is there any configuration to improve performance?

regards,

Ranier Vilela

pgsql-hackers by date:

From: Alvaro Herrera
Date: 22 June 2020, 19:32:13
Subject: Re: Backpatch b61d161c14

From: Alexey Kondratov
Date: 22 June 2020, 20:18:58
Subject: Re: [PATCH] Allow to specify restart_lsn inpg_create_physical_replication_slot()

Re: Parallel Seq Scan vs kernel read ahead - Mailing list pgsql-hackers

Previous

Next