Home > mailing lists

Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation) - Mailing list pgsql-hackers

From	David Rowley
Subject	Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation)
Date	May 8, 2017 07:39:00
Msg-id	CAKJS1f-iRis_adcymdZOguytKrapTvcRSXHPwz4Kx9=AccvvzQ@mail.gmail.com Whole thread Raw
In response to	Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation) (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses	Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation) (Thomas Munro <thomas.munro@enterprisedb.com>) Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation) (Haribabu Kommi <kommi.haribabu@gmail.com>)
List	pgsql-hackers

Tree view

On 6 May 2017 at 13:44, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
> In Linux, each process that opens a file gets its own 'file'
> object[1][5].  Each of those has it's own 'file_ra_state'
> object[2][3], used by ondemand_readahead[4] for sequential read
> detection.  So I speculate that page-at-a-time parallel seq scan must
> look like random access to Linux.
>
> In FreeBSD the situation looks similar.  Each process that opens a
> file gets a 'file' object[8] which has members 'f_seqcount' and
> 'f_nextoff'[6].  These are used by the 'sequential_heuristics'
> function[7] which affects the ioflag which UFS/FFS uses to control
> read ahead (see ffs_read).  So I speculate that page-at-a-time
> parallel seq scan must look like random access to FreeBSD too.
>
> In both cases I suspect that if you'd inherited (or sent the file
> descriptor to the other process via obscure tricks), it would actually
> work because they'd have the same 'file' entry, but that's clearly not
> workable for md.c.
>

Interesting!

> Experimentation required...

Indeed. I do remember long discussions on this before Parallel seq
scan went in, but I don't recall if anyone checked any OS kernels to
see what they did.

We really need a machine with good IO concurrency, and not too much
RAM to test these things out. It could well be that for a suitability
large enough table we'd want to scan a whole 1GB extent per worker.

I did post a patch to have heap_parallelscan_nextpage() use atomics
instead of locking over in [1], but I think doing atomics there does
not rule out also adding batching later. In fact, I think it
structures things so batching would be easier than it is today.

[1] https://www.postgresql.org/message-id/CAKJS1f9tgsPhqBcoPjv9_KUPZvTLCZ4jy=B=bhqgaKn7cYzm-w@mail.gmail.com

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

From: David Rowley
Date: 08 May 2017, 07:30:29
Subject: Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation)

From: Amit Langote
Date: 08 May 2017, 07:44:28
Subject: Re: [HACKERS] Declarative partitioning - another take

Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation) - Mailing list pgsql-hackers

Previous

Next