Re: Parallel Seq Scan vs kernel read ahead - Mailing list pgsql-hackers

From David Rowley
Subject Re: Parallel Seq Scan vs kernel read ahead
Date
Msg-id CAApHDvq+mXCDE61qEWHLBCOVxHQMaF1S_Z8vhU_KsvhAowg+5w@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Seq Scan vs kernel read ahead  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Parallel Seq Scan vs kernel read ahead
Re: Parallel Seq Scan vs kernel read ahead
Re: Parallel Seq Scan vs kernel read ahead
List pgsql-hackers
On Fri, 19 Jun 2020 at 11:34, David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Fri, 19 Jun 2020 at 03:26, Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Thu, Jun 18, 2020 at 6:15 AM David Rowley <dgrowleyml@gmail.com> wrote:
> > > With a 32TB relation, the code will make the chunk size 16GB.  Perhaps
> > > I should change the code to cap that at 1GB.
> >
> > It seems pretty hard to believe there's any significant advantage to a
> > chunk size >1GB, so I would be in favor of that change.
>
> I could certainly make that change.  With the standard page size, 1GB
> is 131072 pages and a power of 2. That would change for non-standard
> page sizes, so we'd need to decide if we want to keep the chunk size a
> power of 2, or just cap it exactly at whatever number of pages 1GB is.
>
> I'm not sure how much of a difference it'll make, but I also just want
> to note that synchronous scans can mean we'll start the scan anywhere
> within the table, so capping to 1GB does not mean we read an entire
> extent. It's more likely to span 2 extents.

Here's a patch which caps the maximum chunk size to 131072.  If
someone doubles the page size then that'll be 2GB instead of 1GB. I'm
not personally worried about that.

I tested the performance on a Windows 10 laptop using the test case from [1]

Master:

workers=0: Time: 141175.935 ms (02:21.176)
workers=1: Time: 316854.538 ms (05:16.855)
workers=2: Time: 323471.791 ms (05:23.472)
workers=3: Time: 321637.945 ms (05:21.638)
workers=4: Time: 308689.599 ms (05:08.690)
workers=5: Time: 289014.709 ms (04:49.015)
workers=6: Time: 267785.270 ms (04:27.785)
workers=7: Time: 248735.817 ms (04:08.736)

Patched:

workers=0: Time: 155985.204 ms (02:35.985)
workers=1: Time: 112238.741 ms (01:52.239)
workers=2: Time: 105861.813 ms (01:45.862)
workers=3: Time: 91874.311 ms (01:31.874)
workers=4: Time: 92538.646 ms (01:32.539)
workers=5: Time: 93012.902 ms (01:33.013)
workers=6: Time: 94269.076 ms (01:34.269)
workers=7: Time: 90858.458 ms (01:30.858)

David

[1] https://www.postgresql.org/message-id/CAApHDvrfJfYH51_WY-iQqPw8yGR4fDoTxAQKqn%2BSa7NTKEVWtg%40mail.gmail.com

Attachment

pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Missing HashAgg EXPLAIN ANALYZE details for parallel plans
Next
From: Justin Pryzby
Date:
Subject: Re: Missing HashAgg EXPLAIN ANALYZE details for parallel plans