effective_io_concurrency's steampunk spindle maths - Mailing list pgsql-hackers

From Thomas Munro
Subject effective_io_concurrency's steampunk spindle maths
Date
Msg-id CA+hUKGJUw08dPs_3EUcdO6M90GnjofPYrWp4YSLaBkgYwS-AqA@mail.gmail.com
Whole thread Raw
Responses Re: effective_io_concurrency's steampunk spindle maths  (Andres Freund <andres@anarazel.de>)
Re: effective_io_concurrency's steampunk spindle maths  (Michael Banck <michael.banck@credativ.de>)
List pgsql-hackers
Hello,

I was reading through some old threads[1][2][3] while trying to figure
out how to add a new GUC to control I/O prefetching for new kinds of
things[4][5], and enjoyed Simon Riggs' reference to Jules Verne in the
context of RAID spindles.

On 2 Sep 2015 14:54, "Andres Freund" <andres(at)anarazel(dot)de> wrote:
> > On 2015-09-02 18:06:54 +0200, Tomas Vondra wrote:
> > Maybe the best thing we can do is just completely abandon the "number of
> > spindles" idea, and just say "number of I/O requests to prefetch". Possibly
> > with an explanation of how to estimate it (devices * queue length).
>
> I think that'd be a lot better.

+many, though I doubt I could describe how to estimate it myself,
considering cloud storage, SANs, multi-lane NVMe etc.  You basically
have to experiment, and like most of our resource consumption limits,
it's a per-backend limit anyway, so it's pretty complicated, but I
don't see how the harmonic series helps anyone.

Should we rename it?  Here are my first suggestions:

random_page_prefetch_degree
maintenance_random_page_prefetch_degree

Rationale for this naming pattern:
* "random_page" from "random_page_cost"
* leaves room for a different setting for sequential prefetching
* "degree" conveys the idea without using loaded words like "queue"
that might imply we know something about the I/O subsystem or that
it's system-wide like kernel and device queues
* "maintenance_" prefix is like other GUCs that establish (presumably
larger) limits for processes working on behalf of many user sessions

Whatever we call it, I don't think it makes sense to try to model the
details of any particular storage system.  Let's use a simple counter
of I/Os initiated but not yet known to have completed (for now: it has
definitely completed when the associated pread() complete; perhaps
something involving real async I/O completion notification in later
releases).

[1]
https://www.postgresql.org/message-id/flat/CAHyXU0yaUG9R_E5%3D1gdXhD-MpWR%3DGr%3D4%3DEHFD_fRid2%2BSCQrqA%40mail.gmail.com
[2] https://www.postgresql.org/message-id/flat/Pine.GSO.4.64.0809220317320.20434%40westnet.com
[3] https://www.postgresql.org/message-id/flat/FDDBA24E-FF4D-4654-BA75-692B3BA71B97%40enterprisedb.com
[4]
https://www.postgresql.org/message-id/flat/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
[5] https://www.postgresql.org/message-id/CA%2BTgmoZP-CTmEPZdmqEOb%2B6t_Tts2nuF7eoqxxvXEHaUoBDmsw%40mail.gmail.com



pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: partition routing layering in nodeModifyTable.c
Next
From: Surafel Temesgen
Date:
Subject: Re: [PATCH v1] Allow COPY "text" to output a header and add headermatching mode to COPY FROM