Re: Streaming relation data out of order - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Streaming relation data out of order |
Date | |
Msg-id | CA+hUKG+tiy_u5fpf_c0cNHcZ5ZQOZ7A7a48SHY8nrU7AhQQP6Q@mail.gmail.com Whole thread Raw |
In response to | Re: Streaming relation data out of order (Andres Freund <andres@anarazel.de>) |
List | pgsql-hackers |
On Thu, Apr 10, 2025 at 7:35 AM Andres Freund <andres@anarazel.de> wrote: > 1) Increase distance by * 2 + read_size on a miss > > This would help us to increase with distance = 1, where * 2 only increases > distance by 1, just as -1 obviously reduces it by 1. > > It'd also somewhat address the over-weighing of hits compared to misses. > > 2) Only reduce distance if we have no outstanding IOs Interesting. That's similar in some ways to one of my own attempts: I had an extra thing called sustain, and that had to be counted down to zero before distance started decrementing. I didn't want the distance itself to get higher than it already did by doubling once per (combined) IO, I just wanted to defer the decay. (Names from an earlier abandoned attempt to use synthesiser terminology: attack, decay, sustain, release; cute but useless.) I tried a few different recipes, but just adding read sizes to sustain works out pretty close to your pleasingly simple rule 2: sustain until we can't see any IOs in the lookahead window. But I also tried adding a bit more for longer sustain, sustain += read_size * 2 or whatever, so you don't give up your window so easily on a run of hits, and it also achieve escape velocity from the gravity of 1. IDK, I hadn't formed any strong opinions yet and need to test more stuff, hence lack of concrete patches. Thanks, this was helpful, I'll play around with it! > > +/* > > + * Increment buffer index by n, wrapping around at queue size. > > + */ > > +static inline void > > +read_stream_advance_buffer_n(ReadStream *stream, int16 *index, int16 n) > > +{ > > + Assert(*index >= 0); > > + Assert(*index < stream->queue_size); > > + Assert(n <= stream->io_combine_limit); > > + > > + *index += n; > > + if (*index >= stream->queue_size) > > + *index -= stream->queue_size; > > + > > + Assert(*index >= 0); > > + Assert(*index < stream->queue_size); > > +} > > Not sure if that's it's possible to have a large enough queue_size, but this > doesn't look like it's safe against *index + n > PG_INT16_MAX. The value would > wrap into the negative. No, because the queue is sized such that indexes into this overflow space also fit under INT16_MAX: /* * If starting a multi-block I/O near the end of the queue, we might * temporarily need extra space for overflowing buffers before they are * moved to regular circular position. This is the maximum extra space we * could need. */ queue_overflow = io_combine_limit - 1; The patch clearly lacks a comment to explain that mystery. Will add. Thanks! > I think all these variables really ought to be unsigned. None of them ever > could be negative afaict. Maybe... I tend to default to signed variables when there isn't a good reason to prefer unsigned. But I have also been mulling a change here since: commit 06fb5612c970b3af95aca3db5a955669b07537ca Author: Thomas Munro <tmunro@postgresql.org> Date: Wed Mar 19 15:23:12 2025 +1300 Increase io_combine_limit range to 1MB. Unfortunately uint16 still needs to worry about UINT16_MAX (effective_io_concurrency = 1000 * io_combine_limit = 128 = 128,000 buffers + a few, still one bit short; before the above commit the cap was 32,000 buffers + a few, which is why int16 was enough when the code was written). Maybe we should just go directly to int, or uint32 if you insist, and stop worrying about overflow. I think we agreed that 128,000 buffers = 1000MB of in-progress I/O seemed unrealistic and 32K ought to be enough for anybody™, but really we're talking about saving a handful of bytes in struct ReadStream in exchange for a risk of bugs. Will write a patch to try that soon...
pgsql-hackers by date: