Home > mailing lists

Re: Tuplesort merge pre-reading - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: Tuplesort merge pre-reading
Date	September 9, 2016 12:01:16
Msg-id	03323fef-6d84-e53d-be28-bbfa68cf8751@iki.fi Whole thread Raw
In response to	Re: Tuplesort merge pre-reading (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses	Re: Tuplesort merge pre-reading
List	pgsql-hackers

Tree view

(Resending with compressed files, to get below the message size limit of
the list)

On 09/09/2016 02:13 PM, Heikki Linnakangas wrote:
> On 09/08/2016 09:59 PM, Heikki Linnakangas wrote:
>> On 09/06/2016 10:26 PM, Peter Geoghegan wrote:
>>> On Tue, Sep 6, 2016 at 12:08 PM, Peter Geoghegan <pg@heroku.com> wrote:
>>>> Offhand, I would think that taken together this is very important. I'd
>>>> certainly want to see cases in the hundreds of megabytes or gigabytes
>>>> of work_mem alongside your 4MB case, even just to be able to talk
>>>> informally about this. As you know, the default work_mem value is very
>>>> conservative.
>>
>> I spent some more time polishing this up, and also added some code to
>> logtape.c, to use larger read buffers, to compensate for the fact that
>> we don't do pre-reading from tuplesort.c anymore. That should trigger
>> the OS read-ahead, and make the I/O more sequential, like was the
>> purpose of the old pre-reading code. But simpler. I haven't tested that
>> part much yet, but I plan to run some tests on larger data sets that
>> don't fit in RAM, to make the I/O effects visible.
>
> Ok, I ran a few tests with 20 GB tables. I thought this would show any
> differences in I/O behaviour, but in fact it was still completely CPU
> bound, like the tests on smaller tables I posted yesterday. I guess I
> need to point temp_tablespaces to a USB drive or something. But here we go.

I took a different tact at demonstrating the I/O pattern effects. I
added some instrumentation code to logtape.c, that prints a line to a
file whenever it reads a block, with the block number. I ran the same
query with master and with these patches, and plotted the access pattern
with gnuplot.

I'm happy with what it looks like. We are in fact getting a more
sequential access pattern with these patches, because we're not
expanding the pre-read tuples into SortTuples. Keeping densely-packed
blocks in memory, instead of SortTuples, allows caching more data overall.

Attached is the patch I used to generate these traces, the gnuplot
script, and traces from I got from sorting a 1 GB table of random
integers, with work_mem=16MB.

Note that in the big picture, what appear to be individual dots, are
actually clusters of a bunch of dots. So the access pattern is a lot
more sequential than it looks like at first glance, with or without
these patches. The zoomed-in pictures show that. If you want to inspect
these in more detail, I recommend running gnuplot in interactive mode,
so that you can zoom in and out easily.

I'm happy with the amount of testing I've done now, and the results.
Does anyone want to throw out any more test cases where there might be a
regression? If not, let's get these reviewed and committed.

- Heikki

Attachment

pgsql-hackers by date:

From: Tom Lane
Date: 09 September 2016, 11:57:48
Subject: Re: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

From: Michael Paquier
Date: 09 September 2016, 12:14:04
Subject: Re: [COMMITTERS] pgsql: Fix corruption of 2PC recovery with subxacts

Re: Tuplesort merge pre-reading - Mailing list pgsql-hackers

Attachment

Previous

Next