Re: ANALYZE sampling is too good - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: ANALYZE sampling is too good
Date
Msg-id 52A64EDE.60105@vmware.com
Whole thread Raw
In response to Re: ANALYZE sampling is too good  (Claudio Freire <klaussfreire@gmail.com>)
Responses Re: ANALYZE sampling is too good
Re: ANALYZE sampling is too good
List pgsql-hackers
On 12/09/2013 11:56 PM, Claudio Freire wrote:
> On Mon, Dec 9, 2013 at 6:47 PM, Heikki Linnakangas
> <hlinnakangas@vmware.com> wrote:
>> On 12/09/2013 11:35 PM, Jim Nasby wrote:
>>>
>>> On 12/8/13 1:49 PM, Heikki Linnakangas wrote:
>>>>
>>>> On 12/08/2013 08:14 PM, Greg Stark wrote:
>>>>>
>>>>> The whole accounts table is 1.2GB and contains 10 million rows. As
>>>>> expected with rows_per_block set to 1 it reads 240MB of that
>>>>> containing nearly 2 million rows (and takes nearly 20s -- doing a full
>>>>> table scan for select count(*) only takes about 5s):
>>>>
>>>> One simple thing we could do, without or in addition to changing the
>>>> algorithm, is to issue posix_fadvise() calls for the blocks we're
>>>> going to read. It should at least be possible to match the speed of a
>>>> plain sequential scan that way.
>>>
>>> Hrm... maybe it wouldn't be very hard to use async IO here either? I'm
>>> thinking it wouldn't be very hard to do the stage 2 work in the callback
>>> routine...
>>
>> Yeah, other than the fact we have no infrastructure to do asynchronous I/O
>> anywhere in the backend. If we had that, then we could easily use it here. I
>> doubt it would be much better than posix_fadvising the blocks, though.
>
> Without patches to the kernel, it is much better.
>
> posix_fadvise interferes with read-ahead, so posix_fadvise on, say,
> bitmap heap scans (or similarly sorted analyze block samples) run at 1
> IO / block, ie horrible, whereas aio can do read coalescence and
> read-ahead when the kernel thinks it'll be profitable, significantly
> increasing IOPS. I've seen everything from a 2x to 10x difference.

How did you test that, given that we don't actually have an asynchronous
I/O implementation? I don't recall any recent patches floating around
either to do that. When Greg Stark investigated this back in 2007-2008
and implemented posix_fadvise() for bitmap heap scans, posix_fadvise
certainly gave a significant speedup on the test data he used. What kind
of a data distribution gives a slowdown like that?

I took a stab at using posix_fadvise() in ANALYZE. It turned out to be
very easy, patch attached. Your mileage may vary, but I'm seeing a nice
gain from this on my laptop. Taking a 30000 page sample of a table with
717717 pages (ie. slightly larger than RAM), ANALYZE takes about 6
seconds without the patch, and less than a second with the patch, with
effective_io_concurrency=10. If anyone with a good test data set loaded
would like to test this and post some numbers, that would be great.

- Heikki

Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: ANALYZE sampling is too good
Next
From: Jim Nasby
Date:
Subject: Re: Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?