Re: ANALYZE sampling is too good - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: ANALYZE sampling is too good
Date
Msg-id 52B081D3.9090907@vmware.com
Whole thread Raw
In response to Re: ANALYZE sampling is too good  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-hackers
On 12/17/2013 12:06 AM, Jeff Janes wrote:
> On Mon, Dec 9, 2013 at 3:14 PM, Heikki Linnakangas
> <hlinnakangas@vmware.com>wrote:
>
>>   I took a stab at using posix_fadvise() in ANALYZE. It turned out to be
>> very easy, patch attached. Your mileage may vary, but I'm seeing a nice
>> gain from this on my laptop. Taking a 30000 page sample of a table with
>> 717717 pages (ie. slightly larger than RAM), ANALYZE takes about 6 seconds
>> without the patch, and less than a second with the patch, with
>> effective_io_concurrency=10. If anyone with a good test data set loaded
>> would like to test this and post some numbers, that would be great.
>
> Performance is often chaotic near transition points, so I try to avoid data
> sets that are slightly bigger or slightly smaller than RAM (or some other
> limit).
>
> Do you know how many io channels your SSD has (or whatever the term of art
> is for SSD drives)?

No idea. It's an Intel 335.

> On a RAID with 12 spindles, analyzing pgbench_accounts at scale 1000 (13GB)
> with 4 GB of RAM goes from ~106 seconds to ~19 seconds.
>
> However, I'm not sure what problem we want to solve here.

The case that Greg Stark mentioned in the email starting this thread is 
doing a database-wide ANALYZE after an upgrade. In that use case, you 
certainly want to get it done as quickly as possible, using all the 
available resources.

> I certainly would not wish to give a background maintenance process
> permission to confiscate my entire RAID throughput for its own
> operation.

Then don't set effective_io_concurrency. If you're worried about that, 
you probably wouldn't want any other process to monopolize the RAID 
array either.

> Perhaps this could only be active for explicit analyze, and only if
> vacuum_cost_delay=0?

That would be a bit weird, because ANALYZE in general doesn't obey 
vacuum_cost_delay. Maybe it should, though...

> Perhaps there should be something like "alter background role autovac set
> ...".  Otherwise we are going to end up with an "autovacuum_*" shadow
> parameter for many of our parameters, see "autovacuum_work_mem" discussions.

Yeah, so it seems.

- Heikki



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: shared memory message queues
Next
From: Pavel Stehule
Date:
Subject: Re: patch: make_timestamp function