Re: ANALYZE sampling is too good - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: ANALYZE sampling is too good
Date
Msg-id CAMkU=1w6qOQadwCwVCMkGGpGptfe2R6k5Cc0JzseARt49Z5LPQ@mail.gmail.com
Whole thread Raw
In response to Re: ANALYZE sampling is too good  (Florian Pflug <fgp@phlo.org>)
Responses Re: ANALYZE sampling is too good
List pgsql-hackers
On Mon, Dec 9, 2013 at 3:14 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

 
I took a stab at using posix_fadvise() in ANALYZE. It turned out to be very easy, patch attached. Your mileage may vary, but I'm seeing a nice gain from this on my laptop. Taking a 30000 page sample of a table with 717717 pages (ie. slightly larger than RAM), ANALYZE takes about 6 seconds without the patch, and less than a second with the patch, with effective_io_concurrency=10. If anyone with a good test data set loaded would like to test this and post some numbers, that would be great.

Performance is often chaotic near transition points, so I try to avoid data sets that are slightly bigger or slightly smaller than RAM (or some other limit).

Do you know how many io channels your SSD has (or whatever the term of art is for SSD drives)?

On a RAID with 12 spindles, analyzing pgbench_accounts at scale 1000 (13GB) with 4 GB of RAM goes from ~106 seconds to ~19 seconds.

However, I'm not sure what problem we want to solve here.  I certainly would not wish to give a background maintenance process permission to confiscate my entire RAID throughput for its own operation.  Perhaps this could only be active for explicit analyze, and only if vacuum_cost_delay=0?

Perhaps there should be something like "alter background role autovac set ...".  Otherwise we are going to end up with an "autovacuum_*" shadow parameter for many of our parameters, see "autovacuum_work_mem" discussions.

Cheers,

Jeff

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: pg_rewarm status
Next
From: David Rowley
Date:
Subject: Re: [PATCH] Negative Transition Aggregate Functions (WIP)