ANALYZE sampling is too good - Mailing list pgsql-hackers

From Greg Stark
Subject ANALYZE sampling is too good
Date
Msg-id CAM-w4HOjRbNPMW=SHjHw_Qfapcuu5Ege1tMdR0ZQU+kqX8Qeug@mail.gmail.com
Whole thread Raw
Responses Re: ANALYZE sampling is too good
Re: ANALYZE sampling is too good
Re: ANALYZE sampling is too good
Re: ANALYZE sampling is too good
List pgsql-hackers
At multiple conferences I've heard about people trying all sorts of
gymnastics to avoid ANALYZE which they expect to take too long and
consume too much I/O. This is especially a big complain after upgrades
when their new database performs poorly until the new statistics are
in and they did pg_upgrade to avoid an extended downtime and complain
about ANALYZE taking hours.

I always gave the party line that ANALYZE only takes a small
constant-sized sample so even very large tables should be very quick.
But after hearing the same story again in Heroku I looked into it a
bit further. I was kind of shocked but the numbers.

ANALYZE takes a sample of 300 * statistics_target rows. That sounds
pretty reasonable but with default_statistics_target set to 100 that's
30,000 rows. If I'm reading the code right It takes this sample by
sampling 30,000 blocks and then (if the table is large enough) taking
an average of one row per block. Each block is 8192 bytes so that
means it's reading 240MB of each table.That's a lot more than I
realized.

It means if your table is anywhere up to 240MB you're effectively
doing a full table scan and then throwing out nearly all the data
read.

Worse, my experience with the posix_fadvise benchmarking is that on
spinning media reading one out of every 16 blocks takes about the same
time as reading them all. Presumably this is because the seek time
between tracks dominates and reading one out of every 16 blocks is
still reading every track. So in fact if your table is up to about
3-4G ANALYZE is still effectively going to do a full table scan, at
least as far as I/O time goes.

The current algorithm seems like it was designed with a 100G+ table in
mind but the consequences on the more common 100M-100G tables weren't
really considered. Consider what this means for partitioned tables. If
they partition their terabyte table into 10 partitions ANALYZE will
suddenly want to use 10x as much I/O which seems like a perverse
consequence.

I'm not sure I have a prescription but my general feeling is that
we're spending an awful lot of resources going after a statistically
valid sample when we can spend a lot less resources and get something
90% as good. Or if we're really going to read that much data that we
might as well use more of the rows we find.

-- 
greg



pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Why we are going to have to go DirectIO
Next
From: "Joshua D. Drake"
Date:
Subject: Re: Why we are going to have to go DirectIO