Re: ANALYZE sampling is too good - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: ANALYZE sampling is too good
Date
Msg-id CA+U5nMLW0yZ3JyuLc5=gcBj4RtV-BdC4zewmwaPu-tFABXaBqA@mail.gmail.com
Whole thread Raw
In response to Re: ANALYZE sampling is too good  (Greg Stark <stark@mit.edu>)
List pgsql-hackers
On 11 December 2013 12:08, Greg Stark <stark@mit.edu> wrote:

> So there is something clearly wonky in the histogram stats that's
> affected by the distribution of the sample.

...in the case where the avg width changes in a consistent manner
across the table.

Well spotted.

ISTM we can have a specific cross check for bias in the sample of that
nature. We just calculate the avg width per block and then check for
correlation of the avg width against block number. If we find bias we
can calculate how many extra blocks to sample and from where.

There may be other biases also, so we can check for them and respond
accordingly.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: "ktm@rice.edu"
Date:
Subject: Re: In-Memory Columnar Store
Next
From: "MauMau"
Date:
Subject: Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?