Re: Thoughts on statistics for continuously advancing columns - Mailing list pgsql-hackers

From Greg Stark
Subject Re: Thoughts on statistics for continuously advancing columns
Date
Msg-id 407d949e0912301042w3e1c14b6s5247587c5fc07ba5@mail.gmail.com
Whole thread Raw
In response to Re: Thoughts on statistics for continuously advancing columns  ("Joshua D. Drake" <jd@commandprompt.com>)
Responses Re: Thoughts on statistics for continuously advancing columns  ("Joshua D. Drake" <jd@commandprompt.com>)
List pgsql-hackers
On Wed, Dec 30, 2009 at 4:31 PM, Joshua D. Drake <jd@commandprompt.com> wrote:
> On the other hand ANALYZE also:
>
> 1. Uses lots of memory
> 2. Lots of processor
> 3. Can take a long time
>
> We normally don't notice because most sets won't incur a penalty. We got a
> customer who
> has a single table that is over 1TB in size... We notice. Granted that is
> the extreme
> but it would only take a quarter of that size (which is common) to start
> seeing issues.

I'm a bit puzzled by people's repeated suggestion here that large
tables take a long time to analyze. The sample analyze takes to
generate statistics is not heavily influenced by the size of the
table. Your 1TB table should take basically the same amount of time as
a 1GB table or a 1MB table (if it wasn't already in cache).

Unless the reason why it's 1TB is that the columns are extremely wide
rather than that it has a lot of rows? Or unless you've raised the
statistics target in (a misguided*) belief that larger tables require
larger statistics targets to achieve the same level of accuracy. Or
unless when you say "ANALYZE" you're really running "VACUUM ANALYZE".

[*] except for ndistinct estimates :(

-- 
greg


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Thoughts on statistics for continuously advancing columns
Next
From: Simon Riggs
Date:
Subject: Re: Cancelling idle in transaction state