Re: [HACKERS] Slow count(*) again... - Mailing list pgsql-performance

From Greg Smith
Subject Re: [HACKERS] Slow count(*) again...
Date
Msg-id 4D4B2DA0.3040601@2ndquadrant.com
Whole thread Raw
In response to Re: [HACKERS] Slow count(*) again...  (david@lang.hm)
List pgsql-performance
david@lang.hm wrote:
> I am making the assumption that an Analyze run only has to go over the
> data once (a seqential scan of the table if it's >> ram for example)
> and gathers stats as it goes.

And that's the part there's some confusion about here.  ANALYZE grabs a
random set of samples from the table, the number of which is guided by
the setting for default_statistics_target.  The amount of time it takes
is not proportional to the table size; it's only proportional to the
sampling size.  Adding a process whose overhead is proportional to the
table size, such as the continuous analyze idea you're proposing, is
quite likely to be a big step backwards relative to just running a
single ANALYZE after the loading is finished.

What people should be doing if concerned about multiple passes happening
is something like this:

CREATE TABLE t (s serial, i integer) WITH (autovacuum_enabled=off);
[populate table]
ANALYZE t;
ALTER TABLE t SET (autovacuum_enabled=on);

I'm not optimistic the database will ever get smart enough to recognize
bulk loading and do this sort of thing automatically, but as the
workaround is so simple it's hard to get motivated to work on trying.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


pgsql-performance by date:

Previous
From: Mark Kirkwood
Date:
Subject: Re: [HACKERS] Slow count(*) again...
Next
From: Mladen Gogala
Date:
Subject: Re: [HACKERS] Slow count(*) again...