How exactly does Analyze work? - Mailing list pgsql-performance

From Richard Neill
Subject How exactly does Analyze work?
Date
Msg-id 4B0D2452.2080606@cam.ac.uk
Whole thread Raw
Responses Re: How exactly does Analyze work?  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: How exactly does Analyze work?  (Kevin Kempter <kevink@consistentstate.com>)
List pgsql-performance
Dear All,

Thanks very much for your help so far. My understanding of PG is getting
a lot better!

I wonder if I've understood analyze properly: I'm not sure I quite
understand how specific the statistics gathered actually are.


In particular, what happens in the following case:
   1. I start with have a table with 100 million rows, and column wid has
      linearly distributed values from 45-90.  (wid is indexed)

   2. I run vacuum analyze

   3. I insert about 2 million rows, all of which have the new wid of 91.

   4. I then do a select * WHERE wid = 91.

How smart is analyze? Will it actually say "well, I've never seen 91 in
this table, because all the values only go up to 90, so you'd better do
a sequential scan"?


-----

On another note, I notice that if I ever manually run vacuum or analyze,
the performance of the database drops to the point where many of the
operators get kicked out. Is there any way to run them "nice" ?

We need to maintain a response time of under 1 second all day for simple
queries (which usually run in about 22ms). But Vacuum or Analyze seem to
lock up the system for a few minutes, during which other queries block
on them, although there is still plenty of CPU spare.

-----


Also, I find that, even with the autovacuum daemon running, there was
one query last night that I had to terminate after an hour. In
desperation, I restarted postgres, let it take 15 mins to vacuum the
entire DB, and then re-ran the query (in 8 minutes)

Any ideas how I can troubleshoot this better? The database is only 30GB
in total - it should (if my intuition is right) be impossible that any
simple select (even over a modestly complex view) should take longer
than a multiple of the time required to read all the data from disk?



Thanks very much,

Richard









pgsql-performance by date:

Previous
From: Richard Neill
Date:
Subject: Re: Query times change by orders of magnitude as DB ages
Next
From: Robert Haas
Date:
Subject: Re: Best possible way to insert and get returned ids