Thread: Changing optimizations

Changing optimizations

From

Philip Molter

Date:

05 July 2001, 10:45:12

If someone could, please explain the following.  Here is an EXPLAIN of
a query taken when database performance was less than optimal just
after a VACUUM ANALYZE had been performed on all tables in the
database:

Aggregate  (cost=16941.27..17273.88 rows=370 width=350)
 ->  Group  (cost=16941.27..17246.16 rows=3696 width=350)
  ->  Sort  (cost=16941.27..16941.27 rows=3696 width=350)
   ->  Hash Join  (cost=15021.96..16722.27 rows=3696 width=350)
    ->  Hash Join  (cost=15017.87..16561.04 rows=3696 width=314)
     ->  Hash Join  (cost=15012.78..16408.03 rows=3696 width=286)
      ->  Hash Join  (cost=14593.72..15663.68 rows=3696 width=250)
       ->  Merge Join  (cost=14128.72..14933.33 rows=3696 width=244)
        ->  Merge Join  (cost=14128.72..14787.09 rows=3696 width=236)
         ->  Merge Join  (cost=14128.72..14181.88 rows=3696 width=222)
          ->  Sort  (cost=14128.72..14128.72 rows=3696 width=194)
           ->  Hash Join  (cost=5.95..13909.72 rows=3696 width=194)
            ->  Nested Loop  (cost=0.00..13720.48 rows=4576 width=98)
             ->  Index Scan using percepttype_pkey on percepttype pt
                 (cost=0.00..6.42 rows=3 width=66)
**           ->  Index Scan using ptid_p_index on percept p
                 (cost=0.00..4716.13 rows=2484 width=32)
            ->  Hash  (cost=5.90..5.90 rows=21 width=96)
             ->  Index Scan using active_h_index on hosts h
                 (cost=0.00..5.90 rows=21 width=96)
          ->  Index Scan using perceptthreshold_pkey on perceptthreshold pth
              (cost=0.00..6.45 rows=41 width=28)
>>       ->  Index Scan using statesummary_pkey on statesummary sl
             (cost=0.00..441.51 rows=9401 width=14)
        ->  Index Scan using perceptdepcache_pkey on perceptdepcache pdc
            (cost=0.00..84.60 rows=1236 width=8)
       ->  Hash  (cost=441.51..441.51 rows=9401 width=6)
        ->  Index Scan using statesummary_pkey on statesummary sd
            (cost=0.00..441.51 rows=9401 width=6)
      ->  Hash  (cost=402.04..402.04 rows=6806 width=36)
       ->  Index Scan using perceptlogfield_pkey on perceptlogfield plf
           (cost=0.00..402.04 rows=6806 width=36)
     ->  Hash  (cost=5.08..5.08 rows=7 width=28)
      ->  Index Scan using percepttypethreshold_pkey on percepttypethreshold ptt
          (cost=0.00..5.08 rows=7 width=28)
    ->  Hash  (cost=4.07..4.07 rows=6 width=36)
     ->  Index Scan using warehousefieldtype_pkey on warehousefieldtype wft
         (cost=0.00..4.07 rows=6 width=36)

In this case, I'm specifically interested why the index scan on the
indicated row ('>>') needs to search through 9401 rows when that table
only has 9386 rows (and, hopefully, only 9386 index entries) in it.

Here is a second EXPLAIN on the same query.  Just before this query,
though, I dropped and recreated the indices on the 'percept' table (in
this query, apparently, only the 'ptid_p_index' index is being used).
Now, the system is performing at the level I want:

Aggregate  (cost=949.14..958.93 rows=11 width=350)
 ->  Group  (cost=949.14..958.11 rows=109 width=350)
  ->  Sort  (cost=949.14..949.14 rows=109 width=350)
   ->  Hash Join  (cost=164.10..945.46 rows=109 width=350)
    ->  Hash Join  (cost=160.01..936.66 rows=109 width=314)
     ->  Nested Loop  (cost=154.92..927.13 rows=109 width=286)
      ->  Nested Loop  (cost=154.92..706.28 rows=109 width=250)
       ->  Merge Join  (cost=154.92..485.46 rows=109 width=244)
        ->  Nested Loop  (cost=154.92..384.06 rows=109 width=236)
         ->  Merge Join  (cost=154.92..163.24 rows=109 width=222)
          ->  Sort  (cost=154.92..154.92 rows=109 width=194)
           ->  Hash Join  (cost=5.95..151.24 rows=109 width=194)
            ->  Nested Loop  (cost=0.00..139.64 rows=135 width=98)
             ->  Index Scan using percepttype_pkey on percepttype pt
                 (cost=0.00..6.42 rows=3 width=66)
**           ->  Index Scan using ptid_p_index on percept p
                 (cost=0.00..45.88 rows=19 width=32)
            ->  Hash  (cost=5.90..5.90 rows=21 width=96)
             ->  Index Scan using active_h_index on hosts h
                 (cost=0.00..5.90 rows=21 width=96)
          ->  Index Scan using perceptthreshold_pkey on perceptthreshold pth
              (cost=0.00..6.45 rows=41 width=28)
>>       ->  Index Scan using statesummary_pkey on statesummary sl
             (cost=0.00..2.02 rows=1 width=14)
        ->  Index Scan using perceptdepcache_pkey on perceptdepcache pdc
            (cost=0.00..84.60 rows=1236 width=8)
       ->  Index Scan using statesummary_pkey on statesummary sd
           (cost=0.00..2.02 rows=1 width=6)
     ->  Index Scan using perceptlogfield_pkey on perceptlogfield plf
         (cost=0.00..2.02 rows=1 width=36)
    ->  Hash  (cost=5.08..5.08 rows=7 width=28)
     ->  Index Scan using percepttypethreshold_pkey on percepttypethreshold ptt
         (cost=0.00..5.08 rows=7 width=28)
   ->  Hash  (cost=4.07..4.07 rows=6 width=36)
    ->  Index Scan using warehousefieldtype_pkey on warehousefieldtype wft
        (cost=0.00..4.07 rows=6 width=36)

Here, I can see the index is being used ('**'), and because of that,
the stateSummary primary key is being used correctly two (only one row
should need to be matched).  Why does the ptid_p_index
stop being used, and why do I need to stop action in my database,
recreate the index, and restart the database action for it to begin
working again?  It gets to be a real pain in the ass to have a stop a
system simply because the database appears to stop recognizing its
indices.  While yes, this is a deep join, it's not the complicated
(everything is pretty much on 1-to-1 indices) and that particular index
isn't changing *at all* (UPDATEs, DELETEs, INSERTs, nothing).


* Philip Molter
* DataFoundry.net
* http://www.datafoundry.net/
* philip@datafoundry.net

Re: Changing optimizations

From

Tom Lane

Date:

05 July 2001, 11:19:46

Philip Molter <philip@datafoundry.net> writes:
> If someone could, please explain the following.

Difficult to do, when you haven't shown us the query nor the table
schemas.

            regards, tom lane

Re: Changing optimizations

From

Philip Molter

Date:

05 July 2001, 11:28:57

On Thu, Jul 05, 2001 at 11:19:01AM -0400, Tom Lane wrote:
: Philip Molter <philip@datafoundry.net> writes:
: > If someone could, please explain the following.
:
: Difficult to do, when you haven't shown us the query nor the table
: schemas.

Well, I hesitated to do it since the table definitions are so long.

Here's the query (functions have been defined for the time being for
UNIX_TIMESTAMP() and IFNULL()):

       SELECT h.hid, h.sysname, h.snmpaddr, h.snmpcomm, h.hostgroupid,
             h.active, h.lowalert, h.os, h.osrev, h.platform,
             UNIX_TIMESTAMP( p.nextrun ) AS nextrun, p.pid, p.pkdwid as dwid,
             p.deleted, pt.units, pt.tablename, pt.classname, pt.description,
             pt.logtype, IFNULL( sl.state, 0 ) AS state,
             UNIX_TIMESTAMP( sl.start_time ) AS sctstamp,
             sl.state AS log_state,
             IFNULL( p.runinterval, pt.runinterval ) AS runinterval,
             IFNULL( SUM( sd.state >> 1 ), 0 ) AS dephold,
             IFNULL( pth.d1_time, ptt.d1_time ) AS d1_time,
             IFNULL( pth.d1_min, ptt.d1_min ) AS d1_min,
             IFNULL( pth.d1_max, ptt.d1_max ) AS d1_max,
             IFNULL( pth.d2_time, ptt.d2_time ) AS d2_time,
             IFNULL( pth.d2_min, ptt.d2_min ) AS d2_min,
             IFNULL( pth.d2_max, ptt.d2_max ) AS d2_max,
             p.running, plf.logfield, plf.min, plf.max,
             wft.maptype AS logfield_type
        FROM percept p
             INNER JOIN perceptType pt ON pt.ptid=p.ptid
               AND pt.runinterval IS NOT NULL
             INNER JOIN hosts h ON h.hid=p.hid
             LEFT JOIN perceptThreshold pth ON pth.pid=p.pid
             LEFT JOIN stateSummary sl ON sl.pid=p.pid
             LEFT JOIN perceptDepCache pdc ON pdc.pid=p.pid
             LEFT JOIN stateSummary sd ON pdc.dep_pid=sd.pid
             LEFT JOIN perceptLogField plf ON p.pid=plf.pid
             LEFT JOIN perceptTypeThreshold ptt ON p.ptid=ptt.ptid
             LEFT JOIN warehouseFieldType wft ON plf.type=wft.fieldtype AND
               pt.logtype=wft.logtype
       WHERE p.deleted=0 AND UNIX_TIMESTAMP( p.nextrun )<=NOW() AND
             pt.runinterval IS NOT NULL AND p.running=0 AND h.active=1
    GROUP BY h.hid, h.sysname, h.snmpaddr, h.snmpcomm, h.hostgroupid,
             h.active, h.lowalert, h.os, h.osrev, h.platform,
             p.nextrun, p.pid, p.deleted, pt.units, pt.tablename,
             pt.classname, pt.description, sl.state, sl.start_time,
             p.running, plf.logfield, plf.min, plf.max,
             pt.logtype, p.pkdwid, wft.maptype,
             IFNULL( p.runinterval, pt.runinterval ),
             IFNULL( pth.d1_time, ptt.d1_time ),
             IFNULL( pth.d1_min, ptt.d1_min ),
             IFNULL( pth.d1_max, ptt.d1_max ),
             IFNULL( pth.d2_time, ptt.d2_time ),
             IFNULL( pth.d2_min, ptt.d2_min ),
             IFNULL( pth.d2_max, ptt.d2_max )
      HAVING SUM( sd.state >> 1 ) = 0 OR SUM( sd.state >> 1 ) IS NULL

Every field being used in the JOINs and the WHERE clause is indexed (or
a primary key).

I can post the schemas for all the tables if it's really necessary, but
since the only two tables that seem to be affected are percept and
stateSummary, I'll post those for now.

  CREATE SEQUENCE percept_pid_seq;

  CREATE TABLE percept (
    pid INTEGER DEFAULT nextval('percept_pid_seq'),
    hid INTEGER NOT NULL DEFAULT 0,
    pkdwid INTEGER NOT NULL DEFAULT 1,
    ptid INTEGER NOT NULL DEFAULT 0,
    nextrun TIMESTAMP NOT NULL DEFAULT 'epoch',
    runinterval INTEGER DEFAULT NULL,
    pkwid INTEGER DEFAULT NULL,
    deleted SMALLINT NOT NULL DEFAULT 0,
    running SMALLINT NOT NULL DEFAULT 0,
    PRIMARY KEY (pid)
  );

  CREATE INDEX deleted_p_index ON percept (deleted);
  CREATE INDEX hid_p_index ON percept (hid);
  CREATE INDEX ptid_p_index ON percept (ptid);
  CREATE INDEX nextrun_p_index ON percept (nextrun);
  CREATE INDEX running_p_index ON percept (running);

  CREATE TABLE stateSummary (
    pid INTEGER NOT NULL DEFAULT 0,
    state SMALLINT DEFAULT NULL,
    start_time TIMESTAMP NOT NULL,
    PRIMARY KEY (pid)
  );

* Philip Molter
* DataFoundry.net
* http://www.datafoundry.net/
* philip@datafoundry.net

Re: Changing optimizations

From

Tom Lane

Date:

05 July 2001, 12:36:37

Hmm.  Nothing out of the ordinary about the table schemas.  The relevant
bits of the query seem to be

>         FROM percept p
>              INNER JOIN perceptType pt ON pt.ptid=p.ptid
>                AND pt.runinterval IS NOT NULL

>        WHERE p.deleted=0 AND UNIX_TIMESTAMP( p.nextrun )<=NOW() AND
>              pt.runinterval IS NOT NULL AND p.running=0 AND h.active=1

What seems to be happening is that as you repeatedly VACUUM ANALYZE,
the statistics shift causing a shift in the estimated number of percept
rows that match the WHERE clauses.  As that estimate rises, you get a
change in the selected plan types for the later joins, in a direction
that isn't favorable if the correct number of rows is small.

But it seems odd that you'd get a factor-of-100 change in that estimate
if the true underlying data distribution isn't changing much.  Could
you keep track of the results of these two queries:

select * from pg_class where relname = 'percept';

select attname,attdispersion,s.*
from pg_statistic s, pg_attribute a, pg_class c
where starelid = c.oid and attrelid = c.oid and staattnum = attnum
and relname = 'percept';

(this is for 7.1, s/attdispersion/attdisbursion/ if you're using 7.0)
and see how they change between the state where you're getting a good
plan and the state where you're getting a not-so-good plan?

Another possibility is that what looks to be the same bottom-level join
plan isn't really the same, but is using different restriction/join
clauses for some weird reason.  It would be good to look at EXPLAIN
VERBOSE output not just EXPLAIN output for the two plans, just to rule
that out.

            regards, tom lane

Re: Changing optimizations

From

Philip Molter

Date:

05 July 2001, 13:53:17

On Thu, Jul 05, 2001 at 12:05:52PM -0400, Tom Lane wrote:
: Hmm.  Nothing out of the ordinary about the table schemas.  The relevant
: bits of the query seem to be
:
: >         FROM percept p
: >              INNER JOIN perceptType pt ON pt.ptid=p.ptid
: >                AND pt.runinterval IS NOT NULL
:
: >        WHERE p.deleted=0 AND UNIX_TIMESTAMP( p.nextrun )<=NOW() AND
: >              pt.runinterval IS NOT NULL AND p.running=0 AND h.active=1
:
: What seems to be happening is that as you repeatedly VACUUM ANALYZE,
: the statistics shift causing a shift in the estimated number of percept
: rows that match the WHERE clauses.  As that estimate rises, you get a
: change in the selected plan types for the later joins, in a direction
: that isn't favorable if the correct number of rows is small.

Well the rowcount is always small (or should be; less than 15 rows per
return).  p.nextrun constantly adjusts to a time within the next 5
minutes and p.running constantly shifts between one and zero, but the
other fields (deleted, runinterval, active, etc.) stay the same.  Now,
it could be that it's having a problem with the running or nextrun, but
that should only affect approximately 750 of the rows.  Certainly the
query optimizer should never have to worry about the other 8500.
Furthermore, at any moment, the number of rows matching any of those
fields should be about the same.  If it wasn't, I'd see problems
elsewhere.

: But it seems odd that you'd get a factor-of-100 change in that estimate
: if the true underlying data distribution isn't changing much.  Could
: you keep track of the results of these two queries:
:
: select * from pg_class where relname = 'percept';
:
: select attname,attdispersion,s.*
: from pg_statistic s, pg_attribute a, pg_class c
: where starelid = c.oid and attrelid = c.oid and staattnum = attnum
: and relname = 'percept';

Is there anything I should specifically be looking for?  I'd assume
that attdispersion should stay relatively constant if the data
distribution doesn't change much, but I'm not really familiar with how
the system tables factor into the optimizations.

: Another possibility is that what looks to be the same bottom-level join
: plan isn't really the same, but is using different restriction/join
: clauses for some weird reason.  It would be good to look at EXPLAIN
: VERBOSE output not just EXPLAIN output for the two plans, just to rule
: that out.

My brain hurts when I do that (literally thousands of lines of output).

Thanks for the help,
Philip

* Philip Molter
* DataFoundry.net
* http://www.datafoundry.net/
* philip@datafoundry.net

Re: Changing optimizations

From

Tom Lane

Date:

05 July 2001, 14:24:15

Philip Molter <philip@datafoundry.net> writes:
> Is there anything I should specifically be looking for?

The only guess I have at the moment is that maybe the stacommonval
(most common value) of the 'deleted' or 'running' column matches the
query in one state and not the other.  But please, just send along the
output, even if you don't understand what it means.

Come to think of it, the same stats for perceptType would be interesting
to look at too.

            regards, tom lane

Re: Changing optimizations

From

Philip Molter

Date:

05 July 2001, 14:56:25

On Thu, Jul 05, 2001 at 02:24:07PM -0400, Tom Lane wrote:
: Philip Molter <philip@datafoundry.net> writes:
: > Is there anything I should specifically be looking for?
:
: The only guess I have at the moment is that maybe the stacommonval
: (most common value) of the 'deleted' or 'running' column matches the
: query in one state and not the other.  But please, just send along the
: output, even if you don't understand what it means.

When the system changes behavior, I'll send the one before and the one
after (it's running hourly right now).

: Come to think of it, the same stats for perceptType would be interesting
: to look at too.

Added, although perceptType never changes (no database operations are
performed on that table outside of SELECTs).

* Philip Molter
* DataFoundry.net
* http://www.datafoundry.net/
* philip@datafoundry.net