Re: How to optimize monstrous query, sorts instead of - Mailing list pgsql-performance

From Michael Mattox
Subject Re: How to optimize monstrous query, sorts instead of
Date
Msg-id CJEBLDCHAADCLAGIGCOOGEADCLAA.michael.mattox@verideon.com
Whole thread Raw
In response to Re: How to optimize monstrous query, sorts instead of  (Rod Taylor <rbt@rbt.ca>)
List pgsql-performance
> Shared buffers is probably too high.  How much memory in this machine?
> Is there anything else running aside from PostgreSQL?  What does top say
> about cached / buffered data (number)

I was using the 25% of RAM guideline posted recently.  The machine has
1.5gig but it also has a couple other java applications running on it
including tomcat.

  1:56pm  up 6 days,  2:58,  6 users,  load average: 2.60, 2.07, 1.78
193 processes: 191 sleeping, 2 running, 0 zombie, 0 stopped
CPU0 states: 14.0% user,  9.0% system,  0.0% nice, 75.1% idle
CPU1 states: 31.0% user,  0.1% system,  0.0% nice, 67.0% idle
CPU2 states:  5.0% user,  0.1% system,  0.1% nice, 93.0% idle
CPU3 states:  0.0% user,  0.1% system,  0.1% nice, 98.0% idle
Mem:  1547572K av, 1537848K used,    9724K free,       0K shrd,   25104K
buff
Swap: 1044216K av,   51352K used,  992864K free                 1245460K
cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
10184 veriguar   9   0 94760  83M  2612 S    36.5  5.5  13:29 java
 8990 postgres  17   0 54864  53M 53096 R    11.5  3.5   0:00 postmaster
 8988 veriguar  11   0  1164 1164   836 R     1.9  0.0   0:00 top
10161 veriguar  13   5 69504  60M  2600 S N   0.9  3.9  13:11 java
10206 veriguar  13   5 27952  23M  2580 S N   0.9  1.5   7:21 java
10699 postgres   9   0 31656  30M 30396 S     0.9  2.0   0:02 postmaster

             total       used       free     shared    buffers     cached
Mem:       1547572    1532024      15548          0      23820    1239024
-/+ buffers/cache:     269180    1278392
Swap:      1044216      51368     992848

> I see you reduced the random_page_cost to 1.5.  Why did you do this (how
> is your disk subsystem configured)?

Someone suggested I lower it to 1.5 or 1.0, not sure what the reasoning was.
The disks are both SCSI 10,000 RPM.  My data directory is on one disk by
itself, and the pg_xlog is on the other disk as well as the operating system
and everything else.  I was told it's best to have them on seperate disks,
however I'm wondering because my system only has two disks and the one with
the operating system isn't big enough to hold my database therefore I must
put my DB on the 2nd disk and if pg_xlog is to be separate, it has to be
with the OS & Java apps.

> Standard questions, did you VACUUM? Regularly?  Want to try again and
> send us the output from VACUUM VERBOSE?

I vacuum the monitor table every 5 minutes and I do a vacuum full analyze
every night at midnight (cron job).
I just did a vacuum verbose, output is attached.

> Sounds like you created a ton of test data, then removed a bunch?  Did
> you REINDEX that table?

I haven't deleted any of the data, I've been continuously adding new data.
I added about 6 million rows at once, and they all had the same date.  Since
then my application has been stress testing over about 2 weeks now so
there's now 7693057 rows in monitorstatusx and monitorstatusitemx as well as
the necessary rows for the join tables.

> During normal use, what is your query spread like?  Mostly selects with
> some inserts?  Any updates or deletes?  How often to updates or deletes
> come in, and how many rows do they effect?

There is a query on monitorx by datex every 10 seconds (monitors are updated
every 5 minutes, so every 10 seconds I get the monitors that are due for an
update).  Each monitor is then saved with its status field modified, and a
new status item is inserted.  This happens every 5 minutes.  There are 8-16
monitors being run in parallel, although typically it's 8 or less.  This is
the main application.  The reporting application does a few queries but
nothing major except the query that is the subject of this email.  It's the
reporting app that is slow due to this one big query.  Finally the web app
executes the same query as the reporting app, except there is a lot less
data to be returned since it's only for the current day.

> >                            ->  Index Scan using
> monitorstatusxmonitori on
> > monitorstatusx ms  (cost=0.00..5996.18 rows=163 width=84) (actual
> > time=4383.38..14936.39 rows=225 loops=1)
> >                                  Index Cond: ("outer".jdoidx =
> ms.monitorx)
> >                                  Filter: ((datex >= '2003-07-01
> > 00:00:00'::timestamp without time zone) AND (datex <= '2003-07-01
> > 23:59:59'::timestamp without time zone))
>
> The above index scan is taking a vast majority of the time (nearly 15
> seconds of the 16 second total -- stop thinking about sorts!)..  What
> happened to the index on monitorx and datex?

I just did

reindex table monitorstatux;

which didn't help, in fact query times went up.  I then did

create index monitorstatus_monitor_date_i on monitorstatusx(monitorx,
datex);

and this seemed to help a little:


QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------------------------------------------------------
 Sort  (cost=1133.13..1133.38 rows=98 width=827) (actual
time=9754.06..9754.25 rows=226 loops=1)
   Sort Key: ms.datex
   ->  Nested Loop  (cost=0.00..1129.90 rows=98 width=827) (actual
time=50.81..9753.17 rows=226 loops=1)
         ->  Nested Loop  (cost=0.00..833.47 rows=98 width=162) (actual
time=50.74..7149.28 rows=226 loops=1)
               ->  Nested Loop  (cost=0.00..537.04 rows=98 width=146)
(actual time=50.67..4774.45 rows=226 loops=1)
                     ->  Nested Loop  (cost=0.00..240.44 rows=98 width=130)
(actual time=50.61..1515.10 rows=226 loops=1)
                           ->  Index Scan using monitorx_id_index on
monitorx mx  (cost=0.00..3.45 rows=1 width=46) (actual time=0.09..0.11
rows=1 loops=1)
                                 Index Cond: (idx =
'M-TEST_170-TEST_00_10560857890510173779233'::character varying)
                           ->  Index Scan using monitorstatus_monitor_date_i
on monitorstatusx ms  (cost=0.00..234.93 rows=165 width=84) (actual
time=50.51..1513.21 rows=226 loops=1)
                                 Index Cond: (("outer".jdoidx = ms.monitorx)
AND (ms.datex >= '2003-07-01 00:00:00'::timestamp without time zone) AND
(ms.datex <= '2003-07-01 23:59:59'::timestamp without time zone))
                     ->  Index Scan using monitorstatus_stjdoidb742c9b3i on
monitorstatus_statusitemsx mssisx  (cost=0.00..3.01 rows=1 width=16) (actual
time=14.40..14.41 rows=1 loops=226)
                           Index Cond: ("outer".jdoidx = mssisx.jdoidx)
               ->  Index Scan using monitorstatusitejdoid7db0befci on
monitorstatusitemlistd8ea58a5x litem  (cost=0.00..3.01 rows=1 width=16)
(actual time=10.49..10.49 rows=1 loops=226)
                     Index Cond: ("outer".statusitemsx = litem.jdoidx)
         ->  Index Scan using monitorstatusitemx_pkey on monitorstatusitemx
msi  (cost=0.00..3.01 rows=1 width=665) (actual time=11.50..11.50 rows=1
loops=226)
               Index Cond: ("outer".statusitemlistx = msi.jdoidx)
 Total runtime: 9754.64 msec
(17 rows)

Before I guess the index with monitorx,datex didn't do much because all the
data had the same date.  But now that I have over 2 weeks of real data, it
makes a difference.

Thanks,
Michael

Attachment

pgsql-performance by date:

Previous
From: Rod Taylor
Date:
Subject: Re: How to optimize monstrous query, sorts instead of
Next
From: Paul Thomas
Date:
Subject: Re: raise or lower transaction timeout?