Thread: How to optimize monstrous query, sorts instead of using index

How to optimize monstrous query, sorts instead of using index

From
"Michael Mattox"
Date:
I've used indexes to speed up my queries but this query escapes me.  I'm
curious if someone can suggest an index or a way to modify the query to use
the index.  The query is:

select ms.averageconnecttimex as ms_averageconnecttime, ms.averagedurationx
as ms_averageduration, ms.datex as ms_date, ms.idx as ms_id,
ms.statusstringx as ms_statusstring, ms.statusx as ms_status,
msi.actualcontentx as msi_actualcontent, msi.connecttimex as
msi_connecttime, msi.correctcontentx as msi_correctcontent, msi.datex as
msi_date, msi.descriptionx as msi_description, msi.durationx as
msi_duration, msi.errorcontentx as msi_errorcontent, msi.idx as msi_id,
msi.monitorlocationx as msi_monitorlocation, msi.statusstringx as
msi_statusstring, msi.statusx as msi_status from monitorstatusx ms,
monitorstatusitemx msi where monitorx.idx =
'M-TEST_1444-TEST_00_10560561260561463219352' AND monitorx.jdoidx =
ms.monitorx AND ms.datex >= '2003-06-20 08:57:21.36' AND ms.datex <=
'2003-06-29 08:57:21.36' AND ms.jdoidx = monitorstatus_statusitemsx.jdoidx
AND monitorstatus_statusitemsx.statusitemsx =
monitorstatusitemlistd8ea58a5x.jdoidx AND
monitorstatusitemlistd8ea58a5x.statusitemlistx = msi.jdoidx ORDER BY ms_date
DESC;

Here is the result of explain:

 Sort  (cost=9498.85..9500.16 rows=525 width=788)
   Sort Key: ms.datex
   ->  Nested Loop  (cost=0.00..9475.15 rows=525 width=788)
         ->  Nested Loop  (cost=0.00..7887.59 rows=525 width=123)
               ->  Nested Loop  (cost=0.00..6300.03 rows=525 width=107)
                     ->  Nested Loop  (cost=0.00..4712.02 rows=525 width=91)
                           ->  Index Scan using monitorx_id_index on
monitorx  (cost=0.00..5.37 rows=1 width=8)
                                 Index Cond: (idx =
'M-TEST_1444-TEST_00_10560561260561463219352'::character varying)
                           ->  Index Scan using monitorstatusxmonitori on
monitorstatusx ms  (cost=0.00..4695.65 rows=880 width=83)
                                 Index Cond: ("outer".jdoidx = ms.monitorx)
                                 Filter: ((datex >= '2003-06-20
08:57:21.36'::timestamp without time zone) AND (datex <= '2003-06-29
08:57:21.36'::timestamp without time zone))
                     ->  Index Scan using monitorstatus_stjdoidb742c9b3i on
monitorstatus_statusitemsx  (cost=0.00..3.01 rows=1 width=16)
                           Index Cond: ("outer".jdoidx =
monitorstatus_statusitemsx.jdoidx)
               ->  Index Scan using monitorstatusitejdoid7db0befci on
monitorstatusitemlistd8ea58a5x  (cost=0.00..3.01 rows=1 width=16)
                     Index Cond: ("outer".statusitemsx =
monitorstatusitemlistd8ea58a5x.jdoidx)
         ->  Index Scan using monitorstatusitemx_pkey on monitorstatusitemx
msi  (cost=0.00..3.01 rows=1 width=665)
               Index Cond: ("outer".statusitemlistx = msi.jdoidx)
(17 rows)

As you can see, it's doing a sort on ms.datex.  I created an index on the
monitorstatusx (ms) table for the datex, but it doesn't use it.  Is it
possible to create an index to prevent this sort?

Thanks,
Michael


Michael Mattox
cunparis@yahoo.fr / http://www.advweb.com/michael




Re: How to optimize monstrous query, sorts instead of using index

From
Rod Taylor
Date:
Is this 7.3.x?  Can we see explain analyze output for the query?

On Wed, 2003-06-25 at 07:46, Michael Mattox wrote:
> I've used indexes to speed up my queries but this query escapes me.  I'm
> curious if someone can suggest an index or a way to modify the query to use
> the index.  The query is:
>
> select ms.averageconnecttimex as ms_averageconnecttime, ms.averagedurationx
> as ms_averageduration, ms.datex as ms_date, ms.idx as ms_id,
> ms.statusstringx as ms_statusstring, ms.statusx as ms_status,
> msi.actualcontentx as msi_actualcontent, msi.connecttimex as
> msi_connecttime, msi.correctcontentx as msi_correctcontent, msi.datex as
> msi_date, msi.descriptionx as msi_description, msi.durationx as
> msi_duration, msi.errorcontentx as msi_errorcontent, msi.idx as msi_id,
> msi.monitorlocationx as msi_monitorlocation, msi.statusstringx as
> msi_statusstring, msi.statusx as msi_status from monitorstatusx ms,
> monitorstatusitemx msi where monitorx.idx =
> 'M-TEST_1444-TEST_00_10560561260561463219352' AND monitorx.jdoidx =
> ms.monitorx AND ms.datex >= '2003-06-20 08:57:21.36' AND ms.datex <=
> '2003-06-29 08:57:21.36' AND ms.jdoidx = monitorstatus_statusitemsx.jdoidx
> AND monitorstatus_statusitemsx.statusitemsx =
> monitorstatusitemlistd8ea58a5x.jdoidx AND
> monitorstatusitemlistd8ea58a5x.statusitemlistx = msi.jdoidx ORDER BY ms_date
> DESC;
>
> Here is the result of explain:
>
>  Sort  (cost=9498.85..9500.16 rows=525 width=788)
>    Sort Key: ms.datex
>    ->  Nested Loop  (cost=0.00..9475.15 rows=525 width=788)
>          ->  Nested Loop  (cost=0.00..7887.59 rows=525 width=123)
>                ->  Nested Loop  (cost=0.00..6300.03 rows=525 width=107)
>                      ->  Nested Loop  (cost=0.00..4712.02 rows=525 width=91)
>                            ->  Index Scan using monitorx_id_index on
> monitorx  (cost=0.00..5.37 rows=1 width=8)
>                                  Index Cond: (idx =
> 'M-TEST_1444-TEST_00_10560561260561463219352'::character varying)
>                            ->  Index Scan using monitorstatusxmonitori on
> monitorstatusx ms  (cost=0.00..4695.65 rows=880 width=83)
>                                  Index Cond: ("outer".jdoidx = ms.monitorx)
>                                  Filter: ((datex >= '2003-06-20
> 08:57:21.36'::timestamp without time zone) AND (datex <= '2003-06-29
> 08:57:21.36'::timestamp without time zone))
>                      ->  Index Scan using monitorstatus_stjdoidb742c9b3i on
> monitorstatus_statusitemsx  (cost=0.00..3.01 rows=1 width=16)
>                            Index Cond: ("outer".jdoidx =
> monitorstatus_statusitemsx.jdoidx)
>                ->  Index Scan using monitorstatusitejdoid7db0befci on
> monitorstatusitemlistd8ea58a5x  (cost=0.00..3.01 rows=1 width=16)
>                      Index Cond: ("outer".statusitemsx =
> monitorstatusitemlistd8ea58a5x.jdoidx)
>          ->  Index Scan using monitorstatusitemx_pkey on monitorstatusitemx
> msi  (cost=0.00..3.01 rows=1 width=665)
>                Index Cond: ("outer".statusitemlistx = msi.jdoidx)
> (17 rows)
>
> As you can see, it's doing a sort on ms.datex.  I created an index on the
> monitorstatusx (ms) table for the datex, but it doesn't use it.  Is it
> possible to create an index to prevent this sort?
>
> Thanks,
> Michael
>
>
> Michael Mattox
> cunparis@yahoo.fr / http://www.advweb.com/michael
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
--
Rod Taylor <rbt@rbt.ca>

PGP Key: http://www.rbt.ca/rbtpub.asc

Attachment

Re: How to optimize monstrous query, sorts instead of using index

From
"Michael Mattox"
Date:
Sorry, I neglected to say the version, yes I'm using Postgres 7.3.2 on
Linux.

Here's the output of explain analyze.  The query typically takes 0-4 seconds
depending on the time frame.  It's run very frequently especially to process
the nightly reports.

veriguard=# explain analyze select ms.averageconnecttimex as
ms_averageconnecttime, ms.averagedurationx as ms_averageduration, ms.datex
as ms_date, ms.idx as ms_id, ms.statusstringx as ms_statusstring, ms.statusx
as ms_status, msi.actualcontentx as msi_actualcontent, msi.connecttimex as
msi_connecttime, msi.correctcontentx as msi_correctcontent, msi.datex as
msi_date, msi.descriptionx as msi_description, msi.durationx as
msi_duration, msi.errorcontentx as msi_errorcontent, msi.idx as msi_id,
msi.monitorlocationx as msi_monitorlocation, msi.statusstringx as
msi_statusstring, msi.statusx as msi_status from monitorstatusx ms,
monitorstatusitemx msi where monitorx.idx =
'M-TEST_1444-TEST_00_10560561260561463219352' AND monitorx.jdoidx =
ms.monitorx AND ms.datex >= '2003-06-20 08:57:21.36' AND ms.datex <=
'2003-06-29 08:57:21.36' AND ms.jdoidx = monitorstatus_statusitemsx.jdoidx
AND monitorstatus_statusitemsx.statusitemsx =
monitorstatusitemlistd8ea58a5x.jdoidx AND
monitorstatusitemlistd8ea58a5x.statusitemlistx = msi.jdoidx ORDER BY ms_date
DESC;

QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------------------
 Sort  (cost=9498.96..9500.27 rows=525 width=788) (actual
time=6720.91..6721.44 rows=623 loops=1)
   Sort Key: ms.datex
   ->  Nested Loop  (cost=0.00..9475.26 rows=525 width=788) (actual
time=145.16..6718.65 rows=623 loops=1)
         ->  Nested Loop  (cost=0.00..7887.69 rows=525 width=123) (actual
time=126.84..4528.85 rows=623 loops=1)
               ->  Nested Loop  (cost=0.00..6300.13 rows=525 width=107)
(actual time=95.37..3470.55 rows=623 loops=1)
                     ->  Nested Loop  (cost=0.00..4712.13 rows=525 width=91)
(actual time=40.44..1892.06 rows=625 loops=1)
                           ->  Index Scan using monitorx_id_index on
monitorx  (cost=0.00..5.48 rows=1 width=8) (actual time=0.25..19.90 rows=1
loops=1)
                                 Index Cond: (idx =
'M-TEST_1444-TEST_00_10560561260561463219352'::character varying)
                           ->  Index Scan using monitorstatusxmonitori on
monitorstatusx ms  (cost=0.00..4695.65 rows=880 width=83) (actual
time=40.17..1868.12 rows=625 loops=1)
                                 Index Cond: ("outer".jdoidx = ms.monitorx)
                                 Filter: ((datex >= '2003-06-20
08:57:21.36'::timestamp without time zone) AND (datex <= '2003-06-29
08:57:21.36'::timestamp without time zone))
                     ->  Index Scan using monitorstatus_stjdoidb742c9b3i on
monitorstatus_statusitemsx  (cost=0.00..3.01 rows=1 width=16) (actual
time=2.51..2.51 rows=1 loops=625)
                           Index Cond: ("outer".jdoidx =
monitorstatus_statusitemsx.jdoidx)
               ->  Index Scan using monitorstatusitejdoid7db0befci on
monitorstatusitemlistd8ea58a5x  (cost=0.00..3.01 rows=1 width=16) (actual
time=1.68..1.69 rows=1 loops=623)
                     Index Cond: ("outer".statusitemsx =
monitorstatusitemlistd8ea58a5x.jdoidx)
         ->  Index Scan using monitorstatusitemx_pkey on monitorstatusitemx
msi  (cost=0.00..3.01 rows=1 width=665) (actual time=3.50..3.50 rows=1
loops=623)
               Index Cond: ("outer".statusitemlistx = msi.jdoidx)
 Total runtime: 6722.43 msec
(18 rows)




Re: How to optimize monstrous query, sorts instead of

From
Rod Taylor
Date:
> Here's the output of explain analyze.  The query typically takes 0-4 seconds
> depending on the time frame.  It's run very frequently especially to process
> the nightly reports.

The plan picked seems reasonable (estimated costs / tuples is close to
actual).

I think the biggest hit is this index scan.  Thats a substantial cost to
pull out less than a thousand lines:

                           ->  Index Scan using monitorstatusxmonitori
on
monitorstatusx ms  (cost=0.00..4695.65 rows=880 width=83) (actual
time=40.17..1868.12 rows=625 loops=1)
                                 Index Cond: ("outer".jdoidx =
ms.monitorx)
                                 Filter: ((datex >= '2003-06-20
08:57:21.36'::timestamp without time zone) AND (datex <= '2003-06-29
08:57:21.36'::timestamp without time zone))


Are jdoidx and monitorx integers?

You might try a multi-column index on (ms.monitorx, ms.datex).

Are monitorx assigned roughly ordered by date?  It must be, otherwise
the sort step would not be so cheap (hardly any impact on the query --
see actual cost number).  The multi-column index above should give you a
bit of a boost.

Depending on the data in the table, the index (ms.datex, monitorx) may
give better results along with a single index on (ms.monitorx) as you
currently have.  It's not very likely though.

--
Rod Taylor <rbt@rbt.ca>

PGP Key: http://www.rbt.ca/rbtpub.asc

Attachment

Re: How to optimize monstrous query, sorts instead of

From
"Michael Mattox"
Date:
> Are jdoidx and monitorx integers?

Yes both are integers:

-- Table: public.monitorstatusx
CREATE TABLE public.monitorstatusx (
  averageconnecttimex numeric(65535, 65532),
  averagedurationx numeric(65535, 65532),
  datex timestamp,
  idx varchar(255),
  jdoclassx varchar(255),
  jdoidx int8 NOT NULL,
  jdolockx int4,
  monitorx int8,
  statusstringx varchar(255),
  statusx varchar(255),
  CONSTRAINT monitorstatusx_pkey PRIMARY KEY (jdoidx)
) WITH OIDS;


> You might try a multi-column index on (ms.monitorx, ms.datex).

Just tried it, it didn't prevent the sort.  But it sounds like the sort
isn't the problem, correct?

-- Index: public.monitorstatusx_datex_monitorx_index
CREATE INDEX monitorstatusx_datex_monitorx_index ON monitorstatusx USING
btree (monitorx, datex);

veriguard=# explain analyze select ms.averageconnecttimex as
ms_averageconnecttime, ms.averagedurationx as ms_averageduration, ms.datex
as ms_date, ms.idx as ms_id, ms.statusstringx as ms_statusstring, ms.statusx
as ms_status, msi.actualcontentx as msi_actualcontent, msi.connecttimex as
msi_connecttime, msi.correctcontentx as msi_correctcontent, msi.datex as
msi_date, msi.descriptionx as msi_description, msi.durationx as
msi_duration, msi.errorcontentx as msi_errorcontent, msi.idx as msi_id,
msi.monitorlocationx as msi_monitorlocation, msi.statusstringx as
msi_statusstring, msi.statusx as msi_status from monitorstatusx ms,
monitorstatusitemx msi where monitorx.idx =
'M-TEST_1444-TEST_00_10560561260561463219352' AND monitorx.jdoidx =
ms.monitorx AND ms.datex >= '2003-06-20 08:57:21.36' AND ms.datex <=
'2003-06-29 08:57:21.36' AND ms.jdoidx = monitorstatus_statusitemsx.jdoidx
AND monitorstatus_statusitemsx.statusitemsx =
monitorstatusitemlistd8ea58a5x.jdoidx AND
monitorstatusitemlistd8ea58a5x.statusitemlistx = msi.jdoidx ORDER BY ms_date
DESC;

QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------------------------------------------------------------
 Sort  (cost=6014.53..6015.86 rows=529 width=788) (actual
time=4286.35..4286.88 rows=626 loops=1)
   Sort Key: ms.datex
   ->  Nested Loop  (cost=0.00..5990.59 rows=529 width=788) (actual
time=131.57..4283.76 rows=626 loops=1)
         ->  Nested Loop  (cost=0.00..4388.44 rows=529 width=123) (actual
time=106.23..3398.54 rows=626 loops=1)
               ->  Nested Loop  (cost=0.00..2786.29 rows=529 width=107)
(actual time=90.29..2518.20 rows=626 loops=1)
                     ->  Nested Loop  (cost=0.00..1175.81 rows=532 width=91)
(actual time=55.15..1345.88 rows=628 loops=1)
                           ->  Index Scan using monitorx_id_index on
monitorx  (cost=0.00..5.36 rows=1 width=8) (actual time=54.94..55.03 rows=1
loops=1)
                                 Index Cond: (idx =
'M-TEST_1444-TEST_00_10560561260561463219352'::character varying)
                           ->  Index Scan using
monitorstatusx_datex_monitorx_index on monitorstatusx ms
(cost=0.00..1159.33 rows=890 width=83) (actual time=0.19..1287.02 rows=628
loops=1)
                                 Index Cond: (("outer".jdoidx = ms.monitorx)
AND (ms.datex >= '2003-06-20 08:57:21.36'::timestamp without time zone) AND
(ms.datex <= '2003-06-29 08:57:21.36'::timestamp without time zone))
                     ->  Index Scan using monitorstatus_stjdoidb742c9b3i on
monitorstatus_statusitemsx  (cost=0.00..3.01 rows=1 width=16) (actual
time=1.85..1.86 rows=1 loops=628)
                           Index Cond: ("outer".jdoidx =
monitorstatus_statusitemsx.jdoidx)
               ->  Index Scan using monitorstatusitejdoid7db0befci on
monitorstatusitemlistd8ea58a5x  (cost=0.00..3.01 rows=1 width=16) (actual
time=1.39..1.39 rows=1 loops=626)
                     Index Cond: ("outer".statusitemsx =
monitorstatusitemlistd8ea58a5x.jdoidx)
         ->  Index Scan using monitorstatusitemx_pkey on monitorstatusitemx
msi  (cost=0.00..3.01 rows=1 width=665) (actual time=1.40..1.40 rows=1
loops=626)
               Index Cond: ("outer".statusitemlistx = msi.jdoidx)
 Total runtime: 4288.71 msec
(17 rows)

veriguard=#


> Are monitorx assigned roughly ordered by date?  It must be, otherwise
> the sort step would not be so cheap (hardly any impact on the query --
> see actual cost number).  The multi-column index above should give you a
> bit of a boost.

monitorx is a foreign key to the monitorx table.

If the query can't be optimized it's OK, I can live it the speed.  I just
couldn't figure out why it'd sort on datex if I had an index on datex.

Thanks,
Michael



Re: How to optimize monstrous query, sorts instead of

From
Rod Taylor
Date:
> > You might try a multi-column index on (ms.monitorx, ms.datex).
>
> Just tried it, it didn't prevent the sort.  But it sounds like the sort
> isn't the problem, correct?

The sort isn't actually doing any sorting, so it's virtually free.  The
sort is taking less than 3ms as the data is already 99% sorted due to
the correlation between datex and monitorx.

For similar reasons, the datex index will not be used, as it has no
advantage to being used.

>                            ->  Index Scan using
> monitorstatusx_datex_monitorx_index on monitorstatusx ms
> (cost=0.00..1159.33 rows=890 width=83) (actual time=0.19..1287.02 rows=628
> loops=1)
>                                  Index Cond: (("outer".jdoidx = ms.monitorx)
> AND (ms.datex >= '2003-06-20 08:57:21.36'::timestamp without time zone) AND
> (ms.datex <= '2003-06-29 08:57:21.36'::timestamp without time zone))

You can see that it used the new multi-key index for both items, rather
than finding for monitorx, then filtering out unwanted results by datex.

It doesn't appear to have made much difference (looks like data was
partially cached for this new run), but it changed a bit for the better.

I'm afraid thats the best I can do on the query itself I think.


Oh, and using tables in your where clause that aren't in the from clause
is non-portable and often hides bugs:

 from monitorstatusx ms
    , monitorstatusitemx msi
where monitorx.idx = 'M-TEST_1444-TEST_00_10560561260561463219352'

Are you sure you sure you don't have any duplicated constraints by
pulling information in from other tables that you don't need to?
Removing some of those nested loops would make a significant impact to
the results.

--
Rod Taylor <rbt@rbt.ca>

PGP Key: http://www.rbt.ca/rbtpub.asc

Attachment

Re: How to optimize monstrous query, sorts instead of

From
"Michael Mattox"
Date:
> Oh, and using tables in your where clause that aren't in the from clause
> is non-portable and often hides bugs:
>
>  from monitorstatusx ms
>     , monitorstatusitemx msi
> where monitorx.idx = 'M-TEST_1444-TEST_00_10560561260561463219352'
>
> Are you sure you sure you don't have any duplicated constraints by
> pulling information in from other tables that you don't need to?
> Removing some of those nested loops would make a significant impact to
> the results.

I didn't notice that before, thanks for pointing that out.  I just tried
adding monitorx.idx to the select and it ended up making my query take
several minutes long.  Any ideas how I can fix this and keep my performance?

new query:

veriguard=# explain select m.idx, ms.averageconnecttimex as
ms_averageconnecttime, ms.averagedurationx as ms_averageduration, ms.datex
as ms_date, ms.idx as ms_id, ms.statusstringx as ms_statusstring, ms.statusx
as ms_status, msi.actualcontentx as msi_actualcontent, msi.connecttimex as
msi_connecttime, msi.correctcontentx as msi_correctcontent, msi.datex as
msi_date, msi.descriptionx as msi_description, msi.durationx as
msi_duration, msi.errorcontentx as msi_errorcontent, msi.idx as msi_id,
msi.monitorlocationx as msi_monitorlocation, msi.statusstringx as
msi_statusstring, msi.statusx as msi_status from monitorx m, monitorstatusx
ms, monitorstatusitemx msi where m.idx =
'M-TEST_1444-TEST_00_10560561260561463219352' AND monitorx.jdoidx =
ms.monitorx AND ms.datex >= '2003-06-20 08:57:21.36' AND ms.datex <=
'2003-06-29 08:57:21.36' AND ms.jdoidx = monitorstatus_statusitemsx.jdoidx
AND monitorstatus_statusitemsx.statusitemsx =
monitorstatusitemlistd8ea58a5x.jdoidx AND
monitorstatusitemlistd8ea58a5x.statusitemlistx = msi.jdoidx ORDER BY ms_date
DESC;

QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------------------
-------------------------------------
 Sort  (cost=1653384.42..1655402.97 rows=807418 width=826)
   Sort Key: ms.datex
   ->  Hash Join  (cost=820308.66..1112670.42 rows=807418 width=826)
         Hash Cond: ("outer".monitorx = "inner".jdoidx)
         ->  Merge Join  (cost=820132.71..1098364.65 rows=807418 width=780)
               Merge Cond: ("outer".jdoidx = "inner".statusitemlistx)
               ->  Index Scan using monitorstatusitemx_pkey on
monitorstatusitemx msi  (cost=0.00..247616.27 rows=6596084 width=665)
               ->  Sort  (cost=820132.71..822151.59 rows=807554 width=115)
                     Sort Key:
monitorstatusitemlistd8ea58a5x.statusitemlistx
                     ->  Hash Join  (cost=461310.87..685820.13 rows=807554
width=115)
                           Hash Cond: ("outer".jdoidx =
"inner".statusitemsx)
                           ->  Seq Scan on monitorstatusitemlistd8ea58a5x
(cost=0.00..104778.90 rows=6597190 width=16)
                           ->  Hash  (cost=447067.98..447067.98 rows=807554
width=99)
                                 ->  Merge Join  (cost=0.00..447067.98
rows=807554 width=99)
                                       Merge Cond: ("outer".jdoidx =
"inner".jdoidx)
                                       ->  Index Scan using
monitorstatusx_pkey on monitorstatusx ms  (cost=0.00..272308.56 rows=811754
width=83)
                                             Filter: ((datex >= '2003-06-20
08:57:21.36'::timestamp without time zone) AND (datex <= '2003-06-29
08:57:21.36'::timestamp without time zone))
                                       ->  Index Scan using
monitorstatus_stjdoidb742c9b3i on monitorstatus_statusitemsx
(cost=0.00..146215.58 rows=6596680 width=16)
         ->  Hash  (cost=172.22..172.22 rows=1493 width=46)
               ->  Nested Loop  (cost=0.00..172.22 rows=1493 width=46)
                     ->  Index Scan using monitorx_id_index on monitorx m
(cost=0.00..5.36 rows=1 width=38)
                           Index Cond: (idx =
'M-TEST_1444-TEST_00_10560561260561463219352'::character varying)
                     ->  Seq Scan on monitorx  (cost=0.00..151.93 rows=1493
width=8)
(23 rows)

old query:

veriguard=# explain select ms.averageconnecttimex as ms_averageconnecttime,
ms.averagedurationx as ms_averageduration, ms.datex as ms_date, ms.idx as
ms_id, ms.statusstringx as ms_statusstring, ms.statusx as ms_status,
msi.actualcontentx as msi_actualcontent, msi.connecttimex as
msi_connecttime, msi.correctcontentx as msi_correctcontent, msi.datex as
msi_date, msi.descriptionx as msi_description, msi.durationx as
msi_duration, msi.errorcontentx as msi_errorcontent, msi.idx as msi_id,
msi.monitorlocationx as msi_monitorlocation, msi.statusstringx as
msi_statusstring, msi.statusx as msi_status from monitorstatusx ms,
monitorstatusitemx msi where monitorx.idx =
'M-TEST_1444-TEST_00_10560561260561463219352' AND monitorx.jdoidx =
ms.monitorx AND ms.datex >= '2003-06-20 08:57:21.36' AND ms.datex <=
'2003-06-29 08:57:21.36' AND ms.jdoidx = monitorstatus_statusitemsx.jdoidx
AND monitorstatus_statusitemsx.statusitemsx =
monitorstatusitemlistd8ea58a5x.jdoidx AND
monitorstatusitemlistd8ea58a5x.statusitemlistx = msi.jdoidx ORDER BY ms_date
DESC;

QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------------------
-------------------------
 Sort  (cost=9590.52..9591.87 rows=541 width=788)
   Sort Key: ms.datex
   ->  Nested Loop  (cost=0.00..9565.97 rows=541 width=788)
         ->  Nested Loop  (cost=0.00..7929.22 rows=541 width=123)
               ->  Nested Loop  (cost=0.00..6292.48 rows=541 width=107)
                     ->  Nested Loop  (cost=0.00..4647.22 rows=544 width=91)
                           ->  Index Scan using monitorx_id_index on
monitorx  (cost=0.00..5.36 rows=1 width=8)
                                 Index Cond: (idx =
'M-TEST_1444-TEST_00_10560561260561463219352'::character varying)
                           ->  Index Scan using monitorstatusxmonitori on
monitorstatusx ms  (cost=0.00..4630.29 rows=926 width=83)
                                 Index Cond: ("outer".jdoidx = ms.monitorx)
                                 Filter: ((datex >= '2003-06-20
08:57:21.36'::timestamp without time zone) AND (datex <= '2003-06-29
08:57:21.36'::timestamp without time zone))
                     ->  Index Scan using monitorstatus_stjdoidb742c9b3i on
monitorstatus_statusitemsx  (cost=0.00..3.01 rows=1 width=16)
                           Index Cond: ("outer".jdoidx =
monitorstatus_statusitemsx.jdoidx)
               ->  Index Scan using monitorstatusitejdoid7db0befci on
monitorstatusitemlistd8ea58a5x  (cost=0.00..3.01 rows=1 width=16)
                     Index Cond: ("outer".statusitemsx =
monitorstatusitemlistd8ea58a5x.jdoidx)
         ->  Index Scan using monitorstatusitemx_pkey on monitorstatusitemx
msi  (cost=0.00..3.01 rows=1 width=665)
               Index Cond: ("outer".statusitemlistx = msi.jdoidx)
(17 rows)

veriguard=#



Re: How to optimize monstrous query, sorts instead of using index

From
SZUCS Gábor
Date:
Michael,

This whole query looks like a mess to me. Since I don't know the exact model
and the table stats, I don't even try to rewrite your query, however, here
are the weak points I can think of:

* as Rod pointed out, there are more tables in WHERE that aren't in FROM.
This can be a bug, but the very least, it makes the query far less readable.
These are:

    monitorx
    monitorstatus_statusitemsx.jdoidx
    monitorstatusitemlistd8ea58a5x.jdoidx

* there are 3 index scans that basically steal your time.
They are 1.6..3.5 ms x 625 ~ 1..2 sec each (or I'm reading exp ana wrong,
I'm not an expert indeed):

    - Index Scan using monitorstatus_stjdoidb742c9b3i on
      monitorstatus_statusitemsx
      (cost=0.00..3.01 rows=1 width=16)
      (actual time=2.51..2.51 rows=1 loops=625)
      Index Cond: ("outer".jdoidx = monitorstatus_statusitemsx.jdoidx)
    - Index Scan using monitorstatusitejdoid7db0befci on
      monitorstatusitemlistd8ea58a5x
      (cost=0.00..3.01 rows=1 width=16)
      (actual time=1.68..1.69 rows=1 loops=623)
      Index Cond: ("outer".statusitemsx =
monitorstatusitemlistd8ea58a5x.jdoidx)
    - Index Scan using monitorstatusitemx_pkey on monitorstatusitemx msi
      (cost=0.00..3.01 rows=1 width=665)
      (actual time=3.50..3.50 rows=1 loops=623)
      Index Cond: ("outer".statusitemlistx = msi.jdoidx)

* another killer index: I think this one takes about the rest of the time
(i.e. 3-4 secs):

    ->  Index Scan using monitorstatusxmonitori on monitorstatusx ms
        (cost=0.00..4695.65 rows=880 width=83)
        (actual time=40.17..1868.12 rows=625 loops=1)
        Index Cond: ("outer".jdoidx = ms.monitorx)
        Filter: ((datex >= '2003-06-20 08:57:21.36'::timestamp without time
zone)
          AND (datex <= '2003-06-29 08:57:21.36'::timestamp without time
zone))

Since the number of rows probably can't be reduced (as I read it, the query
actually returned that many rows), I'd think about clever joins in the FROM
part and fewer tables, to use fewer index scans.

Finally, decided to do an ad-hoc adjustment. Try this, or (wild guess) try
to completely eliminate the WHERE part by subselects on ms and monitorx.

This may be faster, slower, or even give different results, based on whether
I guessed the 1:N relationships right or not.

G.
------------------------------- cut here -------------------------------
select
    ms.averageconnecttimex as ms_averageconnecttime,
    ms.averagedurationx as ms_averageduration,
    ms.datex as ms_date, ms.idx as ms_id,
    ms.statusstringx as ms_statusstring, ms.statusx as ms_status,

    msi.actualcontentx as msi_actualcontent, msi.connecttimex as
    msi_connecttime, msi.correctcontentx as msi_correctcontent, msi.datex as
    msi_date, msi.descriptionx as msi_description, msi.durationx as
    msi_duration, msi.errorcontentx as msi_errorcontent, msi.idx as msi_id,
    msi.monitorlocationx as msi_monitorlocation, msi.statusstringx as
    msi_statusstring, msi.statusx as msi_status

from monitorstatusx ms
     LEFT JOIN monitorx ON (monitorx.jdoidx = ms.monitorx)
     LEFT JOIN monitorstatus_statusitemsx ms_si ON (ms.jdoidx =
ms_si.jdoidx)
     LEFT JOIN monitorstatusitemlistd8ea58a5x msil ON
        (ms_si.statusitemsx = msil.jdoidx)
     LEFT JOIN monitorstatusitemx msi ON (msil.statusitemlistx = msi.jdoidx)
where
    monitorx.idx = 'M-TEST_1444-TEST_00_10560561260561463219352'
    ms.datex >= '2003-06-20 08:57:21.36' AND ms.datex <= '2003-06-29
08:57:21.36'
------------------------------- cut here -------------------------------


Re: How to optimize monstrous query, sorts instead of

From
Rod Taylor
Date:
> I didn't notice that before, thanks for pointing that out.  I just tried
> adding monitorx.idx to the select and it ended up making my query take
> several minutes long.  Any ideas how I can fix this and keep my performance?

By using it aliased and non-aliased (2 different references to the same
table) you've caused it to join itself.

Try this:

SELECT m.idx
     , ms.averageconnecttimex AS ms_averageconnecttime
     , ms.averagedurationx AS ms_averageduration
     , ms.datex AS ms_date
     , ms.idx AS ms_id
     , ms.statusstringx AS ms_statusstring
     , ms.statusx AS ms_status
     , msi.actualcontentx AS msi_actualcontent
     , msi.connecttimex AS msi_connecttime
     , msi.correctcontentx AS msi_correctcontent
     , msi.datex AS msi_date
     , msi.descriptionx AS msi_description
     , msi.durationx AS msi_duration
     , msi.errorcontentx AS msi_errorcontent
     , msi.idx AS msi_id
     , msi.monitorlocationx AS msi_monitorlocation
     , msi.statusstringx AS msi_statusstring
     , msi.statusx AS msi_status

  FROM monitorstatusx AS ms
     , monitorstatusitemx AS msi

     , monitorx AS mx
     , monitorstatus_statusitemsx AS mssisx
     , monitorstatusitemlistd8ea58a5x AS litem

 WHERE ms.jdoidx = mssisx.jdoidx
   AND mssisx.statusitemsx = litem.jdoidx
   AND litem.statusitemlistx = msi.jdoidx
   AND mx.jdoidx = ms.monitorx
   AND ms.datex BETWEEN '2003-06-20 08:57:21.36'
                    AND '2003-06-29 08:57:21.36'
   AND m.idx = 'M-TEST_1444-TEST_00_10560561260561463219352'

ORDER BY ms.datex DESC;

--
Rod Taylor <rbt@rbt.ca>

PGP Key: http://www.rbt.ca/rbtpub.asc

Attachment

Re: How to optimize monstrous query, sorts instead of

From
SZUCS Gábor
Date:
Michael,

Actually, you missed an alias :) the select now returned 800k rows!
(according to explain)

pointed it out below. See my prev mail for more.

If it's possible, try your query on a backend and look for notices like
"Adding missing FROM clause for table ..."

G.
------------------------------- cut here -------------------------------
----- Original Message -----
From: "Michael Mattox" <michael.mattox@verideon.com>
Cc: "Postgresql Performance" <pgsql-performance@postgresql.org>
Sent: Wednesday, June 25, 2003 4:09 PM


> from monitorx m, monitorstatusx ms, monitorstatusitemx msi
> where m.idx = 'M-TEST_1444-TEST_00_10560561260561463219352' AND
> monitorx.jdoidx = ms.monitorx AND
  ^^^^^^^^
  substitute the same alias "m" here.



Re: How to optimize monstrous query, sorts instead of using index

From
"Michael Mattox"
Date:
> Finally, decided to do an ad-hoc adjustment. Try this, or (wild guess) try
> to completely eliminate the WHERE part by subselects on ms and monitorx.
>
> This may be faster, slower, or even give different results, based
> on whether
> I guessed the 1:N relationships right or not.

It's much slower but I appreciate you taking the time to try.  I'm pretty
new to SQL so I must admin this query is very confusing for me.  I'm using
Java Data Objects (JDO, an O/R mapping framework) but the implementation I'm
using (Kodo) isn't smart enough to do all the joins efficiently, which is
why I had to rewrite this query by hand.

Here's the output:

veriguard=# explain select ms.averageconnecttimex as ms_averageconnecttime,
ms.averagedurationx as ms_averageduration, ms.datex as ms_date, ms.idx as
ms_id, ms.statusstringx as ms_statusstring, ms.statusx as ms_status,
msi.actualcontentx as msi_actualcontent, msi.connecttimex as
msi_connecttime, msi.correctcontentx as msi_correctcontent, msi.datex as
msi_date, msi.descriptionx as msi_description, msi.durationx as
msi_duration, msi.errorcontentx as msi_errorcontent, msi.idx as msi_id,
msi.monitorlocationx as msi_monitorlocation, msi.statusstringx as
msi_statusstring, msi.statusx as msi_status from monitorstatusx ms LEFT JOIN
monitorx ON (monitorx.jdoidx = ms.monitorx) LEFT JOIN
monitorstatus_statusitemsx ms_si ON (ms.jdoidx = ms_si.jdoidx) LEFT JOIN
monitorstatusitemlistd8ea58a5x msil ON (ms_si.statusitemsx = msil.jdoidx)
LEFT JOIN monitorstatusitemx msi ON (msil.statusitemlistx = msi.jdoidx)
where     monitorx.idx = 'M-TEST_1444-TEST_00_10560561260561463219352' AND
ms.datex >= '2003-06-20 08:57:21.36' AND ms.datex <= '2003-06-29
08:57:21.36';

QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------------------
-------------------------------------
 Merge Join  (cost=1006209.47..1283529.68 rows=751715 width=826)
   Merge Cond: ("outer".jdoidx = "inner".statusitemlistx)
   ->  Index Scan using monitorstatusitemx_pkey on monitorstatusitemx msi
(cost=0.00..247679.64 rows=6595427 width=665)
   ->  Sort  (cost=1006209.47..1008088.76 rows=751715 width=161)
         Sort Key: msil.statusitemlistx
         ->  Merge Join  (cost=697910.17..864079.59 rows=751715 width=161)
               Merge Cond: ("outer".jdoidx = "inner".statusitemsx)
               ->  Index Scan using monitorstatusitejdoid7db0befci on
monitorstatusitemlistd8ea58a5x msil  (cost=0.00..136564.80 rows=6595427
width=16)
               ->  Sort  (cost=697910.17..699789.46 rows=751715 width=145)
                     Sort Key: ms_si.statusitemsx
                     ->  Merge Join  (cost=385727.49..561594.96 rows=751715
width=145)
                           Merge Cond: ("outer".jdoidx = "inner".jdoidx)
                           ->  Index Scan using
monitorstatus_stjdoidb742c9b3i on monitorstatus_statusitemsx ms_si
(cost=0.00..146268.80 rows=6595427 width=16)
                           ->  Sort  (cost=385727.49..387606.78 rows=751715
width=129)
                                 Sort Key: ms.jdoidx
                                 ->  Hash Join  (cost=155.66..255240.65
rows=751715 width=129)
                                       Hash Cond: ("outer".monitorx =
"inner".jdoidx)
                                       Filter: ("inner".idx =
'M-TEST_1444-TEST_00_10560561260561463219352'::character varying)
                                       ->  Seq Scan on monitorstatusx ms
(cost=0.00..240050.69 rows=751715 width=83)
                                             Filter: ((datex >= '2003-06-20
08:57:21.36'::timestamp without time zone) AND (datex <= '2003-06-29
08:57:21.36'::timestamp without time zone))
                                       ->  Hash  (cost=151.93..151.93
rows=1493 width=46)
                                             ->  Seq Scan on monitorx
(cost=0.00..151.93 rows=1493 width=46)
(22 rows)

veriguard=#




Re: How to optimize monstrous query, sorts instead of

From
"Michael Mattox"
Date:
With a slight correction (you had m & mx so I changed them to be all mx, I
hope this is what you intended) this query works.  It's exactly the same
speed, but it doesn't give me the warnings I was getting:

NOTICE:  Adding missing FROM-clause entry for table "monitorx"
NOTICE:  Adding missing FROM-clause entry for table
"monitorstatus_statusitemsx"
NOTICE:  Adding missing FROM-clause entry for table
"monitorstatusitemlistd8ea58a5x"

I never knew what those were from, I even searched Google trying to find out
and I couldn't understand it so I gave up.  Thanks for pointing this out for
me, and thanks for fixing my query.

Michael


> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org
> [mailto:pgsql-performance-owner@postgresql.org]On Behalf Of Rod Taylor
> Sent: Wednesday, June 25, 2003 4:28 PM
> To: michael.mattox@verideon.com
> Cc: Postgresql Performance
> Subject: Re: [PERFORM] How to optimize monstrous query, sorts instead of
>
>
> > I didn't notice that before, thanks for pointing that out.  I just tried
> > adding monitorx.idx to the select and it ended up making my query take
> > several minutes long.  Any ideas how I can fix this and keep my
> performance?
>
> By using it aliased and non-aliased (2 different references to the same
> table) you've caused it to join itself.
>
> Try this:
>
> SELECT m.idx
>      , ms.averageconnecttimex AS ms_averageconnecttime
>      , ms.averagedurationx AS ms_averageduration
>      , ms.datex AS ms_date
>      , ms.idx AS ms_id
>      , ms.statusstringx AS ms_statusstring
>      , ms.statusx AS ms_status
>      , msi.actualcontentx AS msi_actualcontent
>      , msi.connecttimex AS msi_connecttime
>      , msi.correctcontentx AS msi_correctcontent
>      , msi.datex AS msi_date
>      , msi.descriptionx AS msi_description
>      , msi.durationx AS msi_duration
>      , msi.errorcontentx AS msi_errorcontent
>      , msi.idx AS msi_id
>      , msi.monitorlocationx AS msi_monitorlocation
>      , msi.statusstringx AS msi_statusstring
>      , msi.statusx AS msi_status
>
>   FROM monitorstatusx AS ms
>      , monitorstatusitemx AS msi
>
>      , monitorx AS mx
>      , monitorstatus_statusitemsx AS mssisx
>      , monitorstatusitemlistd8ea58a5x AS litem
>
>  WHERE ms.jdoidx = mssisx.jdoidx
>    AND mssisx.statusitemsx = litem.jdoidx
>    AND litem.statusitemlistx = msi.jdoidx
>    AND mx.jdoidx = ms.monitorx
>    AND ms.datex BETWEEN '2003-06-20 08:57:21.36'
>                     AND '2003-06-29 08:57:21.36'
>    AND m.idx = 'M-TEST_1444-TEST_00_10560561260561463219352'
>
> ORDER BY ms.datex DESC;
>
> --
> Rod Taylor <rbt@rbt.ca>
>
> PGP Key: http://www.rbt.ca/rbtpub.asc
>



Re: How to optimize monstrous query, sorts instead of

From
Tom Lane
Date:
Rod Taylor <rbt@rbt.ca> writes:
>> monitorstatusx_datex_monitorx_index on monitorstatusx ms
>> (cost=3D0.00..1159.33 rows=3D890 width=3D83) (actual time=3D0.19..1287.02=
>  rows=3D628
>> loops=3D1)
>> Index Cond: (("outer".jdoidx =3D ms.moni=
> torx)
>> AND (ms.datex >=3D '2003-06-20 08:57:21.36'::timestamp without time zone)=
>  AND
>> (ms.datex <=3D '2003-06-29 08:57:21.36'::timestamp without time zone))

> You can see that it used the new multi-key index for both items, rather
> than finding for monitorx, then filtering out unwanted results by datex.

What is the column ordering of the combined index?  Unless datex is the
first column, there is no chance of using it to create the required sort
order anyway.  I think this index condition is suggesting that monitorx
is the first column.

However, I agree with Rod's point that "avoid the sort" is not the
mindset to use to optimize this query. The joins are the problem.
You might try forcing different join types (see enable_nestloop and
friends) to get an idea of whether a different plan is likely to help.

            regards, tom lane

Re: How to optimize monstrous query, sorts instead of using index

From
Tom Lane
Date:
"Michael Mattox" <michael.mattox@verideon.com> writes:
> It's much slower but I appreciate you taking the time to try.  I'm pretty
> new to SQL so I must admin this query is very confusing for me.  I'm using
> Java Data Objects (JDO, an O/R mapping framework) but the implementation I'm
> using (Kodo) isn't smart enough to do all the joins efficiently, which is
> why I had to rewrite this query by hand.

It wasn't till I read that :-( that I noticed that you were doing nested
left joins.  Fooling with the join order may be your best route to a
solution --- have you read
http://www.postgresql.org/docs/view.php?version=7.3&idoc=0&file=explicit-joins.html

            regards, tom lane

Re: How to optimize monstrous query, sorts instead of

From
"Michael Mattox"
Date:
> Try this:

Rod, you improved my query last week (thank you very much) but I'm not sure
why but my performance is getting worse.  I think I know what happened, when
I did my load testing I created data that all had the same date, so sorting
on the date was very fast.  But now I've been running the system for a few
weeks I've got a range of dates and now the sort is very costly.  I'm
curious if it's possible to optimize this with an index?  I've tried
creating some indexes but they're never used.

explain analyze SELECT mx.idx , ms.averageconnecttimex AS
ms_averageconnecttime , ms.averagedurationx AS ms_averageduration , ms.datex
AS ms_date , ms.idx AS ms_id , ms.statusstringx AS ms_statusstring ,
ms.statusx AS ms_status , msi.actualcontentx AS msi_actualcontent ,
msi.connecttimex AS msi_connecttime , msi.correctcontentx AS
msi_correctcontent , msi.datex AS msi_date , msi.descriptionx AS
msi_description , msi.durationx AS msi_duration , msi.errorcontentx AS
msi_errorcontent , msi.idx AS msi_id , msi.monitorlocationx AS
msi_monitorlocation , msi.statusstringx AS msi_statusstring , msi.statusx AS
msi_status FROM monitorstatusx AS ms , monitorstatusitemx AS msi , monitorx
AS mx , monitorstatus_statusitemsx AS mssisx ,
monitorstatusitemlistd8ea58a5x AS litem WHERE ms.jdoidx = mssisx.jdoidx AND
mssisx.statusitemsx = litem.jdoidx AND litem.statusitemlistx = msi.jdoidx
AND mx.jdoidx = ms.monitorx AND ms.datex BETWEEN '2003-07-01
00:00:00.000000+01' AND '2003-07-01 23:59:59.000000+01' AND mx.idx =
'M-TEST_150-TEST_01_10560776551771895174239' ORDER BY ms.datex DESC;


QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------------------
-----------------------------------
 Sort  (cost=6882.84..6883.08 rows=97 width=827) (actual
time=16712.46..16712.65 rows=225 loops=1)
   Sort Key: ms.datex
   ->  Nested Loop  (cost=0.00..6879.66 rows=97 width=827) (actual
time=4413.12..16711.62 rows=225 loops=1)
         ->  Nested Loop  (cost=0.00..6587.53 rows=97 width=162) (actual
time=4406.06..15941.16 rows=225 loops=1)
               ->  Nested Loop  (cost=0.00..6295.38 rows=97 width=146)
(actual time=4383.59..15424.96 rows=225 loops=1)
                     ->  Nested Loop  (cost=0.00..6003.22 rows=97 width=130)
(actual time=4383.53..14938.02 rows=225 loops=1)
                           ->  Index Scan using monitorx_id_index on
monitorx mx  (cost=0.00..5.01 rows=1 width=46) (actual time=0.13..0.21
rows=1 loops=1)
                                 Index Cond: (idx =
'M-TEST_150-TEST_01_10560776551771895174239'::character varying)
                           ->  Index Scan using monitorstatusxmonitori on
monitorstatusx ms  (cost=0.00..5996.18 rows=163 width=84) (actual
time=4383.38..14936.39 rows=225 loops=1)
                                 Index Cond: ("outer".jdoidx = ms.monitorx)
                                 Filter: ((datex >= '2003-07-01
00:00:00'::timestamp without time zone) AND (datex <= '2003-07-01
23:59:59'::timestamp without time zone))
                     ->  Index Scan using monitorstatus_stjdoidb742c9b3i on
monitorstatus_statusitemsx mssisx  (cost=0.00..3.01 rows=1 width=16) (actual
time=2.15..2.15 rows=1 loops=225)
                           Index Cond: ("outer".jdoidx = mssisx.jdoidx)
               ->  Index Scan using monitorstatusitejdoid7db0befci on
monitorstatusitemlistd8ea58a5x litem  (cost=0.00..3.01 rows=1 width=16)
(actual time=2.28..2.28 rows=1 loops=225)
                     Index Cond: ("outer".statusitemsx = litem.jdoidx)
         ->  Index Scan using monitorstatusitemx_pkey on monitorstatusitemx
msi  (cost=0.00..3.01 rows=1 width=665) (actual time=3.41..3.41 rows=1
loops=225)
               Index Cond: ("outer".statusitemlistx = msi.jdoidx)
 Total runtime: 16713.25 msec
(18 rows)

As you can see it takes 16 seconds to return only 18 rows.  The
monitorstatusx table has over 7 million rows, and for each monitor status
there's one row in each of the monitorstatusitemx and the join tables.  So I
think the size of the database is just too high for this sort.  I run my
reports offline, but what I'm finding is that at 16 seconds per report, the
reports aren't finished by morning.  My postgresql.conf is attached in case
I have it configured incorrectly.

Thanks,
Michael



Re: How to optimize monstrous query, sorts instead of

From
"Michael Mattox"
Date:
> My postgresql.conf is attached in case I have it configured incorrectly.

Forgot my postgres.conf..


Attachment

Re: How to optimize monstrous query, sorts instead of

From
Rod Taylor
Date:
On Wed, 2003-07-02 at 10:28, Michael Mattox wrote:
> > My postgresql.conf is attached in case I have it configured incorrectly.
>
> Forgot my postgres.conf..

Shared buffers is probably too high.  How much memory in this machine?
Is there anything else running aside from PostgreSQL?  What does top say
about cached / buffered data (number)

I see you reduced the random_page_cost to 1.5.  Why did you do this (how
is your disk subsystem configured)?

--
Rod Taylor <rbt@rbt.ca>

PGP Key: http://www.rbt.ca/rbtpub.asc

Attachment

Re: How to optimize monstrous query, sorts instead of

From
Rod Taylor
Date:
On Wed, 2003-07-02 at 10:24, Michael Mattox wrote:
> > Try this:
>
> Rod, you improved my query last week (thank you very much) but I'm not sure
> why but my performance is getting worse.  I think I know what happened, when
> I did my load testing I created data that all had the same date, so sorting
> on the date was very fast.  But now I've been running the system for a few
> weeks I've got a range of dates and now the sort is very costly.  I'm
> curious if it's possible to optimize this with an index?  I've tried
> creating some indexes but they're never used.

Standard questions, did you VACUUM? Regularly?  Want to try again and
send us the output from VACUUM VERBOSE?

Sounds like you created a ton of test data, then removed a bunch?  Did
you REINDEX that table?

During normal use, what is your query spread like?  Mostly selects with
some inserts?  Any updates or deletes?  How often to updates or deletes
come in, and how many rows do they effect?

>                            ->  Index Scan using monitorstatusxmonitori on
> monitorstatusx ms  (cost=0.00..5996.18 rows=163 width=84) (actual
> time=4383.38..14936.39 rows=225 loops=1)
>                                  Index Cond: ("outer".jdoidx = ms.monitorx)
>                                  Filter: ((datex >= '2003-07-01
> 00:00:00'::timestamp without time zone) AND (datex <= '2003-07-01
> 23:59:59'::timestamp without time zone))

The above index scan is taking a vast majority of the time (nearly 15
seconds of the 16 second total -- stop thinking about sorts!)..  What
happened to the index on monitorx and datex?

--
Rod Taylor <rbt@rbt.ca>

PGP Key: http://www.rbt.ca/rbtpub.asc

Attachment

Re: How to optimize monstrous query, sorts instead of

From
"Michael Mattox"
Date:
> Shared buffers is probably too high.  How much memory in this machine?
> Is there anything else running aside from PostgreSQL?  What does top say
> about cached / buffered data (number)

I was using the 25% of RAM guideline posted recently.  The machine has
1.5gig but it also has a couple other java applications running on it
including tomcat.

  1:56pm  up 6 days,  2:58,  6 users,  load average: 2.60, 2.07, 1.78
193 processes: 191 sleeping, 2 running, 0 zombie, 0 stopped
CPU0 states: 14.0% user,  9.0% system,  0.0% nice, 75.1% idle
CPU1 states: 31.0% user,  0.1% system,  0.0% nice, 67.0% idle
CPU2 states:  5.0% user,  0.1% system,  0.1% nice, 93.0% idle
CPU3 states:  0.0% user,  0.1% system,  0.1% nice, 98.0% idle
Mem:  1547572K av, 1537848K used,    9724K free,       0K shrd,   25104K
buff
Swap: 1044216K av,   51352K used,  992864K free                 1245460K
cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
10184 veriguar   9   0 94760  83M  2612 S    36.5  5.5  13:29 java
 8990 postgres  17   0 54864  53M 53096 R    11.5  3.5   0:00 postmaster
 8988 veriguar  11   0  1164 1164   836 R     1.9  0.0   0:00 top
10161 veriguar  13   5 69504  60M  2600 S N   0.9  3.9  13:11 java
10206 veriguar  13   5 27952  23M  2580 S N   0.9  1.5   7:21 java
10699 postgres   9   0 31656  30M 30396 S     0.9  2.0   0:02 postmaster

             total       used       free     shared    buffers     cached
Mem:       1547572    1532024      15548          0      23820    1239024
-/+ buffers/cache:     269180    1278392
Swap:      1044216      51368     992848

> I see you reduced the random_page_cost to 1.5.  Why did you do this (how
> is your disk subsystem configured)?

Someone suggested I lower it to 1.5 or 1.0, not sure what the reasoning was.
The disks are both SCSI 10,000 RPM.  My data directory is on one disk by
itself, and the pg_xlog is on the other disk as well as the operating system
and everything else.  I was told it's best to have them on seperate disks,
however I'm wondering because my system only has two disks and the one with
the operating system isn't big enough to hold my database therefore I must
put my DB on the 2nd disk and if pg_xlog is to be separate, it has to be
with the OS & Java apps.

> Standard questions, did you VACUUM? Regularly?  Want to try again and
> send us the output from VACUUM VERBOSE?

I vacuum the monitor table every 5 minutes and I do a vacuum full analyze
every night at midnight (cron job).
I just did a vacuum verbose, output is attached.

> Sounds like you created a ton of test data, then removed a bunch?  Did
> you REINDEX that table?

I haven't deleted any of the data, I've been continuously adding new data.
I added about 6 million rows at once, and they all had the same date.  Since
then my application has been stress testing over about 2 weeks now so
there's now 7693057 rows in monitorstatusx and monitorstatusitemx as well as
the necessary rows for the join tables.

> During normal use, what is your query spread like?  Mostly selects with
> some inserts?  Any updates or deletes?  How often to updates or deletes
> come in, and how many rows do they effect?

There is a query on monitorx by datex every 10 seconds (monitors are updated
every 5 minutes, so every 10 seconds I get the monitors that are due for an
update).  Each monitor is then saved with its status field modified, and a
new status item is inserted.  This happens every 5 minutes.  There are 8-16
monitors being run in parallel, although typically it's 8 or less.  This is
the main application.  The reporting application does a few queries but
nothing major except the query that is the subject of this email.  It's the
reporting app that is slow due to this one big query.  Finally the web app
executes the same query as the reporting app, except there is a lot less
data to be returned since it's only for the current day.

> >                            ->  Index Scan using
> monitorstatusxmonitori on
> > monitorstatusx ms  (cost=0.00..5996.18 rows=163 width=84) (actual
> > time=4383.38..14936.39 rows=225 loops=1)
> >                                  Index Cond: ("outer".jdoidx =
> ms.monitorx)
> >                                  Filter: ((datex >= '2003-07-01
> > 00:00:00'::timestamp without time zone) AND (datex <= '2003-07-01
> > 23:59:59'::timestamp without time zone))
>
> The above index scan is taking a vast majority of the time (nearly 15
> seconds of the 16 second total -- stop thinking about sorts!)..  What
> happened to the index on monitorx and datex?

I just did

reindex table monitorstatux;

which didn't help, in fact query times went up.  I then did

create index monitorstatus_monitor_date_i on monitorstatusx(monitorx,
datex);

and this seemed to help a little:


QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------------------------------------------------------
 Sort  (cost=1133.13..1133.38 rows=98 width=827) (actual
time=9754.06..9754.25 rows=226 loops=1)
   Sort Key: ms.datex
   ->  Nested Loop  (cost=0.00..1129.90 rows=98 width=827) (actual
time=50.81..9753.17 rows=226 loops=1)
         ->  Nested Loop  (cost=0.00..833.47 rows=98 width=162) (actual
time=50.74..7149.28 rows=226 loops=1)
               ->  Nested Loop  (cost=0.00..537.04 rows=98 width=146)
(actual time=50.67..4774.45 rows=226 loops=1)
                     ->  Nested Loop  (cost=0.00..240.44 rows=98 width=130)
(actual time=50.61..1515.10 rows=226 loops=1)
                           ->  Index Scan using monitorx_id_index on
monitorx mx  (cost=0.00..3.45 rows=1 width=46) (actual time=0.09..0.11
rows=1 loops=1)
                                 Index Cond: (idx =
'M-TEST_170-TEST_00_10560857890510173779233'::character varying)
                           ->  Index Scan using monitorstatus_monitor_date_i
on monitorstatusx ms  (cost=0.00..234.93 rows=165 width=84) (actual
time=50.51..1513.21 rows=226 loops=1)
                                 Index Cond: (("outer".jdoidx = ms.monitorx)
AND (ms.datex >= '2003-07-01 00:00:00'::timestamp without time zone) AND
(ms.datex <= '2003-07-01 23:59:59'::timestamp without time zone))
                     ->  Index Scan using monitorstatus_stjdoidb742c9b3i on
monitorstatus_statusitemsx mssisx  (cost=0.00..3.01 rows=1 width=16) (actual
time=14.40..14.41 rows=1 loops=226)
                           Index Cond: ("outer".jdoidx = mssisx.jdoidx)
               ->  Index Scan using monitorstatusitejdoid7db0befci on
monitorstatusitemlistd8ea58a5x litem  (cost=0.00..3.01 rows=1 width=16)
(actual time=10.49..10.49 rows=1 loops=226)
                     Index Cond: ("outer".statusitemsx = litem.jdoidx)
         ->  Index Scan using monitorstatusitemx_pkey on monitorstatusitemx
msi  (cost=0.00..3.01 rows=1 width=665) (actual time=11.50..11.50 rows=1
loops=226)
               Index Cond: ("outer".statusitemlistx = msi.jdoidx)
 Total runtime: 9754.64 msec
(17 rows)

Before I guess the index with monitorx,datex didn't do much because all the
data had the same date.  But now that I have over 2 weeks of real data, it
makes a difference.

Thanks,
Michael

Attachment