Thread: slow query

slow query

From
Oleg Lebedev
Date:
Hi,
I am having problems with my master database now. It used to work
extremely good just two days ago, but then I started playing with
adding/dropping schemas and added another database and performance went
down.

I also have a replica database on the same server and when I run the
same query on it, it runs good.
Interestingly the planner statistics for this query are the same on the
master and replica databases.
However, query execution on the master database is about 4min. and on
the replica database is 6 sec.!

I VACUUM ANALYZED both databases and made sure they have same indexes on
the tables.
I don't know where else to look, but here are the schemas I have on the
master and replica database. The temp schemas must be the ones that I
created and then dropped.
master=# select * from pg_namespace;
nspname | nspowner | nspacl
------------+----------+--------
pg_catalog | 1 | {=U}
pg_toast | 1 | {=}
public | 1 | {=UC}
pg_temp_1 | 1 |
pg_temp_3 | 1 |
pg_temp_10 | 1 |
pg_temp_28 | 1 |
replica=> select * from pg_namespace;
nspname | nspowner | nspacl
------------+----------+--------
pg_catalog | 1 | {=U}
pg_toast | 1 | {=}
public | 1 | {=UC}
pg_temp_1 | 1 |
pg_temp_39 | 1 |
india | 105 |

Here is the query:
SELECT * FROM media m, speccharacter c
WHERE m.mediatype IN (SELECT objectid FROM mediatype WHERE
medianame='Audio')
AND m.mediachar = c.objectid
AND (m.activity='178746'
    OR
        (EXISTS (SELECT ism.objectid
        FROM intsetmedia ism, set s
        WHERE ism.set = s.objectid
        AND ism.media = m.objectid AND s.activity='178746' )
        )
    OR
        (EXISTS (SELECT dtrm.objectid
        FROM dtrowmedia dtrm, dtrow dtr, dtcol dtc, datatable dt

        WHERE dtrm.dtrow = dtr.objectid
        AND dtrm.media = m.objectid
        AND dtr.dtcol = dtc.objectid
        AND dtc.datatable = dt.objectid
        AND dt.activity = '178746')
        )
    )
ORDER BY medianame ASC, status DESC;

*************************************

This email may contain privileged or confidential material intended for the named recipient only.
If you are not the named recipient, delete this message and all attachments.
Any review, copying, printing, disclosure or other use is prohibited.
We reserve the right to monitor email sent through our network.

*************************************


Re: slow query

From
Tom Lane
Date:
Oleg Lebedev <oleg.lebedev@waterford.org> writes:
> Interestingly the planner statistics for this query are the same on the
> master and replica databases.
> However, query execution on the master database is about 4min. and on
> the replica database is 6 sec.!

What does EXPLAIN ANALYZE show in each case?

            regards, tom lane

Re: slow query

From
Josh Berkus
Date:
Oleg,

> I VACUUM ANALYZED both databases and made sure they have same indexes on
> the tables.

Have you VACUUM FULL the main database?  And how about REINDEX?

> Here is the query:
> SELECT * FROM media m, speccharacter c
> WHERE m.mediatype IN (SELECT objectid FROM mediatype WHERE
> medianame='Audio')

The above should use an EXISTS clause, not IN, unless you are absolutely sure
that the subquery will never return more than 12 rows.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: slow query

From
Ryan Bradetich
Date:
On Sun, 2003-02-23 at 13:52, Josh Berkus wrote:
> Oleg,
>
> > I VACUUM ANALYZED both databases and made sure they have same indexes on
> > the tables.
>
> Have you VACUUM FULL the main database?  And how about REINDEX?
>
> > Here is the query:
> > SELECT * FROM media m, speccharacter c
> > WHERE m.mediatype IN (SELECT objectid FROM mediatype WHERE
> > medianame='Audio')
>
> The above should use an EXISTS clause, not IN, unless you are absolutely sure
> that the subquery will never return more than 12 rows.

I am assuming you said this because EXISTS is faster for > 12 rows?
Interesting :)

thanks,

- Ryan


--
Ryan Bradetich <rbradetich@uswest.net>


Re: slow query

From
Josh Berkus
Date:
Ryan,

> > The above should use an EXISTS clause, not IN, unless you are absolutely
> > sure that the subquery will never return more than 12 rows.
>
> I am assuming you said this because EXISTS is faster for > 12 rows?
> Interesting :)

That's my rule of thumb, *NOT* any kind of relational-calculus-based truth.

Basically, one should only use IN for a subquery when one is *absolutely* sure
that the subquery will only return a handful of records, *and* the subquery
doesn't have to do an complex work like aggregating or custom function
evaluation.

You're safer using EXISTS for all subqueries, really.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: slow query

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
>> I am assuming you said this because EXISTS is faster for > 12 rows?

> That's my rule of thumb, *NOT* any kind of relational-calculus-based truth.

Keep in mind also that the tradeoffs will change quite a lot when PG 7.4
hits the streets, because the optimizer has gotten a lot smarter about
how to handle IN, but no smarter about EXISTS.  Here's one rather silly
example using CVS tip:

regression=# explain analyze select * from tenk1 a where
regression-# unique1 in (select hundred from tenk1 b);
                                                              QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=486.32..504.11 rows=100 width=248) (actual time=453.19..468.86 rows=100 loops=1)
   Merge Cond: ("outer".unique1 = "inner".hundred)
   ->  Index Scan using tenk1_unique1 on tenk1 a  (cost=0.00..1571.87 rows=10000 width=244) (actual time=0.12..5.25
rows=101loops=1) 
   ->  Sort  (cost=486.32..486.57 rows=100 width=4) (actual time=452.91..453.83 rows=100 loops=1)
         Sort Key: b.hundred
         ->  HashAggregate  (cost=483.00..483.00 rows=100 width=4) (actual time=447.59..449.80 rows=100 loops=1)
               ->  Seq Scan on tenk1 b  (cost=0.00..458.00 rows=10000 width=4) (actual time=0.06..276.47 rows=10000
loops=1)
 Total runtime: 472.06 msec
(8 rows)

regression=# explain analyze select * from tenk1 a where
regression-# exists (select 1 from tenk1 b where b.hundred = a.unique1);
                                                             QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on tenk1 a  (cost=0.00..35889.66 rows=5000 width=244) (actual time=3.69..1591.78 rows=100 loops=1)
   Filter: (subplan)
   SubPlan
     ->  Index Scan using tenk1_hundred on tenk1 b  (cost=0.00..354.32 rows=100 width=0) (actual time=0.10..0.10 rows=0
loops=10000)
           Index Cond: (hundred = $0)
 Total runtime: 1593.88 msec
(6 rows)

The EXISTS case takes about the same time in 7.3, but the IN case is off
the charts (I got bored of waiting after 25 minutes...)

            regards, tom lane

Re: slow query

From
Oleg Lebedev
Date:
Thanks everybody for your help.
VACUUM FULL did the job, and now the query performance is the same in
both databases. I am surprised that FULL option makes such a dramatic
change to the query performance: from 4min. to 5sec.!!! It also changed
planner stats from ~9 sec to ~8sec.
I haven't tried to REINDEX yet, though.
Regarding IN vs. EXISTS. The sub-query in the IN clause will always
return fewer records than 12. I tried using EXISTS instead of IN with
Postgres7.2.1 and it slowed down query performance. With postgres 7.3,
when I use EXISTS instead of IN the planner returns the same stats and
query performance does not improve. However, if I use
m.mediatype=(SELECT objectid FROM mediatype WHERE medianame='Audio')
the planner returns ~7 sec., which is the same as if I the query is
changed like this:
SELECT * FROM media m, speccharacter c, mediatype mt
WHERE mt.objectid=m.mediatype and mt.medianame='Audio'
...
So, using JOIN and =(SELECT ...) is better than using IN and EXISTS in
this case.
Thanks.

Oleg


-----Original Message-----
From: Josh Berkus [mailto:josh@agliodbs.com]
Sent: Sunday, February 23, 2003 1:53 PM
To: Oleg Lebedev; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] slow query
Importance: Low


Oleg,

> I VACUUM ANALYZED both databases and made sure they have same indexes
> on the tables.

Have you VACUUM FULL the main database?  And how about REINDEX?

> Here is the query:
> SELECT * FROM media m, speccharacter c
> WHERE m.mediatype IN (SELECT objectid FROM mediatype WHERE
> medianame='Audio')

The above should use an EXISTS clause, not IN, unless you are absolutely
sure
that the subquery will never return more than 12 rows.

--
Josh Berkus
Aglio Database Solutions
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster


*************************************

This email may contain privileged or confidential material intended for the named recipient only.
If you are not the named recipient, delete this message and all attachments.
Any review, copying, printing, disclosure or other use is prohibited.
We reserve the right to monitor email sent through our network.

*************************************


Re: slow query

From
Robert Treat
Date:
On Mon, 2003-02-24 at 10:59, Oleg Lebedev wrote:
> Thanks everybody for your help.
> VACUUM FULL did the job, and now the query performance is the same in
> both databases. I am surprised that FULL option makes such a dramatic
> change to the query performance: from 4min. to 5sec.!!! It also changed
> planner stats from ~9 sec to ~8sec.

If your seeing wildly dramatic improvments from vacuum full, you might
want to look into running regular vacuums more often (especially for
high turnover tables), increase your max_fsm_relations to 1000, and
increasing your max_fsm_pages.

Robert Treat



Re: slow query

From
Oleg Lebedev
Date:
I run VACUUM (not FULL though) every night, which I thought would be
enough for good query performance. Moreover, the data in table does not
really change significantly since database usage is pretty low right
now, therefore I thought that VACUUM FULL was an overkill.
I think that creating, populating and dropping schemas in the master
database could have affected the query performance and required VACUUM
FULL.
I will definitely look at max_fsm_relations and max_fsm_pages parameter
settings.
Thank you.

Oleg Lebedev



-----Original Message-----
From: Robert Treat [mailto:xzilla@users.sourceforge.net]
Sent: Monday, February 24, 2003 9:59 AM
To: Oleg Lebedev
Cc: Josh Berkus; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] slow query


On Mon, 2003-02-24 at 10:59, Oleg Lebedev wrote:
> Thanks everybody for your help.
> VACUUM FULL did the job, and now the query performance is the same in
> both databases. I am surprised that FULL option makes such a dramatic
> change to the query performance: from 4min. to 5sec.!!! It also
> changed planner stats from ~9 sec to ~8sec.

If your seeing wildly dramatic improvments from vacuum full, you might
want to look into running regular vacuums more often (especially for
high turnover tables), increase your max_fsm_relations to 1000, and
increasing your max_fsm_pages.

Robert Treat



*************************************

This email may contain privileged or confidential material intended for the named recipient only.
If you are not the named recipient, delete this message and all attachments.
Any review, copying, printing, disclosure or other use is prohibited.
We reserve the right to monitor email sent through our network.

*************************************


Re: slow query

From
Clarence Gardner
Date:
On 24 Feb 2003, Robert Treat wrote:

>
> If your seeing wildly dramatic improvments from vacuum full, you might
> want to look into running regular vacuums more often (especially for
> high turnover tables), increase your max_fsm_relations to 1000, and
> increasing your max_fsm_pages.

I don't know about the settings you mention, but a frequent vacuum
does not at all obviate a vacuum full. My database is vacuumed every
night, but a while ago I found that a vacuum full changed a simple
single-table query from well over 30 seconds to one or two. We now
do a vacuum full every night.



Re: slow query

From
Oleg Lebedev
Date:
What about REINDEXing and other DB tricks?
Do you use any of these regularly?
Thanks.

Oleg

-----Original Message-----
From: Clarence Gardner [mailto:clarence@silcom.com]
Sent: Monday, February 24, 2003 10:28 AM
To: Robert Treat
Cc: Oleg Lebedev; Josh Berkus; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] slow query


On 24 Feb 2003, Robert Treat wrote:

>
> If your seeing wildly dramatic improvments from vacuum full, you might

> want to look into running regular vacuums more often (especially for
> high turnover tables), increase your max_fsm_relations to 1000, and
> increasing your max_fsm_pages.

I don't know about the settings you mention, but a frequent vacuum does
not at all obviate a vacuum full. My database is vacuumed every night,
but a while ago I found that a vacuum full changed a simple single-table
query from well over 30 seconds to one or two. We now do a vacuum full
every night.



*************************************

This email may contain privileged or confidential material intended for the named recipient only.
If you are not the named recipient, delete this message and all attachments.
Any review, copying, printing, disclosure or other use is prohibited.
We reserve the right to monitor email sent through our network.

*************************************


Re: slow query

From
Robert Treat
Date:
On Mon, 2003-02-24 at 12:27, Clarence Gardner wrote:
> On 24 Feb 2003, Robert Treat wrote:
>
> >
> > If your seeing wildly dramatic improvments from vacuum full, you might
> > want to look into running regular vacuums more often (especially for
> > high turnover tables), increase your max_fsm_relations to 1000, and
> > increasing your max_fsm_pages.
>
> I don't know about the settings you mention, but a frequent vacuum
> does not at all obviate a vacuum full. My database is vacuumed every
> night, but a while ago I found that a vacuum full changed a simple
> single-table query from well over 30 seconds to one or two. We now
> do a vacuum full every night.
>

Actually if you are vacuuming frequently enough, it can (and should*)
obviate a vacuum full. Be aware that frequently enough might mean really
frequent, for instance I have several tables in my database that update
every row within a 15 minute timeframe, so I run a "lazy" vacuum on
these tables every 10 minutes. This allows postgresql to reuse the space
for these tables almost continuously so I never have to vacuum full
them.

(* this assumes your max_fsm_pages/relations settings are set correctly)

Robert Treat



Re: slow query

From
Josh Berkus
Date:
Robert,

> Actually if you are vacuuming frequently enough, it can (and should*)
> obviate a vacuum full. Be aware that frequently enough might mean really
> frequent, for instance I have several tables in my database that update
> every row within a 15 minute timeframe, so I run a "lazy" vacuum on
> these tables every 10 minutes. This allows postgresql to reuse the space
> for these tables almost continuously so I never have to vacuum full
> them.

This would assume absolutely perfect FSM settings, and that the DB never gets
thrown off by unexpected loads.  I have never been so fortunate as to work
with such a database.   However, I agree that good FSM tuning and frequent
regular VACUUMs can greatly extend the period required for running FULL.

I have not found, though, that this does anything to prevent the need for
REINDEX on frequently-updated tables.  How about you, Robert?

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: slow query

From
Rod Taylor
Date:
On Mon, 2003-02-24 at 12:27, Clarence Gardner wrote:
> On 24 Feb 2003, Robert Treat wrote:
>
> >
> > If your seeing wildly dramatic improvments from vacuum full, you might
> > want to look into running regular vacuums more often (especially for
> > high turnover tables), increase your max_fsm_relations to 1000, and
> > increasing your max_fsm_pages.
>
> I don't know about the settings you mention, but a frequent vacuum
> does not at all obviate a vacuum full. My database is vacuumed every
> night, but a while ago I found that a vacuum full changed a simple
> single-table query from well over 30 seconds to one or two. We now
> do a vacuum full every night.

Sure is does.  If your free-space-map (FSM) is up to date, tuples are
not appended to the end of the table, so table growth will not occur
(beyond a 'settling' point.

Unless you remove more data from the table than you ever expect to have
in the table again, this is enough.  That said, the instant the FSM is
empty, you're table starts to grow again and a VACUUM FULL will be
required to re-shrink it.

--
Rod Taylor <rbt@rbt.ca>

PGP Key: http://www.rbt.ca/rbtpub.asc

Attachment

Re: slow query

From
Andrew Sullivan
Date:
On Mon, Feb 24, 2003 at 09:27:56AM -0800, Clarence Gardner wrote:
>
> I don't know about the settings you mention, but a frequent vacuum
> does not at all obviate a vacuum full. My database is vacuumed every
> night, but a while ago I found that a vacuum full changed a simple
> single-table query from well over 30 seconds to one or two. We now
> do a vacuum full every night.

This probably means that either some of your FSM settings should be
different, or that you have long-running queries, or both.  _Some_
advantage to vacuum full is expected, but 30 seconds to one or two is
a pretty big deal.

Except in cases where a large percentage of the table is a vacuum
candidate, the standard vacuum should be more than adequate.  But
there are a couple of gotchas.

You need to have room in your free space map to hold information
about the bulk of the to-be-freed tables.  So perhaps your FSM
settings are not big enough, even though you tried to set them
higher.  (Of course, if you're replacing, say, more than half the
table, setting the FSM high enough isn't practical.)

Another possibility is that you have multiple long-running
transactions that are keeping non-blocking vacuum from being very
effective.  Since those transactions are possibly referencing old
versions of a row, when the non-blocking vacuum comes around, it just
skips the "dead" tuples which are nevertheless alive to someone.
(You can see the effect of this by using the contrib/pgstattuple
function).  Blocking vacuum doesn't have this problem, because it
just waits on the table until everything ahead of it has committed or
rolled back.  So you pay in wait time for all transactions during the
vacuum.  (I have encountered this very problem on a table which gets
a lot of update activity on just on just one row.  Even vacuuming
every minute, the table grew and grew, because of another misbehaving
application which was keeping a transaction open when it shouldn't
have.)

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: slow query

From
Robert Treat
Date:
On Mon, 2003-02-24 at 13:45, Josh Berkus wrote:
> Robert,
>
> > Actually if you are vacuuming frequently enough, it can (and should*)
> > obviate a vacuum full. Be aware that frequently enough might mean really
> > frequent, for instance I have several tables in my database that update
> > every row within a 15 minute timeframe, so I run a "lazy" vacuum on
> > these tables every 10 minutes. This allows postgresql to reuse the space
> > for these tables almost continuously so I never have to vacuum full
> > them.
>
> This would assume absolutely perfect FSM settings, and that the DB never gets
> thrown off by unexpected loads.  I have never been so fortunate as to work
> with such a database.   However, I agree that good FSM tuning and frequent
> regular VACUUMs can greatly extend the period required for running FULL.
>

It's somewhat relative. On one of my tables, it has about 600 rows, each
row gets updated within 15 minutes. I vacuum it every 10 minutes, which
should leave me with around 1000 tuples (dead and alive) for that table,
Even if something overload the updates on that table, chances are that I
wouldn't see enough of a performance drop to warrant a vacuum full. Of
course, its a small table, so YMMV.  I think the point is though that if
your running nightly vacuum fulls just to stay ahead of the game, your
not maintaining the database optimally.

> I have not found, though, that this does anything to prevent the need for
> REINDEX on frequently-updated tables.  How about you, Robert?
>

Well, this touches on a different topic. On those tables where I get
"index bloat", I do have to do REINDEX's. But that's not currently
solvable with vacuum (remember indexes dont even use FSM) though IIRC
Tom & Co. have done some work toward this for 7.4

Robert Treat


Re: slow query

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> ... However, I agree that good FSM tuning and frequent
> regular VACUUMs can greatly extend the period required for running FULL.

> I have not found, though, that this does anything to prevent the need for
> REINDEX on frequently-updated tables.  How about you, Robert?

As of 7.3, FSM doesn't have anything to do with indexes.  If you have
index bloat, it's because of the inherent inability of btree indexes to
reuse space when the data distribution changes over time.  (Portions of
the btree may become empty, but they aren't recycled.)  You'll
particularly get burnt by indexes that are on OIDs or sequentially
assigned ID numbers, since the set of IDs in use just naturally tends to
migrate higher over time.  I don't think that the update rate per se has
much to do with this, it's the insertion of new IDs and deletion of old
ones that causes the statistical shift.  The tree grows at the right
edge, but doesn't shrink at the left.

As of CVS tip, however, the situation is different ;-).  Btree indexes
will recycle space using FSM in 7.4.

            regards, tom lane

Re: slow query

From
Robert Treat
Date:
On Mon, 2003-02-24 at 15:50, Tom Lane wrote:
> You'll
> particularly get burnt by indexes that are on OIDs or sequentially
> assigned ID numbers, since the set of IDs in use just naturally tends to
> migrate higher over time.  I don't think that the update rate per se has
> much to do with this, it's the insertion of new IDs and deletion of old
> ones that causes the statistical shift.

Would it be safe to say that tables with high update rates where the
updates do not change the indexed value would not suffer from index
bloat? For example updates to non-index columns or updates that
overwrite, but don't change the value of indexed columns; do these even
need to touch the index?

Robert Treat


Re: slow query

From
Tom Lane
Date:
Robert Treat <xzilla@users.sourceforge.net> writes:
> Would it be safe to say that tables with high update rates where the
> updates do not change the indexed value would not suffer from index
> bloat?

I would expect not.  If you vacuum often enough to keep the main table
size under control, the index should stay under control too.

> For example updates to non-index columns or updates that
> overwrite, but don't change the value of indexed columns; do these even
> need to touch the index?

Yes, they do.  Think MVCC.

            regards, tom lane

Re: slow query

From
Peter Childs
Date:
On Tue, 25 Feb 2003, Tom Lane wrote:

> Robert Treat <xzilla@users.sourceforge.net> writes:
> > Would it be safe to say that tables with high update rates where the
> > updates do not change the indexed value would not suffer from index
> > bloat?
>
> I would expect not.  If you vacuum often enough to keep the main table
> size under control, the index should stay under control too.

    Yes and No, From what I can work out the index is a tree and if
many updates occus the tree can become unbalanced (eventually it get so
unbalanced its no better than a seq scan.... Its not big its just all the
data is all on one side of the tree. Which is why Reindexing is a good
plan. What is really needed is a quicker way of rebalancing the tree. So
the database notices when the index is unbalanced and picks a new root
node and hangs the old root to that. (Makes for a very intresting
algorithim if I remeber my University lectures....)
    Now I'm trying to sort out a very large static table that I've
just finished updating. I am beginning to think that the quickest way of
sorting it out is to dump and reload it. But I'm trying a do it in place
method. (Of Reindex it, vaccum full analyse) but what is the correct order
to do this in?

Reindex, Vaccum

or

Vaccum, Reindex.

Peter Childs

>
> > For example updates to non-index columns or updates that
> > overwrite, but don't change the value of indexed columns; do these even
> > need to touch the index?
>
> Yes, they do.  Think MVCC.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html
>