Thread: how does pg handle concurrent queries and same queries

how does pg handle concurrent queries and same queries

From
Faludi Gábor
Date:

Hi All,

 

I have taken over the maintenance of a server farm , recently. 2 webserver on  db server. They are quite powerful 2 processor xeon w/ 6Gig of ram .

 

Couple of days ago we had a serious performance hit and the db server (pg. v7.4) was overloaded w/ something in a way that operating system was almost not able to respond or in cases it did not.

 

 

After some analysis i suspect that there is a query that takes up to 1 second and that is the cause. Upon  each page loading this query fires and takes the one second and blocks the page to load completly . The load was roughly ~300 connections in one minute .

 

So here are my questions :

 

·         Why does the second and the later queries take the whole on second if the dataset is the same . Shouldn’t PG realise that the query is the same so i give the user the same resultset ?

·         How do I know if one query blocks the other ?

·         Is there a way to log the long running queries in 7.4 ? If not is it available in any newer version ?

 

thanks for your help !

 

Üdvözlettel/kind regards,

Faludi, Gábor

FITS Magyarország Kft.

http://www.FITS.hu

Tel.:+36 30 4945862

Email: falu@fits.hu

Ingyenes videó tanfolyamok(Excel,Access,Word) : http://www.fits.hu/trainings

 

Re: how does pg handle concurrent queries and same queries

From
"Claus Guttesen"
Date:
> I have taken over the maintenance of a server farm , recently. 2 webserver
> on  db server. They are quite powerful 2 processor xeon w/ 6Gig of ram .
>
> Couple of days ago we had a serious performance hit and the db server (pg.
> v7.4) was overloaded w/ something in a way that operating system was almost
> not able to respond or in cases it did not.
>
> After some analysis i suspect that there is a query that takes up to 1
> second and that is the cause. Upon  each page loading this query fires and
> takes the one second and blocks the page to load completly . The load was
> roughly ~300 connections in one minute .
>
> So here are my questions :
>
> ·         Why does the second and the later queries take the whole on second
> if the dataset is the same . Shouldn't PG realise that the query is the same
> so i give the user the same resultset ?
>
> ·         How do I know if one query blocks the other ?
>
> ·         Is there a way to log the long running queries in 7.4 ? If not is
> it available in any newer version ?

Can you post the queries? Can you provide an 'analyze explain'? Do you
perform a 'vacuum analyze' on a regular basis?

--
regards
Claus

When lenity and cruelty play for a kingdom,
the gentlest gamester is the soonest winner.

Shakespeare

Re: how does pg handle concurrent queries and same queries

From
Faludi Gábor
Date:
Hi,

here is what the original query was which was obviously nonsense :
EXPLAIN ANALYZE SELECT DISTINCT letoltes.cid, s.elofordulas FROM letoltes
INNER JOIN (select letoltes.cid, count(letoltes.cid) AS elofordulas FROM
letoltes GROUP BY cid) s ON s.cid=letoltes.cid ORDER BY s.elofordulas DESC
LIMIT 5;

QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------------------
------------
 Limit  (cost=73945.35..73945.65 rows=5 width=12) (actual
time=4191.396..4351.966 rows=5 loops=1)
   ->  Unique  (cost=73945.35..77427.99 rows=58800 width=12) (actual
time=4191.390..4351.956 rows=5 loops=1)
         ->  Sort  (cost=73945.35..75106.23 rows=464351 width=12) (actual
time=4191.386..4283.545 rows=175944 loops=1)
               Sort Key: s.elofordulas, letoltes.cid
               ->  Merge Join  (cost=9257.99..30238.65 rows=464351 width=12)
(actual time=652.535..2920.304 rows=464351 loops=1)
                     Merge Cond: ("outer".cid = "inner".cid)
                     ->  Index Scan using idx_letoltes_cid on letoltes
(cost=0.00..12854.51 rows=464351 width=4) (actual time=0.084..1270.588
rows=464351 loops=1)
                     ->  Sort  (cost=9257.99..9258.73 rows=294 width=12)
(actual time=652.434..810.941 rows=464176 loops=1)
                           Sort Key: s.cid
                           ->  Subquery Scan s  (cost=9242.26..9245.94
rows=294 width=12) (actual time=651.343..652.028 rows=373 loops=1)
                                 ->  HashAggregate  (cost=9242.26..9243.00
rows=294 width=4) (actual time=651.339..651.661 rows=373 loops=1)
                                       ->  Seq Scan on letoltes
(cost=0.00..6920.51 rows=464351 width=4) (actual time=0.014..307.469
rows=464351 loops=1)
 Total runtime: 4708.434 ms
(13 sor)

However after fixing the query this is 1/4 th of the time  but still blocks
the site :


EXPLAIN ANALYZE SELECT DISTINCT letoltes.cid, count(letoltes.cid)  AS
elofordulas FROM letoltes GROUP BY cid ORDER BY elofordulas DESC LIMIT 5;
                                                               QUERY PLAN
----------------------------------------------------------------------------
-------------------------------------------------------------
 Limit  (cost=9255.05..9255.09 rows=5 width=4) (actual time=604.734..604.743
rows=5 loops=1)
   ->  Unique  (cost=9255.05..9257.26 rows=294 width=4) (actual
time=604.732..604.737 rows=5 loops=1)
         ->  Sort  (cost=9255.05..9255.79 rows=294 width=4) (actual
time=604.730..604.732 rows=5 loops=1)
               Sort Key: count(cid), cid
               ->  HashAggregate  (cost=9242.26..9243.00 rows=294 width=4)
(actual time=604.109..604.417 rows=373 loops=1)
                     ->  Seq Scan on letoltes  (cost=0.00..6920.51
rows=464351 width=4) (actual time=0.022..281.413 rows=464351 loops=1)
 Total runtime: 604.811 ms


here is the table :
\d letoltes
                     TĂĄbla "public.letoltes"
 Oszlop | TĂ­pus  |                  MĂłdosĂ­tĂł
--------+---------+------------------------------------------------
 id     | integer | not null default nextval('letoltes_seq'::text)
 cid    | integer |
Indexes:
    "idx_letoltes_cid" btree (cid)
    "idx_letoltes_id" btree (id)

select count(1) from letoltes;
 count
--------
 464351


VACUM ANALYZE runs overnight every day.

thanks,
Gabor

-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Claus Guttesen
Sent: Monday, July 28, 2008 8:56 AM
To: Faludi Gábor
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] how does pg handle concurrent queries and same
queries

> I have taken over the maintenance of a server farm , recently. 2 webserver
> on  db server. They are quite powerful 2 processor xeon w/ 6Gig of ram .
>
> Couple of days ago we had a serious performance hit and the db server (pg.
> v7.4) was overloaded w/ something in a way that operating system was
almost
> not able to respond or in cases it did not.
>
> After some analysis i suspect that there is a query that takes up to 1
> second and that is the cause. Upon  each page loading this query fires and
> takes the one second and blocks the page to load completly . The load was
> roughly ~300 connections in one minute .
>
> So here are my questions :
>
> .         Why does the second and the later queries take the whole on
second
> if the dataset is the same . Shouldn't PG realise that the query is the
same
> so i give the user the same resultset ?
>
> .         How do I know if one query blocks the other ?
>
> .         Is there a way to log the long running queries in 7.4 ? If not
is
> it available in any newer version ?

Can you post the queries? Can you provide an 'analyze explain'? Do you
perform a 'vacuum analyze' on a regular basis?

--
regards
Claus

When lenity and cruelty play for a kingdom,
the gentlest gamester is the soonest winner.

Shakespeare

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

No virus found in this incoming message.
Checked by AVG - http://www.avg.com
Version: 8.0.138 / Virus Database: 270.5.6/1576 - Release Date: 2008.07.27.
16:16


Re: how does pg handle concurrent queries and same queries

From
Craig Ringer
Date:
Faludi Gábor wrote:

> .         Why does the second and the later queries take the whole on second
> if the dataset is the same . Shouldn't PG realise that the query is the same
> so i give the user the same resultset ?

That would require a result cache. I don't know if Pg even has a query
result cache - I don't think so, but I'm not sure. Even if it does, it'd
still only be useful if the queries were issued under *exactly* the same
conditions - in other words, no writes had been made to the database
since the cached query was issued, and the first query had committed
before the second began (or was read-only). Additionally, no volatile
functions could be called in the query, because their values/effects
might be different when the query is executed a second time. That
includes triggers, etc.

Since 7.4 doesn't do lazy xid allocation it can't really tell that
nothing has been changed since the previous query was cached. So, if I'm
not missing something here, a query result cache would be useless anyway.

> .         How do I know if one query blocks the other ?

Examination of pg_catalog.pg_locks is certainly a start. It's trickier
with lots of short-running queries, though.

> .         Is there a way to log the long running queries in 7.4 ? If not is
> it available in any newer version ?

It's certainly available in 8.3, as log_min_duration_statement in
postgresql.conf . You can find out if it's in 7.4, and if not what
version it was introduced in, by looking through the documentation for
versions 7.4 and up.

--
Craig Ringer

Re: how does pg handle concurrent queries and same queries

From
Matthew Wakeling
Date:
On Mon, 28 Jul 2008, Faludi Gábor wrote:
> EXPLAIN ANALYZE SELECT DISTINCT letoltes.cid, count(letoltes.cid)  AS
> elofordulas FROM letoltes GROUP BY cid ORDER BY elofordulas DESC LIMIT 5;
>                                                               QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------------
> Limit  (cost=9255.05..9255.09 rows=5 width=4) (actual  time=604.734..604.743 > rows=5 loops=1)
>   ->  Unique  (cost=9255.05..9257.26 rows=294 width=4) (actual time=604.732..604.737 rows=5 loops=1)
>         ->  Sort  (cost=9255.05..9255.79 rows=294 width=4) (actual time=604.730..604.732 rows=5 loops=1)
>               Sort Key: count(cid), cid
>               ->  HashAggregate  (cost=9242.26..9243.00 rows=294 width=4) (actual time=604.109..604.417 rows=373
loops=1)
>                     ->  Seq Scan on letoltes  (cost=0.00..6920.51 rows=464351 width=4) (actual time=0.022..281.413
rows=464351loops=1) 
> Total runtime: 604.811 ms

So this query is doing a sequential scan of the letoltes table for each
query. You may get some improvement by creating an index on cid and
clustering on that index, but probably not much.

Moving to Postgres 8.3 will probably help a lot, as it will allow multiple
queries to use the same sequential scan in parallel. That's assuming the
entire table isn't in cache.

Another solution would be to create an additional table that contains the
results of this query, and keep it up to date using triggers on the
original table. Then query that table instead.

However, probably the best solution is to examine the problem and work out
if you can alter the application to make it avoid doing such an expensive
query so often. Perhaps it could cache the results.

Matthew

--
Psychotics are consistently inconsistent. The essence of sanity is to
be inconsistently inconsistent.

Re: how does pg handle concurrent queries and same queries

From
Alvaro Herrera
Date:
Craig Ringer wrote:
> Faludi Gábor wrote:
>
> > .         Why does the second and the later queries take the whole on second
> > if the dataset is the same . Shouldn't PG realise that the query is the same
> > so i give the user the same resultset ?
>
> That would require a result cache. I don't know if Pg even has a query
> result cache - I don't think so, but I'm not sure.

It doesn't.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: how does pg handle concurrent queries and same queries

From
"Dennis Brakhane"
Date:
Slightly off-topic, but judging from the fact that you were able to
"fix" the query, it seems you have some way to modify the application
code itself. In that case, I'd try to implement caching (at least for
this statement) on the application side, for example with memcached.