Thread: same query in high number of times
Hey folks !
Still kind of analyzing the situation , I realized that I do have a reasonably high shared_memory and effective_cache_size , though if the same query is being run in a number of times ~100-200 concurrent connection it is not being cached .
Should PG realize that if the table data is same should the query result set also be the same ? Instead each query takes up to 1-2 seconds .
Where do I see what the PG does ? I can see now the query's that take long time ,but do not have information about what the optimizer does neither when the DB decides about to table scan or cache ?
cheers,
Peter
Still kind of analyzing the situation , I realized that I do have a reasonably high shared_memory and effective_cache_size , though if the same query is being run in a number of times ~100-200 concurrent connection it is not being cached .
Should PG realize that if the table data is same should the query result set also be the same ? Instead each query takes up to 1-2 seconds .
Where do I see what the PG does ? I can see now the query's that take long time ,but do not have information about what the optimizer does neither when the DB decides about to table scan or cache ?
cheers,
Peter
With out knowing how much memory for each of those settings and how much work_mem for each connection its kinda hard to tell what is going.
Also need version for PG, OS, how big the tables are, Also would be nice to see the query itself with explain and analyze
PG does not cache the results from a query but the tables itself.
The table could be completely cached but there may be some nasty Nested loops causing the problem.
What are you expecting the query time to be??
check out http://wiki.postgresql.org/wiki/Performance_Optimization there is allot of info on how to tune, and diagnose problem queries
---- Message from Peter Alban <peter.alban2@gmail.com> at 06-21-2009 12:54:40 PM ------
Hey folks !
Still kind of analyzing the situation , I realized that I do have a reasonably high shared_memory and effective_cache_size , though if the same query is being run in a number of times ~100-200 concurrent connection it is not being cached .
Should PG realize that if the table data is same should the query result set also be the same ? Instead each query takes up to 1-2 seconds .
Where do I see what the PG does ? I can see now the query's that take long time ,but do not have information about what the optimizer does neither when the DB decides about to table scan or cache ?
cheers,
Peter
On Sun, Jun 21, 2009 at 6:54 AM, Peter Alban<peter.alban2@gmail.com> wrote: > Should PG realize that if the table data is same should the query result set > also be the same ? No. That's not so easy to implement as you might think. Saving the results of each previous query in case someone issues the same query again without having changed anything in the meantime would probably cost more in performance on average that you'd get out of it. > Where do I see what the PG does ? I can see now the query's that take long > time ,but do not have information about what the optimizer does neither when > the DB decides about to table scan or cache ? Can't you get this from EXPLAIN and EXPLAIN ANALYZE? ...Robert
Hi,
Here is the query :
duration: 2533.734 ms statement:
SELECT news.url_text,news.title, comments.name, comments.createdate, comments.user_id, comments.comment FROM news, comments WHERE comments.cid=news.id AND comments.published='1' GROUP BY news.url_text,news.title comments.name, comments.createdate, comments.user_id, comments.comment ORDER BY comments.createdate DESC LIMIT 3
And here is the query plan :
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4313.54..4313.55 rows=3 width=595) (actual time=288.525..288.528 rows=3 loops=1)
-> Sort (cost=4313.54..4347.26 rows=13486 width=595) (actual time=288.523..288.523 rows=3 loops=1)
Sort Key: comments.createdate
-> HashAggregate (cost=3253.60..3388.46 rows=13486 width=595) (actual time=137.521..148.132 rows=13415 loops=1)
-> Hash Join (cost=1400.73..3051.31 rows=13486 width=595) (actual time=14.298..51.049 rows=13578 loops=1)
Hash Cond: ("outer".cid = "inner".id)
-> Seq Scan on comments (cost=0.00..1178.72 rows=13480 width=522) (actual time=0.012..17.434 rows=13418 loops=1)
Filter: (published = 1)
-> Hash (cost=1391.18..1391.18 rows=3818 width=81) (actual time=14.268..14.268 rows=3818 loops=1)
-> Seq Scan on news (cost=0.00..1391.18 rows=3818 width=81) (actual time=0.021..10.072 rows=3818 loops=1)
The same is being requested from different sessions . So why is it not being cached .
postgresq.conf --current --
shared_buffers = 410000 # min 16 or max_connections*2, 8KB each
temp_buffers = 11000 # min 100, 8KB each
#max_prepared_transactions = 5 # can be 0 or more
# note: increasing max_prepared_transactions costs ~600 bytes of shared memory
# per transaction slot, plus lock space (see max_locks_per_transaction).
work_mem = 51024 # min 64, size in KB
#maintenance_work_mem = 16384 # min 1024, size in KB
#max_stack_depth = 2048 # min 100, size in KB
#---------------------------------------------------------------------------
# QUERY TUNING
#---------------------------------------------------------------------------
# - Planner Method Configuration -
#enable_bitmapscan = on
#enable_hashagg = on
#enable_hashjoin = on
#enable_indexscan = on
#enable_mergejoin = on
#enable_nestloop = on
#enable_seqscan = on
#enable_sort = on
#enable_tidscan = on
# - Planner Cost Constants -
effective_cache_size = 692674 # typically 8KB each
#random_page_cost = 4 # units are one sequential page fetch
# cost
#cpu_tuple_cost = 0.01 # (same)
#cpu_index_tuple_cost = 0.001 # (same)
#cpu_operator_cost = 0.0025 # (same)
# - Genetic Query Optimizer -
#geqo = on
#geqo_threshold = 12
#geqo_effort = 5 # range 1-10
#geqo_pool_size = 0 # selects default based on effort
#geqo_generations = 0 # selects default based on effort
#geqo_selection_bias = 2.0 # range 1.5-2.0
# - Other Planner Options -
#default_statistics_target = 10 # range 1-1000
cheers,
Peter
Here is the query :
duration: 2533.734 ms statement:
SELECT news.url_text,news.title, comments.name, comments.createdate, comments.user_id, comments.comment FROM news, comments WHERE comments.cid=news.id AND comments.published='1' GROUP BY news.url_text,news.title comments.name, comments.createdate, comments.user_id, comments.comment ORDER BY comments.createdate DESC LIMIT 3
And here is the query plan :
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4313.54..4313.55 rows=3 width=595) (actual time=288.525..288.528 rows=3 loops=1)
-> Sort (cost=4313.54..4347.26 rows=13486 width=595) (actual time=288.523..288.523 rows=3 loops=1)
Sort Key: comments.createdate
-> HashAggregate (cost=3253.60..3388.46 rows=13486 width=595) (actual time=137.521..148.132 rows=13415 loops=1)
-> Hash Join (cost=1400.73..3051.31 rows=13486 width=595) (actual time=14.298..51.049 rows=13578 loops=1)
Hash Cond: ("outer".cid = "inner".id)
-> Seq Scan on comments (cost=0.00..1178.72 rows=13480 width=522) (actual time=0.012..17.434 rows=13418 loops=1)
Filter: (published = 1)
-> Hash (cost=1391.18..1391.18 rows=3818 width=81) (actual time=14.268..14.268 rows=3818 loops=1)
-> Seq Scan on news (cost=0.00..1391.18 rows=3818 width=81) (actual time=0.021..10.072 rows=3818 loops=1)
The same is being requested from different sessions . So why is it not being cached .
postgresq.conf --current --
shared_buffers = 410000 # min 16 or max_connections*2, 8KB each
temp_buffers = 11000 # min 100, 8KB each
#max_prepared_transactions = 5 # can be 0 or more
# note: increasing max_prepared_transactions costs ~600 bytes of shared memory
# per transaction slot, plus lock space (see max_locks_per_transaction).
work_mem = 51024 # min 64, size in KB
#maintenance_work_mem = 16384 # min 1024, size in KB
#max_stack_depth = 2048 # min 100, size in KB
#---------------------------------------------------------------------------
# QUERY TUNING
#---------------------------------------------------------------------------
# - Planner Method Configuration -
#enable_bitmapscan = on
#enable_hashagg = on
#enable_hashjoin = on
#enable_indexscan = on
#enable_mergejoin = on
#enable_nestloop = on
#enable_seqscan = on
#enable_sort = on
#enable_tidscan = on
# - Planner Cost Constants -
effective_cache_size = 692674 # typically 8KB each
#random_page_cost = 4 # units are one sequential page fetch
# cost
#cpu_tuple_cost = 0.01 # (same)
#cpu_index_tuple_cost = 0.001 # (same)
#cpu_operator_cost = 0.0025 # (same)
# - Genetic Query Optimizer -
#geqo = on
#geqo_threshold = 12
#geqo_effort = 5 # range 1-10
#geqo_pool_size = 0 # selects default based on effort
#geqo_generations = 0 # selects default based on effort
#geqo_selection_bias = 2.0 # range 1.5-2.0
# - Other Planner Options -
#default_statistics_target = 10 # range 1-1000
cheers,
Peter
On Sun, Jun 21, 2009 at 7:42 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Jun 21, 2009 at 6:54 AM, Peter Alban<peter.alban2@gmail.com> wrote:No. That's not so easy to implement as you might think. Saving the
> Should PG realize that if the table data is same should the query result set
> also be the same ?
results of each previous query in case someone issues the same query
again without having changed anything in the meantime would probably
cost more in performance on average that you'd get out of it.Can't you get this from EXPLAIN and EXPLAIN ANALYZE?
> Where do I see what the PG does ? I can see now the query's that take long
> time ,but do not have information about what the optimizer does neither when
> the DB decides about to table scan or cache ?
...Robert
Peter Alban wrote:
Because the query results are not cached only the RAW tables are. The query is rerun every time it is requested.
What is the group by clause accomplishing???
The sorting and hash Aggregate is eating up all the time
Thats allot memory dedicated to work mem if you have 30 connections open this could eat up 1.5gigs pushing the data out of cache.
duration: 2533.734 ms statement:
SELECT news.url_text,news.title, comments.name, comments.createdate, comments.user_id, comments.comment FROM news, comments WHERE comments.cid=news.id AND comments.published='1' GROUP BY news.url_text,news.title comments.name, comments.createdate, comments.user_id, comments.comment ORDER BY comments.createdate DESC LIMIT 3
And here is the query plan :
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4313.54..4313.55 rows=3 width=595) (actual time=288.525..288.528 rows=3 loops=1)
-> Sort (cost=4313.54..4347.26 rows=13486 width=595) (actual time=288.523..288.523 rows=3 loops=1)
Sort Key: comments.createdate
-> HashAggregate (cost=3253.60..3388.46 rows=13486 width=595) (actual time=137.521..148.132 rows=13415 loops=1)
-> Hash Join (cost=1400.73..3051.31 rows=13486 width=595) (actual time=14.298..51.049 rows=13578 loops=1)
Hash Cond: ("outer".cid = "inner".id)
-> Seq Scan on comments (cost=0.00..1178.72 rows=13480 width=522) (actual time=0.012..17.434 rows=13418 loops=1)
Filter: (published = 1)
-> Hash (cost=1391.18..1391.18 rows=3818 width=81) (actual time=14.268..14.268 rows=3818 loops=1)
-> Seq Scan on news (cost=0.00..1391.18 rows=3818 width=81) (actual time=0.021..10.072 rows=3818 loops=1)
The same is being requested from different sessions . So why is it not being cached .
Because the query results are not cached only the RAW tables are. The query is rerun every time it is requested.
What is the group by clause accomplishing???
The sorting and hash Aggregate is eating up all the time
work_mem = 51024 # min 64, size in KB
Thats allot memory dedicated to work mem if you have 30 connections open this could eat up 1.5gigs pushing the data out of cache.
On Sun, Jun 21, 2009 at 9:01 PM, Justin Graf<justin@emproshunts.com> wrote: > work_mem = 51024 # min 64, size in KB > > Thats allot memory dedicated to work mem if you have 30 connections open > this could eat up 1.5gigs pushing the data out of cache. I thought work memory is max memory that can be allocated per connection for sorting, etc. I think it is not allocated when connection is opened, but only on 'if needed' basis. -- GJ
---- Message from Grzegorz Jaśkiewicz <gryzman@gmail.com> at 06-21-2009 09:36:01 PM ------
On Sun, Jun 21, 2009 at 9:01 PM, Justin Graf<justin@emproshunts.com> wrote:
> work_mem = 51024 # min 64, size in KB
>
> Thats allot memory dedicated to work mem if you have 30 connections open
> this could eat up 1.5gigs pushing the data out of cache.
I thought work memory is max memory that can be allocated per
connection for sorting, etc. I think it is not allocated when
connection is opened, but only on 'if needed' basis.
On Sun, Jun 21, 2009 at 10:01 PM, Justin Graf <justin@emproshunts.com> wrote:
So this should mean that having say a 5 mb table in memory doing such query above takes 2 secs in memory ?
Assuming that, we probably have really slow memory :)
Besides , the query makes less sense to me , but I dont write the queries (yet) simply looking at the server side .
So do you suggest to tune the queries or shall I rather look for other monitoring tools ?
cheers,
Peter
Peter Alban wrote:Because the query results are not cached only the RAW tables are. The query is rerun every time it is requested.duration: 2533.734 ms statement:
SELECT news.url_text,news.title, comments.name, comments.createdate, comments.user_id, comments.comment FROM news, comments WHERE comments.cid=news.id AND comments.published='1' GROUP BY news.url_text,news.title comments.name, comments.createdate, comments.user_id, comments.comment ORDER BY comments.createdate DESC LIMIT 3
And here is the query plan :
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4313.54..4313.55 rows=3 width=595) (actual time=288.525..288.528 rows=3 loops=1)
-> Sort (cost=4313.54..4347.26 rows=13486 width=595) (actual time=288.523..288.523 rows=3 loops=1)
Sort Key: comments.createdate
-> HashAggregate (cost=3253.60..3388.46 rows=13486 width=595) (actual time=137.521..148.132 rows=13415 loops=1)
-> Hash Join (cost=1400.73..3051.31 rows=13486 width=595) (actual time=14.298..51.049 rows=13578 loops=1)
Hash Cond: ("outer".cid = "inner".id)
-> Seq Scan on comments (cost=0.00..1178.72 rows=13480 width=522) (actual time=0.012..17.434 rows=13418 loops=1)
Filter: (published = 1)
-> Hash (cost=1391.18..1391.18 rows=3818 width=81) (actual time=14.268..14.268 rows=3818 loops=1)
-> Seq Scan on news (cost=0.00..1391.18 rows=3818 width=81) (actual time=0.021..10.072 rows=3818 loops=1)
The same is being requested from different sessions . So why is it not being cached .
What is the group by clause accomplishing???
The sorting and hash Aggregate is eating up all the time
So this should mean that having say a 5 mb table in memory doing such query above takes 2 secs in memory ?
Assuming that, we probably have really slow memory :)
Besides , the query makes less sense to me , but I dont write the queries (yet) simply looking at the server side .
So do you suggest to tune the queries or shall I rather look for other monitoring tools ?
cheers,
Peter
Thats allot memory dedicated to work mem if you have 30 connections open this could eat up 1.5gigs pushing the data out of cache.work_mem = 51024 # min 64, size in KB
On Sun, Jun 21, 2009 at 12:28 PM, Peter Alban<peter.alban2@gmail.com> wrote: > Hi, > > Here is the query : > duration: 2533.734 ms statement: SNIP > Limit (cost=4313.54..4313.55 rows=3 width=595) (actual > time=288.525..288.528 rows=3 loops=1) According to this query plan, your query is taking up 288 milliseconds. I'm guessing the rest of the time is actually is spent transferring data.
---- Message from Peter Alban <peter.alban2@gmail.com> at 06-21-2009 10:59:49 PM ------
On Sun, Jun 21, 2009 at 10:01 PM, Justin Graf <justin@emproshunts.com> wrote:Peter Alban wrote:Because the query results are not cached only the RAW tables are. The query is rerun every time it is requested.duration: 2533.734 ms statement:
Limit (cost=4313.54..4313.55 rows=3 width=595) (actual time=288.525..288.528 rows=3 loops=1)
What is the group by clause accomplishing???
The sorting and hash Aggregate is eating up all the time
So this should mean that having say a 5 mb table in memory doing such query above takes 2 secs in memory ?
Assuming that, we probably have really slow memory :)
Besides , the query makes less sense to me , but I dont write the queries (yet) simply looking at the server side .
So do you suggest to tune the queries or shall I rather look for other monitoring tools ?
cheers,
Peter
Thats a really tiny table it should be processed in sub milliseconds something else is going on. The actual time in the explain of the query states 288 millisecond not the 2533.734 you state from above.
You have not told us the version of PG or the OS its running on.
Is there anything else running on the server???
On Sun, Jun 21, 2009 at 4:59 PM, Peter Alban<peter.alban2@gmail.com> wrote: > > > On Sun, Jun 21, 2009 at 10:01 PM, Justin Graf <justin@emproshunts.com> > wrote: >> >> Peter Alban wrote: >> >> duration: 2533.734 ms statement: >> >> SELECT news.url_text,news.title, comments.name, comments.createdate, >> comments.user_id, comments.comment FROM news, comments WHERE >> comments.cid=news.id AND comments.published='1' GROUP BY >> news.url_text,news.title comments.name, comments.createdate, >> comments.user_id, comments.comment ORDER BY comments.createdate DESC LIMIT 3 >> >> >> And here is the query plan : >> QUERY >> PLAN >> >> ---------------------------------------------------------------------------------------------------------------------------------------- >> Limit (cost=4313.54..4313.55 rows=3 width=595) (actual >> time=288.525..288.528 rows=3 loops=1) >> -> Sort (cost=4313.54..4347.26 rows=13486 width=595) (actual >> time=288.523..288.523 rows=3 loops=1) >> Sort Key: comments.createdate >> -> HashAggregate (cost=3253.60..3388.46 rows=13486 width=595) >> (actual time=137.521..148.132 rows=13415 loops=1) >> -> Hash Join (cost=1400.73..3051.31 rows=13486 width=595) >> (actual time=14.298..51.049 rows=13578 loops=1) >> Hash Cond: ("outer".cid = "inner".id) >> -> Seq Scan on comments (cost=0.00..1178.72 >> rows=13480 width=522) (actual time=0.012..17.434 rows=13418 loops=1) >> Filter: (published = 1) >> -> Hash (cost=1391.18..1391.18 rows=3818 width=81) >> (actual time=14.268..14.268 rows=3818 loops=1) >> -> Seq Scan on news (cost=0.00..1391.18 >> rows=3818 width=81) (actual time=0.021..10.072 rows=3818 loops=1) >> >> The same is being requested from different sessions . So why is it not >> being cached . >> >> >> Because the query results are not cached only the RAW tables are. The >> query is rerun every time it is requested. >> >> What is the group by clause accomplishing??? >> The sorting and hash Aggregate is eating up all the time > > So this should mean that having say a 5 mb table in memory doing such query > above takes 2 secs in memory ? Nope. But as others have pointed out, you need to figure out why it's taking 2.5 s but EXPLAIN ANALYZE is only saying 300 ms. There's other things you can do to optimize this query; for example: 1. Try creating an index on comments (createdate), and don't forget to ANALYZE the table afterward, or 2. Modify the query to remove the probably-unnecessary GROUP BY. But figuring out the times may be the first thing. My guess is that the 2.5 s time is a time from your logs, maybe at a time when the system was busy, and the 300 ms time was what you got it when you ran it some other time. But maybe there's some other explanation. You should try to figure it out. ...Robert
hey folks !
eventually the removing of the group by did improve but still my concern is why cant we take the result from memory given its same resultset .
But I keep pusing for the developers to move to memcached so we overcome this limitation .
cheers,
Peter
eventually the removing of the group by did improve but still my concern is why cant we take the result from memory given its same resultset .
But I keep pusing for the developers to move to memcached so we overcome this limitation .
cheers,
Peter
On Mon, Jun 22, 2009 at 5:23 AM, Robert Haas <robertmhaas@gmail.com> wrote:
Nope. But as others have pointed out, you need to figure out why it'sOn Sun, Jun 21, 2009 at 4:59 PM, Peter Alban<peter.alban2@gmail.com> wrote:
>
>
> On Sun, Jun 21, 2009 at 10:01 PM, Justin Graf <justin@emproshunts.com>
> wrote:
>>
>> Peter Alban wrote:
>>
>> duration: 2533.734 ms statement:
>>
>> SELECT news.url_text,news.title, comments.name, comments.createdate,
>> comments.user_id, comments.comment FROM news, comments WHERE
>> comments.cid=news.id AND comments.published='1' GROUP BY
>> news.url_text,news.title comments.name, comments.createdate,
>> comments.user_id, comments.comment ORDER BY comments.createdate DESC LIMIT 3
>>
>>
>> And here is the query plan :
>> QUERY
>> PLAN
>>
>> ----------------------------------------------------------------------------------------------------------------------------------------
>> Limit (cost=4313.54..4313.55 rows=3 width=595) (actual
>> time=288.525..288.528 rows=3 loops=1)
>> -> Sort (cost=4313.54..4347.26 rows=13486 width=595) (actual
>> time=288.523..288.523 rows=3 loops=1)
>> Sort Key: comments.createdate
>> -> HashAggregate (cost=3253.60..3388.46 rows=13486 width=595)
>> (actual time=137.521..148.132 rows=13415 loops=1)
>> -> Hash Join (cost=1400.73..3051.31 rows=13486 width=595)
>> (actual time=14.298..51.049 rows=13578 loops=1)
>> Hash Cond: ("outer".cid = "inner".id)
>> -> Seq Scan on comments (cost=0.00..1178.72
>> rows=13480 width=522) (actual time=0.012..17.434 rows=13418 loops=1)
>> Filter: (published = 1)
>> -> Hash (cost=1391.18..1391.18 rows=3818 width=81)
>> (actual time=14.268..14.268 rows=3818 loops=1)
>> -> Seq Scan on news (cost=0.00..1391.18
>> rows=3818 width=81) (actual time=0.021..10.072 rows=3818 loops=1)
>>
>> The same is being requested from different sessions . So why is it not
>> being cached .
>>
>>
>> Because the query results are not cached only the RAW tables are. The
>> query is rerun every time it is requested.
>>
>> What is the group by clause accomplishing???
>> The sorting and hash Aggregate is eating up all the time
>
> So this should mean that having say a 5 mb table in memory doing such query
> above takes 2 secs in memory ?
taking 2.5 s but EXPLAIN ANALYZE is only saying 300 ms.
There's other things you can do to optimize this query; for example:
1. Try creating an index on comments (createdate), and don't forget to
ANALYZE the table afterward, or
2. Modify the query to remove the probably-unnecessary GROUP BY.
But figuring out the times may be the first thing. My guess is that
the 2.5 s time is a time from your logs, maybe at a time when the
system was busy, and the 300 ms time was what you got it when you ran
it some other time. But maybe there's some other explanation. You
should try to figure it out.
...Robert
On 6/22/09 2:11 PM, "Peter Alban" <peter.alban2@gmail.com> wrote: > hey folks ! > > eventually the removing of the group by did improve but still my concern is > why cant we take the result from memory given its same resultset . > But I keep pusing for the developers to move to memcached so we overcome this > limitation . > > cheers, > Peter > Caching of that sort is better suited to client code or a middle tier caching technology. It isn¹t that simple to resolve that the same query will return the same result for most queries. Between two executions another could have made a change. But more importantly, the client is what knows what level of Ostaleness¹ is appropriate for the data. A RDBMS will operate on a no tolerance policy for returning stale data and with strict transactional visibility rules. If your application only needs a result that is fresh within some window of time, it should do the caching (or another component). This is far more efficient. A RDBMS is a very poorly performing data cache < its built to quickly resolve arbitrary relational queries with strict transactional guarantees, not to cache a set of answers. Although given a hammer everything looks like a nail . . . > On Mon, Jun 22, 2009 at 5:23 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Sun, Jun 21, 2009 at 4:59 PM, Peter Alban<peter.alban2@gmail.com> wrote: >>> >>> >>> On Sun, Jun 21, 2009 at 10:01 PM, Justin Graf <justin@emproshunts.com> >>> wrote: >>>> >>>> Peter Alban wrote: >>>> >>>> duration: 2533.734 ms statement: >>>> >>>> SELECT news.url_text,news.title, comments.name <http://comments.name> , >>>> comments.createdate, >>>> comments.user_id, comments.comment FROM news, comments WHERE >>>> comments.cid=news.id <http://news.id> AND comments.published='1' GROUP BY >>>> news.url_text,news.title comments.name <http://comments.name> , >>>> comments.createdate, >>>> comments.user_id, comments.comment ORDER BY comments.createdate DESC LIMIT >>>> 3 >>>> >>>> >>>> And here is the query plan : >>>> QUERY >>>> PLAN >>>> >>>> --------------------------------------------------------------------------- >>>> ------------------------------------------------------------- >>>> Limit (cost=4313.54..4313.55 rows=3 width=595) (actual >>>> time=288.525..288.528 rows=3 loops=1) >>>> -> Sort (cost=4313.54..4347.26 rows=13486 width=595) (actual >>>> time=288.523..288.523 rows=3 loops=1) >>>> Sort Key: comments.createdate >>>> -> HashAggregate (cost=3253.60..3388.46 rows=13486 width=595) >>>> (actual time=137.521..148.132 rows=13415 loops=1) >>>> -> Hash Join (cost=1400.73..3051.31 rows=13486 width=595) >>>> (actual time=14.298..51.049 rows=13578 loops=1) >>>> Hash Cond: ("outer".cid = "inner".id) >>>> -> Seq Scan on comments (cost=0.00..1178.72 >>>> rows=13480 width=522) (actual time=0.012..17.434 rows=13418 loops=1) >>>> Filter: (published = 1) >>>> -> Hash (cost=1391.18..1391.18 rows=3818 width=81) >>>> (actual time=14.268..14.268 rows=3818 loops=1) >>>> -> Seq Scan on news (cost=0.00..1391.18 >>>> rows=3818 width=81) (actual time=0.021..10.072 rows=3818 loops=1) >>>> >>>> The same is being requested from different sessions . So why is it not >>>> being cached . >>>> >>>> >>>> Because the query results are not cached only the RAW tables are. The >>>> query is rerun every time it is requested. >>>> >>>> What is the group by clause accomplishing??? >>>> The sorting and hash Aggregate is eating up all the time >>> >>> So this should mean that having say a 5 mb table in memory doing such query >>> above takes 2 secs in memory ? >> >> Nope. But as others have pointed out, you need to figure out why it's >> taking 2.5 s but EXPLAIN ANALYZE is only saying 300 ms. >> >> There's other things you can do to optimize this query; for example: >> >> 1. Try creating an index on comments (createdate), and don't forget to >> ANALYZE the table afterward, or >> >> 2. Modify the query to remove the probably-unnecessary GROUP BY. >> >> But figuring out the times may be the first thing. My guess is that >> the 2.5 s time is a time from your logs, maybe at a time when the >> system was busy, and the 300 ms time was what you got it when you ran >> it some other time. But maybe there's some other explanation. You >> should try to figure it out. >> >> ...Robert > >
On Mon, Jun 22, 2009 at 12:06 AM, Scott Marlowe<scott.marlowe@gmail.com> wrote: > On Sun, Jun 21, 2009 at 12:28 PM, Peter Alban<peter.alban2@gmail.com> wrote: >> Hi, >> >> Here is the query : >> duration: 2533.734 ms statement: > > SNIP > >> Limit (cost=4313.54..4313.55 rows=3 width=595) (actual >> time=288.525..288.528 rows=3 loops=1) > > According to this query plan, your query is taking up 288 > milliseconds. I'm guessing the rest of the time is actually is spent > transferring data. Huuuuuu ... The cost is _certainly_ not the time in ms. See the planner cost constants in a config file, or in any good documentation. -- F4FQM Kerunix Flan Laurent Laborde
On Tue, Jun 23, 2009 at 10:52 AM, Laurent Laborde<kerdezixe@gmail.com> wrote: > On Mon, Jun 22, 2009 at 12:06 AM, Scott Marlowe<scott.marlowe@gmail.com> wrote: >> On Sun, Jun 21, 2009 at 12:28 PM, Peter Alban<peter.alban2@gmail.com> wrote: >>> Hi, >>> >>> Here is the query : >>> duration: 2533.734 ms statement: >> >> SNIP >> >>> Limit (cost=4313.54..4313.55 rows=3 width=595) (actual >>> time=288.525..288.528 rows=3 loops=1) >> >> According to this query plan, your query is taking up 288 >> milliseconds. I'm guessing the rest of the time is actually is spent >> transferring data. > > Huuuuuu ... > The cost is _certainly_ not the time in ms. > See the planner cost constants in a config file, or in any good documentation. Woooops... cost... time... my mistake ... :) *duck and cover* -- F4FQM Kerunix Flan Laurent Laborde