Thread: how could select id=xx so slow?

how could select id=xx so slow?

From
Yan Chunlu
Date:
I have grabbed one day slow query log and analyzed it by pgfouine, to my surprise, the slowest query is just a simple select statement:

select diggcontent_data_message.thing_id, diggcontent_data_message.KEY, diggcontent_data_message.value, diggcontent_data_message.kind FROM diggcontent_data_message WHERE diggcontent_data_message.thing_id = 3570882;


where thing_id is the primary key,  guess how long it takes?

754.61 seconds!! 

I tried explain analyze it and below is the result, which is very fast:


explain analyze select diggcontent_data_message.thing_id, diggcontent_data_message.KEY, diggcontent_data_message.value, diggcontent_data_message.kind FROM diggcontent_data_message WHERE diggcontent_data_message.thing_id = 3570882;
                                                                            QUERY PLAN                                                                             
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_thing_id_diggcontent_data_message on diggcontent_data_message  (cost=0.00..15.34 rows=32 width=51) (actual time=0.080..0.096 rows=8 loops=1)
   Index Cond: (thing_id = 3570882)
 Total runtime: 0.115 ms
(3 rows)


so I wonder could this simple select is innocent and affected badly by other queries? how could I find those queries that really slow down the database? 
thanks!

Re: how could select id=xx so slow?

From
Daniel Farina
Date:
On Thu, Jul 5, 2012 at 11:17 PM, Yan Chunlu <springrider@gmail.com> wrote:
> I have grabbed one day slow query log and analyzed it by pgfouine, to my
> surprise, the slowest query is just a simple select statement:
>
> select diggcontent_data_message.thing_id, diggcontent_data_message.KEY,
> diggcontent_data_message.value, diggcontent_data_message.kind FROM
> diggcontent_data_message WHERE diggcontent_data_message.thing_id = 3570882;
>
>
> where thing_id is the primary key,  guess how long it takes?
>
> 754.61 seconds!!

Is it possible that the size of the tuple is enormous?  Because one
area where I've noticed EXPLAIN ANALYZE blows away normal performance
is when a lot of the work would be in reassembling, decompressing
(collectively: de-TOASTING) and sending the data.

Even then, that time seems excessive...but something to think about.

--
fdr

Re: how could select id=xx so slow?

From
Craig Ringer
Date:
On 07/06/2012 02:17 PM, Yan Chunlu wrote:

> so I wonder could this simple select is innocent and affected badly by
> other queries? how could I find those queries that really slow down
> the database?

It might not be other queries. Your query could be taking that long
because it was blocked by a lock during maintenance work (say, an ALTER
TABLE). It's also quite possible that it was held up by a slow
checkpoint; check your logs to see if there are warnings about
checkpoint activity.

--
Craig Ringer

Re: how could select id=xx so slow?

From
"Albe Laurenz"
Date:
Yan Chunlu wrote:
> I have grabbed one day slow query log and analyzed it by pgfouine, to
my surprise, the slowest query
> is just a simple select statement:
>
> select diggcontent_data_message.thing_id,
diggcontent_data_message.KEY,
> diggcontent_data_message.value, diggcontent_data_message.kind FROM
diggcontent_data_message WHERE
> diggcontent_data_message.thing_id = 3570882;
>
> where thing_id is the primary key,  guess how long it takes?
>
> 754.61 seconds!!
>
> I tried explain analyze it and below is the result, which is very
fast:
>
> explain analyze select diggcontent_data_message.thing_id,
diggcontent_data_message.KEY,
> diggcontent_data_message.value, diggcontent_data_message.kind FROM
diggcontent_data_message WHERE
> diggcontent_data_message.thing_id = 3570882;
>
QUERY PLAN
>
------------------------------------------------------------------------
------------------------------
> -------------------------------------------------------------
>  Index Scan using idx_thing_id_diggcontent_data_message on
diggcontent_data_message  (cost=0.00..15.34
> rows=32 width=51) (actual time=0.080..0.096 rows=8 loops=1)
>    Index Cond: (thing_id = 3570882)
>  Total runtime: 0.115 ms
> (3 rows)
>
> so I wonder could this simple select is innocent and affected badly by
other queries? how could I find
> those queries that really slow down the database?

Are these by any chance the aggregated costs in pgFouine?
Could it be that the statement just ran very often and used that time in
total?

Other than that, it could have been blocked by something that takes an
exclusive lock on the table.

There are no ON SELECT DO INSTEAD rules or similar things on the table,
right?

Yours,
Laurenz Albe

Re: how could select id=xx so slow?

From
Yan Chunlu
Date:
thanks for all the help. I checked the probability and found that:
1, the size of tuple is small
2, I checked the log manually and it indeed cost that much of time,  not aggregated

the value of "log_min_messages" in postgresql.conf is error, I have changed it to "warning", so far does not received any warning, still waiting.

beside I do see some COMMIT which is relatively slow for example:
   60 2012-07-08 00:00:29 CST [19367]: [131-1] LOG:  duration: 375.851 ms  statement: COMMIT
   61 2012-07-08 00:00:30 CST [19367]: [132-1] LOG:  duration: 327.964 ms  statement: COMMIT

but only one "BEGIN" in the same one day log file, did that influence the query time too?


On Fri, Jul 6, 2012 at 9:10 PM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
Yan Chunlu wrote:
> I have grabbed one day slow query log and analyzed it by pgfouine, to
my surprise, the slowest query
> is just a simple select statement:
>
> select diggcontent_data_message.thing_id,
diggcontent_data_message.KEY,
> diggcontent_data_message.value, diggcontent_data_message.kind FROM
diggcontent_data_message WHERE
> diggcontent_data_message.thing_id = 3570882;
>
> where thing_id is the primary key,  guess how long it takes?
>
> 754.61 seconds!!
>
> I tried explain analyze it and below is the result, which is very
fast:
>
> explain analyze select diggcontent_data_message.thing_id,
diggcontent_data_message.KEY,
> diggcontent_data_message.value, diggcontent_data_message.kind FROM
diggcontent_data_message WHERE
> diggcontent_data_message.thing_id = 3570882;
>
QUERY PLAN
>
------------------------------------------------------------------------
------------------------------
> -------------------------------------------------------------
>  Index Scan using idx_thing_id_diggcontent_data_message on
diggcontent_data_message  (cost=0.00..15.34
> rows=32 width=51) (actual time=0.080..0.096 rows=8 loops=1)
>    Index Cond: (thing_id = 3570882)
>  Total runtime: 0.115 ms
> (3 rows)
>
> so I wonder could this simple select is innocent and affected badly by
other queries? how could I find
> those queries that really slow down the database?

Are these by any chance the aggregated costs in pgFouine?
Could it be that the statement just ran very often and used that time in
total?

Other than that, it could have been blocked by something that takes an
exclusive lock on the table.

There are no ON SELECT DO INSTEAD rules or similar things on the table,
right?

Yours,
Laurenz Albe

Re: how could select id=xx so slow?

From
Craig Ringer
Date:
On 07/09/2012 05:20 PM, Yan Chunlu wrote:
>
> the value of "log_min_messages" in postgresql.conf is error, I have
> changed it to "warning", so far does not received any warning, still
> waiting.
When trying to track down performance issues, increasing logging to at
least `info' would seem to be sensible.

I suggest increasing your logging and enabling the auto_explain module
so it logs slow queries. If you can afford the substantial performance
hit you could enable its analyze mode to get details on why.

>
> beside I do see some COMMIT which is relatively slow for example:
>    60 2012-07-08 00:00:29 CST [19367]: [131-1] LOG:  duration: 375.851
> ms  statement: COMMIT
>    61 2012-07-08 00:00:30 CST [19367]: [132-1] LOG:  duration: 327.964
> ms  statement: COMMIT
That certainly is slow. Again, I suspect checkpoint activity could be at
fault. You may need to tune to spread your checkpoints out and use more
aggressive bgwriter settings. See the wiki for performance tuning info.

>
> but only one "BEGIN" in the same one day log file, did that influence
> the query time too?

Only one BEGIN in the whole day? Or do you mean "only one BEGIN slow
enough to trigger slow query logging" ?

Do you have a "log_min_duration_statement" directive set in your
postgresql.conf ?

--
Craig Ringer

Re: how could select id=xx so slow?

From
Craig Ringer
Date:
On 07/10/2012 10:25 AM, Yan Chunlu wrote:
I didn't set log_min_duration_statement in the postgresql.conf, but execute
dbapi_con.cursor().execute("SET log_min_duration_statement to 30")
for every connection.

OK, same effect: You're only logging slow statements.

It's not at all surprising that BEGIN doesn't appear when a log_min_duration_statement is set. It's an incredibly fast operation. What's amazing is that it appears even once - that means your database must be in serious performance trouble, as BEGIN should take tenths of a millisecond on an unloaded system. For example my quick test here:

LOG:  statement: BEGIN;
LOG:  duration: 0.193 ms

... which is actually a lot slower than I expected, but hardly slow statement material.

The frequent appearance of slow (multi-second) COMMIT statements in your slow statement logs suggests there's enough load on your database that there's real contention for disk, and/or that checkpoints are stalling transactions.


First, you need to set log_min_messages = 'info' to allow Pg to complain about things like checkpoint frequency.

Now temporarily set log_checkpoints = on to record when checkpoints happen and how long they take. Most likely you'll find you need to tune checkpoint behaviour. Some information, albeit old, on that is here:

  http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm

Basically you might want to try increasing your checkpoint_completion_target and making the bgwriter more aggressive - assuming that your performance issues are in fact checkpoint related.

It's also possible that they're just overall load, especially if you have lots and lots (hundreds) of connections to the database all trying to do work at once without any kind of admission control or pooling/queuing. In that case, introducing a connection pool like PgBouncer may help.

--
Craig Ringer

Re: how could select id=xx so slow?

From
Yan Chunlu
Date:
I have logged one day data and found the checkpoint is rather frequently(detail: https://gist.github.com/3088338). Not sure if it is normal, but the average time of checkpoint is about 100sec~200sec, it seems related with my settings:

574 checkpoint_segments = 64
575 wal_keep_segments = 5000

I set checkpoint_segments as a very large value which is because otherwise the slave server always can not follow the master, should I lower that value? 

or the slow query is about something else?  thanks!

On Tue, Jul 10, 2012 at 10:46 AM, Craig Ringer <ringerc@ringerc.id.au> wrote:
On 07/10/2012 10:25 AM, Yan Chunlu wrote:
I didn't set log_min_duration_statement in the postgresql.conf, but execute
dbapi_con.cursor().execute("SET log_min_duration_statement to 30")
for every connection.

OK, same effect: You're only logging slow statements.

It's not at all surprising that BEGIN doesn't appear when a log_min_duration_statement is set. It's an incredibly fast operation. What's amazing is that it appears even once - that means your database must be in serious performance trouble, as BEGIN should take tenths of a millisecond on an unloaded system. For example my quick test here:

LOG:  statement: BEGIN;
LOG:  duration: 0.193 ms

... which is actually a lot slower than I expected, but hardly slow statement material.

The frequent appearance of slow (multi-second) COMMIT statements in your slow statement logs suggests there's enough load on your database that there's real contention for disk, and/or that checkpoints are stalling transactions.


First, you need to set log_min_messages = 'info' to allow Pg to complain about things like checkpoint frequency.

Now temporarily set log_checkpoints = on to record when checkpoints happen and how long they take. Most likely you'll find you need to tune checkpoint behaviour. Some information, albeit old, on that is here:

  http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm

Basically you might want to try increasing your checkpoint_completion_target and making the bgwriter more aggressive - assuming that your performance issues are in fact checkpoint related.

It's also possible that they're just overall load, especially if you have lots and lots (hundreds) of connections to the database all trying to do work at once without any kind of admission control or pooling/queuing. In that case, introducing a connection pool like PgBouncer may help.

--
Craig Ringer

Re: how could select id=xx so slow?

From
"Albe Laurenz"
Date:
Yan Chunlu wrote:
> I have logged one day data and found the checkpoint is rather
frequently(detail:
> https://gist.github.com/3088338). Not sure if it is normal, but the
average time of checkpoint is
> about 100sec~200sec, it seems related with my settings:
>
> 574 checkpoint_segments = 64
> 575 wal_keep_segments = 5000
>
> I set checkpoint_segments as a very large value which is because
otherwise the slave server always can
> not follow the master, should I lower that value?

You mean, you set wal_keep_segments high for the standby, right?

wal_keep_segments has no impact on checkpoint frequency and intensity.

You are right that your checkpoint frequency is high. What is your value
of checkpoint_timeout?

You can increase the value of checkpoint_segments to decrease the
checkpoint frequence, but recovery will take longer then.

> or the slow query is about something else?  thanks!

I guess the question is how saturated the I/O system is during
checkpoints. But even if it is very busy, I find it hard to believe
that such a trivial statement can take extremely long.

Yours,
Laurenz Albe

Re: how could select id=xx so slow?

From
Yan Chunlu
Date:
could that because of my system is really busy? 
1, postgresql always have 400+ connections(dozens of python process using client pool)
2, the query peak is 50+/s
3, I do have some bad performance sql executing periodically, need 100+ second to complete. could those bad performance sql influence others?  because when I execute those simple sql directly, they was fast. but the slow query log shows it took too much time.




On Wed, Jul 11, 2012 at 4:23 PM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
Yan Chunlu wrote:
> I have logged one day data and found the checkpoint is rather
frequently(detail:
> https://gist.github.com/3088338). Not sure if it is normal, but the
average time of checkpoint is
> about 100sec~200sec, it seems related with my settings:
>
> 574 checkpoint_segments = 64
> 575 wal_keep_segments = 5000
>
> I set checkpoint_segments as a very large value which is because
otherwise the slave server always can
> not follow the master, should I lower that value?

You mean, you set wal_keep_segments high for the standby, right?

wal_keep_segments has no impact on checkpoint frequency and intensity.

You are right that your checkpoint frequency is high. What is your value
of checkpoint_timeout?

You can increase the value of checkpoint_segments to decrease the
checkpoint frequence, but recovery will take longer then.

> or the slow query is about something else?  thanks!

I guess the question is how saturated the I/O system is during
checkpoints. But even if it is very busy, I find it hard to believe
that such a trivial statement can take extremely long.

Yours,
Laurenz Albe

Re: how could select id=xx so slow?

From
Ants Aasma
Date:
On Wed, Jul 11, 2012 at 9:24 AM, Yan Chunlu <springrider@gmail.com> wrote:
> I have logged one day data and found the checkpoint is rather
> frequently(detail: https://gist.github.com/3088338). Not sure if it is
> normal, but the average time of checkpoint is about 100sec~200sec, it seems
> related with my settings:
>
> 574 checkpoint_segments = 64
> 575 wal_keep_segments = 5000
>
> I set checkpoint_segments as a very large value which is because otherwise
> the slave server always can not follow the master, should I lower that
> value?
>
> or the slow query is about something else?  thanks!

Some things to notice from the checkpoints log:
* All chcekpoints are triggered by checkpoint_timeout, using up only a
couple log files
* Checkpoints write out around 40MB of buffers
* The write out period is spread out nicely like it's supposed to but
the sync phase is occasionally taking a very long time (more than 2
minutes)

This looks like something (not necessarily the checkpoint sync itself)
is overloading the IO system. You might want to monitor the IO load
with iostat and correlate it with the checkpoints and slow queries to
find the culprit. It's also possible that something else is causing
the issues.

If the cause is checkpoints, just making them less frequent might make
the problem worse. I'm assuming you have 16GB+ of RAM because you have
4GB of shared_buffers. Just making checkpoint_timeout longer will
accumulate a larger number of dirty buffers that will clog up the IO
queues even worse. If you are on Linux, lowering
dirty_expire_centisecs or dirty_background_bytes might help to spread
the load out but will make overall throughput worse.

On the otherhand, if the I/O overload is from queries (more likely
because some checkpoints sync quickly) there are no easy tuning
answers. Making queries less IO intensive is probably the best you can
do. From the tuning side, newer Linux kernels handle I/O fairness a
lot better, and you could also try tweaking the I/O scheduler to
achieve better throughput to avoid congestion or at least provide
better latency for trivial queries. And of course its always possible
to throw more hardware at the problem and upgrade the I/O subsystem.

Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

Re: how could select id=xx so slow?

From
Yan Chunlu
Date:
huge thanks for the patient explanations, I think you are right, it is really related to the IO.
I monitor the IO using iostat -x and found the utilize part reach 100% frequently, postgresql is the only service running on that machine, so I think it is either checkpoint or queries caused the problem. 

and I agree that checkpoint may not the problem, I guess I need to tackle those damn queries.
currently the data dir(pgsql/data/base) used 111GB disk space, some tables has tens of millions records. could that cause the query heavy disk IO?  when should I split the data to other machines(aka sharding)? 


and you are right the machine has 16GB memory and commodity 500GB disk. 
kernel: Linux adams 2.6.26-2-amd64 #1 SMP Mon Jun 13 16:29:33 UTC 2011 x86_64 GNU/Linux

by "new kernel" which version do you mean? 

and about those IO intensive queries,  I can only tell the time used from slow query log, is there anything like "explain analyze" that shows specific information about IO usage?




On Wed, Jul 11, 2012 at 7:59 PM, Ants Aasma <ants@cybertec.at> wrote:
On Wed, Jul 11, 2012 at 9:24 AM, Yan Chunlu <springrider@gmail.com> wrote:
> I have logged one day data and found the checkpoint is rather
> frequently(detail: https://gist.github.com/3088338). Not sure if it is
> normal, but the average time of checkpoint is about 100sec~200sec, it seems
> related with my settings:
>
> 574 checkpoint_segments = 64
> 575 wal_keep_segments = 5000
>
> I set checkpoint_segments as a very large value which is because otherwise
> the slave server always can not follow the master, should I lower that
> value?
>
> or the slow query is about something else?  thanks!

Some things to notice from the checkpoints log:
* All chcekpoints are triggered by checkpoint_timeout, using up only a
couple log files
* Checkpoints write out around 40MB of buffers
* The write out period is spread out nicely like it's supposed to but
the sync phase is occasionally taking a very long time (more than 2
minutes)

This looks like something (not necessarily the checkpoint sync itself)
is overloading the IO system. You might want to monitor the IO load
with iostat and correlate it with the checkpoints and slow queries to
find the culprit. It's also possible that something else is causing
the issues.

If the cause is checkpoints, just making them less frequent might make
the problem worse. I'm assuming you have 16GB+ of RAM because you have
4GB of shared_buffers. Just making checkpoint_timeout longer will
accumulate a larger number of dirty buffers that will clog up the IO
queues even worse. If you are on Linux, lowering
dirty_expire_centisecs or dirty_background_bytes might help to spread
the load out but will make overall throughput worse.

On the otherhand, if the I/O overload is from queries (more likely
because some checkpoints sync quickly) there are no easy tuning
answers. Making queries less IO intensive is probably the best you can
do. From the tuning side, newer Linux kernels handle I/O fairness a
lot better, and you could also try tweaking the I/O scheduler to
achieve better throughput to avoid congestion or at least provide
better latency for trivial queries. And of course its always possible
to throw more hardware at the problem and upgrade the I/O subsystem.

Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

Re: how could select id=xx so slow?

From
Craig Ringer
Date:
On 07/11/2012 07:40 PM, Yan Chunlu wrote:
could that because of my system is really busy? 
1, postgresql always have 400+ connections(dozens of python process using client pool)
2, the query peak is 50+/s
3, I do have some bad performance sql executing periodically, need 100+ second to complete. could those bad performance sql influence others?  because when I execute those simple sql directly, they was fast. but the slow query log shows it took too much time.

Oh, come on, these are the sorts of things you tell us when you ask your question, not days later after lots of back-and-forth discussion.

--
Craig Ringer

Re: how could select id=xx so slow?

From
Yan Chunlu
Date:
Really sorry for the lack of information, but I did asked if the slow queries could affect those simple one:
 'so I wonder could this simple select is innocent and affected badly by other queries? '

I didn't mention the connections  number because I don't think my app is that busy, and the large number connections was caused by slow queries.   

I was wrong, everything is connected,  too many factor could end with the result,I am really sorry,  I will tell everything I knew the next time:) 
I learnt a lot during the back and forth!



On Thursday, July 12, 2012, Craig Ringer wrote:
On 07/11/2012 07:40 PM, Yan Chunlu wrote:
could that because of my system is really busy? 
1, postgresql always have 400+ connections(dozens of python process using client pool)
2, the query peak is 50+/s
3, I do have some bad performance sql executing periodically, need 100+ second to complete. could those bad performance sql influence others?  because when I execute those simple sql directly, they was fast. but the slow query log shows it took too much time.

Oh, come on, these are the sorts of things you tell us when you ask your question, not days later after lots of back-and-forth discussion.

--
Craig Ringer

Re: how could select id=xx so slow?

From
Maciek Sakrejda
Date:
On Wed, Jul 11, 2012 at 5:47 PM, Yan Chunlu <springrider@gmail.com> wrote:
> I learnt a lot during the back and forth!

Great to hear.

>> 1, postgresql always have 400+ connections(dozens of python process using client pool)

Note that Postgres does not deal well with a large number of
connections[1]: consider shrinking the size of the pool.

[1]: http://wiki.postgresql.org/wiki/Number_Of_Database_Connections

Re: how could select id=xx so slow?

From
Craig Ringer
Date:
On 07/12/2012 08:47 AM, Yan Chunlu wrote:
Really sorry for the lack of information
I shouldn't have grumped like that either, sorry about that.

I didn't mention the connections  number because I don't think my app is that busy, and the large number connections was caused by slow queries.

Yep - assumptions are a killer like that.

Now you know to watch your system load with iostat, vmstat, top, etc and to monitor your overall load.

--
Craig Ringer

Re: how could select id=xx so slow?

From
Yan Chunlu
Date:
after check out the wiki page Maciek mentioned, turns out that heavy connection also burden the disk hardly.
looks like I am in the vicious circle:
1, slow query cause connection blocked so the client request more connection. 
2, more connection cause high disk io and make even the simplest query slow and block.


I guess I should optimized those queries first...


On Thu, Jul 12, 2012 at 10:20 AM, Craig Ringer <ringerc@ringerc.id.au> wrote:
On 07/12/2012 08:47 AM, Yan Chunlu wrote:
Really sorry for the lack of information
I shouldn't have grumped like that either, sorry about that.


I didn't mention the connections  number because I don't think my app is that busy, and the large number connections was caused by slow queries.

Yep - assumptions are a killer like that.

Now you know to watch your system load with iostat, vmstat, top, etc and to monitor your overall load.

--
Craig Ringer

Re: how could select id=xx so slow?

From
Craig Ringer
Date:
On 07/12/2012 01:10 PM, Yan Chunlu wrote:
after check out the wiki page Maciek mentioned, turns out that heavy connection also burden the disk hardly.
looks like I am in the vicious circle:
1, slow query cause connection blocked so the client request more connection. 
2, more connection cause high disk io and make even the simplest query slow and block.

While true, you can often control this by making sure you don't completely overload your hardware, queuing queries instead of running them all at once.

You may still discover that your hardware can't cope with the workload in that your queues may just keep on getting deeper or time out. In that case, you certainly need to optimise your queries, tune your database, and/or get bigger hardware.

--
Craig Ringer

Re: how could select id=xx so slow?

From
Yan Chunlu
Date:
yes the system seems overloaded, I am dealing with a simple "INSERT" but not sure if it is normal that it took more time than the explain estimated:


explain analyze INSERT INTO vote_content ( thing1_id, thing2_id, name, date) VALUES (1,1, E'1', '2012-07-12T12:34:29.926863+00:00'::timestamptz)

                                       QUERY PLAN                                        
------------------------------------------------------------------------------------------
 Insert  (cost=0.00..0.01 rows=1 width=0) (actual time=79.610..79.610 rows=0 loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.058..0.060 rows=1 loops=1)
 Total runtime: 79.656 ms

it is a table with 50 million rows, so not sure if it is too large... I have attached the schema below:

  Column   |           Type           |                                     Modifiers                                      
-----------+--------------------------+------------------------------------------------------------------------------------
 rel_id    | bigint                   | not null default nextval('vote_content_rel_id_seq'::regclass)
 thing1_id | bigint                   | not null
 thing2_id | bigint                   | not null
 name      | character varying        | not null
 date      | timestamp with time zone | not null
Indexes:
    "vote_content_pkey" PRIMARY KEY, btree (rel_id)
    "vote_content_thing1_id_key" UNIQUE, btree (thing1_id, thing2_id, name)
    "idx_date_vote_content" btree (date)
    "idx_name_vote_content" btree (name)
    "idx_thing1_id_vote_content" btree (thing1_id)
    "idx_thing1_name_date_vote_content" btree (thing1_id, name, date)
    "idx_thing2_id_vote_content" btree (thing2_id)
    "idx_thing2_name_date_vote_content" btree (thing2_id, name, date)

besides, it not the rush hour, so the disk IO is not the problem currently(I think):
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00    44.50    9.50   21.50    76.00   264.00    21.94     0.16    5.10   12.42    1.86   4.39  13.60
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00



On Thu, Jul 12, 2012 at 2:56 PM, Craig Ringer <ringerc@ringerc.id.au> wrote:
On 07/12/2012 01:10 PM, Yan Chunlu wrote:
after check out the wiki page Maciek mentioned, turns out that heavy connection also burden the disk hardly.
looks like I am in the vicious circle:
1, slow query cause connection blocked so the client request more connection. 
2, more connection cause high disk io and make even the simplest query slow and block.

While true, you can often control this by making sure you don't completely overload your hardware, queuing queries instead of running them all at once.

You may still discover that your hardware can't cope with the workload in that your queues may just keep on getting deeper or time out. In that case, you certainly need to optimise your queries, tune your database, and/or get bigger hardware.

--
Craig Ringer

Re: how could select id=xx so slow?

From
Craig Ringer
Date:
On 07/12/2012 08:48 PM, Yan Chunlu wrote:


explain analyze INSERT INTO vote_content ( thing1_id, thing2_id, name, date) VALUES (1,1, E'1', '2012-07-12T12:34:29.926863+00:00'::timestamptz)

                                       QUERY PLAN                                        
------------------------------------------------------------------------------------------
 Insert  (cost=0.00..0.01 rows=1 width=0) (actual time=79.610..79.610 rows=0 loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.058..0.060 rows=1 loops=1)
 Total runtime: 79.656 ms

it is a table with 50 million rows, so not sure if it is too large... I have attached the schema below:

You have eight indexes on that table according to the schema you showed. Three of them cover three columns. Those indexes are going to be expensive to update; frankly I'm amazed it's that FAST to update them when they're that big.

Use pg_size_pretty(pg_relation_size('index_name')) to get the index sizes and compare to the pg_relation_size of the table. It might be informative.

You may see some insert performance benefits with a non-100% fill factor on the indexes, but with possible performance costs to index scans.

--
Craig Ringer

Re: how could select id=xx so slow?

From
Ants Aasma
Date:
On Thu, Jul 12, 2012 at 3:48 PM, Yan Chunlu <springrider@gmail.com> wrote:
> yes the system seems overloaded, I am dealing with a simple "INSERT" but not
> sure if it is normal that it took more time than the explain estimated:

The estimated cost is in arbitrary units, its purpose is to compare
different execution plans, not estimate time taken. So it's completely
normal that it doesn't match actual time taken.

Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

Re: how could select id=xx so slow?

From
Jeff Janes
Date:
On Thu, Jul 12, 2012 at 9:07 AM, Ants Aasma <ants@cybertec.at> wrote:
> On Thu, Jul 12, 2012 at 3:48 PM, Yan Chunlu <springrider@gmail.com> wrote:
>> yes the system seems overloaded, I am dealing with a simple "INSERT" but not
>> sure if it is normal that it took more time than the explain estimated:
>
> The estimated cost is in arbitrary units, its purpose is to compare
> different execution plans, not estimate time taken. So it's completely
> normal that it doesn't match actual time taken.

Right.  And to make explicit what you implied, when there is only one
to do something (like insert a row, or do maintenance on an index) it
often doesn't even attempt to cost that at all as there is no choice.
So it is not just a matter of units.

Cheers,

Jeff

Re: how could select id=xx so slow?

From
Yan Chunlu
Date:
got it, thanks! without your help I really have no idea what should be fast and what supposed to be slower.

I also find "select"  involves a lot of writes:

iotop shows:

 2789 be/4 postgres      0.00 B     57.34 M  0.00 %  0.00 % postgres: goov conta 192.168.1.129(27300) SELECT

I knew that select could cause writes, but not at this magnitude....




On Fri, Jul 13, 2012 at 2:53 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Thu, Jul 12, 2012 at 9:07 AM, Ants Aasma <ants@cybertec.at> wrote:
> On Thu, Jul 12, 2012 at 3:48 PM, Yan Chunlu <springrider@gmail.com> wrote:
>> yes the system seems overloaded, I am dealing with a simple "INSERT" but not
>> sure if it is normal that it took more time than the explain estimated:
>
> The estimated cost is in arbitrary units, its purpose is to compare
> different execution plans, not estimate time taken. So it's completely
> normal that it doesn't match actual time taken.

Right.  And to make explicit what you implied, when there is only one
to do something (like insert a row, or do maintenance on an index) it
often doesn't even attempt to cost that at all as there is no choice.
So it is not just a matter of units.

Cheers,

Jeff

Re: how could select id=xx so slow?

From
Yan Chunlu
Date:
the transaction part is wired, I have filtered BEGIN and COMMIT from a one day log by using:
cat /usr/local/pgsql/data/pg_log/Saturday.log  |grep -E "BEGIN|COMMIT" >trans.txt

and pasted it to gist(only three BEGIN and many COMMIT):

I didn't set log_min_duration_statement in the postgresql.conf, but execute
dbapi_con.cursor().execute("SET log_min_duration_statement to 30")
for every connection.
the system is in production and relatively heavy load, I thought it would be better not "reload" the db too frequently, use the "SET log_min_duration_statement to 30" I could turn the log off within my application.



On Tue, Jul 10, 2012 at 9:25 AM, Craig Ringer <ringerc@ringerc.id.au> wrote:
On 07/09/2012 05:20 PM, Yan Chunlu wrote:

the value of "log_min_messages" in postgresql.conf is error, I have changed it to "warning", so far does not received any warning, still waiting.
When trying to track down performance issues, increasing logging to at least `info' would seem to be sensible.

I suggest increasing your logging and enabling the auto_explain module so it logs slow queries. If you can afford the substantial performance hit you could enable its analyze mode to get details on why.



beside I do see some COMMIT which is relatively slow for example:
   60 2012-07-08 00:00:29 CST [19367]: [131-1] LOG:  duration: 375.851 ms  statement: COMMIT
   61 2012-07-08 00:00:30 CST [19367]: [132-1] LOG:  duration: 327.964 ms  statement: COMMIT
That certainly is slow. Again, I suspect checkpoint activity could be at fault. You may need to tune to spread your checkpoints out and use more aggressive bgwriter settings. See the wiki for performance tuning info.



but only one "BEGIN" in the same one day log file, did that influence the query time too?

Only one BEGIN in the whole day? Or do you mean "only one BEGIN slow enough to trigger slow query logging" ?

Do you have a "log_min_duration_statement" directive set in your postgresql.conf ?

--
Craig Ringer

Re: how could select id=xx so slow?

From
Yan Chunlu
Date:
great thanks for the help and explanation, I will start logging the information you mentioned and do some analysis.



On Tue, Jul 10, 2012 at 10:46 AM, Craig Ringer <ringerc@ringerc.id.au> wrote:
On 07/10/2012 10:25 AM, Yan Chunlu wrote:
I didn't set log_min_duration_statement in the postgresql.conf, but execute
dbapi_con.cursor().execute("SET log_min_duration_statement to 30")
for every connection.

OK, same effect: You're only logging slow statements.

It's not at all surprising that BEGIN doesn't appear when a log_min_duration_statement is set. It's an incredibly fast operation. What's amazing is that it appears even once - that means your database must be in serious performance trouble, as BEGIN should take tenths of a millisecond on an unloaded system. For example my quick test here:

LOG:  statement: BEGIN;
LOG:  duration: 0.193 ms

... which is actually a lot slower than I expected, but hardly slow statement material.

The frequent appearance of slow (multi-second) COMMIT statements in your slow statement logs suggests there's enough load on your database that there's real contention for disk, and/or that checkpoints are stalling transactions.


First, you need to set log_min_messages = 'info' to allow Pg to complain about things like checkpoint frequency.

Now temporarily set log_checkpoints = on to record when checkpoints happen and how long they take. Most likely you'll find you need to tune checkpoint behaviour. Some information, albeit old, on that is here:

  http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm

Basically you might want to try increasing your checkpoint_completion_target and making the bgwriter more aggressive - assuming that your performance issues are in fact checkpoint related.

It's also possible that they're just overall load, especially if you have lots and lots (hundreds) of connections to the database all trying to do work at once without any kind of admission control or pooling/queuing. In that case, introducing a connection pool like PgBouncer may help.

--
Craig Ringer