Thread: how could select id=xx so slow?
I have grabbed one day slow query log and analyzed it by pgfouine, to my surprise, the slowest query is just a simple select statement:
select diggcontent_data_message.thing_id, diggcontent_data_message.KEY, diggcontent_data_message.value, diggcontent_data_message.kind FROM diggcontent_data_message WHERE diggcontent_data_message.thing_id = 3570882;
where thing_id is the primary key, guess how long it takes?
754.61 seconds!!
I tried explain analyze it and below is the result, which is very fast:
explain analyze select diggcontent_data_message.thing_id, diggcontent_data_message.KEY, diggcontent_data_message.value, diggcontent_data_message.kind FROM diggcontent_data_message WHERE diggcontent_data_message.thing_id = 3570882;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_thing_id_diggcontent_data_message on diggcontent_data_message (cost=0.00..15.34 rows=32 width=51) (actual time=0.080..0.096 rows=8 loops=1)
Index Cond: (thing_id = 3570882)
Total runtime: 0.115 ms
(3 rows)
so I wonder could this simple select is innocent and affected badly by other queries? how could I find those queries that really slow down the database?
thanks!
On Thu, Jul 5, 2012 at 11:17 PM, Yan Chunlu <springrider@gmail.com> wrote: > I have grabbed one day slow query log and analyzed it by pgfouine, to my > surprise, the slowest query is just a simple select statement: > > select diggcontent_data_message.thing_id, diggcontent_data_message.KEY, > diggcontent_data_message.value, diggcontent_data_message.kind FROM > diggcontent_data_message WHERE diggcontent_data_message.thing_id = 3570882; > > > where thing_id is the primary key, guess how long it takes? > > 754.61 seconds!! Is it possible that the size of the tuple is enormous? Because one area where I've noticed EXPLAIN ANALYZE blows away normal performance is when a lot of the work would be in reassembling, decompressing (collectively: de-TOASTING) and sending the data. Even then, that time seems excessive...but something to think about. -- fdr
On 07/06/2012 02:17 PM, Yan Chunlu wrote: > so I wonder could this simple select is innocent and affected badly by > other queries? how could I find those queries that really slow down > the database? It might not be other queries. Your query could be taking that long because it was blocked by a lock during maintenance work (say, an ALTER TABLE). It's also quite possible that it was held up by a slow checkpoint; check your logs to see if there are warnings about checkpoint activity. -- Craig Ringer
Yan Chunlu wrote: > I have grabbed one day slow query log and analyzed it by pgfouine, to my surprise, the slowest query > is just a simple select statement: > > select diggcontent_data_message.thing_id, diggcontent_data_message.KEY, > diggcontent_data_message.value, diggcontent_data_message.kind FROM diggcontent_data_message WHERE > diggcontent_data_message.thing_id = 3570882; > > where thing_id is the primary key, guess how long it takes? > > 754.61 seconds!! > > I tried explain analyze it and below is the result, which is very fast: > > explain analyze select diggcontent_data_message.thing_id, diggcontent_data_message.KEY, > diggcontent_data_message.value, diggcontent_data_message.kind FROM diggcontent_data_message WHERE > diggcontent_data_message.thing_id = 3570882; > QUERY PLAN > ------------------------------------------------------------------------ ------------------------------ > ------------------------------------------------------------- > Index Scan using idx_thing_id_diggcontent_data_message on diggcontent_data_message (cost=0.00..15.34 > rows=32 width=51) (actual time=0.080..0.096 rows=8 loops=1) > Index Cond: (thing_id = 3570882) > Total runtime: 0.115 ms > (3 rows) > > so I wonder could this simple select is innocent and affected badly by other queries? how could I find > those queries that really slow down the database? Are these by any chance the aggregated costs in pgFouine? Could it be that the statement just ran very often and used that time in total? Other than that, it could have been blocked by something that takes an exclusive lock on the table. There are no ON SELECT DO INSTEAD rules or similar things on the table, right? Yours, Laurenz Albe
thanks for all the help. I checked the probability and found that:
1, the size of tuple is small
2, I checked the log manually and it indeed cost that much of time, not aggregated
the value of "log_min_messages" in postgresql.conf is error, I have changed it to "warning", so far does not received any warning, still waiting.
beside I do see some COMMIT which is relatively slow for example:
60 2012-07-08 00:00:29 CST [19367]: [131-1] LOG: duration: 375.851 ms statement: COMMIT
61 2012-07-08 00:00:30 CST [19367]: [132-1] LOG: duration: 327.964 ms statement: COMMIT
but only one "BEGIN" in the same one day log file, did that influence the query time too?
On Fri, Jul 6, 2012 at 9:10 PM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
Are these by any chance the aggregated costs in pgFouine?Yan Chunlu wrote:
> I have grabbed one day slow query log and analyzed it by pgfouine, to
my surprise, the slowest query
> is just a simple select statement:
>
> select diggcontent_data_message.thing_id,
diggcontent_data_message.KEY,
> diggcontent_data_message.value, diggcontent_data_message.kind FROM
diggcontent_data_message WHERE
> diggcontent_data_message.thing_id = 3570882;
>
> where thing_id is the primary key, guess how long it takes?
>
> 754.61 seconds!!
>
> I tried explain analyze it and below is the result, which is very
fast:
>
> explain analyze select diggcontent_data_message.thing_id,
diggcontent_data_message.KEY,
> diggcontent_data_message.value, diggcontent_data_message.kind FROM
diggcontent_data_message WHERE
> diggcontent_data_message.thing_id = 3570882;
>
QUERY PLAN
>
------------------------------------------------------------------------
------------------------------
> -------------------------------------------------------------
> Index Scan using idx_thing_id_diggcontent_data_message on
diggcontent_data_message (cost=0.00..15.34
> rows=32 width=51) (actual time=0.080..0.096 rows=8 loops=1)
> Index Cond: (thing_id = 3570882)
> Total runtime: 0.115 ms
> (3 rows)
>
> so I wonder could this simple select is innocent and affected badly by
other queries? how could I find
> those queries that really slow down the database?
Could it be that the statement just ran very often and used that time in
total?
Other than that, it could have been blocked by something that takes an
exclusive lock on the table.
There are no ON SELECT DO INSTEAD rules or similar things on the table,
right?
Yours,
Laurenz Albe
On 07/09/2012 05:20 PM, Yan Chunlu wrote: > > the value of "log_min_messages" in postgresql.conf is error, I have > changed it to "warning", so far does not received any warning, still > waiting. When trying to track down performance issues, increasing logging to at least `info' would seem to be sensible. I suggest increasing your logging and enabling the auto_explain module so it logs slow queries. If you can afford the substantial performance hit you could enable its analyze mode to get details on why. > > beside I do see some COMMIT which is relatively slow for example: > 60 2012-07-08 00:00:29 CST [19367]: [131-1] LOG: duration: 375.851 > ms statement: COMMIT > 61 2012-07-08 00:00:30 CST [19367]: [132-1] LOG: duration: 327.964 > ms statement: COMMIT That certainly is slow. Again, I suspect checkpoint activity could be at fault. You may need to tune to spread your checkpoints out and use more aggressive bgwriter settings. See the wiki for performance tuning info. > > but only one "BEGIN" in the same one day log file, did that influence > the query time too? Only one BEGIN in the whole day? Or do you mean "only one BEGIN slow enough to trigger slow query logging" ? Do you have a "log_min_duration_statement" directive set in your postgresql.conf ? -- Craig Ringer
On 07/10/2012 10:25 AM, Yan Chunlu wrote:
I didn't set log_min_duration_statement in the postgresql.conf, but executedbapi_con.cursor().execute("SET log_min_duration_statement to 30")for every connection.
OK, same effect: You're only logging slow statements.
It's not at all surprising that BEGIN doesn't appear when a log_min_duration_statement is set. It's an incredibly fast operation. What's amazing is that it appears even once - that means your database must be in serious performance trouble, as BEGIN should take tenths of a millisecond on an unloaded system. For example my quick test here:
LOG: statement: BEGIN;
LOG: duration: 0.193 ms
... which is actually a lot slower than I expected, but hardly slow statement material.
The frequent appearance of slow (multi-second) COMMIT statements in your slow statement logs suggests there's enough load on your database that there's real contention for disk, and/or that checkpoints are stalling transactions.
First, you need to set log_min_messages = 'info' to allow Pg to complain about things like checkpoint frequency.
Now temporarily set log_checkpoints = on to record when checkpoints happen and how long they take. Most likely you'll find you need to tune checkpoint behaviour. Some information, albeit old, on that is here:
http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm
Basically you might want to try increasing your checkpoint_completion_target and making the bgwriter more aggressive - assuming that your performance issues are in fact checkpoint related.
It's also possible that they're just overall load, especially if you have lots and lots (hundreds) of connections to the database all trying to do work at once without any kind of admission control or pooling/queuing. In that case, introducing a connection pool like PgBouncer may help.
--
Craig Ringer
I have logged one day data and found the checkpoint is rather frequently(detail: https://gist.github.com/3088338). Not sure if it is normal, but the average time of checkpoint is about 100sec~200sec, it seems related with my settings:
574 checkpoint_segments = 64
575 wal_keep_segments = 5000
I set checkpoint_segments as a very large value which is because otherwise the slave server always can not follow the master, should I lower that value?
or the slow query is about something else? thanks!
On Tue, Jul 10, 2012 at 10:46 AM, Craig Ringer <ringerc@ringerc.id.au> wrote:
OK, same effect: You're only logging slow statements.On 07/10/2012 10:25 AM, Yan Chunlu wrote:I didn't set log_min_duration_statement in the postgresql.conf, but executedbapi_con.cursor().execute("SET log_min_duration_statement to 30")for every connection.
It's not at all surprising that BEGIN doesn't appear when a log_min_duration_statement is set. It's an incredibly fast operation. What's amazing is that it appears even once - that means your database must be in serious performance trouble, as BEGIN should take tenths of a millisecond on an unloaded system. For example my quick test here:
LOG: statement: BEGIN;
LOG: duration: 0.193 ms
... which is actually a lot slower than I expected, but hardly slow statement material.
The frequent appearance of slow (multi-second) COMMIT statements in your slow statement logs suggests there's enough load on your database that there's real contention for disk, and/or that checkpoints are stalling transactions.
First, you need to set log_min_messages = 'info' to allow Pg to complain about things like checkpoint frequency.
Now temporarily set log_checkpoints = on to record when checkpoints happen and how long they take. Most likely you'll find you need to tune checkpoint behaviour. Some information, albeit old, on that is here:
http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm
Basically you might want to try increasing your checkpoint_completion_target and making the bgwriter more aggressive - assuming that your performance issues are in fact checkpoint related.
It's also possible that they're just overall load, especially if you have lots and lots (hundreds) of connections to the database all trying to do work at once without any kind of admission control or pooling/queuing. In that case, introducing a connection pool like PgBouncer may help.
--
Craig Ringer
Yan Chunlu wrote: > I have logged one day data and found the checkpoint is rather frequently(detail: > https://gist.github.com/3088338). Not sure if it is normal, but the average time of checkpoint is > about 100sec~200sec, it seems related with my settings: > > 574 checkpoint_segments = 64 > 575 wal_keep_segments = 5000 > > I set checkpoint_segments as a very large value which is because otherwise the slave server always can > not follow the master, should I lower that value? You mean, you set wal_keep_segments high for the standby, right? wal_keep_segments has no impact on checkpoint frequency and intensity. You are right that your checkpoint frequency is high. What is your value of checkpoint_timeout? You can increase the value of checkpoint_segments to decrease the checkpoint frequence, but recovery will take longer then. > or the slow query is about something else? thanks! I guess the question is how saturated the I/O system is during checkpoints. But even if it is very busy, I find it hard to believe that such a trivial statement can take extremely long. Yours, Laurenz Albe
could that because of my system is really busy?
1, postgresql always have 400+ connections(dozens of python process using client pool)
2, the query peak is 50+/s
3, I do have some bad performance sql executing periodically, need 100+ second to complete. could those bad performance sql influence others? because when I execute those simple sql directly, they was fast. but the slow query log shows it took too much time.
On Wed, Jul 11, 2012 at 4:23 PM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
Yan Chunlu wrote:You mean, you set wal_keep_segments high for the standby, right?
> I have logged one day data and found the checkpoint is rather
frequently(detail:
> https://gist.github.com/3088338). Not sure if it is normal, but the
average time of checkpoint is
> about 100sec~200sec, it seems related with my settings:
>
> 574 checkpoint_segments = 64
> 575 wal_keep_segments = 5000
>
> I set checkpoint_segments as a very large value which is because
otherwise the slave server always can
> not follow the master, should I lower that value?
wal_keep_segments has no impact on checkpoint frequency and intensity.
You are right that your checkpoint frequency is high. What is your value
of checkpoint_timeout?
You can increase the value of checkpoint_segments to decrease the
checkpoint frequence, but recovery will take longer then.I guess the question is how saturated the I/O system is during
> or the slow query is about something else? thanks!
checkpoints. But even if it is very busy, I find it hard to believe
that such a trivial statement can take extremely long.
Yours,
Laurenz Albe
On Wed, Jul 11, 2012 at 9:24 AM, Yan Chunlu <springrider@gmail.com> wrote: > I have logged one day data and found the checkpoint is rather > frequently(detail: https://gist.github.com/3088338). Not sure if it is > normal, but the average time of checkpoint is about 100sec~200sec, it seems > related with my settings: > > 574 checkpoint_segments = 64 > 575 wal_keep_segments = 5000 > > I set checkpoint_segments as a very large value which is because otherwise > the slave server always can not follow the master, should I lower that > value? > > or the slow query is about something else? thanks! Some things to notice from the checkpoints log: * All chcekpoints are triggered by checkpoint_timeout, using up only a couple log files * Checkpoints write out around 40MB of buffers * The write out period is spread out nicely like it's supposed to but the sync phase is occasionally taking a very long time (more than 2 minutes) This looks like something (not necessarily the checkpoint sync itself) is overloading the IO system. You might want to monitor the IO load with iostat and correlate it with the checkpoints and slow queries to find the culprit. It's also possible that something else is causing the issues. If the cause is checkpoints, just making them less frequent might make the problem worse. I'm assuming you have 16GB+ of RAM because you have 4GB of shared_buffers. Just making checkpoint_timeout longer will accumulate a larger number of dirty buffers that will clog up the IO queues even worse. If you are on Linux, lowering dirty_expire_centisecs or dirty_background_bytes might help to spread the load out but will make overall throughput worse. On the otherhand, if the I/O overload is from queries (more likely because some checkpoints sync quickly) there are no easy tuning answers. Making queries less IO intensive is probably the best you can do. From the tuning side, newer Linux kernels handle I/O fairness a lot better, and you could also try tweaking the I/O scheduler to achieve better throughput to avoid congestion or at least provide better latency for trivial queries. And of course its always possible to throw more hardware at the problem and upgrade the I/O subsystem. Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de
huge thanks for the patient explanations, I think you are right, it is really related to the IO.
I monitor the IO using iostat -x and found the utilize part reach 100% frequently, postgresql is the only service running on that machine, so I think it is either checkpoint or queries caused the problem.
and I agree that checkpoint may not the problem, I guess I need to tackle those damn queries.
currently the data dir(pgsql/data/base) used 111GB disk space, some tables has tens of millions records. could that cause the query heavy disk IO? when should I split the data to other machines(aka sharding)?
and you are right the machine has 16GB memory and commodity 500GB disk.
kernel: Linux adams 2.6.26-2-amd64 #1 SMP Mon Jun 13 16:29:33 UTC 2011 x86_64 GNU/Linux
by "new kernel" which version do you mean?
and about those IO intensive queries, I can only tell the time used from slow query log, is there anything like "explain analyze" that shows specific information about IO usage?
On Wed, Jul 11, 2012 at 7:59 PM, Ants Aasma <ants@cybertec.at> wrote:
On Wed, Jul 11, 2012 at 9:24 AM, Yan Chunlu <springrider@gmail.com> wrote:Some things to notice from the checkpoints log:
> I have logged one day data and found the checkpoint is rather
> frequently(detail: https://gist.github.com/3088338). Not sure if it is
> normal, but the average time of checkpoint is about 100sec~200sec, it seems
> related with my settings:
>
> 574 checkpoint_segments = 64
> 575 wal_keep_segments = 5000
>
> I set checkpoint_segments as a very large value which is because otherwise
> the slave server always can not follow the master, should I lower that
> value?
>
> or the slow query is about something else? thanks!
* All chcekpoints are triggered by checkpoint_timeout, using up only a
couple log files
* Checkpoints write out around 40MB of buffers
* The write out period is spread out nicely like it's supposed to but
the sync phase is occasionally taking a very long time (more than 2
minutes)
This looks like something (not necessarily the checkpoint sync itself)
is overloading the IO system. You might want to monitor the IO load
with iostat and correlate it with the checkpoints and slow queries to
find the culprit. It's also possible that something else is causing
the issues.
If the cause is checkpoints, just making them less frequent might make
the problem worse. I'm assuming you have 16GB+ of RAM because you have
4GB of shared_buffers. Just making checkpoint_timeout longer will
accumulate a larger number of dirty buffers that will clog up the IO
queues even worse. If you are on Linux, lowering
dirty_expire_centisecs or dirty_background_bytes might help to spread
the load out but will make overall throughput worse.
On the otherhand, if the I/O overload is from queries (more likely
because some checkpoints sync quickly) there are no easy tuning
answers. Making queries less IO intensive is probably the best you can
do. From the tuning side, newer Linux kernels handle I/O fairness a
lot better, and you could also try tweaking the I/O scheduler to
achieve better throughput to avoid congestion or at least provide
better latency for trivial queries. And of course its always possible
to throw more hardware at the problem and upgrade the I/O subsystem.
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de
On 07/11/2012 07:40 PM, Yan Chunlu wrote:
could that because of my system is really busy?Oh, come on, these are the sorts of things you tell us when you ask your question, not days later after lots of back-and-forth discussion.1, postgresql always have 400+ connections(dozens of python process using client pool)2, the query peak is 50+/s3, I do have some bad performance sql executing periodically, need 100+ second to complete. could those bad performance sql influence others? because when I execute those simple sql directly, they was fast. but the slow query log shows it took too much time.
--
Craig Ringer
Really sorry for the lack of information, but I did asked if the slow queries could affect those simple one:
On Thursday, July 12, 2012, Craig Ringer wrote:
'so I wonder could this simple select is innocent and affected badly by other queries? '
I didn't mention the connections number because I don't think my app is that busy, and the large number connections was caused by slow queries.
I was wrong, everything is connected, too many factor could end with the result,I am really sorry, I will tell everything I knew the next time:)
I learnt a lot during the back and forth!
On Thursday, July 12, 2012, Craig Ringer wrote:
On 07/11/2012 07:40 PM, Yan Chunlu wrote:could that because of my system is really busy?Oh, come on, these are the sorts of things you tell us when you ask your question, not days later after lots of back-and-forth discussion.1, postgresql always have 400+ connections(dozens of python process using client pool)2, the query peak is 50+/s3, I do have some bad performance sql executing periodically, need 100+ second to complete. could those bad performance sql influence others? because when I execute those simple sql directly, they was fast. but the slow query log shows it took too much time.
--
Craig Ringer
On Wed, Jul 11, 2012 at 5:47 PM, Yan Chunlu <springrider@gmail.com> wrote: > I learnt a lot during the back and forth! Great to hear. >> 1, postgresql always have 400+ connections(dozens of python process using client pool) Note that Postgres does not deal well with a large number of connections[1]: consider shrinking the size of the pool. [1]: http://wiki.postgresql.org/wiki/Number_Of_Database_Connections
On 07/12/2012 08:47 AM, Yan Chunlu wrote:
Really sorry for the lack of informationI shouldn't have grumped like that either, sorry about that.
I didn't mention the connections number because I don't think my app is that busy, and the large number connections was caused by slow queries.
Yep - assumptions are a killer like that.
Now you know to watch your system load with iostat, vmstat, top, etc and to monitor your overall load.
--
Craig Ringer
after check out the wiki page Maciek mentioned, turns out that heavy connection also burden the disk hardly.
looks like I am in the vicious circle:1, slow query cause connection blocked so the client request more connection.
2, more connection cause high disk io and make even the simplest query slow and block.
I guess I should optimized those queries first...
On Thu, Jul 12, 2012 at 10:20 AM, Craig Ringer <ringerc@ringerc.id.au> wrote:
I shouldn't have grumped like that either, sorry about that.On 07/12/2012 08:47 AM, Yan Chunlu wrote:Really sorry for the lack of informationYep - assumptions are a killer like that.I didn't mention the connections number because I don't think my app is that busy, and the large number connections was caused by slow queries.
Now you know to watch your system load with iostat, vmstat, top, etc and to monitor your overall load.
--
Craig Ringer
On 07/12/2012 01:10 PM, Yan Chunlu wrote:
after check out the wiki page Maciek mentioned, turns out that heavy connection also burden the disk hardly.looks like I am in the vicious circle:1, slow query cause connection blocked so the client request more connection.2, more connection cause high disk io and make even the simplest query slow and block.
While true, you can often control this by making sure you don't completely overload your hardware, queuing queries instead of running them all at once.
You may still discover that your hardware can't cope with the workload in that your queues may just keep on getting deeper or time out. In that case, you certainly need to optimise your queries, tune your database, and/or get bigger hardware.
--
Craig Ringer
yes the system seems overloaded, I am dealing with a simple "INSERT" but not sure if it is normal that it took more time than the explain estimated:
explain analyze INSERT INTO vote_content ( thing1_id, thing2_id, name, date) VALUES (1,1, E'1', '2012-07-12T12:34:29.926863+00:00'::timestamptz)
QUERY PLAN
------------------------------------------------------------------------------------------
Insert (cost=0.00..0.01 rows=1 width=0) (actual time=79.610..79.610 rows=0 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.058..0.060 rows=1 loops=1)
Total runtime: 79.656 ms
it is a table with 50 million rows, so not sure if it is too large... I have attached the schema below:
Column | Type | Modifiers
-----------+--------------------------+------------------------------------------------------------------------------------
rel_id | bigint | not null default nextval('vote_content_rel_id_seq'::regclass)
thing1_id | bigint | not null
thing2_id | bigint | not null
name | character varying | not null
date | timestamp with time zone | not null
Indexes:
"vote_content_pkey" PRIMARY KEY, btree (rel_id)
"vote_content_thing1_id_key" UNIQUE, btree (thing1_id, thing2_id, name)
"idx_date_vote_content" btree (date)
"idx_name_vote_content" btree (name)
"idx_thing1_id_vote_content" btree (thing1_id)
"idx_thing1_name_date_vote_content" btree (thing1_id, name, date)
"idx_thing2_id_vote_content" btree (thing2_id)
"idx_thing2_name_date_vote_content" btree (thing2_id, name, date)
besides, it not the rush hour, so the disk IO is not the problem currently(I think):
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 44.50 9.50 21.50 76.00 264.00 21.94 0.16 5.10 12.42 1.86 4.39 13.60
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
On Thu, Jul 12, 2012 at 2:56 PM, Craig Ringer <ringerc@ringerc.id.au> wrote:
While true, you can often control this by making sure you don't completely overload your hardware, queuing queries instead of running them all at once.On 07/12/2012 01:10 PM, Yan Chunlu wrote:after check out the wiki page Maciek mentioned, turns out that heavy connection also burden the disk hardly.looks like I am in the vicious circle:1, slow query cause connection blocked so the client request more connection.2, more connection cause high disk io and make even the simplest query slow and block.
You may still discover that your hardware can't cope with the workload in that your queues may just keep on getting deeper or time out. In that case, you certainly need to optimise your queries, tune your database, and/or get bigger hardware.
--
Craig Ringer
On 07/12/2012 08:48 PM, Yan Chunlu wrote:
explain analyze INSERT INTO vote_content ( thing1_id, thing2_id, name, date) VALUES (1,1, E'1', '2012-07-12T12:34:29.926863+00:00'::timestamptz)QUERY PLAN------------------------------------------------------------------------------------------Insert (cost=0.00..0.01 rows=1 width=0) (actual time=79.610..79.610 rows=0 loops=1)-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.058..0.060 rows=1 loops=1)Total runtime: 79.656 msit is a table with 50 million rows, so not sure if it is too large... I have attached the schema below:
You have eight indexes on that table according to the schema you showed. Three of them cover three columns. Those indexes are going to be expensive to update; frankly I'm amazed it's that FAST to update them when they're that big.
Use pg_size_pretty(pg_relation_size('index_name')) to get the index sizes and compare to the pg_relation_size of the table. It might be informative.
You may see some insert performance benefits with a non-100% fill factor on the indexes, but with possible performance costs to index scans.
--
Craig Ringer
On Thu, Jul 12, 2012 at 3:48 PM, Yan Chunlu <springrider@gmail.com> wrote: > yes the system seems overloaded, I am dealing with a simple "INSERT" but not > sure if it is normal that it took more time than the explain estimated: The estimated cost is in arbitrary units, its purpose is to compare different execution plans, not estimate time taken. So it's completely normal that it doesn't match actual time taken. Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de
On Thu, Jul 12, 2012 at 9:07 AM, Ants Aasma <ants@cybertec.at> wrote: > On Thu, Jul 12, 2012 at 3:48 PM, Yan Chunlu <springrider@gmail.com> wrote: >> yes the system seems overloaded, I am dealing with a simple "INSERT" but not >> sure if it is normal that it took more time than the explain estimated: > > The estimated cost is in arbitrary units, its purpose is to compare > different execution plans, not estimate time taken. So it's completely > normal that it doesn't match actual time taken. Right. And to make explicit what you implied, when there is only one to do something (like insert a row, or do maintenance on an index) it often doesn't even attempt to cost that at all as there is no choice. So it is not just a matter of units. Cheers, Jeff
got it, thanks! without your help I really have no idea what should be fast and what supposed to be slower.
I also find "select" involves a lot of writes:
iotop shows:
2789 be/4 postgres 0.00 B 57.34 M 0.00 % 0.00 % postgres: goov conta 192.168.1.129(27300) SELECT
I knew that select could cause writes, but not at this magnitude....
On Fri, Jul 13, 2012 at 2:53 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Thu, Jul 12, 2012 at 9:07 AM, Ants Aasma <ants@cybertec.at> wrote:Right. And to make explicit what you implied, when there is only one
> On Thu, Jul 12, 2012 at 3:48 PM, Yan Chunlu <springrider@gmail.com> wrote:
>> yes the system seems overloaded, I am dealing with a simple "INSERT" but not
>> sure if it is normal that it took more time than the explain estimated:
>
> The estimated cost is in arbitrary units, its purpose is to compare
> different execution plans, not estimate time taken. So it's completely
> normal that it doesn't match actual time taken.
to do something (like insert a row, or do maintenance on an index) it
often doesn't even attempt to cost that at all as there is no choice.
So it is not just a matter of units.
Cheers,
Jeff
the transaction part is wired, I have filtered BEGIN and COMMIT from a one day log by using:
cat /usr/local/pgsql/data/pg_log/Saturday.log |grep -E "BEGIN|COMMIT" >trans.txt
and pasted it to gist(only three BEGIN and many COMMIT):
I didn't set log_min_duration_statement in the postgresql.conf, but execute
dbapi_con.cursor().execute("SET log_min_duration_statement to 30")
for every connection.
the system is in production and relatively heavy load, I thought it would be better not "reload" the db too frequently, use the "SET log_min_duration_statement to 30" I could turn the log off within my application.
On Tue, Jul 10, 2012 at 9:25 AM, Craig Ringer <ringerc@ringerc.id.au> wrote:
On 07/09/2012 05:20 PM, Yan Chunlu wrote:When trying to track down performance issues, increasing logging to at least `info' would seem to be sensible.
the value of "log_min_messages" in postgresql.conf is error, I have changed it to "warning", so far does not received any warning, still waiting.
I suggest increasing your logging and enabling the auto_explain module so it logs slow queries. If you can afford the substantial performance hit you could enable its analyze mode to get details on why.That certainly is slow. Again, I suspect checkpoint activity could be at fault. You may need to tune to spread your checkpoints out and use more aggressive bgwriter settings. See the wiki for performance tuning info.
beside I do see some COMMIT which is relatively slow for example:
60 2012-07-08 00:00:29 CST [19367]: [131-1] LOG: duration: 375.851 ms statement: COMMIT
61 2012-07-08 00:00:30 CST [19367]: [132-1] LOG: duration: 327.964 ms statement: COMMITOnly one BEGIN in the whole day? Or do you mean "only one BEGIN slow enough to trigger slow query logging" ?
but only one "BEGIN" in the same one day log file, did that influence the query time too?
Do you have a "log_min_duration_statement" directive set in your postgresql.conf ?
--
Craig Ringer
great thanks for the help and explanation, I will start logging the information you mentioned and do some analysis.
On Tue, Jul 10, 2012 at 10:46 AM, Craig Ringer <ringerc@ringerc.id.au> wrote:
OK, same effect: You're only logging slow statements.On 07/10/2012 10:25 AM, Yan Chunlu wrote:I didn't set log_min_duration_statement in the postgresql.conf, but executedbapi_con.cursor().execute("SET log_min_duration_statement to 30")for every connection.
It's not at all surprising that BEGIN doesn't appear when a log_min_duration_statement is set. It's an incredibly fast operation. What's amazing is that it appears even once - that means your database must be in serious performance trouble, as BEGIN should take tenths of a millisecond on an unloaded system. For example my quick test here:
LOG: statement: BEGIN;
LOG: duration: 0.193 ms
... which is actually a lot slower than I expected, but hardly slow statement material.
The frequent appearance of slow (multi-second) COMMIT statements in your slow statement logs suggests there's enough load on your database that there's real contention for disk, and/or that checkpoints are stalling transactions.
First, you need to set log_min_messages = 'info' to allow Pg to complain about things like checkpoint frequency.
Now temporarily set log_checkpoints = on to record when checkpoints happen and how long they take. Most likely you'll find you need to tune checkpoint behaviour. Some information, albeit old, on that is here:
http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm
Basically you might want to try increasing your checkpoint_completion_target and making the bgwriter more aggressive - assuming that your performance issues are in fact checkpoint related.
It's also possible that they're just overall load, especially if you have lots and lots (hundreds) of connections to the database all trying to do work at once without any kind of admission control or pooling/queuing. In that case, introducing a connection pool like PgBouncer may help.
--
Craig Ringer