Thread: response time is very long in PG9.5.5 using psql or jdbc
Hi,all
In our product database,we met a problem .when we login in the database using psql or jdbc ,the response time is very long.The sql is simple,get the rows of small table.Actualy ,the execution time is also not very long when using explan analyze to see it .So where is the time gone?
[postgres@cnlf081174:eits:5521 ~/tmp]$ psql -f t.sql eits
Timing is on.
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=175.79..175.80 rows=1 width=0) (actual time=1.852..1.852 rows=1 loops=1)
Buffers: shared hit=64
-> Seq Scan on tbl1_stats_prop_rt (cost=0.00..153.43 rows=8943 width=0) (actual time=0.013..1.055 rows=8871 loops=1)
Buffers: shared hit=64
Planning time: 0.646 ms
Execution time: 1.926 ms
(6 rows)
Time: 19479.081 ms
[postgres@cnlf081174:eits:5521 ~/tmp]$
[postgres@cnlf081174:eits:5521 ~/tmp]$
[postgres@cnlf081174:eits:5521 ~/tmp]$ psql -f t.sql eits
Timing is on.
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=175.79..175.80 rows=1 width=0) (actual time=1.851..1.851 rows=1 loops=1)
Buffers: shared hit=64
-> Seq Scan on tbl1_stats_prop_rt (cost=0.00..153.43 rows=8943 width=0) (actual time=0.012..1.061 rows=8871 loops=1)
Buffers: shared hit=64
Planning time: 0.459 ms
Execution time: 1.929 ms
(6 rows)
Time: 2400.550 ms
[postgres@cnlf081174:eits:5521 ~/tmp]$ psql -f t.sql eits
Timing is on.
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=175.79..175.80 rows=1 width=0) (actual time=1.790..1.791 rows=1 loops=1)
Buffers: shared hit=64
-> Seq Scan on tbl1_stats_prop_rt (cost=0.00..153.43 rows=8943 width=0) (actual time=0.010..1.045 rows=8871 loops=1)
Buffers: shared hit=64
Planning time: 0.415 ms
Execution time: 1.858 ms
(6 rows)
Time: 1991.484 ms
[postgres@cnlf081174:eits:5521 ~/tmp]$
[postgres@cnlf081174:eits:5521 ~/tmp]$
Then ,the next test is bellow , we login in the database,it is slow first time ,and next time it will be faster,I know it is about cache.may be ,this is normal.
[postgres@cnlf081174:eits:5521 ~/tmp]$ psql eits
Timing is on.
psql (9.5.5)
Type "help" for help.
[postgres:5521@eits] [02-12.17:49:32]=# \i t.sql
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=175.79..175.80 rows=1 width=0) (actual time=1.899..1.899 rows=1 loops=1)
Buffers: shared hit=64
-> Seq Scan on tbl1_stats_prop_rt (cost=0.00..153.43 rows=8943 width=0) (actual time=0.013..1.084 rows=8871 loops=1)
Buffers: shared hit=64
Planning time: 0.492 ms
Execution time: 1.977 ms
(6 rows)
Time: 7803.830 ms
[postgres:5521@eits] [02-12.17:49:43]=# \i t.sql
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=175.79..175.80 rows=1 width=0) (actual time=3.194..3.194 rows=1 loops=1)
Buffers: shared hit=64
-> Seq Scan on tbl1_stats_prop_rt (cost=0.00..153.43 rows=8943 width=0) (actual time=0.014..1.658 rows=8871 loops=1)
Buffers: shared hit=64
Planning time: 0.091 ms
Execution time: 3.237 ms
(6 rows)
Time: 3.748 ms
[postgres:5521@eits] [02-12.17:49:45]=# \i t.sql
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=175.79..175.80 rows=1 width=0) (actual time=3.182..3.182 rows=1 loops=1)
Buffers: shared hit=64
-> Seq Scan on tbl1_stats_prop_rt (cost=0.00..153.43 rows=8943 width=0) (actual time=0.009..1.666 rows=8871 loops=1)
Buffers: shared hit=64
Planning time: 0.095 ms
Execution time: 3.227 ms
(6 rows)
Time: 3.765 ms
Bellow is the table structure:
[02-12.18:11:50]=# \d+ tbl1_stats_prop_rt
Table "public.tbl1_stats_prop_rt"
Column | Type | Modifiers | Storage | Stats target | Description
-----------+------------------------+----------------------------------------------------------------------+----------+--------------+-------------
prop_id | bigint | not null default nextval('tbl1_stats_prop_rt_prop_id_seq'::regclass) | plain | |
prop_name | character varying(50) | default NULL::character varying | extended | |
prop_val | character varying(250) | default NULL::character varying | extended | |
Indexes:
"tbl1_stats_prop_rt_pkey" PRIMARY KEY, btree (prop_id)
"tbl1_stats_prop_rt_prop_name_val" btree (prop_name, prop_val)
********************************************************************************************************************************
The information in this email is confidential and may be legally privileged. If you have received this email in error or are not the intended recipient, please immediately notify the sender and delete this message from your computer. Any use, distribution, or copying of this email other than by the intended recipient is strictly prohibited. All messages sent to and from us may be monitored to ensure compliance with internal policies and to protect our business.
Emails are not secure and cannot be guaranteed to be error free as they can be intercepted, amended, lost or destroyed, or contain viruses. Anyone who communicates with us by email is taken to accept these risks.
收发邮件者请注意:
本邮件含涉密信息,请保守秘密,若误收本邮件,请务必通知发送人并直接删去,不得使用、传播或复制本邮件。
进出邮件均受到本公司合规监控。邮件可能发生被截留、被修改、丢失、被破坏或包含计算机病毒等不安全情况。
********************************************************************************************************************************
=?gb2312?B?yq/Twrui?= <SHIYONGHU651@pingan.com.cn> writes: > In our product database,we met a problem .when we login in the database > using psql or jdbc ,the response time is very long. I'm going to guess that this has something to do with very slow population of the backend's internal caches during the first query of a session. That suggests bloated system catalogs. Do you have an unusually large number of database objects (e.g., millions of tables or functions)? Or is it possible that some transaction(s) transiently created a lot of objects, and vacuuming hasn't been able to shrink the catalogs back down? regards, tom lane
Yes,we have more than 500 thousand of objects,and the total size of the database is almost 10TB.Just as you said,we may needto reduce the objects number,or you have any better solution? And I also have a question that if the backend's internal caches of catalog is shared with other users or sessions?if thepgbouncer is usefull? [postgres:5521@eits] [02-13.11:42:25]=# select count(1) from pg_class; count -------- 537384 (1 row) -----邮件原件----- 发件人: Tom Lane [mailto:tgl@sss.pgh.pa.us] 发送时间: 2018年2月13日 10:30 收件人: 石勇虎 <SHIYONGHU651@pingan.com.cn> 抄送: pgsql-bugs@postgresql.org 主题: Re: response time is very long in PG9.5.5 using psql or jdbc =?gb2312?B?yq/Twrui?= <SHIYONGHU651@pingan.com.cn> writes: > In our product database,we met a problem .when we login in the > database using psql or jdbc ,the response time is very long. I'm going to guess that this has something to do with very slow population of the backend's internal caches during the firstquery of a session. That suggests bloated system catalogs. Do you have an unusually large number of database objects (e.g., millions of tablesor functions)? Or is it possible that some transaction(s) transiently created a lot of objects, and vacuuming hasn't been able to shrinkthe catalogs back down? regards, tom lane ******************************************************************************************************************************** The information in this email is confidential and may be legally privileged. If you have received this email in error orare not the intended recipient, please immediately notify the sender and delete this message from your computer. Any use,distribution, or copying of this email other than by the intended recipient is strictly prohibited. All messages sentto and from us may be monitored to ensure compliance with internal policies and to protect our business. Emails are not secure and cannot be guaranteed to be error free as they can be intercepted, amended, lost or destroyed, orcontain viruses. Anyone who communicates with us by email is taken to accept these risks. 收发邮件者请注意: 本邮件含涉密信息,请保守秘密,若误收本邮件,请务必通知发送人并直接删去,不得使用、传播或复制本邮件。 进出邮件均受到本公司合规监控。邮件可能发生被截留、被修改、丢失、被破坏或包含计算机病毒等不安全情况。 ********************************************************************************************************************************
=?gb2312?B?yq/Twrui?= <SHIYONGHU651@pingan.com.cn> writes: > Yes,we have more than 500 thousand of objects,and the total size of the database is almost 10TB.Just as you said,we mayneed to reduce the objects number,or you have any better solution? Hmph. I tried creating 500000 tables in a test database, and couldn't detect any obvious performance problem in session startup. So there's something very odd about your results. You might try looking at the sizes of the system catalogs, e.g like select pg_size_pretty(pg_total_relation_size('pg_attribute')); (In my test database, pg_class is about 80MB and pg_attribute about 800MB.) > And I also have a question that if the backend's internal caches of catalog is shared with other users or sessions?if thepgbouncer is usefull? pgbouncer or some other connection pooler would help, yes. But I don't think the underlying performance ought to be this bad to begin with. regards, tom lane
Hi, On 2018-02-13 13:58:00 -0500, Tom Lane wrote: > =?gb2312?B?yq/Twrui?= <SHIYONGHU651@pingan.com.cn> writes: > > Yes,we have more than 500 thousand of objects,and the total size of the database is almost 10TB.Just as you said,we mayneed to reduce the objects number,or you have any better solution? Might also be worth knowing where the time is spent. Any chance you could use a profiler like perf to figure out exactly where the time is going? A simple thing wich might already give some hints is to show us parser & planner stats: SET log_parser_stats = true; SET log_planner_stats = true; SET client_min_messages = log; \i t.sql that should print things like LOG: 00000: PARSER STATISTICS DETAIL: ! system usage stats: ! 0.000043 s user, 0.000001 s system, 0.000041 s elapsed ! [3.086537 s user, 0.061010 s system total] ! 100592 kB max resident size ! 0/0 [183368/0] filesystem blocks in/out ! 0/0 [651/3696] page faults/reclaims, 0 [0] swaps ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent ! 0/0 [818/10] voluntary/involuntary context switches LOCATION: ShowUsage, postgres.c:4545 Just out of paranoia, could you share the result of: SHOW ignore_system_indexes; > Hmph. I tried creating 500000 tables in a test database, and couldn't > detect any obvious performance problem in session startup. Did you try 9.5 or master? We've had a few improvements... But I think the majority are only going to matter when a lot of relations are touched in the same relation... Greetings, Andres Freund
On Tue, 13 Feb 2018 13:58:00 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: > =?gb2312?B?yq/Twrui?= <SHIYONGHU651@pingan.com.cn> writes: > > Yes,we have more than 500 thousand of objects,and the total size of the database is almost 10TB.Just as you said,we mayneed to reduce the objects number,or you have any better solution? > > Hmph. I tried creating 500000 tables in a test database, and couldn't > detect any obvious performance problem in session startup. So there's > something very odd about your results. You might try looking at the > sizes of the system catalogs, e.g like > select pg_size_pretty(pg_total_relation_size('pg_attribute')); > (In my test database, pg_class is about 80MB and pg_attribute about > 800MB.) I see this problem fairly frequently. Typically the problem catalog is pg_attribute as it has the most rows. However the problem really arises when the catalogs become bloated. Once the total size of the catalogs that get read at start up approaches the size of shared buffers, and especially if several processes start at the same time, it becomes quite noticeable. Catalog bloat happens with large numbers of objects because autovacuum is ineffective when: - Catalog churn leading to dead rows - More than about 50,000 tables in one database, causes large stats file - Stats_temp_directory is not set to point to tmpfs so autovacuum stats requests flap the disks. Locally we also patch autovacuum to avoid this. - inflated reltuples prevents vacuum. See patches for bugs 14863 and 15005. Example from one of a clients production instances: # analyze verbose pg_attribute; INFO: analyzing "pg_catalog.pg_attribute" INFO: "pg_attribute": scanned 30000 of 24519424 pages, containing 6475 live rows and 83 dead rows; 6475 rows in sample, 800983035 estimated total rows. The reltuples estimate gets scaled by the bloat and once it gets large enough autovacuum won't look at it anymore as updated rows < 0.2 * reltuples. I have a fix into the next commitfest for the reltuples estimate which once it is reviewed I'd like to recommend for backpatching. -dg -- David Gould daveg@sonic.net If simplicity worked, the world would be overrun with insects.
Hi, On 2018-02-13 13:24:47 -0800, David Gould wrote: > I see this problem fairly frequently. Typically the problem catalog is > pg_attribute as it has the most rows. However the problem really arises when > the catalogs become bloated. Once the total size of the catalogs that get > read at start up approaches the size of shared buffers, and especially if > several processes start at the same time, it becomes quite noticeable. I can obviously see problems with a bloated catalog, but how does that cause massive slowdowns in *individual* queries as we see here? The OP's problem isn't the connection establishment performance afaict, but the first query using specific pieces of catalog data. And that should be ok-ish performancewise, even if there's bloat. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > I can obviously see problems with a bloated catalog, but how does that > cause massive slowdowns in *individual* queries as we see here? The > OP's problem isn't the connection establishment performance afaict, but > the first query using specific pieces of catalog data. And that should > be ok-ish performancewise, even if there's bloat. Yeah, I'm confused about it too. I'm wondering if there's some as-yet-unrecognized effect like the ones discussed here: https://www.postgresql.org/message-id/flat/4F9AC452.3050102%40pgexperts.com and here: https://www.postgresql.org/message-id/flat/20120524173325.GA5533%40tinybird.home Now, those issues seemed to be associated with needing to rebuild the relcache init file, so you wouldn't expect them to repeat on every connection ... but maybe the OP has something going on that causes the init file to get invalidated constantly. That'd have to be on top of some other massive performance problem, but at least you could see how catalog bloat could possibly lead to this type of symptom, seeing that init file rebuild requires some catalog seqscans. regards, tom lane
On 2018-02-13 17:06:36 -0500, Tom Lane wrote: > Now, those issues seemed to be associated with needing to rebuild the > relcache init file, so you wouldn't expect them to repeat on every > connection ... but maybe the OP has something going on that causes the > init file to get invalidated constantly. That'd have to be on top of > some other massive performance problem, but at least you could see how > catalog bloat could possibly lead to this type of symptom, seeing that > init file rebuild requires some catalog seqscans. One of the OP's examples shows the connection establishment separate from the slowdown however. Shouldn't that preclude relcache init file rebuilds being involved? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > One of the OP's examples shows the connection establishment separate > from the slowdown however. Shouldn't that preclude relcache init file > rebuilds being involved? Hmm ... I was thinking that psql might be able to produce a prompt before the connected backend was done initializing, but after a bit more poking into the code that seems unlikely. Still, it might be interesting to verify whether the first command is fast if it's one that requires no catalog access. Maybe "set enable_hashjoin = 1" or suchlike (being careful not to invoke tab completion to type it ;-)). regards, tom lane
On Tue, 13 Feb 2018 14:21:02 -0800 Andres Freund <andres@anarazel.de> wrote: > On 2018-02-13 17:06:36 -0500, Tom Lane wrote: > > Now, those issues seemed to be associated with needing to rebuild the > > relcache init file, so you wouldn't expect them to repeat on every > > connection ... but maybe the OP has something going on that causes the > > init file to get invalidated constantly. That'd have to be on top of > > some other massive performance problem, but at least you could see how > > catalog bloat could possibly lead to this type of symptom, seeing that > > init file rebuild requires some catalog seqscans. > > One of the OP's examples shows the connection establishment separate > from the slowdown however. Shouldn't that preclude relcache init file > rebuilds being involved? I can't account for all the behaviors, but we have a job that spawns 100 connections to process a heavily sharded set of tables in parallel every few minutes. Normally it runs for a few minutes, but it will extend a lot, eg hours when the catalogs get severely bloated. I'm assuming that the relcache init scans are chasing each others buffers out of memory as the system is completely IO bound when this happens. Not saying for sure that this is what is happening to the OP, but it may be similar. -dg -- David Gould daveg@sonic.net If simplicity worked, the world would be overrun with insects.
On 02/13/2018 10:31 PM, Andres Freund wrote: > Hi, > > On 2018-02-13 13:24:47 -0800, David Gould wrote: >> I see this problem fairly frequently. Typically the problem catalog is >> pg_attribute as it has the most rows. However the problem really arises when >> the catalogs become bloated. Once the total size of the catalogs that get >> read at start up approaches the size of shared buffers, and especially if >> several processes start at the same time, it becomes quite noticeable. > > I can obviously see problems with a bloated catalog, but how does that > cause massive slowdowns in *individual* queries as we see here? The > OP's problem isn't the connection establishment performance afaict, but > the first query using specific pieces of catalog data. And that should > be ok-ish performancewise, even if there's bloat. > What if the first connection does not actually go to PostgreSQL, but to some sort of connection pool (like pgbouncer)? That would be consistent with the observed behavior, I think - the first connection would likely have to open connection to a backend, paying all the price. Obviously, this is just a stab in the dark ... regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
发件人: Tomas Vondra [mailto:tomas.vondra@2ndquadrant.com]
发送时间: 2018年2月14日 7:53
收件人: Andres Freund <andres@anarazel.de>; David Gould <daveg@sonic.net>
抄送: Tom Lane <tgl@sss.pgh.pa.us>; 石勇虎 <SHIYONGHU651@pingan.com.cn>; pgsql-bugs@postgresql.org
主题: Re: response time is very long in PG9.5.5 using psql or jdbc
********************************************************************************************************************************
The information in this email is confidential and may be legally privileged. If you have received this email in error or are not the intended recipient, please immediately notify the sender and delete this message from your computer. Any use, distribution, or copying of this email other than by the intended recipient is strictly prohibited. All messages sent to and from us may be monitored to ensure compliance with internal policies and to protect our business.
Emails are not secure and cannot be guaranteed to be error free as they can be intercepted, amended, lost or destroyed, or contain viruses. Anyone who communicates with us by email is taken to accept these risks.
收发邮件者请注意:
本邮件含涉密信息,请保守秘密,若误收本邮件,请务必通知发送人并直接删去,不得使用、传播或复制本邮件。
进出邮件均受到本公司合规监控。邮件可能发生被截留、被修改、丢失、被破坏或包含计算机病毒等不安全情况。
********************************************************************************************************************************