Thread: PostgresSQL 9.5.21 very slow to connect and perform basic queries

PostgresSQL 9.5.21 very slow to connect and perform basic queries

From
bruno da silva
Date:
Hello.

I'm investigating an issue on a PostgresSql 9.5.21 installation that becomes unusable in an intermittent way. Simple queries like "select now();" could take 20s. commits take 2s. and all gets fixed after an engine restart.

I look into the pg logs and no signs of errors. and checkpoints are always timed. The machine is well provisioned, load isn't too high, and cpu io wait is under 1%.

any suggestions on what I should check more?


Thanks in advance.
--
Bruno da Silva

Re: PostgresSQL 9.5.21 very slow to connect and perform basic queries

From
Justin Pryzby
Date:
On Thu, Jul 21, 2022 at 02:37:35PM -0400, bruno da silva wrote:
> I'm investigating an issue on a PostgresSql 9.5.21 installation that
> becomes unusable in an intermittent way. Simple queries like "select
> now();" could take 20s. commits take 2s. and all gets fixed after an engine
> restart.
> 
> I look into the pg logs and no signs of errors. and checkpoints are
> always timed. The machine is well provisioned, load isn't too high, and cpu
> io wait is under 1%.
> 
> any suggestions on what I should check more?

What OS/version is it ?

What GUCs have you changed ?

Is it a new issue ?

https://wiki.postgresql.org/wiki/Slow_Query_Questions

Operating system+version
What OS / version ? At least for linux, you can get the distribution by running: tail /etc/*release 

GUC Settings
What database configuration settings have you changed? What are their values? (These are things like "shared_buffers",
"work_mem","enable_seq_scan", "effective_io_concurrency", "effective_cache_size", etc). See Server Configuration for a
usefulquery that will show all of your non-default database settings, in an easier to read format than posting pieces
ofyour postgresql.conf file. 
 

-- 
Justin



Re: PostgresSQL 9.5.21 very slow to connect and perform basic queries

From
bruno da silva
Date:
  Thanks for the quick response.

  OS/version: CentOS release 6.9 (Final)

  Hardware(non dedicated to the db, other services and app run the same server):
 
  Xeon(R) CPU E5-2690 v4 @ 2.60GHz -  56 cores - 504 GB RAM 
logicaldrive 1 (1.5 TB, RAID 1, OK)
physicaldrive 1I:3:1 (port 1I:box 3:bay 1, Solid State SAS, 1600.3 GB, OK)
physicaldrive 1I:3:2 (port 1I:box 3:bay 2, Solid State SAS, 1600.3 GB, OK)

 GUC Settings:
 auto_explain.log_analyze                     0
 auto_explain.log_min_duration             1000
 auto_explain.log_nested_statements           0
 auto_explain.log_verbose                     0
 autovacuum_analyze_scale_factor            0.1
 autovacuum_analyze_threshold                50
 autovacuum_freeze_max_age            200000000
 autovacuum_max_workers                       3
 autovacuum_multixact_freeze_max_age  400000000
 autovacuum_naptime                          60
 autovacuum_vacuum_cost_delay                 2
 autovacuum_vacuum_cost_limit               100
 autovacuum_vacuum_scale_factor             0.1
 autovacuum_vacuum_threshold                 50
 autovacuum_work_mem                         -1
 checkpoint_timeout                        2700
 effective_cache_size                   4194304
 enable_seqscan                               0
 log_autovacuum_min_duration                250
 log_checkpoints                              1
 log_connections                              1
 log_file_mode                              600
 log_lock_waits                               1
 log_min_duration_statement                1000
 log_rotation_age                          1440
 log_truncate_on_rotation                     1
 maintenance_work_mem                    262144
 max_connections                            300
 max_replication_slots                       10
 max_wal_senders                             10
 max_wal_size                              1280
 max_worker_processes                        15
 min_wal_size                                 5
 pg_stat_statements.max                   10000
 standard_conforming_strings                  1
 track_commit_timestamp                       1
 wal_receiver_timeout                         0
 wal_sender_timeout                           0
 work_mem                                  8192

On Thu, Jul 21, 2022 at 3:33 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Thu, Jul 21, 2022 at 02:37:35PM -0400, bruno da silva wrote:
> I'm investigating an issue on a PostgresSql 9.5.21 installation that
> becomes unusable in an intermittent way. Simple queries like "select
> now();" could take 20s. commits take 2s. and all gets fixed after an engine
> restart.
>
> I look into the pg logs and no signs of errors. and checkpoints are
> always timed. The machine is well provisioned, load isn't too high, and cpu
> io wait is under 1%.
>
> any suggestions on what I should check more?

What OS/version is it ?

What GUCs have you changed ?

Is it a new issue ?

https://wiki.postgresql.org/wiki/Slow_Query_Questions

Operating system+version
What OS / version ? At least for linux, you can get the distribution by running: tail /etc/*release

GUC Settings
What database configuration settings have you changed? What are their values? (These are things like "shared_buffers", "work_mem", "enable_seq_scan", "effective_io_concurrency", "effective_cache_size", etc). See Server Configuration for a useful query that will show all of your non-default database settings, in an easier to read format than posting pieces of your postgresql.conf file.

--
Justin


--
Bruno da Silva

Re: PostgresSQL 9.5.21 very slow to connect and perform basic queries

From
bruno da silva
Date:
The issue started a month ago.

On Thu, Jul 21, 2022 at 3:59 PM bruno da silva <brunogiovs@gmail.com> wrote:
  Thanks for the quick response.

  OS/version: CentOS release 6.9 (Final)

  Hardware(non dedicated to the db, other services and app run the same server):
 
  Xeon(R) CPU E5-2690 v4 @ 2.60GHz -  56 cores - 504 GB RAM 
logicaldrive 1 (1.5 TB, RAID 1, OK)
physicaldrive 1I:3:1 (port 1I:box 3:bay 1, Solid State SAS, 1600.3 GB, OK)
physicaldrive 1I:3:2 (port 1I:box 3:bay 2, Solid State SAS, 1600.3 GB, OK)

 GUC Settings:
 auto_explain.log_analyze                     0
 auto_explain.log_min_duration             1000
 auto_explain.log_nested_statements           0
 auto_explain.log_verbose                     0
 autovacuum_analyze_scale_factor            0.1
 autovacuum_analyze_threshold                50
 autovacuum_freeze_max_age            200000000
 autovacuum_max_workers                       3
 autovacuum_multixact_freeze_max_age  400000000
 autovacuum_naptime                          60
 autovacuum_vacuum_cost_delay                 2
 autovacuum_vacuum_cost_limit               100
 autovacuum_vacuum_scale_factor             0.1
 autovacuum_vacuum_threshold                 50
 autovacuum_work_mem                         -1
 checkpoint_timeout                        2700
 effective_cache_size                   4194304
 enable_seqscan                               0
 log_autovacuum_min_duration                250
 log_checkpoints                              1
 log_connections                              1
 log_file_mode                              600
 log_lock_waits                               1
 log_min_duration_statement                1000
 log_rotation_age                          1440
 log_truncate_on_rotation                     1
 maintenance_work_mem                    262144
 max_connections                            300
 max_replication_slots                       10
 max_wal_senders                             10
 max_wal_size                              1280
 max_worker_processes                        15
 min_wal_size                                 5
 pg_stat_statements.max                   10000
 standard_conforming_strings                  1
 track_commit_timestamp                       1
 wal_receiver_timeout                         0
 wal_sender_timeout                           0
 work_mem                                  8192

On Thu, Jul 21, 2022 at 3:33 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Thu, Jul 21, 2022 at 02:37:35PM -0400, bruno da silva wrote:
> I'm investigating an issue on a PostgresSql 9.5.21 installation that
> becomes unusable in an intermittent way. Simple queries like "select
> now();" could take 20s. commits take 2s. and all gets fixed after an engine
> restart.
>
> I look into the pg logs and no signs of errors. and checkpoints are
> always timed. The machine is well provisioned, load isn't too high, and cpu
> io wait is under 1%.
>
> any suggestions on what I should check more?

What OS/version is it ?

What GUCs have you changed ?

Is it a new issue ?

https://wiki.postgresql.org/wiki/Slow_Query_Questions

Operating system+version
What OS / version ? At least for linux, you can get the distribution by running: tail /etc/*release

GUC Settings
What database configuration settings have you changed? What are their values? (These are things like "shared_buffers", "work_mem", "enable_seq_scan", "effective_io_concurrency", "effective_cache_size", etc). See Server Configuration for a useful query that will show all of your non-default database settings, in an easier to read format than posting pieces of your postgresql.conf file.

--
Justin


--
Bruno da Silva


--
Bruno da Silva

Re: PostgresSQL 9.5.21 very slow to connect and perform basic queries

From
Andrew Dunstan
Date:
On 2022-07-21 Th 14:37, bruno da silva wrote:
> Hello.
>
> I'm investigating an issue on a PostgresSql 9.5.21 installation that
> becomes unusable in an intermittent way. Simple queries like "select
> now();" could take 20s. commits take 2s. and all gets fixed after an
> engine restart.
>
> I look into the pg logs and no signs of errors. and checkpoints are
> always timed. The machine is well provisioned, load isn't too high,
> and cpu io wait is under 1%.
>
> any suggestions on what I should check more?
>
>
>


9.5 has been out of support for nearly 2 years. You should be looking to
upgrade.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: PostgresSQL 9.5.21 very slow to connect and perform basic queries

From
Justin Pryzby
Date:
On Thu, Jul 21, 2022 at 03:59:30PM -0400, bruno da silva wrote:
>   OS/version: CentOS release 6.9 (Final)

How are these set ?

tail /sys/kernel/mm/ksm/run /sys/kernel/mm/transparent_hugepage/{defrag,enabled,khugepaged/defrag}
/proc/sys/vm/zone_reclaim_mode

I suspect you may be suffering from issues with transparent huge pages.

I suggest to disable KSM and THP, or upgrade to a newer OS.

I've written before about these:
https://www.postgresql.org/message-id/20170524155855.GH31097@telsasoft.com
https://www.postgresql.org/message-id/20190625162338.GF18602@telsasoft.com
https://www.postgresql.org/message-id/20170718180152.GE17566@telsasoft.com
https://www.postgresql.org/message-id/20191004060300.GA11241@telsasoft.com
https://www.postgresql.org/message-id/20200413144254.GS2228@telsasoft.com
https://www.postgresql.org/message-id/20220329182453.GA28503@telsasoft.com

On Thu, Jul 21, 2022 at 04:01:10PM -0400, bruno da silva wrote:
> The issue started a month ago.

Ok .. but how long has the DB been running under this environment ?

-- 
Justin



Re: PostgresSQL 9.5.21 very slow to connect and perform basic queries

From
bruno da silva
Date:
Thanks, I will check it out.

On Thu, Jul 21, 2022 at 4:21 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Thu, Jul 21, 2022 at 03:59:30PM -0400, bruno da silva wrote:
>   OS/version: CentOS release 6.9 (Final)

How are these set ?

tail /sys/kernel/mm/ksm/run /sys/kernel/mm/transparent_hugepage/{defrag,enabled,khugepaged/defrag} /proc/sys/vm/zone_reclaim_mode

I suspect you may be suffering from issues with transparent huge pages.

I suggest to disable KSM and THP, or upgrade to a newer OS.

I've written before about these:
https://www.postgresql.org/message-id/20170524155855.GH31097@telsasoft.com
https://www.postgresql.org/message-id/20190625162338.GF18602@telsasoft.com
https://www.postgresql.org/message-id/20170718180152.GE17566@telsasoft.com
https://www.postgresql.org/message-id/20191004060300.GA11241@telsasoft.com
https://www.postgresql.org/message-id/20200413144254.GS2228@telsasoft.com
https://www.postgresql.org/message-id/20220329182453.GA28503@telsasoft.com

On Thu, Jul 21, 2022 at 04:01:10PM -0400, bruno da silva wrote:
> The issue started a month ago.

Ok .. but how long has the DB been running under this environment ?

--
Justin


--
Bruno da Silva

Re: PostgresSQL 9.5.21 very slow to connect and perform basic queries

From
bruno da silva
Date:
It has been running at least since 2018. 

I got that from the tail

=> /sys/kernel/mm/ksm/run <==
0

==> /sys/kernel/mm/transparent_hugepage/defrag <==
[always] madvise never

==> /sys/kernel/mm/transparent_hugepage/enabled <==
always madvise [never]

==> /sys/kernel/mm/transparent_hugepage/khugepaged/defrag <==
[yes] no

==> /proc/sys/vm/zone_reclaim_mode <==
0


On Thu, Jul 21, 2022 at 4:32 PM bruno da silva <brunogiovs@gmail.com> wrote:
Thanks, I will check it out.

On Thu, Jul 21, 2022 at 4:21 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Thu, Jul 21, 2022 at 03:59:30PM -0400, bruno da silva wrote:
>   OS/version: CentOS release 6.9 (Final)

How are these set ?

tail /sys/kernel/mm/ksm/run /sys/kernel/mm/transparent_hugepage/{defrag,enabled,khugepaged/defrag} /proc/sys/vm/zone_reclaim_mode

I suspect you may be suffering from issues with transparent huge pages.

I suggest to disable KSM and THP, or upgrade to a newer OS.

I've written before about these:
https://www.postgresql.org/message-id/20170524155855.GH31097@telsasoft.com
https://www.postgresql.org/message-id/20190625162338.GF18602@telsasoft.com
https://www.postgresql.org/message-id/20170718180152.GE17566@telsasoft.com
https://www.postgresql.org/message-id/20191004060300.GA11241@telsasoft.com
https://www.postgresql.org/message-id/20200413144254.GS2228@telsasoft.com
https://www.postgresql.org/message-id/20220329182453.GA28503@telsasoft.com

On Thu, Jul 21, 2022 at 04:01:10PM -0400, bruno da silva wrote:
> The issue started a month ago.

Ok .. but how long has the DB been running under this environment ?

--
Justin


--
Bruno da Silva


--
Bruno da Silva

Re: PostgresSQL 9.5.21 very slow to connect and perform basic queries

From
bruno da silva
Date:
Hello.

After more investigation, we found that pgss_query_texts.stat of a size of 2.2GB. and this deployment has a 32bit pg.
and this errors:

postgresql-2022-07-12-20:07:15.log.gz:[2022-07-14 11:17:06.713 EDT] 207.89.58.230(46964) {62c87db0.8eb2} xxxx LOG:  out of memory
postgresql-2022-07-12-20:07:15.log.gz:[2022-07-14 11:17:06.713 EDT] 207.89.58.230(46964) {62c87db0.8eb2} xxxx DETAIL:  Could not allocate enough memory to read pg_stat_statement file "pg_stat_tmp/pgss_query_texts.stat".


So, my question is if pgss_query_texts.stat increases in size gradually due to too many distincts large sql statements could it cause an overall slowness on the engine? this slowness could cause simple statements to be super slow to return like
"select now()" taking 20s? 

Thanks in advance

Environment:

  OS/version: CentOS release 6.9 (Final)

  Hardware(non dedicated to the db, other services and app run the same server):
 
  Xeon(R) CPU E5-2690 v4 @ 2.60GHz -  56 cores - 504 GB RAM 
logicaldrive 1 (1.5 TB, RAID 1, OK)
physicaldrive 1I:3:1 (port 1I:box 3:bay 1, Solid State SAS, 1600.3 GB, OK)
physicaldrive 1I:3:2 (port 1I:box 3:bay 2, Solid State SAS, 1600.3 GB, OK)
  PostgresSQL 9.5.21 32bit

 GUC Settings:
 auto_explain.log_analyze                     0
 auto_explain.log_min_duration             1000
 auto_explain.log_nested_statements           0
 auto_explain.log_verbose                     0
 autovacuum_analyze_scale_factor            0.1
 autovacuum_analyze_threshold                50
 autovacuum_freeze_max_age            200000000
 autovacuum_max_workers                       3
 autovacuum_multixact_freeze_max_age  400000000
 autovacuum_naptime                          60
 autovacuum_vacuum_cost_delay                 2
 autovacuum_vacuum_cost_limit               100
 autovacuum_vacuum_scale_factor             0.1
 autovacuum_vacuum_threshold                 50
 autovacuum_work_mem                         -1
 checkpoint_timeout                        2700
 effective_cache_size                   4194304
 enable_seqscan                               0
 log_autovacuum_min_duration                250
 log_checkpoints                              1
 log_connections                              1
 log_file_mode                              600
 log_lock_waits                               1
 log_min_duration_statement                1000
 log_rotation_age                          1440
 log_truncate_on_rotation                     1
 maintenance_work_mem                    262144
 max_connections                            300
 max_replication_slots                       10
 max_wal_senders                             10
 max_wal_size                              1280
 max_worker_processes                        15
 min_wal_size                                 5
 pg_stat_statements.max                   10000
 standard_conforming_strings                  1
 track_commit_timestamp                       1
 wal_receiver_timeout                         0
 wal_sender_timeout                           0
 work_mem                                  8192





On Thu, Jul 21, 2022 at 2:37 PM bruno da silva <brunogiovs@gmail.com> wrote:
Hello.

I'm investigating an issue on a PostgresSql 9.5.21 installation that becomes unusable in an intermittent way. Simple queries like "select now();" could take 20s. commits take 2s. and all gets fixed after an engine restart.

I look into the pg logs and no signs of errors. and checkpoints are always timed. The machine is well provisioned, load isn't too high, and cpu io wait is under 1%.

any suggestions on what I should check more?


Thanks in advance.
--
Bruno da Silva


--
Bruno da Silva

Re: PostgresSQL 9.5.21 very slow to connect and perform basic queries

From
Tom Lane
Date:
bruno da silva <brunogiovs@gmail.com> writes:
> After more investigation, we found that pgss_query_texts.stat of a size of
> 2.2GB. and this deployment has a 32bit pg.

Hm ... we've heard one previous report of pg_stat_statements' query text
file getting unreasonably large, but it's not clear how that can come
to be.  Do you have a lot of especially long statements being tracked
in the pg_stat_statements view?  Are there any other signs of distress
in the postmaster log, like complaints about being unable to write
pgss_query_texts.stat?

            regards, tom lane



Re: PostgresSQL 9.5.21 very slow to connect and perform basic queries

From
bruno da silva
Date:
Do you have a lot of especially long statements being tracked
in the pg_stat_statements view? well, the view was showing the query column null.
 but looking on  pgss_query_texts.stat there are very large sql statements, of around ~ 400kb, multiple thousands. 

Are there any other signs of distress
in the postmaster log, like complaints about being unable to write
pgss_query_texts.stat? no, just complaints for reading it. 

Thanks

On Tue, Aug 2, 2022 at 11:59 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
bruno da silva <brunogiovs@gmail.com> writes:
> After more investigation, we found that pgss_query_texts.stat of a size of
> 2.2GB. and this deployment has a 32bit pg.

Hm ... we've heard one previous report of pg_stat_statements' query text
file getting unreasonably large, but it's not clear how that can come
to be.  Do you have a lot of especially long statements being tracked
in the pg_stat_statements view?  Are there any other signs of distress
in the postmaster log, like complaints about being unable to write
pgss_query_texts.stat?

                        regards, tom lane


--
Bruno da Silva

Re: PostgresSQL 9.5.21 very slow to connect and perform basic queries

From
Tom Lane
Date:
bruno da silva <brunogiovs@gmail.com> writes:
> Do you have a lot of especially long statements being tracked
> in the pg_stat_statements view?* well, the view was showing the query
> column null.*
> * but looking on  pgss_query_texts.stat there are very large sql
> statements, of around ~ 400kb, multiple thousands. *

Hm.  We try to recover from such failures by (a) resetting all the view's
query text fields to null and (b) truncating the file --- well, unlinking
it and creating it as empty.  It seems like (a) happened and (b) didn't.
It's pretty hard to explain that from the code though.  Are you quite
sure this is a 9.5.21 version of the pg_stat_statements extension?
Is it possible that the pg_stat_tmp directory has been made non-writable?

            regards, tom lane




> Are there any other signs of distress
> in the postmaster log, like complaints about being unable to write
> pgss_query_texts.stat? *no, just complaints for reading it. *

> Thanks

> On Tue, Aug 2, 2022 at 11:59 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

>> bruno da silva <brunogiovs@gmail.com> writes:
> After more investigation, we found that pgss_query_texts.stat of a size
>> of
> 2.2GB. and this deployment has a 32bit pg.
>> 
>> Hm ... we've heard one previous report of pg_stat_statements' query text
>> file getting unreasonably large, but it's not clear how that can come
>> to be.  Do you have a lot of especially long statements being tracked
>> in the pg_stat_statements view?  Are there any other signs of distress
>> in the postmaster log, like complaints about being unable to write
>> pgss_query_texts.stat?
>> 
>> regards, tom lane
>> 


> -- 
> Bruno da Silva




Re: PostgresSQL 9.5.21 very slow to connect and perform basic queries

From
bruno da silva
Date:
Hello.

Are you quite sure this is a 9.5.21 version of the pg_stat_statements extension? I got version 1.3 from SELECT * FROM pg_extension;
Is it possible that the pg_stat_tmp directory has been made non-writable? hard to tell if it was made non-writable during the outage. but now it is writable.

Thanks

On Tue, Aug 2, 2022 at 1:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
bruno da silva <brunogiovs@gmail.com> writes:
> Do you have a lot of especially long statements being tracked
> in the pg_stat_statements view?* well, the view was showing the query
> column null.*
> * but looking on  pgss_query_texts.stat there are very large sql
> statements, of around ~ 400kb, multiple thousands. *

Hm.  We try to recover from such failures by (a) resetting all the view's
query text fields to null and (b) truncating the file --- well, unlinking
it and creating it as empty.  It seems like (a) happened and (b) didn't.
It's pretty hard to explain that from the code though.  Are you quite
sure this is a 9.5.21 version of the pg_stat_statements extension?
Is it possible that the pg_stat_tmp directory has been made non-writable?

                        regards, tom lane




> Are there any other signs of distress
> in the postmaster log, like complaints about being unable to write
> pgss_query_texts.stat? *no, just complaints for reading it. *

> Thanks

> On Tue, Aug 2, 2022 at 11:59 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

>> bruno da silva <brunogiovs@gmail.com> writes:
> After more investigation, we found that pgss_query_texts.stat of a size
>> of
> 2.2GB. and this deployment has a 32bit pg.
>>
>> Hm ... we've heard one previous report of pg_stat_statements' query text
>> file getting unreasonably large, but it's not clear how that can come
>> to be.  Do you have a lot of especially long statements being tracked
>> in the pg_stat_statements view?  Are there any other signs of distress
>> in the postmaster log, like complaints about being unable to write
>> pgss_query_texts.stat?
>>
>> regards, tom lane
>>


> --
> Bruno da Silva



--
Bruno da Silva

Re: PostgresSQL 9.5.21 very slow to connect and perform basic queries

From
Tom Lane
Date:
I wrote:
> bruno da silva <brunogiovs@gmail.com> writes:
>> Do you have a lot of especially long statements being tracked
>> in the pg_stat_statements view?* well, the view was showing the query
>> column null.*
>> * but looking on  pgss_query_texts.stat there are very large sql
>> statements, of around ~ 400kb, multiple thousands. *

I see one possible piece of the puzzle here: since you're using a 32-bit
build, overflowing size_t is a reachable hazard.  Specifically, in this
test to see if we need to garbage-collect the query text file:

    if (extent < pgss->mean_query_len * pgss_max * 2)
        return false;

You said earlier that pg_stat_statements.max = 10000, so a mean_query_len
exceeding about 2^32 / 10000 / 2 = 214748.3648 would be enough to overflow
size_t and break this comparison.  Now, a mean SQL query length in excess
of 200kB sounds mighty improbable, but it's really the mean length of the
query texts in the view.  If your "normal" queries fall into just a few
patterns they might be represented by a relatively small number of view
entries.  And if the "big" queries are sufficiently not alike, they might
each get their own view entry, which could potentially drive the mean high
enough to cause trouble.  It'd be interesting to track what
"SELECT avg(length(query)) FROM pg_stat_statements" gives.

However, even if we grant that mean_query_len is that big, overflow here
would make garbage collection of the query text file more likely not less
so.  What I'm speculating is that overflow is occurring and causing all
processes to decide they need to run gc_qtexts() every time they insert
a new query entry, even though the query texts file isn't actually
bloated.  That could possibly explain your performance issues: a garbage
collection pass over a multi-gig file will take awhile, and what's worse
is that it's done under an exclusive lock, meaning that all the backends
stack up waiting their turn to perform a useless GC pass.

What this doesn't explain is why the condition doesn't clear once you
observe one of those "out of memory" complaints, because that should
lead to truncating the texts file.  Maybe it does get truncated, but
then the cycle repeats after awhile?  If you have a steady stream of
incoming new 400kB queries, you could build back up to 2.2GB of text
after five thousand or so of those.

I'm also curious whether this installation is in the habit of doing
pg_stat_statements_reset() a lot.  It looks like that fails to
reset mean_query_len, which might be intentional but perhaps it
could play into getting a silly result here later on.

            regards, tom lane



Re: PostgresSQL 9.5.21 very slow to connect and perform basic queries

From
bruno da silva
Date:
Hello Tom. Thanks for your response. 
I spent most of the time looking for evidence and checking other installations with similar patterns since your response.

this installation is in the habit of doing pg_stat_statements_reset() a lot?
 resetting is very rare. How can I get "pgss->mean_query_len" via sql?

Maybe it does get truncated, but then the cycle repeats after a while? 
it is possible as the slowness happened some days apart 3 times.

Question: Besides the gc issue that you mentioned, having a large ( 700MB or 1GB ) pgss_query_texts.stat could cause slowness in pg_stat_statement processing 
than leading to slower query responses with a 32bit PG? I'm thinking in reducing pg_stat_statements.max from 10k to 3k


Thanks 

On Tue, Aug 2, 2022 at 3:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
> bruno da silva <brunogiovs@gmail.com> writes:
>> Do you have a lot of especially long statements being tracked
>> in the pg_stat_statements view?* well, the view was showing the query
>> column null.*
>> * but looking on  pgss_query_texts.stat there are very large sql
>> statements, of around ~ 400kb, multiple thousands. *

I see one possible piece of the puzzle here: since you're using a 32-bit
build, overflowing size_t is a reachable hazard.  Specifically, in this
test to see if we need to garbage-collect the query text file:

        if (extent < pgss->mean_query_len * pgss_max * 2)
                return false;

You said earlier that pg_stat_statements.max = 10000, so a mean_query_len
exceeding about 2^32 / 10000 / 2 = 214748.3648 would be enough to overflow
size_t and break this comparison.  Now, a mean SQL query length in excess
of 200kB sounds mighty improbable, but it's really the mean length of the
query texts in the view.  If your "normal" queries fall into just a few
patterns they might be represented by a relatively small number of view
entries.  And if the "big" queries are sufficiently not alike, they might
each get their own view entry, which could potentially drive the mean high
enough to cause trouble.  It'd be interesting to track what
"SELECT avg(length(query)) FROM pg_stat_statements" gives.

However, even if we grant that mean_query_len is that big, overflow here
would make garbage collection of the query text file more likely not less
so.  What I'm speculating is that overflow is occurring and causing all
processes to decide they need to run gc_qtexts() every time they insert
a new query entry, even though the query texts file isn't actually
bloated.  That could possibly explain your performance issues: a garbage
collection pass over a multi-gig file will take awhile, and what's worse
is that it's done under an exclusive lock, meaning that all the backends
stack up waiting their turn to perform a useless GC pass.

What this doesn't explain is why the condition doesn't clear once you
observe one of those "out of memory" complaints, because that should
lead to truncating the texts file.  Maybe it does get truncated, but
then the cycle repeats after awhile?  If you have a steady stream of
incoming new 400kB queries, you could build back up to 2.2GB of text
after five thousand or so of those.

I'm also curious whether this installation is in the habit of doing
pg_stat_statements_reset() a lot.  It looks like that fails to
reset mean_query_len, which might be intentional but perhaps it
could play into getting a silly result here later on.

                        regards, tom lane


--
Bruno da Silva

Re: PostgresSQL 9.5.21 very slow to connect and perform basic queries

From
Tom Lane
Date:
bruno da silva <brunogiovs@gmail.com> writes:
> *Question: *Besides the gc issue that you mentioned, having a large ( 700MB
> or 1GB ) pgss_query_texts.stat could cause slowness in pg_stat_statement
> processing
> than leading to slower query responses with a 32bit PG? I'm thinking in
> reducing pg_stat_statements.max from 10k to 3k

Whether or not we've fully identified the problem, I think cutting
pg_stat_statements.max is a good idea.  Especially as long as you're
stuck on an unsupported PG version.

            regards, tom lane



Re: PostgresSQL 9.5.21 very slow to connect and perform basic queries

From
bruno da silva
Date:
Hello Guys.

I'd like to report back on this issue as I've been monitoring on this installation that has very large distinct sqls and I noticed something that isn't probably new here but I'd like to confirm that again.

So after I reduced the pg_stat_statements.max from 10k to 3k pgss_query_texts.stat was peaking at a reasonable size of ~450MB and by monitoring the file size I was able to have a 1min window interval when the pgss_query_texts.stat gc was happening. but whenever a gc was detected a bunch of statements would get logged on the pg log as slow statements and all would report taking around 1s some statements are like "BEGIN", "COMMIT" then last week I asked for another reduction from 3k to 300  pg_stat_statements.max and those slow statement reports aren't happening anymore even if pgss_query_texts.stat gc still occurs. 

my question is: is it safe to assume that because the gc of pgss_query_texts.stat requires a global lock this is a limitation of pg_stat_statements current implementation? 

Thanks

On Wed, Aug 3, 2022 at 11:17 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
bruno da silva <brunogiovs@gmail.com> writes:
> *Question: *Besides the gc issue that you mentioned, having a large ( 700MB
> or 1GB ) pgss_query_texts.stat could cause slowness in pg_stat_statement
> processing
> than leading to slower query responses with a 32bit PG? I'm thinking in
> reducing pg_stat_statements.max from 10k to 3k

Whether or not we've fully identified the problem, I think cutting
pg_stat_statements.max is a good idea.  Especially as long as you're
stuck on an unsupported PG version.

                        regards, tom lane


--
Bruno da Silva