Thread: [PATCH] Increase the maximum value track_activity_query_size

[PATCH] Increase the maximum value track_activity_query_size

From
v.makarov@postgrespro.ru
Date:
Hi Hackers,

Some ORMs may generate queries larger than the maximum possible value of 
track_activity_query_size (100 kB).
Is there any reasons to limit the maximum value of 
track_activity_query_size to such small value?
Increasing the maximum value to 1 MB will help partially solve this 
problem.
Also in the file postgresql.conf.sample pointed maximum value 
track_activity_query_size (before that it was not specified).

--
Vyacheslav Makarov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment

Re: [PATCH] Increase the maximum value track_activity_query_size

From
Bruce Momjian
Date:
Good question.  I am in favor of allowing a larger value if no one
objects.  I don't think adding the min/max is helpful.

---------------------------------------------------------------------------

On Tue, Nov 26, 2019 at 05:59:25PM +0300, v.makarov@postgrespro.ru wrote:
> Hi Hackers,
> 
> Some ORMs may generate queries larger than the maximum possible value of
> track_activity_query_size (100 kB).
> Is there any reasons to limit the maximum value of track_activity_query_size
> to such small value?
> Increasing the maximum value to 1 MB will help partially solve this problem.
> Also in the file postgresql.conf.sample pointed maximum value
> track_activity_query_size (before that it was not specified).
> 
> --
> Vyacheslav Makarov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company

> diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
> index ba4edde71a..0e64dc1dbb 100644
> --- a/src/backend/utils/misc/guc.c
> +++ b/src/backend/utils/misc/guc.c
> @@ -3200,7 +3200,7 @@ static struct config_int ConfigureNamesInt[] =
>              GUC_UNIT_BYTE
>          },
>          &pgstat_track_activity_query_size,
> -        1024, 100, 102400,
> +        1024, 100, 1048576,
>          NULL, NULL, NULL
>      },
>  
> diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
> index 46a06ffacd..55d3bfbfd0 100644
> --- a/src/backend/utils/misc/postgresql.conf.sample
> +++ b/src/backend/utils/misc/postgresql.conf.sample
> @@ -569,7 +569,7 @@
>  #track_counts = on
>  #track_io_timing = off
>  #track_functions = none            # none, pl, all
> -#track_activity_query_size = 1024    # (change requires restart)
> +#track_activity_query_size = 1024    # range 100B - 1MB (change requires restart)
>  #stats_temp_directory = 'pg_stat_tmp'
>  
>  


-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [PATCH] Increase the maximum value track_activity_query_size

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Good question.  I am in favor of allowing a larger value if no one
> objects.  I don't think adding the min/max is helpful.

I think there are pretty obvious performance and memory-consumption
penalties to very large track_activity_query_size values.  Who exactly
are we really helping if we let them set it to huge values?

(wanders away wondering if we have suitable integer-overflow checks
in relevant code paths...)

            regards, tom lane



Re: [PATCH] Increase the maximum value track_activity_query_size

From
Robert Haas
Date:
On Thu, Dec 19, 2019 at 10:59 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Good question.  I am in favor of allowing a larger value if no one
> > objects.  I don't think adding the min/max is helpful.
>
> I think there are pretty obvious performance and memory-consumption
> penalties to very large track_activity_query_size values.  Who exactly
> are we really helping if we let them set it to huge values?

The original poster.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PATCH] Increase the maximum value track_activity_query_size

From
Alexey Kondratov
Date:
On 19.12.2019 20:52, Robert Haas wrote:
> On Thu, Dec 19, 2019 at 10:59 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Bruce Momjian <bruce@momjian.us> writes:
>>> Good question.  I am in favor of allowing a larger value if no one
>>> objects.  I don't think adding the min/max is helpful.
>>
>> The original poster.


And probably anyone else, who debugs stuck queries of yet another crazy 
ORM. Yes, one could use log_min_duration_statement, but having a 
possibility to directly get it from pg_stat_activity without eyeballing 
the logs is nice. Also, IIRC log_min_duration_statement applies only to 
completed statements.

> I think there are pretty obvious performance and memory-consumption
> penalties to very large track_activity_query_size values.  Who exactly
> are we really helping if we let them set it to huge values?
>
> (wanders away wondering if we have suitable integer-overflow checks
> in relevant code paths...)


The value of pgstat_track_activity_query_size is in bytes, so setting it 
to any value below INT_MAX seems to be safe from that perspective. 
However, being multiplied by NumBackendStatSlots its reasonable value 
should be far below INT_MAX (~2 GB).

Sincerely, It does not look for me like something badly needed, but 
still. We already have hundreds of GUCs and it is easy for a user to 
build a sub-optimal configuration, so does this overprotection make sense?


Regards

-- 
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company




Re: [PATCH] Increase the maximum value track_activity_query_size

From
Bruce Momjian
Date:
On Fri, Dec 20, 2019 at 02:35:37PM +0300, Alexey Kondratov wrote:
> On 19.12.2019 20:52, Robert Haas wrote:
> > On Thu, Dec 19, 2019 at 10:59 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > > Bruce Momjian <bruce@momjian.us> writes:
> > > > Good question.  I am in favor of allowing a larger value if no one
> > > > objects.  I don't think adding the min/max is helpful.
> > > 
> > > The original poster.
> 
> 
> And probably anyone else, who debugs stuck queries of yet another crazy ORM.
> Yes, one could use log_min_duration_statement, but having a possibility to
> directly get it from pg_stat_activity without eyeballing the logs is nice.
> Also, IIRC log_min_duration_statement applies only to completed statements.

Yes, you would need log_statement = true.

> > I think there are pretty obvious performance and memory-consumption
> > penalties to very large track_activity_query_size values.  Who exactly
> > are we really helping if we let them set it to huge values?
> > 
> > (wanders away wondering if we have suitable integer-overflow checks
> > in relevant code paths...)
> 
> 
> The value of pgstat_track_activity_query_size is in bytes, so setting it to
> any value below INT_MAX seems to be safe from that perspective. However,
> being multiplied by NumBackendStatSlots its reasonable value should be far
> below INT_MAX (~2 GB).
> 
> Sincerely, It does not look for me like something badly needed, but still.
> We already have hundreds of GUCs and it is easy for a user to build a
> sub-optimal configuration, so does this overprotection make sense?

I can imagine using larger pgstat_track_activity_query_size values for
data warehouse queries, where they are long and there are only a few of
them.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [PATCH] Increase the maximum value track_activity_query_size

From
Michael Paquier
Date:
On Fri, Dec 20, 2019 at 08:57:04AM -0500, Bruce Momjian wrote:
> I can imagine using larger pgstat_track_activity_query_size values for
> data warehouse queries, where they are long and there are only a few of
> them.

Why are those queries that long anyway?  A too long IN clause with an
insane amount of parameters which could be replaced by an ANY clause
with an array?
--
Michael

Attachment

Re: [PATCH] Increase the maximum value track_activity_query_size

From
Nikolay Samokhvalov
Date:
Here is what ORMs do:

select length('SELECT "column_name_1001", "column_name_1002", "column_name_1003", "column_name_1004", "column_name_1005", "column_name_1006", "column_name_1007", "column_name_1008", "column_name_1009", "column_name_1010", "column_name_1011", "column_name_1012", "column_name_1013", "column_name_1014", "column_name_1015", "column_name_1016", "column_name_1017", "column_name_1018", "column_name_1019", "column_name_1020", "column_name_1021", "column_name_1022", "column_name_1023", "column_name_1024", "column_name_1025", "column_name_1026", "column_name_1027", "column_name_1028", "column_name_1029", "column_name_1030", "column_name_1031", "column_name_1032", "column_name_1033", "column_name_1034", "column_name_1035", "column_name_1036", "column_name_1037", "column_name_1038", "column_name_1039", "column_name_1040", "column_name_1041", "column_name_1042", "column_name_1043", "column_name_1044", "column_name_1045", "column_name_1046", "column_name_1047", "column_name_1048", "column_name_1049", "column_name_1050" FROM "some_table";');
 length
--------
   1024
(1 row)

That's it – with default settings, you won't see WHERE clause or
anything else.

It is not only about analytical workloads. I see it for regular OLTP
workloads literally *any* large project that use an ORM. Ruby on Rails'
ActiveRecord does it, Java'sHibernate does, and so on.

As a result, many queries exceed track_activity_query_size, and we
end up having queries trimmed in pg_stat_activity. Why it is bad:- it
makes an automated analysis involving pg_stat_activity impossible,
it complicates any manual troubleshooting involving pg_stat_activity.

Changing this parameter in a mission-critical database is difficult
because it requires a restart.

+1 for changing it to 1M or at least to 100k. If the penalty is significant,
at least 10k.

What is the overhead here except the memory consumption?
Consumption of, say,100 * 1M = 1MiB of RAM is a low price for much
better transparency here. But what about the performance penalty?
Some benchmark would be nice to answer this.

On Fri, Dec 20, 2019 at 6:48 PM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Dec 20, 2019 at 08:57:04AM -0500, Bruce Momjian wrote:
> I can imagine using larger pgstat_track_activity_query_size values for
> data warehouse queries, where they are long and there are only a few of
> them.

Why are those queries that long anyway?  A too long IN clause with an
insane amount of parameters which could be replaced by an ANY clause
with an array?
--
Michael

Re: [PATCH] Increase the maximum value track_activity_query_size

From
Tom Lane
Date:
Nikolay Samokhvalov <samokhvalov@gmail.com> writes:
> Here is what ORMs do:
> select length('SELECT "column_name_1001", "column_name_1002",
> "column_name_1003", "column_name_1004", "column_name_1005",
> "column_name_1006", "column_name_1007", "column_name_1008",
> "column_name_1009", "column_name_1010", "column_name_1011",
> "column_name_1012", "column_name_1013", "column_name_1014",
> "column_name_1015", "column_name_1016", "column_name_1017",
> "column_name_1018", "column_name_1019", "column_name_1020",
> "column_name_1021", "column_name_1022", "column_name_1023",
> "column_name_1024", "column_name_1025", "column_name_1026",
> "column_name_1027", "column_name_1028", "column_name_1029",
> "column_name_1030", "column_name_1031", "column_name_1032",
> "column_name_1033", "column_name_1034", "column_name_1035",
> "column_name_1036", "column_name_1037", "column_name_1038",
> "column_name_1039", "column_name_1040", "column_name_1041",
> "column_name_1042", "column_name_1043", "column_name_1044",
> "column_name_1045", "column_name_1046", "column_name_1047",
> "column_name_1048", "column_name_1049", "column_name_1050" FROM
> "some_table";');
>  length
> --------
>    1024
> (1 row)

> That's it – with default settings, you won't see WHERE clause or
> anything else.

If that's true, it doesn't offer much of a case for upping the limit
on track_activity_query_size.  The longest such a query could reasonably
get is somewhere near NAMEDATALEN times MaxHeapAttributeNumber, which
as it happens is exactly the existing limit on track_activity_query_size.

> As a result, many queries exceed track_activity_query_size

How?  And if they are, why do you care?  Such queries sure seem
pretty content-free.

> What is the overhead here except the memory consumption?

The time to copy those strings out of shared storage, any time
you query pg_stat_activity.

            regards, tom lane



Re: [PATCH] Increase the maximum value track_activity_query_size

From
Tomas Vondra
Date:
On Sat, Dec 21, 2019 at 04:25:05PM -0500, Tom Lane wrote:
>Nikolay Samokhvalov <samokhvalov@gmail.com> writes:
>> Here is what ORMs do:
>> select length('SELECT "column_name_1001", "column_name_1002",
>> "column_name_1003", "column_name_1004", "column_name_1005",
>> "column_name_1006", "column_name_1007", "column_name_1008",
>> "column_name_1009", "column_name_1010", "column_name_1011",
>> "column_name_1012", "column_name_1013", "column_name_1014",
>> "column_name_1015", "column_name_1016", "column_name_1017",
>> "column_name_1018", "column_name_1019", "column_name_1020",
>> "column_name_1021", "column_name_1022", "column_name_1023",
>> "column_name_1024", "column_name_1025", "column_name_1026",
>> "column_name_1027", "column_name_1028", "column_name_1029",
>> "column_name_1030", "column_name_1031", "column_name_1032",
>> "column_name_1033", "column_name_1034", "column_name_1035",
>> "column_name_1036", "column_name_1037", "column_name_1038",
>> "column_name_1039", "column_name_1040", "column_name_1041",
>> "column_name_1042", "column_name_1043", "column_name_1044",
>> "column_name_1045", "column_name_1046", "column_name_1047",
>> "column_name_1048", "column_name_1049", "column_name_1050" FROM
>> "some_table";');
>>  length
>> --------
>>    1024
>> (1 row)
>
>> That's it – with default settings, you won't see WHERE clause or
>> anything else.
>
>If that's true, it doesn't offer much of a case for upping the limit
>on track_activity_query_size.  The longest such a query could reasonably
>get is somewhere near NAMEDATALEN times MaxHeapAttributeNumber, which
>as it happens is exactly the existing limit on track_activity_query_size.
>
>> As a result, many queries exceed track_activity_query_size
>
>How?  And if they are, why do you care?  Such queries sure seem
>pretty content-free.
>

I believe the example was just a very simplistic example. ORMs can of
course generate queries with joins, which can easily exceed the limit
you mentioned.

>> What is the overhead here except the memory consumption?
>
>The time to copy those strings out of shared storage, any time
>you query pg_stat_activity.
>

IMO that seems like a reasonable price to pay, if you want to see
complete queries and bump the track_activity_query_size value up.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PATCH] Increase the maximum value track_activity_query_size

From
Julien Rouhaud
Date:
On Sun, Dec 22, 2019 at 1:03 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Sat, Dec 21, 2019 at 04:25:05PM -0500, Tom Lane wrote:
> >Nikolay Samokhvalov <samokhvalov@gmail.com> writes:
> >> Here is what ORMs do:
> >> select length('SELECT "column_name_1001", "column_name_1002",
> >> "column_name_1003", "column_name_1004", "column_name_1005",
> >> "column_name_1006", "column_name_1007", "column_name_1008",
> >> "column_name_1009", "column_name_1010", "column_name_1011",
> >> "column_name_1012", "column_name_1013", "column_name_1014",
> >> "column_name_1015", "column_name_1016", "column_name_1017",
> >> "column_name_1018", "column_name_1019", "column_name_1020",
> >> "column_name_1021", "column_name_1022", "column_name_1023",
> >> "column_name_1024", "column_name_1025", "column_name_1026",
> >> "column_name_1027", "column_name_1028", "column_name_1029",
> >> "column_name_1030", "column_name_1031", "column_name_1032",
> >> "column_name_1033", "column_name_1034", "column_name_1035",
> >> "column_name_1036", "column_name_1037", "column_name_1038",
> >> "column_name_1039", "column_name_1040", "column_name_1041",
> >> "column_name_1042", "column_name_1043", "column_name_1044",
> >> "column_name_1045", "column_name_1046", "column_name_1047",
> >> "column_name_1048", "column_name_1049", "column_name_1050" FROM
> >> "some_table";');
> >>  length
> >> --------
> >>    1024
> >> (1 row)
> >
> >> That's it – with default settings, you won't see WHERE clause or
> >> anything else.
> >
> >If that's true, it doesn't offer much of a case for upping the limit
> >on track_activity_query_size.  The longest such a query could reasonably
> >get is somewhere near NAMEDATALEN times MaxHeapAttributeNumber, which
> >as it happens is exactly the existing limit on track_activity_query_size.
> >
> >> As a result, many queries exceed track_activity_query_size
> >
> >How?  And if they are, why do you care?  Such queries sure seem
> >pretty content-free.
> >
>
> I believe the example was just a very simplistic example. ORMs can of
> course generate queries with joins, which can easily exceed the limit
> you mentioned.
>
> >> What is the overhead here except the memory consumption?
> >
> >The time to copy those strings out of shared storage, any time
> >you query pg_stat_activity.
> >
>
> IMO that seems like a reasonable price to pay, if you want to see
> complete queries and bump the track_activity_query_size value up.

Couldn't be pg_stat_statements (or any similar extension) queryid
exposure in pg_stat_activity [1] also an alternative?  You wouldn't
have the parameters but maybe the normalized query would be enough for
most analysis.  Now, maybe pg_stat_statements jumble overhead for such
large statements would be even more problematic.

[1] https://commitfest.postgresql.org/26/2069/



Re: [PATCH] Increase the maximum value track_activity_query_size

From
Tomas Vondra
Date:
On Sun, Dec 22, 2019 at 09:06:41AM +0100, Julien Rouhaud wrote:
>On Sun, Dec 22, 2019 at 1:03 AM Tomas Vondra
><tomas.vondra@2ndquadrant.com> wrote:
>>
>> On Sat, Dec 21, 2019 at 04:25:05PM -0500, Tom Lane wrote:
>> >Nikolay Samokhvalov <samokhvalov@gmail.com> writes:
>> >> Here is what ORMs do:
>> >> select length('SELECT "column_name_1001", "column_name_1002",
>> >> "column_name_1003", "column_name_1004", "column_name_1005",
>> >> "column_name_1006", "column_name_1007", "column_name_1008",
>> >> "column_name_1009", "column_name_1010", "column_name_1011",
>> >> "column_name_1012", "column_name_1013", "column_name_1014",
>> >> "column_name_1015", "column_name_1016", "column_name_1017",
>> >> "column_name_1018", "column_name_1019", "column_name_1020",
>> >> "column_name_1021", "column_name_1022", "column_name_1023",
>> >> "column_name_1024", "column_name_1025", "column_name_1026",
>> >> "column_name_1027", "column_name_1028", "column_name_1029",
>> >> "column_name_1030", "column_name_1031", "column_name_1032",
>> >> "column_name_1033", "column_name_1034", "column_name_1035",
>> >> "column_name_1036", "column_name_1037", "column_name_1038",
>> >> "column_name_1039", "column_name_1040", "column_name_1041",
>> >> "column_name_1042", "column_name_1043", "column_name_1044",
>> >> "column_name_1045", "column_name_1046", "column_name_1047",
>> >> "column_name_1048", "column_name_1049", "column_name_1050" FROM
>> >> "some_table";');
>> >>  length
>> >> --------
>> >>    1024
>> >> (1 row)
>> >
>> >> That's it – with default settings, you won't see WHERE clause or
>> >> anything else.
>> >
>> >If that's true, it doesn't offer much of a case for upping the limit
>> >on track_activity_query_size.  The longest such a query could reasonably
>> >get is somewhere near NAMEDATALEN times MaxHeapAttributeNumber, which
>> >as it happens is exactly the existing limit on track_activity_query_size.
>> >
>> >> As a result, many queries exceed track_activity_query_size
>> >
>> >How?  And if they are, why do you care?  Such queries sure seem
>> >pretty content-free.
>> >
>>
>> I believe the example was just a very simplistic example. ORMs can of
>> course generate queries with joins, which can easily exceed the limit
>> you mentioned.
>>
>> >> What is the overhead here except the memory consumption?
>> >
>> >The time to copy those strings out of shared storage, any time
>> >you query pg_stat_activity.
>> >
>>
>> IMO that seems like a reasonable price to pay, if you want to see
>> complete queries and bump the track_activity_query_size value up.
>
>Couldn't be pg_stat_statements (or any similar extension) queryid
>exposure in pg_stat_activity [1] also an alternative?  You wouldn't
>have the parameters but maybe the normalized query would be enough for
>most analysis.  Now, maybe pg_stat_statements jumble overhead for such
>large statements would be even more problematic.
>

But that would effectively add dependency on pg_stat_statements, no? I
don't think we want that.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: [PATCH] Increase the maximum value track_activity_query_size

From
Julien Rouhaud
Date:
On Mon, Dec 23, 2019 at 1:10 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Sun, Dec 22, 2019 at 09:06:41AM +0100, Julien Rouhaud wrote:
> >On Sun, Dec 22, 2019 at 1:03 AM Tomas Vondra
> ><tomas.vondra@2ndquadrant.com> wrote:
> >>
> >> On Sat, Dec 21, 2019 at 04:25:05PM -0500, Tom Lane wrote:
> >> >> What is the overhead here except the memory consumption?
> >> >
> >> >The time to copy those strings out of shared storage, any time
> >> >you query pg_stat_activity.
> >> >
> >>
> >> IMO that seems like a reasonable price to pay, if you want to see
> >> complete queries and bump the track_activity_query_size value up.
> >
> >Couldn't be pg_stat_statements (or any similar extension) queryid
> >exposure in pg_stat_activity [1] also an alternative?  You wouldn't
> >have the parameters but maybe the normalized query would be enough for
> >most analysis.  Now, maybe pg_stat_statements jumble overhead for such
> >large statements would be even more problematic.
> >
>
> But that would effectively add dependency on pg_stat_statements, no? I
> don't think we want that.

The queryid field is part of the core, so no dependency is added.  You
just get a somewhat useless NULL value returned until you load an
extension that compute a queryid, which may be pg_stat_statements but
any other one will work too.



Re: [PATCH] Increase the maximum value track_activity_query_size

From
Robert Haas
Date:
On Sat, Dec 21, 2019 at 1:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > What is the overhead here except the memory consumption?
>
> The time to copy those strings out of shared storage, any time
> you query pg_stat_activity.

It seems like you're masterminding this, and I don't know why. It
seems unlikely that anyone will raise the value unless they have very
long queries, and if those people would rather pay the overhead of
copying more data than have their queries truncated, who are we to
argue?

If increasing the maximum imposed some noticeable cost on
installations the kept the default setting, that might well be a good
argument for not raising the maximum. But I don't think that's the
case. I also suspect that the overhead would be pretty darn small even
for people who *do* raise the default setting. It looks to me like
both reading and write operations on st_activity_raw stop when they
hit a NUL byte, so any performance costs on short queries must come
from second-order effects (e.g. the main shared memory segment is
bigger, so the OS cache is smaller) which are likely irrelevant in
practice.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PATCH] Increase the maximum value track_activity_query_size

From
Jeff Janes
Date:
On Tue, Dec 24, 2019 at 12:11 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Sat, Dec 21, 2019 at 1:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > What is the overhead here except the memory consumption?
>
> The time to copy those strings out of shared storage, any time
> you query pg_stat_activity.

It seems like you're masterminding this, and I don't know why. It
seems unlikely that anyone will raise the value unless they have very
long queries, and if those people would rather pay the overhead of
copying more data than have their queries truncated, who are we to
argue?

+1

Cheers,

Jeff

Re: [PATCH] Increase the maximum value track_activity_query_size

From
Robert Treat
Date:
On Mon, Dec 23, 2019 at 9:11 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Sat, Dec 21, 2019 at 1:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > > What is the overhead here except the memory consumption?
> >
> > The time to copy those strings out of shared storage, any time
> > you query pg_stat_activity.
>
> It seems like you're masterminding this, and I don't know why. It
> seems unlikely that anyone will raise the value unless they have very
> long queries, and if those people would rather pay the overhead of
> copying more data than have their queries truncated, who are we to
> argue?
>
> If increasing the maximum imposed some noticeable cost on
> installations the kept the default setting, that might well be a good
> argument for not raising the maximum. But I don't think that's the
> case. I also suspect that the overhead would be pretty darn small even
> for people who *do* raise the default setting. It looks to me like
> both reading and write operations on st_activity_raw stop when they
> hit a NUL byte, so any performance costs on short queries must come
> from second-order effects (e.g. the main shared memory segment is
> bigger, so the OS cache is smaller) which are likely irrelevant in
> practice.
>

I'm generally in favor of the idea of allowing people to make
trade-offs that best work for them, but Tom's concern does give me
pause, because it isn't clear to me how people will measure the
overhead of upping this setting. If given the option people will
almost certainly start raising this limit because the benefits are
obvious ("I can see all my query now!") but so far the explanation of
the downsides have been either hand-wavy or, in the case of your
second paragraph, an argument they are non-existent, which doesn't
seem right either; so how do we explain to people how to measure the
overhead for them?

Robert Treat
https://xzilla.net



Re: [PATCH] Increase the maximum value track_activity_query_size

From
Tomas Vondra
Date:
On Mon, Dec 30, 2019 at 12:46:40PM -0800, Robert Treat wrote:
>On Mon, Dec 23, 2019 at 9:11 PM Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> On Sat, Dec 21, 2019 at 1:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> > > What is the overhead here except the memory consumption?
>> >
>> > The time to copy those strings out of shared storage, any time
>> > you query pg_stat_activity.
>>
>> It seems like you're masterminding this, and I don't know why. It
>> seems unlikely that anyone will raise the value unless they have very
>> long queries, and if those people would rather pay the overhead of
>> copying more data than have their queries truncated, who are we to
>> argue?
>>
>> If increasing the maximum imposed some noticeable cost on
>> installations the kept the default setting, that might well be a good
>> argument for not raising the maximum. But I don't think that's the
>> case. I also suspect that the overhead would be pretty darn small even
>> for people who *do* raise the default setting. It looks to me like
>> both reading and write operations on st_activity_raw stop when they
>> hit a NUL byte, so any performance costs on short queries must come
>> from second-order effects (e.g. the main shared memory segment is
>> bigger, so the OS cache is smaller) which are likely irrelevant in
>> practice.
>>
>
>I'm generally in favor of the idea of allowing people to make
>trade-offs that best work for them, but Tom's concern does give me
>pause, because it isn't clear to me how people will measure the
>overhead of upping this setting. If given the option people will
>almost certainly start raising this limit because the benefits are
>obvious ("I can see all my query now!") but so far the explanation of
>the downsides have been either hand-wavy or, in the case of your
>second paragraph, an argument they are non-existent, which doesn't
>seem right either; so how do we explain to people how to measure the
>overhead for them?
>

I think there are two questions that we need to answer:

1) Does allowing higher values for the GUC mean overhead for people who
don't actually increase it?

I don't think so.

2) What's the overhead for increasing the value for short/long queries?

My assumption is that for short queries, it's going to be negligible.
For longer queries it may be measurable, but I'd expect longer queries
to be more expensive in general, so maybe it's still negligible.

Of course, the easiest thing we can do is actually measuring this.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PATCH] Increase the maximum value track_activity_query_size

From
Tom Lane
Date:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> 2) What's the overhead for increasing the value for short/long queries?

> My assumption is that for short queries, it's going to be negligible.
> For longer queries it may be measurable, but I'd expect longer queries
> to be more expensive in general, so maybe it's still negligible.

The thing that has been bothering me is the idea that backends reading
st_activity_raw might palloc the max possible length and/or memcpy the
whole buffer rather than just the valid part.  Having now rooted through
pgstat.c, that appears to be half true: the local allocation made by
pgstat_read_current_status() will be just as large as the shared-memory
arena, but we use strcpy() or equivalent so that each query copy should
stop upon hitting a '\0'.  So the run-time cost should be negligible, but
you might be eating a lot of memory if multiple sessions are inspecting
pg_stat_activity and you cranked the setting up imprudently high.

This doesn't seem like a reason not to allow a higher limit, like a
megabyte or so, but I'm not sure that pushing it to the moon would be
wise.

Meanwhile, I noted what seems like a pretty obvious bug in
pg_stat_get_backend_activity():

    clipped_activity = pgstat_clip_activity(activity);
    ret = cstring_to_text(activity);
    pfree(clipped_activity);

We're not actually applying the intended clip to the returned
value, so that an invalidly-encoded result is possible.

(Of course, since we also don't seem to be making any attempt
to translate from the source backend's encoding to our own,
there's more problems here than just that.)

            regards, tom lane



Re: [PATCH] Increase the maximum value track_activity_query_size

From
Andrew Dunstan
Date:
On Tue, Dec 31, 2019 at 9:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

>
> This doesn't seem like a reason not to allow a higher limit, like a
> megabyte or so, but I'm not sure that pushing it to the moon would be
> wise.
>


Just to get a mental handle on the size of queries we might be
allowing before truncation, I did some very rough arithmetic on what
well known texts might fit in a megabyte. By my calculations you could
fit about four Animal Farms or one Madame Bovary in about a megabyte.
So I think that seems like more than enough :-). (My mind kinda
explores at the thought of debugging a query as long as Animal Farm.)

cheers

andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PATCH] Increase the maximum value track_activity_query_size

From
Andrew Dunstan
Date:
On Tue, Dec 31, 2019 at 10:16 AM Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
>
> On Tue, Dec 31, 2019 at 9:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> >
> > This doesn't seem like a reason not to allow a higher limit, like a
> > megabyte or so, but I'm not sure that pushing it to the moon would be
> > wise.
> >
>
>
> Just to get a mental handle on the size of queries we might be
> allowing before truncation, I did some very rough arithmetic on what
> well known texts might fit in a megabyte. By my calculations you could
> fit about four Animal Farms or one Madame Bovary in about a megabyte.
> So I think that seems like more than enough :-). (My mind kinda
> explores at the thought of debugging a query as long as Animal Farm.)
>


Turns out my arithmetic was a bit off. Animal Farm is 90 kb, Madame
Bovary 678 Kb.

cheers

andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PATCH] Increase the maximum value track_activity_query_size

From
Alvaro Herrera
Date:
On 2019-Dec-31, Andrew Dunstan wrote:

> Turns out my arithmetic was a bit off. Animal Farm is 90 kb, Madame
> Bovary 678 Kb.

Ten animal farms should be enough for everybody.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PATCH] Increase the maximum value track_activity_query_size

From
Tomas Vondra
Date:
On Tue, Dec 31, 2019 at 10:25:26AM +1030, Andrew Dunstan wrote:
>On Tue, Dec 31, 2019 at 10:16 AM Andrew Dunstan
><andrew.dunstan@2ndquadrant.com> wrote:
>>
>> On Tue, Dec 31, 2019 at 9:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> >
>> > This doesn't seem like a reason not to allow a higher limit, like a
>> > megabyte or so, but I'm not sure that pushing it to the moon would be
>> > wise.
>> >
>>
>>
>> Just to get a mental handle on the size of queries we might be
>> allowing before truncation, I did some very rough arithmetic on what
>> well known texts might fit in a megabyte. By my calculations you could
>> fit about four Animal Farms or one Madame Bovary in about a megabyte.
>> So I think that seems like more than enough :-). (My mind kinda
>> explores at the thought of debugging a query as long as Animal Farm.)
>>
>
>
>Turns out my arithmetic was a bit off. Animal Farm is 90 kb, Madame
>Bovary 678 Kb.
>

Not sure, but the Animal Farm text I found is about ~450kB (~120 pages,
with ~3kB per page) ...

Anyway, the longest queries I personally saw in production were a couple
of kB long (~32kB IIRC, it's been a couple years ago). The queries were
generated by the application (not quite a traditional ORM, but something
like it), with long identifiers (e.g. table names) pretty long due to
including a hash (so being 63 characters most of the time). Plus the
columns were always fully qualified, with multiple joins etc.

Not sure what a good limit would be. Obviously, if we pick value X, the
next day someone will come asking for X+1 ... ;-)


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PATCH] Increase the maximum value track_activity_query_size

From
Andrew Dunstan
Date:
On Tue, Dec 31, 2019 at 11:16 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Tue, Dec 31, 2019 at 10:25:26AM +1030, Andrew Dunstan wrote:
> >On Tue, Dec 31, 2019 at 10:16 AM Andrew Dunstan
> ><andrew.dunstan@2ndquadrant.com> wrote:
> >>
> >> On Tue, Dec 31, 2019 at 9:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>
> >> >
> >> > This doesn't seem like a reason not to allow a higher limit, like a
> >> > megabyte or so, but I'm not sure that pushing it to the moon would be
> >> > wise.
> >> >
> >>
> >>
> >> Just to get a mental handle on the size of queries we might be
> >> allowing before truncation, I did some very rough arithmetic on what
> >> well known texts might fit in a megabyte. By my calculations you could
> >> fit about four Animal Farms or one Madame Bovary in about a megabyte.
> >> So I think that seems like more than enough :-). (My mind kinda
> >> explores at the thought of debugging a query as long as Animal Farm.)
> >>
> >
> >
> >Turns out my arithmetic was a bit off. Animal Farm is 90 kb, Madame
> >Bovary 678 Kb.
> >
>
> Not sure, but the Animal Farm text I found is about ~450kB (~120 pages,
> with ~3kB per page) ...


My browser has led me astray.
http://gutenberg.net.au/ebooks01/0100011.txt is in fact 172618 bytes.

cheers

andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PATCH] Increase the maximum value track_activity_query_size

From
Jeff Janes
Date:
On Mon, Dec 30, 2019 at 6:46 PM Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:
On Tue, Dec 31, 2019 at 9:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

>
> This doesn't seem like a reason not to allow a higher limit, like a
> megabyte or so, but I'm not sure that pushing it to the moon would be
> wise.
>


Just to get a mental handle on the size of queries we might be
allowing before truncation, I did some very rough arithmetic on what
well known texts might fit in a megabyte. By my calculations you could
fit about four Animal Farms or one Madame Bovary in about a megabyte.
So I think that seems like more than enough :-). (My mind kinda
explores at the thought of debugging a query as long as Animal Farm.)


I've seen some pretty big IN-lists and VALUES lists.   They aren't so hard to debug once you tune out iterations 3 through N-3 of the list members.  Unless they are hard to debug for other reasons.  In these cases, it would be helpful, if not just allowing bigger texts in general, to instead "truncate" from the middle, preserving both the beginning and the end of the query text.

Cheers,

Jeff

Re: [PATCH] Increase the maximum value track_activity_query_size

From
Robert Haas
Date:
On Thu, Jan 2, 2020 at 3:27 PM Jeff Janes <jeff.janes@gmail.com> wrote:
> I've seen some pretty big IN-lists and VALUES lists.   They aren't so hard to debug once you tune out iterations 3
throughN-3 of the list members.  Unless they are hard to debug for other reasons.  In these cases, it would be helpful,
ifnot just allowing bigger texts in general, to instead "truncate" from the middle, preserving both the beginning and
theend of the query text. 

I vote for not trying to make this more complicated and just accepting
the original proposal. It's about a factor of ten increase over the
limit we have right now, which doesn't seem like enough to cause any
real breakage, and it should be enough to satisfy the majority of the
people who are unhappy with the current limit, and it is very little
work. If somebody wants to do more work on this later, they can, but I
don't think the OP should be on the hook for that.

At some point, someone (I think Peter Geoghegan) suggested that
pg_stat_statements ought to normalize IN lists down to a single
element. That kind of thing might be another approach to the problem
you mention. It's a bit easier to imagine doing such a thing from a
tool like that than it is to do it for strings in pg_stat_activity
because pg_stat_statements has got a parse tree to work with, not just
a flat sting. And that might work more nicely than just keeping the
beginning and end of the string, but of course it's also more
complicated, so I don't know.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PATCH] Increase the maximum value track_activity_query_size

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> I vote for not trying to make this more complicated and just accepting
> the original proposal. It's about a factor of ten increase over the
> limit we have right now, which doesn't seem like enough to cause any
> real breakage, and it should be enough to satisfy the majority of the
> people who are unhappy with the current limit, and it is very little
> work.

+1 ... we've surely beaten this topic to death by now.

            regards, tom lane



Re: [PATCH] Increase the maximum value track_activity_query_size

From
Michael Paquier
Date:
On Fri, Jan 03, 2020 at 01:48:56PM -0500, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I vote for not trying to make this more complicated and just accepting
>> the original proposal. It's about a factor of ten increase over the
>> limit we have right now, which doesn't seem like enough to cause any
>> real breakage, and it should be enough to satisfy the majority of the
>> people who are unhappy with the current limit, and it is very little
>> work.
>
> +1 ... we've surely beaten this topic to death by now.

Sounds like an agreement then.  The original patch documents the range
in postgresql.conf.sample, which is fine by me as this is done for
some parameters, and skips the part about doc/, which also matches
with the surrounding effort for other parameters, so the whole looks
fine seen from here.  Anybody willing to commit that?
--
Michael

Attachment

Re: [PATCH] Increase the maximum value track_activity_query_size

From
Robert Haas
Date:
On Sun, Jan 5, 2020 at 11:01 PM Michael Paquier <michael@paquier.xyz> wrote:
> Sounds like an agreement then.  The original patch documents the range
> in postgresql.conf.sample, which is fine by me as this is done for
> some parameters, and skips the part about doc/, which also matches
> with the surrounding effort for other parameters, so the whole looks
> fine seen from here.  Anybody willing to commit that?

Done. I didn't commit the postgresql.conf.sample change because:

(1) I think Bruce voted against it.

(2) It makes the line a little wide, and I'd rather not do that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PATCH] Increase the maximum value track_activity_query_size

From
Michael Paquier
Date:
On Tue, Jan 07, 2020 at 12:21:26PM -0500, Robert Haas wrote:
> Done. I didn't commit the postgresql.conf.sample change because:
>
> (1) I think Bruce voted against it.
>
> (2) It makes the line a little wide, and I'd rather not do that.

Thanks!
--
Michael

Attachment