Thread: [PATCH] Increase the maximum value track_activity_query_size
Hi Hackers, Some ORMs may generate queries larger than the maximum possible value of track_activity_query_size (100 kB). Is there any reasons to limit the maximum value of track_activity_query_size to such small value? Increasing the maximum value to 1 MB will help partially solve this problem. Also in the file postgresql.conf.sample pointed maximum value track_activity_query_size (before that it was not specified). -- Vyacheslav Makarov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Good question. I am in favor of allowing a larger value if no one objects. I don't think adding the min/max is helpful. --------------------------------------------------------------------------- On Tue, Nov 26, 2019 at 05:59:25PM +0300, v.makarov@postgrespro.ru wrote: > Hi Hackers, > > Some ORMs may generate queries larger than the maximum possible value of > track_activity_query_size (100 kB). > Is there any reasons to limit the maximum value of track_activity_query_size > to such small value? > Increasing the maximum value to 1 MB will help partially solve this problem. > Also in the file postgresql.conf.sample pointed maximum value > track_activity_query_size (before that it was not specified). > > -- > Vyacheslav Makarov > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company > diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c > index ba4edde71a..0e64dc1dbb 100644 > --- a/src/backend/utils/misc/guc.c > +++ b/src/backend/utils/misc/guc.c > @@ -3200,7 +3200,7 @@ static struct config_int ConfigureNamesInt[] = > GUC_UNIT_BYTE > }, > &pgstat_track_activity_query_size, > - 1024, 100, 102400, > + 1024, 100, 1048576, > NULL, NULL, NULL > }, > > diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample > index 46a06ffacd..55d3bfbfd0 100644 > --- a/src/backend/utils/misc/postgresql.conf.sample > +++ b/src/backend/utils/misc/postgresql.conf.sample > @@ -569,7 +569,7 @@ > #track_counts = on > #track_io_timing = off > #track_functions = none # none, pl, all > -#track_activity_query_size = 1024 # (change requires restart) > +#track_activity_query_size = 1024 # range 100B - 1MB (change requires restart) > #stats_temp_directory = 'pg_stat_tmp' > > -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Bruce Momjian <bruce@momjian.us> writes: > Good question. I am in favor of allowing a larger value if no one > objects. I don't think adding the min/max is helpful. I think there are pretty obvious performance and memory-consumption penalties to very large track_activity_query_size values. Who exactly are we really helping if we let them set it to huge values? (wanders away wondering if we have suitable integer-overflow checks in relevant code paths...) regards, tom lane
On Thu, Dec 19, 2019 at 10:59 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Bruce Momjian <bruce@momjian.us> writes: > > Good question. I am in favor of allowing a larger value if no one > > objects. I don't think adding the min/max is helpful. > > I think there are pretty obvious performance and memory-consumption > penalties to very large track_activity_query_size values. Who exactly > are we really helping if we let them set it to huge values? The original poster. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 19.12.2019 20:52, Robert Haas wrote: > On Thu, Dec 19, 2019 at 10:59 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Bruce Momjian <bruce@momjian.us> writes: >>> Good question. I am in favor of allowing a larger value if no one >>> objects. I don't think adding the min/max is helpful. >> >> The original poster. And probably anyone else, who debugs stuck queries of yet another crazy ORM. Yes, one could use log_min_duration_statement, but having a possibility to directly get it from pg_stat_activity without eyeballing the logs is nice. Also, IIRC log_min_duration_statement applies only to completed statements. > I think there are pretty obvious performance and memory-consumption > penalties to very large track_activity_query_size values. Who exactly > are we really helping if we let them set it to huge values? > > (wanders away wondering if we have suitable integer-overflow checks > in relevant code paths...) The value of pgstat_track_activity_query_size is in bytes, so setting it to any value below INT_MAX seems to be safe from that perspective. However, being multiplied by NumBackendStatSlots its reasonable value should be far below INT_MAX (~2 GB). Sincerely, It does not look for me like something badly needed, but still. We already have hundreds of GUCs and it is easy for a user to build a sub-optimal configuration, so does this overprotection make sense? Regards -- Alexey Kondratov Postgres Professional https://www.postgrespro.com Russian Postgres Company
On Fri, Dec 20, 2019 at 02:35:37PM +0300, Alexey Kondratov wrote: > On 19.12.2019 20:52, Robert Haas wrote: > > On Thu, Dec 19, 2019 at 10:59 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > Bruce Momjian <bruce@momjian.us> writes: > > > > Good question. I am in favor of allowing a larger value if no one > > > > objects. I don't think adding the min/max is helpful. > > > > > > The original poster. > > > And probably anyone else, who debugs stuck queries of yet another crazy ORM. > Yes, one could use log_min_duration_statement, but having a possibility to > directly get it from pg_stat_activity without eyeballing the logs is nice. > Also, IIRC log_min_duration_statement applies only to completed statements. Yes, you would need log_statement = true. > > I think there are pretty obvious performance and memory-consumption > > penalties to very large track_activity_query_size values. Who exactly > > are we really helping if we let them set it to huge values? > > > > (wanders away wondering if we have suitable integer-overflow checks > > in relevant code paths...) > > > The value of pgstat_track_activity_query_size is in bytes, so setting it to > any value below INT_MAX seems to be safe from that perspective. However, > being multiplied by NumBackendStatSlots its reasonable value should be far > below INT_MAX (~2 GB). > > Sincerely, It does not look for me like something badly needed, but still. > We already have hundreds of GUCs and it is easy for a user to build a > sub-optimal configuration, so does this overprotection make sense? I can imagine using larger pgstat_track_activity_query_size values for data warehouse queries, where they are long and there are only a few of them. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Fri, Dec 20, 2019 at 08:57:04AM -0500, Bruce Momjian wrote: > I can imagine using larger pgstat_track_activity_query_size values for > data warehouse queries, where they are long and there are only a few of > them. Why are those queries that long anyway? A too long IN clause with an insane amount of parameters which could be replaced by an ANY clause with an array? -- Michael
Attachment
Here is what ORMs do:
select length('SELECT "column_name_1001", "column_name_1002", "column_name_1003", "column_name_1004", "column_name_1005", "column_name_1006", "column_name_1007", "column_name_1008", "column_name_1009", "column_name_1010", "column_name_1011", "column_name_1012", "column_name_1013", "column_name_1014", "column_name_1015", "column_name_1016", "column_name_1017", "column_name_1018", "column_name_1019", "column_name_1020", "column_name_1021", "column_name_1022", "column_name_1023", "column_name_1024", "column_name_1025", "column_name_1026", "column_name_1027", "column_name_1028", "column_name_1029", "column_name_1030", "column_name_1031", "column_name_1032", "column_name_1033", "column_name_1034", "column_name_1035", "column_name_1036", "column_name_1037", "column_name_1038", "column_name_1039", "column_name_1040", "column_name_1041", "column_name_1042", "column_name_1043", "column_name_1044", "column_name_1045", "column_name_1046", "column_name_1047", "column_name_1048", "column_name_1049", "column_name_1050" FROM "some_table";');
length
--------
1024
(1 row)
That's it – with default settings, you won't see WHERE clause or
anything else.
It is not only about analytical workloads. I see it for regular OLTP
workloads literally *any* large project that use an ORM. Ruby on Rails'
ActiveRecord does it, Java'sHibernate does, and so on.
As a result, many queries exceed track_activity_query_size, and we
end up having queries trimmed in pg_stat_activity. Why it is bad:- it
makes an automated analysis involving pg_stat_activity impossible,
it complicates any manual troubleshooting involving pg_stat_activity.
Changing this parameter in a mission-critical database is difficult
because it requires a restart.
+1 for changing it to 1M or at least to 100k. If the penalty is significant,
at least 10k.
What is the overhead here except the memory consumption?
Consumption of, say,100 * 1M = 1MiB of RAM is a low price for much
better transparency here. But what about the performance penalty?
Some benchmark would be nice to answer this.
On Fri, Dec 20, 2019 at 6:48 PM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Dec 20, 2019 at 08:57:04AM -0500, Bruce Momjian wrote:
> I can imagine using larger pgstat_track_activity_query_size values for
> data warehouse queries, where they are long and there are only a few of
> them.
Why are those queries that long anyway? A too long IN clause with an
insane amount of parameters which could be replaced by an ANY clause
with an array?
--
Michael
Nikolay Samokhvalov <samokhvalov@gmail.com> writes: > Here is what ORMs do: > select length('SELECT "column_name_1001", "column_name_1002", > "column_name_1003", "column_name_1004", "column_name_1005", > "column_name_1006", "column_name_1007", "column_name_1008", > "column_name_1009", "column_name_1010", "column_name_1011", > "column_name_1012", "column_name_1013", "column_name_1014", > "column_name_1015", "column_name_1016", "column_name_1017", > "column_name_1018", "column_name_1019", "column_name_1020", > "column_name_1021", "column_name_1022", "column_name_1023", > "column_name_1024", "column_name_1025", "column_name_1026", > "column_name_1027", "column_name_1028", "column_name_1029", > "column_name_1030", "column_name_1031", "column_name_1032", > "column_name_1033", "column_name_1034", "column_name_1035", > "column_name_1036", "column_name_1037", "column_name_1038", > "column_name_1039", "column_name_1040", "column_name_1041", > "column_name_1042", "column_name_1043", "column_name_1044", > "column_name_1045", "column_name_1046", "column_name_1047", > "column_name_1048", "column_name_1049", "column_name_1050" FROM > "some_table";'); > length > -------- > 1024 > (1 row) > That's it – with default settings, you won't see WHERE clause or > anything else. If that's true, it doesn't offer much of a case for upping the limit on track_activity_query_size. The longest such a query could reasonably get is somewhere near NAMEDATALEN times MaxHeapAttributeNumber, which as it happens is exactly the existing limit on track_activity_query_size. > As a result, many queries exceed track_activity_query_size How? And if they are, why do you care? Such queries sure seem pretty content-free. > What is the overhead here except the memory consumption? The time to copy those strings out of shared storage, any time you query pg_stat_activity. regards, tom lane
On Sat, Dec 21, 2019 at 04:25:05PM -0500, Tom Lane wrote: >Nikolay Samokhvalov <samokhvalov@gmail.com> writes: >> Here is what ORMs do: >> select length('SELECT "column_name_1001", "column_name_1002", >> "column_name_1003", "column_name_1004", "column_name_1005", >> "column_name_1006", "column_name_1007", "column_name_1008", >> "column_name_1009", "column_name_1010", "column_name_1011", >> "column_name_1012", "column_name_1013", "column_name_1014", >> "column_name_1015", "column_name_1016", "column_name_1017", >> "column_name_1018", "column_name_1019", "column_name_1020", >> "column_name_1021", "column_name_1022", "column_name_1023", >> "column_name_1024", "column_name_1025", "column_name_1026", >> "column_name_1027", "column_name_1028", "column_name_1029", >> "column_name_1030", "column_name_1031", "column_name_1032", >> "column_name_1033", "column_name_1034", "column_name_1035", >> "column_name_1036", "column_name_1037", "column_name_1038", >> "column_name_1039", "column_name_1040", "column_name_1041", >> "column_name_1042", "column_name_1043", "column_name_1044", >> "column_name_1045", "column_name_1046", "column_name_1047", >> "column_name_1048", "column_name_1049", "column_name_1050" FROM >> "some_table";'); >> length >> -------- >> 1024 >> (1 row) > >> That's it – with default settings, you won't see WHERE clause or >> anything else. > >If that's true, it doesn't offer much of a case for upping the limit >on track_activity_query_size. The longest such a query could reasonably >get is somewhere near NAMEDATALEN times MaxHeapAttributeNumber, which >as it happens is exactly the existing limit on track_activity_query_size. > >> As a result, many queries exceed track_activity_query_size > >How? And if they are, why do you care? Such queries sure seem >pretty content-free. > I believe the example was just a very simplistic example. ORMs can of course generate queries with joins, which can easily exceed the limit you mentioned. >> What is the overhead here except the memory consumption? > >The time to copy those strings out of shared storage, any time >you query pg_stat_activity. > IMO that seems like a reasonable price to pay, if you want to see complete queries and bump the track_activity_query_size value up. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, Dec 22, 2019 at 1:03 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > On Sat, Dec 21, 2019 at 04:25:05PM -0500, Tom Lane wrote: > >Nikolay Samokhvalov <samokhvalov@gmail.com> writes: > >> Here is what ORMs do: > >> select length('SELECT "column_name_1001", "column_name_1002", > >> "column_name_1003", "column_name_1004", "column_name_1005", > >> "column_name_1006", "column_name_1007", "column_name_1008", > >> "column_name_1009", "column_name_1010", "column_name_1011", > >> "column_name_1012", "column_name_1013", "column_name_1014", > >> "column_name_1015", "column_name_1016", "column_name_1017", > >> "column_name_1018", "column_name_1019", "column_name_1020", > >> "column_name_1021", "column_name_1022", "column_name_1023", > >> "column_name_1024", "column_name_1025", "column_name_1026", > >> "column_name_1027", "column_name_1028", "column_name_1029", > >> "column_name_1030", "column_name_1031", "column_name_1032", > >> "column_name_1033", "column_name_1034", "column_name_1035", > >> "column_name_1036", "column_name_1037", "column_name_1038", > >> "column_name_1039", "column_name_1040", "column_name_1041", > >> "column_name_1042", "column_name_1043", "column_name_1044", > >> "column_name_1045", "column_name_1046", "column_name_1047", > >> "column_name_1048", "column_name_1049", "column_name_1050" FROM > >> "some_table";'); > >> length > >> -------- > >> 1024 > >> (1 row) > > > >> That's it – with default settings, you won't see WHERE clause or > >> anything else. > > > >If that's true, it doesn't offer much of a case for upping the limit > >on track_activity_query_size. The longest such a query could reasonably > >get is somewhere near NAMEDATALEN times MaxHeapAttributeNumber, which > >as it happens is exactly the existing limit on track_activity_query_size. > > > >> As a result, many queries exceed track_activity_query_size > > > >How? And if they are, why do you care? Such queries sure seem > >pretty content-free. > > > > I believe the example was just a very simplistic example. ORMs can of > course generate queries with joins, which can easily exceed the limit > you mentioned. > > >> What is the overhead here except the memory consumption? > > > >The time to copy those strings out of shared storage, any time > >you query pg_stat_activity. > > > > IMO that seems like a reasonable price to pay, if you want to see > complete queries and bump the track_activity_query_size value up. Couldn't be pg_stat_statements (or any similar extension) queryid exposure in pg_stat_activity [1] also an alternative? You wouldn't have the parameters but maybe the normalized query would be enough for most analysis. Now, maybe pg_stat_statements jumble overhead for such large statements would be even more problematic. [1] https://commitfest.postgresql.org/26/2069/
On Sun, Dec 22, 2019 at 09:06:41AM +0100, Julien Rouhaud wrote: >On Sun, Dec 22, 2019 at 1:03 AM Tomas Vondra ><tomas.vondra@2ndquadrant.com> wrote: >> >> On Sat, Dec 21, 2019 at 04:25:05PM -0500, Tom Lane wrote: >> >Nikolay Samokhvalov <samokhvalov@gmail.com> writes: >> >> Here is what ORMs do: >> >> select length('SELECT "column_name_1001", "column_name_1002", >> >> "column_name_1003", "column_name_1004", "column_name_1005", >> >> "column_name_1006", "column_name_1007", "column_name_1008", >> >> "column_name_1009", "column_name_1010", "column_name_1011", >> >> "column_name_1012", "column_name_1013", "column_name_1014", >> >> "column_name_1015", "column_name_1016", "column_name_1017", >> >> "column_name_1018", "column_name_1019", "column_name_1020", >> >> "column_name_1021", "column_name_1022", "column_name_1023", >> >> "column_name_1024", "column_name_1025", "column_name_1026", >> >> "column_name_1027", "column_name_1028", "column_name_1029", >> >> "column_name_1030", "column_name_1031", "column_name_1032", >> >> "column_name_1033", "column_name_1034", "column_name_1035", >> >> "column_name_1036", "column_name_1037", "column_name_1038", >> >> "column_name_1039", "column_name_1040", "column_name_1041", >> >> "column_name_1042", "column_name_1043", "column_name_1044", >> >> "column_name_1045", "column_name_1046", "column_name_1047", >> >> "column_name_1048", "column_name_1049", "column_name_1050" FROM >> >> "some_table";'); >> >> length >> >> -------- >> >> 1024 >> >> (1 row) >> > >> >> That's it – with default settings, you won't see WHERE clause or >> >> anything else. >> > >> >If that's true, it doesn't offer much of a case for upping the limit >> >on track_activity_query_size. The longest such a query could reasonably >> >get is somewhere near NAMEDATALEN times MaxHeapAttributeNumber, which >> >as it happens is exactly the existing limit on track_activity_query_size. >> > >> >> As a result, many queries exceed track_activity_query_size >> > >> >How? And if they are, why do you care? Such queries sure seem >> >pretty content-free. >> > >> >> I believe the example was just a very simplistic example. ORMs can of >> course generate queries with joins, which can easily exceed the limit >> you mentioned. >> >> >> What is the overhead here except the memory consumption? >> > >> >The time to copy those strings out of shared storage, any time >> >you query pg_stat_activity. >> > >> >> IMO that seems like a reasonable price to pay, if you want to see >> complete queries and bump the track_activity_query_size value up. > >Couldn't be pg_stat_statements (or any similar extension) queryid >exposure in pg_stat_activity [1] also an alternative? You wouldn't >have the parameters but maybe the normalized query would be enough for >most analysis. Now, maybe pg_stat_statements jumble overhead for such >large statements would be even more problematic. > But that would effectively add dependency on pg_stat_statements, no? I don't think we want that. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Dec 23, 2019 at 1:10 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > On Sun, Dec 22, 2019 at 09:06:41AM +0100, Julien Rouhaud wrote: > >On Sun, Dec 22, 2019 at 1:03 AM Tomas Vondra > ><tomas.vondra@2ndquadrant.com> wrote: > >> > >> On Sat, Dec 21, 2019 at 04:25:05PM -0500, Tom Lane wrote: > >> >> What is the overhead here except the memory consumption? > >> > > >> >The time to copy those strings out of shared storage, any time > >> >you query pg_stat_activity. > >> > > >> > >> IMO that seems like a reasonable price to pay, if you want to see > >> complete queries and bump the track_activity_query_size value up. > > > >Couldn't be pg_stat_statements (or any similar extension) queryid > >exposure in pg_stat_activity [1] also an alternative? You wouldn't > >have the parameters but maybe the normalized query would be enough for > >most analysis. Now, maybe pg_stat_statements jumble overhead for such > >large statements would be even more problematic. > > > > But that would effectively add dependency on pg_stat_statements, no? I > don't think we want that. The queryid field is part of the core, so no dependency is added. You just get a somewhat useless NULL value returned until you load an extension that compute a queryid, which may be pg_stat_statements but any other one will work too.
On Sat, Dec 21, 2019 at 1:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > What is the overhead here except the memory consumption? > > The time to copy those strings out of shared storage, any time > you query pg_stat_activity. It seems like you're masterminding this, and I don't know why. It seems unlikely that anyone will raise the value unless they have very long queries, and if those people would rather pay the overhead of copying more data than have their queries truncated, who are we to argue? If increasing the maximum imposed some noticeable cost on installations the kept the default setting, that might well be a good argument for not raising the maximum. But I don't think that's the case. I also suspect that the overhead would be pretty darn small even for people who *do* raise the default setting. It looks to me like both reading and write operations on st_activity_raw stop when they hit a NUL byte, so any performance costs on short queries must come from second-order effects (e.g. the main shared memory segment is bigger, so the OS cache is smaller) which are likely irrelevant in practice. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Dec 24, 2019 at 12:11 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Sat, Dec 21, 2019 at 1:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > What is the overhead here except the memory consumption?
>
> The time to copy those strings out of shared storage, any time
> you query pg_stat_activity.
It seems like you're masterminding this, and I don't know why. It
seems unlikely that anyone will raise the value unless they have very
long queries, and if those people would rather pay the overhead of
copying more data than have their queries truncated, who are we to
argue?
+1
Cheers,
Jeff
On Mon, Dec 23, 2019 at 9:11 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Sat, Dec 21, 2019 at 1:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > What is the overhead here except the memory consumption? > > > > The time to copy those strings out of shared storage, any time > > you query pg_stat_activity. > > It seems like you're masterminding this, and I don't know why. It > seems unlikely that anyone will raise the value unless they have very > long queries, and if those people would rather pay the overhead of > copying more data than have their queries truncated, who are we to > argue? > > If increasing the maximum imposed some noticeable cost on > installations the kept the default setting, that might well be a good > argument for not raising the maximum. But I don't think that's the > case. I also suspect that the overhead would be pretty darn small even > for people who *do* raise the default setting. It looks to me like > both reading and write operations on st_activity_raw stop when they > hit a NUL byte, so any performance costs on short queries must come > from second-order effects (e.g. the main shared memory segment is > bigger, so the OS cache is smaller) which are likely irrelevant in > practice. > I'm generally in favor of the idea of allowing people to make trade-offs that best work for them, but Tom's concern does give me pause, because it isn't clear to me how people will measure the overhead of upping this setting. If given the option people will almost certainly start raising this limit because the benefits are obvious ("I can see all my query now!") but so far the explanation of the downsides have been either hand-wavy or, in the case of your second paragraph, an argument they are non-existent, which doesn't seem right either; so how do we explain to people how to measure the overhead for them? Robert Treat https://xzilla.net
On Mon, Dec 30, 2019 at 12:46:40PM -0800, Robert Treat wrote: >On Mon, Dec 23, 2019 at 9:11 PM Robert Haas <robertmhaas@gmail.com> wrote: >> >> On Sat, Dec 21, 2019 at 1:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> > > What is the overhead here except the memory consumption? >> > >> > The time to copy those strings out of shared storage, any time >> > you query pg_stat_activity. >> >> It seems like you're masterminding this, and I don't know why. It >> seems unlikely that anyone will raise the value unless they have very >> long queries, and if those people would rather pay the overhead of >> copying more data than have their queries truncated, who are we to >> argue? >> >> If increasing the maximum imposed some noticeable cost on >> installations the kept the default setting, that might well be a good >> argument for not raising the maximum. But I don't think that's the >> case. I also suspect that the overhead would be pretty darn small even >> for people who *do* raise the default setting. It looks to me like >> both reading and write operations on st_activity_raw stop when they >> hit a NUL byte, so any performance costs on short queries must come >> from second-order effects (e.g. the main shared memory segment is >> bigger, so the OS cache is smaller) which are likely irrelevant in >> practice. >> > >I'm generally in favor of the idea of allowing people to make >trade-offs that best work for them, but Tom's concern does give me >pause, because it isn't clear to me how people will measure the >overhead of upping this setting. If given the option people will >almost certainly start raising this limit because the benefits are >obvious ("I can see all my query now!") but so far the explanation of >the downsides have been either hand-wavy or, in the case of your >second paragraph, an argument they are non-existent, which doesn't >seem right either; so how do we explain to people how to measure the >overhead for them? > I think there are two questions that we need to answer: 1) Does allowing higher values for the GUC mean overhead for people who don't actually increase it? I don't think so. 2) What's the overhead for increasing the value for short/long queries? My assumption is that for short queries, it's going to be negligible. For longer queries it may be measurable, but I'd expect longer queries to be more expensive in general, so maybe it's still negligible. Of course, the easiest thing we can do is actually measuring this. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > 2) What's the overhead for increasing the value for short/long queries? > My assumption is that for short queries, it's going to be negligible. > For longer queries it may be measurable, but I'd expect longer queries > to be more expensive in general, so maybe it's still negligible. The thing that has been bothering me is the idea that backends reading st_activity_raw might palloc the max possible length and/or memcpy the whole buffer rather than just the valid part. Having now rooted through pgstat.c, that appears to be half true: the local allocation made by pgstat_read_current_status() will be just as large as the shared-memory arena, but we use strcpy() or equivalent so that each query copy should stop upon hitting a '\0'. So the run-time cost should be negligible, but you might be eating a lot of memory if multiple sessions are inspecting pg_stat_activity and you cranked the setting up imprudently high. This doesn't seem like a reason not to allow a higher limit, like a megabyte or so, but I'm not sure that pushing it to the moon would be wise. Meanwhile, I noted what seems like a pretty obvious bug in pg_stat_get_backend_activity(): clipped_activity = pgstat_clip_activity(activity); ret = cstring_to_text(activity); pfree(clipped_activity); We're not actually applying the intended clip to the returned value, so that an invalidly-encoded result is possible. (Of course, since we also don't seem to be making any attempt to translate from the source backend's encoding to our own, there's more problems here than just that.) regards, tom lane
On Tue, Dec 31, 2019 at 9:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > This doesn't seem like a reason not to allow a higher limit, like a > megabyte or so, but I'm not sure that pushing it to the moon would be > wise. > Just to get a mental handle on the size of queries we might be allowing before truncation, I did some very rough arithmetic on what well known texts might fit in a megabyte. By my calculations you could fit about four Animal Farms or one Madame Bovary in about a megabyte. So I think that seems like more than enough :-). (My mind kinda explores at the thought of debugging a query as long as Animal Farm.) cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Dec 31, 2019 at 10:16 AM Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: > > On Tue, Dec 31, 2019 at 9:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > > This doesn't seem like a reason not to allow a higher limit, like a > > megabyte or so, but I'm not sure that pushing it to the moon would be > > wise. > > > > > Just to get a mental handle on the size of queries we might be > allowing before truncation, I did some very rough arithmetic on what > well known texts might fit in a megabyte. By my calculations you could > fit about four Animal Farms or one Madame Bovary in about a megabyte. > So I think that seems like more than enough :-). (My mind kinda > explores at the thought of debugging a query as long as Animal Farm.) > Turns out my arithmetic was a bit off. Animal Farm is 90 kb, Madame Bovary 678 Kb. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-Dec-31, Andrew Dunstan wrote: > Turns out my arithmetic was a bit off. Animal Farm is 90 kb, Madame > Bovary 678 Kb. Ten animal farms should be enough for everybody. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Dec 31, 2019 at 10:25:26AM +1030, Andrew Dunstan wrote: >On Tue, Dec 31, 2019 at 10:16 AM Andrew Dunstan ><andrew.dunstan@2ndquadrant.com> wrote: >> >> On Tue, Dec 31, 2019 at 9:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >> > >> > This doesn't seem like a reason not to allow a higher limit, like a >> > megabyte or so, but I'm not sure that pushing it to the moon would be >> > wise. >> > >> >> >> Just to get a mental handle on the size of queries we might be >> allowing before truncation, I did some very rough arithmetic on what >> well known texts might fit in a megabyte. By my calculations you could >> fit about four Animal Farms or one Madame Bovary in about a megabyte. >> So I think that seems like more than enough :-). (My mind kinda >> explores at the thought of debugging a query as long as Animal Farm.) >> > > >Turns out my arithmetic was a bit off. Animal Farm is 90 kb, Madame >Bovary 678 Kb. > Not sure, but the Animal Farm text I found is about ~450kB (~120 pages, with ~3kB per page) ... Anyway, the longest queries I personally saw in production were a couple of kB long (~32kB IIRC, it's been a couple years ago). The queries were generated by the application (not quite a traditional ORM, but something like it), with long identifiers (e.g. table names) pretty long due to including a hash (so being 63 characters most of the time). Plus the columns were always fully qualified, with multiple joins etc. Not sure what a good limit would be. Obviously, if we pick value X, the next day someone will come asking for X+1 ... ;-) regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Dec 31, 2019 at 11:16 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > On Tue, Dec 31, 2019 at 10:25:26AM +1030, Andrew Dunstan wrote: > >On Tue, Dec 31, 2019 at 10:16 AM Andrew Dunstan > ><andrew.dunstan@2ndquadrant.com> wrote: > >> > >> On Tue, Dec 31, 2019 at 9:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> > >> > > >> > This doesn't seem like a reason not to allow a higher limit, like a > >> > megabyte or so, but I'm not sure that pushing it to the moon would be > >> > wise. > >> > > >> > >> > >> Just to get a mental handle on the size of queries we might be > >> allowing before truncation, I did some very rough arithmetic on what > >> well known texts might fit in a megabyte. By my calculations you could > >> fit about four Animal Farms or one Madame Bovary in about a megabyte. > >> So I think that seems like more than enough :-). (My mind kinda > >> explores at the thought of debugging a query as long as Animal Farm.) > >> > > > > > >Turns out my arithmetic was a bit off. Animal Farm is 90 kb, Madame > >Bovary 678 Kb. > > > > Not sure, but the Animal Farm text I found is about ~450kB (~120 pages, > with ~3kB per page) ... My browser has led me astray. http://gutenberg.net.au/ebooks01/0100011.txt is in fact 172618 bytes. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Dec 30, 2019 at 6:46 PM Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:
On Tue, Dec 31, 2019 at 9:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> This doesn't seem like a reason not to allow a higher limit, like a
> megabyte or so, but I'm not sure that pushing it to the moon would be
> wise.
>
Just to get a mental handle on the size of queries we might be
allowing before truncation, I did some very rough arithmetic on what
well known texts might fit in a megabyte. By my calculations you could
fit about four Animal Farms or one Madame Bovary in about a megabyte.
So I think that seems like more than enough :-). (My mind kinda
explores at the thought of debugging a query as long as Animal Farm.)
I've seen some pretty big IN-lists and VALUES lists. They aren't so hard to debug once you tune out iterations 3 through N-3 of the list members. Unless they are hard to debug for other reasons. In these cases, it would be helpful, if not just allowing bigger texts in general, to instead "truncate" from the middle, preserving both the beginning and the end of the query text.
Cheers,
Jeff
On Thu, Jan 2, 2020 at 3:27 PM Jeff Janes <jeff.janes@gmail.com> wrote: > I've seen some pretty big IN-lists and VALUES lists. They aren't so hard to debug once you tune out iterations 3 throughN-3 of the list members. Unless they are hard to debug for other reasons. In these cases, it would be helpful, ifnot just allowing bigger texts in general, to instead "truncate" from the middle, preserving both the beginning and theend of the query text. I vote for not trying to make this more complicated and just accepting the original proposal. It's about a factor of ten increase over the limit we have right now, which doesn't seem like enough to cause any real breakage, and it should be enough to satisfy the majority of the people who are unhappy with the current limit, and it is very little work. If somebody wants to do more work on this later, they can, but I don't think the OP should be on the hook for that. At some point, someone (I think Peter Geoghegan) suggested that pg_stat_statements ought to normalize IN lists down to a single element. That kind of thing might be another approach to the problem you mention. It's a bit easier to imagine doing such a thing from a tool like that than it is to do it for strings in pg_stat_activity because pg_stat_statements has got a parse tree to work with, not just a flat sting. And that might work more nicely than just keeping the beginning and end of the string, but of course it's also more complicated, so I don't know. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I vote for not trying to make this more complicated and just accepting > the original proposal. It's about a factor of ten increase over the > limit we have right now, which doesn't seem like enough to cause any > real breakage, and it should be enough to satisfy the majority of the > people who are unhappy with the current limit, and it is very little > work. +1 ... we've surely beaten this topic to death by now. regards, tom lane
On Fri, Jan 03, 2020 at 01:48:56PM -0500, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I vote for not trying to make this more complicated and just accepting >> the original proposal. It's about a factor of ten increase over the >> limit we have right now, which doesn't seem like enough to cause any >> real breakage, and it should be enough to satisfy the majority of the >> people who are unhappy with the current limit, and it is very little >> work. > > +1 ... we've surely beaten this topic to death by now. Sounds like an agreement then. The original patch documents the range in postgresql.conf.sample, which is fine by me as this is done for some parameters, and skips the part about doc/, which also matches with the surrounding effort for other parameters, so the whole looks fine seen from here. Anybody willing to commit that? -- Michael
Attachment
On Sun, Jan 5, 2020 at 11:01 PM Michael Paquier <michael@paquier.xyz> wrote: > Sounds like an agreement then. The original patch documents the range > in postgresql.conf.sample, which is fine by me as this is done for > some parameters, and skips the part about doc/, which also matches > with the surrounding effort for other parameters, so the whole looks > fine seen from here. Anybody willing to commit that? Done. I didn't commit the postgresql.conf.sample change because: (1) I think Bruce voted against it. (2) It makes the line a little wide, and I'd rather not do that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Jan 07, 2020 at 12:21:26PM -0500, Robert Haas wrote: > Done. I didn't commit the postgresql.conf.sample change because: > > (1) I think Bruce voted against it. > > (2) It makes the line a little wide, and I'd rather not do that. Thanks! -- Michael