Thread: [survey] New "Stable" QueryId based on normalized query text

[survey] New "Stable" QueryId based on normalized query text

From
legrand legrand
Date:
Hello,

There are many projects that use alternate QueryId 
distinct from the famous pg_stat_statements jumbling algorithm.

https://github.com/postgrespro/aqo
query_hash

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.Optimize.ViewPlans.html
sql_hash

https://github.com/ossc-db/pg_hint_plan
queryid

Even pg_stat_statement has a normalize function, 
that would answer the current question ...

Here are some *needs* :

needs.1: stable accross different databases,
needs.2: doesn't change after database or object rebuild,
needs.3: search_path / schema independant,
needs.4: pg version independant (as long as possible),
...

and some *normalization rules*:

norm.1: case insensitive
norm.2: blank reduction 
norm.3: hash algoritm ?
norm.4: CURRENT_DATE, CURRENT_TIME, LOCALTIME, LOCALTIMESTAMP not normalized
norm.5: NULL, IS NULL not normalized ?
norm.6: booleans t, f, true, false not normalized
norm.7: order by 1,2 or group by 1,2 should not be normalized
norm.8: pl/pgsql anonymous blocks not normalized
norm.9: comments aware

Do not hesitate to add your thougths
Regards
PAscal



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html


RE: [survey] New "Stable" QueryId based on normalized query text

From
"Tsunakawa, Takayuki"
Date:
From: legrand legrand [mailto:legrand_legrand@hotmail.com]
> There are many projects that use alternate QueryId
> distinct from the famous pg_stat_statements jumbling algorithm.

I'd like to welcome the standard QueryID that DBAs and extension developers can depend on.  Are you surveying the needs
foryou to develop the QueryID that can meet as many needs as possible?
 


> needs.1: stable accross different databases,

Does this mean different database clusters, not different databases in a single database cluster?


needs.5: minimal overhead to calculate
needs.6: doesn't change across database server restarts
needs.7: same value on both the primary and standby?


> norm.9: comments aware

Is this to distinguish queries that have different comments for optimizer hints?  If yes, I agree.


Regards
Takayuki Tsunakawa




Re: [survey] New "Stable" QueryId based on normalized query text

From
Kyotaro HORIGUCHI
Date:
At Wed, 20 Mar 2019 00:23:30 +0000, "Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com> wrote in
<0A3221C70F24FB45833433255569204D1FBE20A4@G01JPEXMBYT05>
> From: legrand legrand [mailto:legrand_legrand@hotmail.com]
> > There are many projects that use alternate QueryId
> > distinct from the famous pg_stat_statements jumbling algorithm.
> 
> I'd like to welcome the standard QueryID that DBAs and extension developers can depend on.  Are you surveying the
needsfor you to develop the QueryID that can meet as many needs as possible?
 
  
+1 to the necessity.

There's a similar thread about adding queryid in pg_stat_activity.

https://www.postgresql.org/message-id/CA%2B8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4%3DD4LuFA%40mail.gmail.com

> > needs.1: stable accross different databases,
> 
> Does this mean different database clusters, not different databases in a single database cluster?

Does this mean you want different QueryID for the same-looking
query for another database in the same cluster?


> needs.5: minimal overhead to calculate
> needs.6: doesn't change across database server restarts
> needs.7: same value on both the primary and standby?
> 
> 
> > norm.9: comments aware
> 
> Is this to distinguish queries that have different comments for optimizer hints?  If yes, I agree.

Or, any means to give an explict query id? I saw many instances
of query that follows a comment describing a query id.

> needs.2: doesn't change after database or object rebuild,
> needs.3: search_path / schema independant,

pg_stat_statements even ignores table/object/column names.

> needs.4: pg version independant (as long as possible),

I don't think this cannot be guaranteed.

> norm.1: case insensitive
> norm.2: blank reduction 
> norm.3: hash algoritm ?
> norm.4: CURRENT_DATE, CURRENT_TIME, LOCALTIME, LOCALTIMESTAMP not normalized
> norm.5: NULL, IS NULL not normalized ?
> norm.6: booleans t, f, true, false not normalized
> norm.7: order by 1,2 or group by 1,2 should not be normalized
> norm.8: pl/pgsql anonymous blocks not normalized

pg_stat_statements can be the base of the discussion on them.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



RE: [survey] New "Stable" QueryId based on normalized query text

From
"Tsunakawa, Takayuki"
Date:
From: Kyotaro HORIGUCHI [mailto:horiguchi.kyotaro@lab.ntt.co.jp]
> > > needs.1: stable accross different databases,
> >
> > Does this mean different database clusters, not different databases in
> a single database cluster?
> 
> Does this mean you want different QueryID for the same-looking
> query for another database in the same cluster?

(I'm afraid this question may be targeted at legland legland, not me...)
I think the same query text can have either same or different QueryID in different databases in the database cluster.
Evenif the QueryID value is the same, we can use DatabaseID to choose desired information.
 


Regards
Takayuki Tsunakawa





RE: [survey] New "Stable" QueryId based on normalized query text

From
legrand legrand
Date:
> From: "Tsunakawa, Takayuki"
>> From: legrand legrand [mailto:legrand_legrand@]
>> There are many projects that use alternate QueryId
>> distinct from the famous pg_stat_statements jumbling algorithm.

>I'd like to welcome the standard QueryID that DBAs and extension developers
can depend on.  
>Are you surveying the needs for you to develop the QueryID that can meet as
many needs as possible?

Yes, I would like first to understand what are the main needs, 
then identify how it would be possible to implement it 
in core, in a new extension or simply with a modified pg_stat_statements.
(I'm just a DBA not a C developer, so it will only be restricted to very
simple enhancements)


>> needs.1: stable accross different databases,

>Does this mean different database clusters, not different databases in a
single database cluster?

Same looking query should give same QueryId on any database (in the same
cluster or in distinct clusters). It can be differentiated with dbid.


>needs.5: minimal overhead to calculate

OK will add it


>needs.6: doesn't change across database server restarts

Really ? does this already occurs ?


>needs.7: same value on both the primary and standby?

I would say yes (I don't use standby) isn't this included into needs.1 ?


>> norm.9: comments aware

>Is this to distinguish queries that have different comments for optimizer
hints?  If yes, I agree.

Yes and others like playing with : 
set ...
select /* test 1*/ ...

set ... 
select /* test 2*/ ...
  




--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html


Re: [survey] New "Stable" QueryId based on normalized query text

From
legrand legrand
Date:
> From "Kyotaro HORIGUCHI-2"
>>At Wed, 20 Mar 2019 00:23:30 +0000, "Tsunakawa, Takayuki" 
>>> From: legrand legrand [mailto:legrand_legrand@]
>>> norm.9: comments aware
>> Is this to distinguish queries that have different comments for optimizer
>> hints?  If yes, I agree.

> Or, any means to give an explict query id? I saw many instances
> of query that follows a comment describing a query id.

Yes, in fact didn't thought about different kink of comments, but all of
them


>> needs.3: search_path / schema independant,

>pg_stat_statements even ignores table/object/column names.

there is a very interesting thread about that in "pg_stat_statements and non
default search_path"
https://www.postgresql.org/message-id/8f54c609-17c6-945b-fe13-8b07c0866420@dalibo.com

expecting distinct QueryIds when using distinct schemas ...
maybe that It should be switched to "Schema dependant"



>> needs.4: pg version independant (as long as possible),

>I don't think this cannot be guaranteed.

maybe using a QueryId versioning GUC 
 

>> norm.1: case insensitive
>> norm.2: blank reduction 
>> norm.3: hash algoritm ?
>> norm.4: CURRENT_DATE, CURRENT_TIME, LOCALTIME, LOCALTIMESTAMP not
>> normalized
>> norm.5: NULL, IS NULL not normalized ?
>> norm.6: booleans t, f, true, false not normalized
>> norm.7: order by 1,2 or group by 1,2 should not be normalized
>> norm.8: pl/pgsql anonymous blocks not normalized

> pg_stat_statements can be the base of the discussion on them.

OK

Regards
PAscal



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html


Re: [survey] New "Stable" QueryId based on normalized query text

From
Julien Rouhaud
Date:
On Wed, Mar 20, 2019 at 8:39 PM legrand legrand
<legrand_legrand@hotmail.com> wrote:
>
> Yes, I would like first to understand what are the main needs,

I don't really see one implementation that suits every need, as
probably not everyone will agree on using relation name vs fully
qualified relation name for starter.  The idea to take into account or
normalise comments will also probably require a lot of argumentation
to reach a consensus.

Also, most of what's listed here would require catcache lookup for
every objects impacted in a query, at every execution.  That would be
*super* expensive (at least for OLTP workload).  As far as the need is
to gather statistics like pg_stat_statements and similar extensions
are doing, current queryid semantics and underlying limitations is not
enough of a problem to justify paying that price IMHO.  Or do you have
other needs and/or problems that really can't be solved with current
implementation?

In other words, my first need would be to be able to deactivate
everything that would make queryid computation significantly more
expensive than it's today, and/or to be able to replace it with
third-party implementation.

> >> needs.1: stable accross different databases,
> [...]
>
> >needs.7: same value on both the primary and standby?
>
> I would say yes (I don't use standby) isn't this included into needs.1 ?

Physical replication servers have identical oids, so identical
queryid.  That's obviously not true for logical replication.


Re: [survey] New "Stable" QueryId based on normalized query text

From
legrand legrand
Date:
Julien Rouhaud wrote
> On Wed, Mar 20, 2019 at 8:39 PM legrand legrand
> <

> legrand_legrand@

> > wrote:
>>
>> Yes, I would like first to understand what are the main needs,
> 
> I don't really see one implementation that suits every need, as
> probably not everyone will agree on using relation name vs fully
> qualified relation name for starter.  The idea to take into account or
> normalise comments will also probably require a lot of argumentation
> to reach a consensus.
> 
> Also, most of what's listed here would require catcache lookup for
> every objects impacted in a query, at every execution.  That would be
> *super* expensive (at least for OLTP workload).  As far as the need is
> to gather statistics like pg_stat_statements and similar extensions
> are doing, current queryid semantics and underlying limitations is not
> enough of a problem to justify paying that price IMHO.  Or do you have
> other needs and/or problems that really can't be solved with current
> implementation?
> 
> In other words, my first need would be to be able to deactivate
> everything that would make queryid computation significantly more
> expensive than it's today, and/or to be able to replace it with
> third-party implementation.
> 
>> >> needs.1: stable accross different databases,
>> [...]
>>
>> >needs.7: same value on both the primary and standby?
>>
>> I would say yes (I don't use standby) isn't this included into needs.1 ?
> 
> Physical replication servers have identical oids, so identical
> queryid.  That's obviously not true for logical replication.

On my personal point of view, I need to get the same Queryid between (OLAP)
environments
to be able to compare Production, Pre-production, Qualif performances 
(and I don't need Fully qualified relation names). Today to do that,
I'm using a custom extension computing the QueryId based on the normalized
Query 
text. 

This way to do, seems very popular and maybe including it in core (as a
dedicated extension) 
or proposing an alternate queryid (based on relation name) in PGSS (Guc
enabled) 
would fullfill 95% of the needs ...

I agree with you on the last point: being able to replace actual QueryId
with third-party 
implementation IS the first need.

Regards
PAscal




--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html


Re: [survey] New "Stable" QueryId based on normalized query text

From
legrand legrand
Date:
maybe this patch (with a GUC)
https://www.postgresql.org/message-id/55E51C48.1060102@uptime.jp
would be enough for thoses actually using a text normalization function.




--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html


Re: [survey] New "Stable" QueryId based on normalized query text

From
Julien Rouhaud
Date:
On Wed, Mar 20, 2019 at 10:18 PM legrand legrand
<legrand_legrand@hotmail.com> wrote:
>
> On my personal point of view, I need to get the same Queryid between (OLAP)
> environments
> to be able to compare Production, Pre-production, Qualif performances
> (and I don't need Fully qualified relation names). Today to do that,
> I'm using a custom extension computing the QueryId based on the normalized
> Query
> text.

IIUC, your need is to compare pgss (maybe other extensions) counters
from different primary servers, for queries generated by the same
application(s).  A naive workaround like exporting each environment
counters (COPY SELECT 'production', * FROM pg_stat_statements TO
'...'), importing all of them on a server and then comparing
everything using the query text (which should be the same if the
application is the same) instead of queryid wouldn't work?  Or even
using foreign tables if exporting data is too troublesome. That's
clearly not ideal, but that's an easy workaround which doesn't add any
performance hit at runtime.


Re: [survey] New "Stable" QueryId based on normalized query text

From
Julien Rouhaud
Date:
On Wed, Mar 20, 2019 at 10:30 PM legrand legrand
<legrand_legrand@hotmail.com> wrote:
>
> maybe this patch (with a GUC)
> https://www.postgresql.org/message-id/55E51C48.1060102@uptime.jp
> would be enough for thoses actually using a text normalization function.

The rest of thread raise quite a lot of concerns about the semantics,
the cost and the correctness of this patch.  After 5 minutes checking,
it wouldn't suits your need if you use custom functions, custom types,
custom operators (say using intarray extension) or if your tables
don't have columns in the same order in every environment.  And there
are probably other caveats that I didn't see;


Re: [survey] New "Stable" QueryId based on normalized query text

From
legrand legrand
Date:
Julien Rouhaud wrote
> On Wed, Mar 20, 2019 at 10:18 PM legrand legrand
> <

> legrand_legrand@

> > wrote:
>>
>> On my personal point of view, I need to get the same Queryid between
>> (OLAP)
>> environments
>> to be able to compare Production, Pre-production, Qualif performances
>> (and I don't need Fully qualified relation names). Today to do that,
>> I'm using a custom extension computing the QueryId based on the
>> normalized
>> Query
>> text.
> 
> IIUC, your need is to compare pgss (maybe other extensions) counters
> from different primary servers, for queries generated by the same
> application(s).  A naive workaround like exporting each environment
> counters (COPY SELECT 'production', * FROM pg_stat_statements TO
> '...'), importing all of them on a server and then comparing
> everything using the query text (which should be the same if the
> application is the same) instead of queryid wouldn't work?  Or even
> using foreign tables if exporting data is too troublesome. That's
> clearly not ideal, but that's an easy workaround which doesn't add any
> performance hit at runtime.

Thank you Julien for the workaround,
It is not easy to build "cross tables" in excel to join metrics per query
text ...
and I'm not ready to build a MD5(query) as many query could lead to the same
QueryId
I've been using SQL_IDs for ten years, and I have some (who say old) habits
:^)

Regards
PAscal



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html


Re: [survey] New "Stable" QueryId based on normalized query text

From
legrand legrand
Date:
Julien Rouhaud wrote
> On Wed, Mar 20, 2019 at 10:30 PM legrand legrand
> <

> legrand_legrand@

> > wrote:
>>
>> maybe this patch (with a GUC)
>> https://www.postgresql.org/message-id/

> 55E51C48.1060102@

>> would be enough for thoses actually using a text normalization function.
> 
> The rest of thread raise quite a lot of concerns about the semantics,
> the cost and the correctness of this patch.  After 5 minutes checking,
> it wouldn't suits your need if you use custom functions, custom types,
> custom operators (say using intarray extension) or if your tables
> don't have columns in the same order in every environment.  And there
> are probably other caveats that I didn't see;

Yes I know,
It would have to be extended at less at functions, types, operators ...
names
and a guc pg_stat_statements.queryid_based= 'names' (default being 'oids')

and with a second guc ('fullyqualifed' ?)
sould include tables, functions, types, operators ... namespaces

let "users" specify their needs, we will see ;o)



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html


Re: [survey] New "Stable" QueryId based on normalized query text

From
Julien Rouhaud
Date:
On Wed, Mar 20, 2019 at 11:10 PM legrand legrand
<legrand_legrand@hotmail.com> wrote:
>
> Thank you Julien for the workaround,
> It is not easy to build "cross tables" in excel to join metrics per query
> text ...

then keep only one queryid over all environments, that's easy enough in SQL:

SELECT min(queryid) OVER (partition by query ORDER BY environment),
... FROM all_pg_stat_statements

if you have your environment named like 0_production,
1_preproduction... you'll get the queryid from production.  Once
again, that's not ideal but it's easy to deal with it when consuming
the data.

> and I'm not ready to build a MD5(query) as many query could lead to the same
> QueryId

I'd be really surprised if you see a single collision in your whole
life, whatever pg_stat_statements.max you're using.  I'm also pretty
sure that the collision risk is technically higher with an 8B queryId
field rather than a 16B md5, but maybe I'm wrong.


Re: [survey] New "Stable" QueryId based on normalized query text

From
Bruce Momjian
Date:
On Wed, Mar 20, 2019 at 03:19:58PM -0700, legrand legrand wrote:
> > The rest of thread raise quite a lot of concerns about the semantics,
> > the cost and the correctness of this patch.  After 5 minutes checking,
> > it wouldn't suits your need if you use custom functions, custom types,
> > custom operators (say using intarray extension) or if your tables
> > don't have columns in the same order in every environment.  And there
> > are probably other caveats that I didn't see;
> 
> Yes I know,
> It would have to be extended at less at functions, types, operators ...
> names
> and a guc pg_stat_statements.queryid_based= 'names' (default being 'oids')
> 
> and with a second guc ('fullyqualifed' ?)
> sould include tables, functions, types, operators ... namespaces
> 
> let "users" specify their needs, we will see ;o)

Why can't we just explose the hash computation as an SQL function and
let people call it with pg_stat_activity.query or wherever they want the
value?  We can install multiple functions if needed.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [survey] New "Stable" QueryId based on normalized query text

From
Julien Rouhaud
Date:
On Tue, Apr 9, 2019 at 11:26 PM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Wed, Mar 20, 2019 at 03:19:58PM -0700, legrand legrand wrote:
> > > The rest of thread raise quite a lot of concerns about the semantics,
> > > the cost and the correctness of this patch.  After 5 minutes checking,
> > > it wouldn't suits your need if you use custom functions, custom types,
> > > custom operators (say using intarray extension) or if your tables
> > > don't have columns in the same order in every environment.  And there
> > > are probably other caveats that I didn't see;
> >
> > Yes I know,
> > It would have to be extended at less at functions, types, operators ...
> > names
> > and a guc pg_stat_statements.queryid_based= 'names' (default being 'oids')
> >
> > and with a second guc ('fullyqualifed' ?)
> > sould include tables, functions, types, operators ... namespaces
> >
> > let "users" specify their needs, we will see ;o)
>
> Why can't we just explose the hash computation as an SQL function and
> let people call it with pg_stat_activity.query or wherever they want the
> value?  We can install multiple functions if needed.

It'd be very nice to exposing the queryid computation at SQL level.
However it would allow to compute only the top-level queryid from
pg_stat_activity.  For monitoring and performance purpose, people
would probably want to see the queryid of the underlying query
actually running I think.



Re: [survey] New "Stable" QueryId based on normalized query text

From
Jim Finnerty
Date:
I missed this thread.  I'd be happy to post the code for what we use as the
stable query identifier, but we could definitely come up with a more
efficient algorithm if we're willing to assume that the sql statements are
the same if and only if the parse tree structure is the same.

Currently what we do for the sql hash is to simply replace all the literals
and then hash the resulting SQL string, because for our use case we wanted
to be insensitive to the even the structure of the parse tree from one
release to the next.  That may be too conservative for other use cases.  If
it's ok to assume that the structure of the Query tree doesn't change, then
you could define a stable identifier for each node type, ignore literal
constants, and hash fully-qualified object names instead of OIDs.  That
should be pretty fast.

We also compute a plan hash that converts Plan tree node id's into stable
identifiers, and computes a cheap hash function over all nodes in the plan. 
This is fast and efficient.  It's also pretty straightforward to convert
node id's to stable identifiers.

A complication that we recently had to deal with was hashing and normalizing
the text of queries inside pl/pgsql functions, where variables are converted
to parameter markers.  In that case the sql text is transformed to contain
both parameter markers and literal replacement markers before computing the
sql hash.



-----
Jim Finnerty, AWS, Amazon Aurora PostgreSQL
--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



Re: [survey] New "Stable" QueryId based on normalized query text

From
Julien Rouhaud
Date:
On Sat, Aug 10, 2019 at 3:27 AM Jim Finnerty <jfinnert@amazon.com> wrote:
>
> I missed this thread.  I'd be happy to post the code for what we use as the
> stable query identifier, but we could definitely come up with a more
> efficient algorithm if we're willing to assume that the sql statements are
> the same if and only if the parse tree structure is the same.
>
> Currently what we do for the sql hash is to simply replace all the literals
> and then hash the resulting SQL string

Isn't that what pg_store_plan is already doing?  Except that it
removes extraneous whitespaces and put identifiers in uppercase so
that you get a reasonable query identifier.

> you could define a stable identifier for each node type, ignore literal
> constants, and hash fully-qualified object names instead of OIDs.  That
> should be pretty fast.

This has been discussed already, and resolving all object names and
qualifier names will add a dramatic overhead for many workloads.



Re: [survey] New "Stable" QueryId based on normalized query text

From
legrand legrand
Date:
Hi Jim,

Its never too later, as nothing has been concluded about that survey ;o)

For information, I thought It would be possible to get a more stable
QueryId,
by hashing relation name or fully qualified names.

With the support of Julien Rouhaud, I tested with this kind of code:

     case RTE_RELATION:
            if (pgss_queryid_oid)
                {
                    APP_JUMB(rte->relid);
                }
                else
                {
                    rel = RelationIdGetRelation(rte->relid);
                    APP_JUMB_STRING(RelationGetRelationName(rel));
                    APP_JUMB_STRING(get_namespace_name(get_rel_namespace(rte->relid)));
                    RelationClose(rel);
                {
                
thinking that 3 hash options would be interesting in pgss:
1- actual OID
2- relation names only (for databases WITHOUT distinct schemas contaning
same tables)
3- fully qualified names schema.relname (for databases WITH distinct schemas
contaning same tables)

but performances where quite bad (it was a few month ago, but I remenber
about a 1-5% decrease).
I also remenber that's this was not portable between distinct pg versions
11/12
and also not sure it was stable between windows / linux ports ...

So I stopped here ... Maybe its time to test deeper this alternative 
(to get fully qualified names hashes in One call) knowing that such
transformations 
will have to be done for all objects types (not only relations) ?

I'm ready to continue testing as it seems the less impacting solution to
keep actual pgss ...

If this doesn't work, then trying with a normalized query text (associated
with search_path) would be the 
other alternative, but impacts on actual pgss would be higher ... 

Regards
PAscal





--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



Re: [survey] New "Stable" QueryId based on normalized query text

From
Jim Finnerty
Date:
If hashing names instead of using OIDs is too expensive for some workload,
then that workload would need to be able to turn statement hashing off.  So
it needs to be optional, just like queryId is optionally computed today. 
For many cases the extra overhead of hashing object names is small compared
to optimization time plus execution time.



-----
Jim Finnerty, AWS, Amazon Aurora PostgreSQL
--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



Re: [survey] New "Stable" QueryId based on normalized query text

From
Evgeniy Efimkin
Date:
Hi!
What about adding new column in pg_stat_statements e.g. sql_id it's hash from normalized query. Аnd add function which
getthat hash (using raw_parser, raw_expression_tree_walker) for any query
 
`
postgres=# select get_queryid('select 1');
 get_queryid    
-------------
 680388963
(1 row)
`
that function can be used on pg_stat_activity(query) for join pg_stat_statements if it need.

12.08.2019, 14:51, "legrand legrand" <legrand_legrand@hotmail.com>:
> Hi Jim,
>
> Its never too later, as nothing has been concluded about that survey ;o)
>
> For information, I thought It would be possible to get a more stable
> QueryId,
> by hashing relation name or fully qualified names.
>
> With the support of Julien Rouhaud, I tested with this kind of code:
>
>          case RTE_RELATION:
>                         if (pgss_queryid_oid)
>                                 {
>                                         APP_JUMB(rte->relid);
>                                 }
>                                 else
>                                 {
>                                         rel = RelationIdGetRelation(rte->relid);
>                                         APP_JUMB_STRING(RelationGetRelationName(rel));
>                                         APP_JUMB_STRING(get_namespace_name(get_rel_namespace(rte->relid)));
>                                         RelationClose(rel);
>                                 {
>
> thinking that 3 hash options would be interesting in pgss:
> 1- actual OID
> 2- relation names only (for databases WITHOUT distinct schemas contaning
> same tables)
> 3- fully qualified names schema.relname (for databases WITH distinct schemas
> contaning same tables)
>
> but performances where quite bad (it was a few month ago, but I remenber
> about a 1-5% decrease).
> I also remenber that's this was not portable between distinct pg versions
> 11/12
> and also not sure it was stable between windows / linux ports ...
>
> So I stopped here ... Maybe its time to test deeper this alternative
> (to get fully qualified names hashes in One call) knowing that such
> transformations
> will have to be done for all objects types (not only relations) ?
>
> I'm ready to continue testing as it seems the less impacting solution to
> keep actual pgss ...
>
> If this doesn't work, then trying with a normalized query text (associated
> with search_path) would be the
> other alternative, but impacts on actual pgss would be higher ...
>
> Regards
> PAscal
>
> --
> Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

-------- 
Efimkin Evgeny





Re: [survey] New "Stable" QueryId based on normalized query text

From
Julien Rouhaud
Date:
On Mon, Aug 12, 2019 at 2:40 PM Jim Finnerty <jfinnert@amazon.com> wrote:
>
> If hashing names instead of using OIDs is too expensive for some workload,
> then that workload would need to be able to turn statement hashing off.  So
> it needs to be optional, just like queryId is optionally computed today.
> For many cases the extra overhead of hashing object names is small compared
> to optimization time plus execution time.

Are you suggesting a fallback to oid based queryid or to entirely
disable queryid generation?

How would that work with pg_stat_statements or similar extension?



Re: [survey] New "Stable" QueryId based on normalized query text

From
Julien Rouhaud
Date:
On Mon, Aug 12, 2019 at 2:52 PM Evgeniy Efimkin <efimkin@yandex-team.ru> wrote:
>
> Hi!
> What about adding new column in pg_stat_statements e.g. sql_id it's hash from normalized query. Аnd add function
whichget that hash (using raw_parser, raw_expression_tree_walker) for any query 
> `
> postgres=# select get_queryid('select 1');
>  get_queryid
> -------------
>  680388963
> (1 row)

One problem with pg_stat_statement's normalized query is that it's not
stable, it's storing the normalized version of the first query string
passed when an entry is created.  So you could have different strings
depending on whether the query was fully qualified or relying on
search path for instance.

Exposing the queryid computation at SQL level has already been
proposed, and FWIW I'm all for it.



Re: [survey] New "Stable" QueryId based on normalized query text

From
Evgeniy Efimkin
Date:

> One problem with pg_stat_statement's normalized query is that it's not
> stable, it's storing the normalized version of the first query string
> passed when an entry is created. So you could have different strings
> depending on whether the query was fully qualified or relying on
> search path for instance.
I think normalized query stored in pg_stat_statement it's not very important.
it might look something like that
`
         query         | calls |  queryid   |  sql_id
-----------------------+-------+------------+------------
 Select * from   t     |     4 |  762359559 |  680388963
 select * from t       |     7 | 3438533065 |  680388963
 select * from test2.t |     1 |  230362373 |  680388963
`
we can cut schema name in sql normalization 
algorithm 
-------- 
Efimkin Evgeny




Re: [survey] New "Stable" QueryId based on normalized query text

From
Julien Rouhaud
Date:
On Mon, Aug 12, 2019 at 4:01 PM Evgeniy Efimkin <efimkin@yandex-team.ru> wrote:
>
> > One problem with pg_stat_statement's normalized query is that it's not
> > stable, it's storing the normalized version of the first query string
> > passed when an entry is created. So you could have different strings
> > depending on whether the query was fully qualified or relying on
> > search path for instance.
> I think normalized query stored in pg_stat_statement it's not very important.
> it might look something like that
> `
>          query         | calls |  queryid   |  sql_id
> -----------------------+-------+------------+------------
>  Select * from   t     |     4 |  762359559 |  680388963
>  select * from t       |     7 | 3438533065 |  680388963
>  select * from test2.t |     1 |  230362373 |  680388963
> `
> we can cut schema name in sql normalization
> algorithm

Not only schema name but all kind of qualification and indeed extra
whitespaces. Things get harder for other difference that aren't
meaningful (LIKE vs ~~, IN vs = ANY...).  That would also imply that
everyone wants to ignore schemas in query normalization, I'm not sure
how realistic that is.



Re: [survey] New "Stable" QueryId based on normalized query text

From
legrand legrand
Date:
my understanding is

* pg_stat_statements.track = 'none' or 'top' (default) or 'all' 
    to make queryId optionally computed

* a new GUC: pg_stat_statements.queryid_based = 'oids' (default) or 'names'
or 'fullnames'
    to choose the queryid computation algorithm

am I rigth ?




--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html