Thread: pg_stat_statements and non default search_path

pg_stat_statements and non default search_path

From
Julien Rouhaud
Date:
Hello,

I've faced multiple times a painful limitation with pg_stat_statements.
Queries are normalized based on the textual SQL statements, and the
queryid is computed using objects' oids. So for instance with different
search_path settings, we can end up with the same normalized query but
different queryids, and there's no way to know what are the actual
relations each query is using. It also means that it can be almost
impossible to use the normalized query to replay a query.

I see two ways of fixing this. First one would be to store normalized
queries while fully qualifiying the relation's names. After a quick look
this can't be done without storing at least a token location in
RangeTblEntry nodes, which sounds like a bad idea.

The other way would be to store the value of search_path with each pgss
entry (either in shared_memory or in the external file).

Is it something that we should fix, and if yes is any of this
acceptable, or does someone see another way to solve this problem?

Regards.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org



Re: pg_stat_statements and non default search_path

From
Jim Nasby
Date:
On 10/7/16 4:39 AM, Julien Rouhaud wrote:
> I see two ways of fixing this. First one would be to store normalized
> queries while fully qualifiying the relation's names. After a quick look
> this can't be done without storing at least a token location in
> RangeTblEntry nodes, which sounds like a bad idea.
>
> The other way would be to store the value of search_path with each pgss
> entry (either in shared_memory or in the external file).
>
> Is it something that we should fix, and if yes is any of this
> acceptable, or does someone see another way to solve this problem?

Would another option be to temporarily set search_path to '' when 
getting the query text? ISTM that would be the best option.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461



Re: pg_stat_statements and non default search_path

From
Julien Rouhaud
Date:
On 10/10/2016 05:00, Jim Nasby wrote:
> On 10/7/16 4:39 AM, Julien Rouhaud wrote:
>> I see two ways of fixing this. First one would be to store normalized
>> queries while fully qualifiying the relation's names. After a quick look
>> this can't be done without storing at least a token location in
>> RangeTblEntry nodes, which sounds like a bad idea.
>>
>> The other way would be to store the value of search_path with each pgss
>> entry (either in shared_memory or in the external file).
>>
>> Is it something that we should fix, and if yes is any of this
>> acceptable, or does someone see another way to solve this problem?
> 
> Would another option be to temporarily set search_path to '' when
> getting the query text? ISTM that would be the best option.

I don't think that possible.  The query text used is raw query text
provided by the post_parse_analyse hook (ParseState->p_sourcetext).

Unless you mean deparsing the Query instead of using raw source text?  I
think that would solve this issue (and also the other issue when
multiple queries are submitted at once, you get the normalized version
of all the queries multiple time), but AFAIK ruleutils.c doesn't expose
enough to do it (like get_query_def()), and exposing it isn't an option.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org



Re: pg_stat_statements and non default search_path

From
Jim Nasby
Date:
On 10/10/16 12:58 AM, Julien Rouhaud wrote:
>> > Would another option be to temporarily set search_path to '' when
>> > getting the query text? ISTM that would be the best option.
> I don't think that possible.  The query text used is raw query text
> provided by the post_parse_analyse hook (ParseState->p_sourcetext).

Oh, hrm. :/

> Unless you mean deparsing the Query instead of using raw source text?  I
> think that would solve this issue (and also the other issue when
> multiple queries are submitted at once, you get the normalized version
> of all the queries multiple time), but AFAIK ruleutils.c doesn't expose
> enough to do it (like get_query_def()), and exposing it isn't an option.

Why couldn't we expose it?

BTW, after thinking about it some more, I don't see how storing the 
search_path would help at all... it's not like you can do anything with 
it unless you have a huge chunk of the parser available.

BTW, this issue isn't limited to just tables; it affects almost every 
object identifier you can put in a query, like functions, operators, 
types, etc.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461



Re: pg_stat_statements and non default search_path

From
Julien Rouhaud
Date:
On 16/10/2016 02:38, Jim Nasby wrote:
> On 10/10/16 12:58 AM, Julien Rouhaud wrote:
>> Unless you mean deparsing the Query instead of using raw source text?  I
>> think that would solve this issue (and also the other issue when
>> multiple queries are submitted at once, you get the normalized version
>> of all the queries multiple time), but AFAIK ruleutils.c doesn't expose
>> enough to do it (like get_query_def()), and exposing it isn't an option.
> 
> Why couldn't we expose it?
> 

I'm not really sure.  The only thread I could find on this topic didn't
get any answer:
https://www.postgresql.org/message-id/flat/5E5BB0F8-B05E-47C2-8706-F85E70A6DF7D@citusdata.com

> BTW, after thinking about it some more, I don't see how storing the
> search_path would help at all... it's not like you can do anything with
> it unless you have a huge chunk of the parser available.
> 

My use case is not really to know the fully qualified name of each
identifier, but being able to optimize a problematic query found with
pg_stat_statements.  I can already "unjumble" it automatically, but the
search_path is needed to be able to get an explain or just execute the
query.

Of course, I'd prefer that pgss can generate a fully qualified
normalized query instead of storing the search_path.

> BTW, this issue isn't limited to just tables; it affects almost every
> object identifier you can put in a query, like functions, operators,
> types, etc.

Yes, indeed.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org



Re: pg_stat_statements and non default search_path

From
Lukas Fittl
Date:
On Sat, Oct 15, 2016 at 11:29 PM, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote:
> BTW, after thinking about it some more, I don't see how storing the
> search_path would help at all... it's not like you can do anything with
> it unless you have a huge chunk of the parser available.
>

My use case is not really to know the fully qualified name of each
identifier, but being able to optimize a problematic query found with
pg_stat_statements.  I can already "unjumble" it automatically, but the
search_path is needed to be able to get an explain or just execute the
query.

I'd also find having the search_path available to be a helpful benefit - I've run into the same problem as Julien where two identical queries couldn't be identified correctly because of the missing search_path.

In particular this situation might happen if you shard different tenants using schemas (one schema for each tenant), and use the search_path to set the current tenant.

In my own setup I combine pg_stat_statements with a slightly hacked up copy of the Postgres parser, so having the correct search_path + query text available is enough to find the objects a query references (most of the time, there are a few other edge cases).

My assumption thus far has been that adding another field like this might not be considered because of performance considerations.

Can somebody chime in if it would be feasible to store this in the out-of-band query text file, and whether a patch for this would be considered acceptable?

Best,
Lukas

--
Lukas Fittl

Skype: lfittl
Phone: +1 415 321 0630

Re: pg_stat_statements and non default search_path

From
Julien Rouhaud
Date:
On 16/10/2016 10:47, Lukas Fittl wrote:
> Can somebody chime in if it would be feasible to store this in the
> out-of-band query text file, and whether a patch for this would be
> considered acceptable?

FWIW I already have a quick and dirty patch for this, I don't think
there's any major difficulty here.  I still hope we can find a way to
store the fully qualified objects name in the normalized query instead.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org



Re: pg_stat_statements and non default search_path

From
Craig Ringer
Date:
<p dir="ltr"><p dir="ltr">On 16 Oct. 2016 14:31, "Julien Rouhaud" <<a
href="mailto:julien.rouhaud@dalibo.com">julien.rouhaud@dalibo.com</a>>wrote:<br /> ><br /> > On 16/10/2016
02:38,Jim Nasby wrote:<br /> > > On 10/10/16 12:58 AM, Julien Rouhaud wrote:<br /> > >> Unless you mean
deparsingthe Query instead of using raw source text?  I<br /> > >> think that would solve this issue (and also
theother issue when<br /> > >> multiple queries are submitted at once, you get the normalized version<br />
>>> of all the queries multiple time), but AFAIK ruleutils.c doesn't expose<br /> > >> enough to do
it(like get_query_def()), and exposing it isn't an option.<br /> > ><br /> > > Why couldn't we expose it?<p
dir="ltr">I'minterested in that too, for the purpose of passing the correct substring of a multi-statement to
ProcessUtility_hook.Perhaps by using parser position data to generate a start pointer and length within the querystring
alreadypassed.<p dir="ltr">For the problem in this thread one could always implement plan-to-query transforms
(deparse).But you'd get back something pretty far from the input query after function inlining, subquery flattering,
conditionpushdown/pullup, join elimination, etc etc.<p dir="ltr">I think pg_stat_plans attempted a middle ground here,
capturingexplain style plans rather than trying to report the SQL.<p dir="ltr">I do think there is utility to storing
search_pathand optionally specialising stats according to it. The most obvious case is multi-tenant and schema-sharded
modelswhere per-schema stats would be very handy. You'd want to use the active search path though so you eliminate
referencesto nonexistent schemas. Otherwise $user would screw things up. 

Re: pg_stat_statements and non default search_path

From
Julien Rouhaud
Date:
On 16/10/2016 11:21, Craig Ringer wrote:
> On 16 Oct. 2016 14:31, Julien Rouhaud wrote:
>>
>> On 16/10/2016 02:38, Jim Nasby wrote:
>> > On 10/10/16 12:58 AM, Julien Rouhaud wrote:
>> >> Unless you mean deparsing the Query instead of using raw source
> text?  I
>> >> think that would solve this issue (and also the other issue when
>> >> multiple queries are submitted at once, you get the normalized version
>> >> of all the queries multiple time), but AFAIK ruleutils.c doesn't expose
>> >> enough to do it (like get_query_def()), and exposing it isn't an
> option.
>> >
>> > Why couldn't we expose it?
> 
> I'm interested in that too, for the purpose of passing the correct
> substring of a multi-statement to ProcessUtility_hook.

I have another use case for this: being able to easily print what is the
query (or queries) that are actually executed (ie. what's get out of the
rewriter).  That's useful when trying to figure out / optimize nested
views hell or abuse of rules.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org