Thread: New Feature Request

New Feature Request

From
Bert Scalzo
Date:
I am reposting this from a few months back (see below). I am not trying to be a pest, just very motivated. I really think this feature has merit, and if not generally worthwhile, I'd be willing to pay someone to code it for me as I don't have strong enough C skills to modify the PostgreSQL code myself. So anyone who might have such skills that would be interested, please contact me: bertscalzo2@gmail.com.

MySQL has a really useful feature they call the query rewrite cache. The optimizer checks incoming queries to see if a known better rewrite has been placed within the query rewrite cache table. If one is found, the rewrite replaces the incoming query before sending it to the execution engine. This capability allows for one to fix poorly performing queries in 3rd party application code that cannot be modified. For example, suppose a 3rd party application contains the following inefficient query: SELECT COUNT(*) FROM table WHERE SUBSTRING(column,1,3) = 'ABC'. One can place the following rewrite in the query rewrite cache: SELECT COUNT(*) FROM table WHERE column LIKE 'ABC%'. The original query cannot use an index while the rewrite can. Since it's a 3rd party application there is really no other way to make such an improvement. The existing rewrite rules in PostgreSQL are too narrowly defined to permit such a substitution as the incoming query could involve many tables, so what's needed is a general "if input SQL string matches X then replace it with Y". This check could be placed at the beginning of the parser.c code. Suggest that the matching code should first check the string lengths and hash values before checking entire string match for efficiency.  

Re: New Feature Request

From
Bruce Momjian
Date:
On Mon, May 25, 2020 at 07:53:40PM -0500, Bert Scalzo wrote:
> I am reposting this from a few months back (see below). I am not trying to be a
> pest, just very motivated. I really think this feature has merit, and if not
> generally worthwhile, I'd be willing to pay someone to code it for me as I
> don't have strong enough C skills to modify the PostgreSQL code myself. So
> anyone who might have such skills that would be interested, please contact me:
> bertscalzo2@gmail.com.

I think your best bet is to try getting someone to write a hook
that will do the replacement so that you don't need to modify too much
of the Postgres core code.  You will need to have the hook updated for
new versions of Postgres, which adds to the complexity.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: New Feature Request

From
Michael Paquier
Date:
On Mon, May 25, 2020 at 09:21:26PM -0400, Bruce Momjian wrote:
> I think your best bet is to try getting someone to write a hook
> that will do the replacement so that you don't need to modify too much
> of the Postgres core code.  You will need to have the hook updated for
> new versions of Postgres, which adds to the complexity.

Couldn't one just use the existing planner hook for that?  The
post-parse analysis hook is run before a query rewrite, but the
planner hook could manipulate a rewrite before planning the query.
--
Michael

Attachment

Re: New Feature Request

From
Tomas Vondra
Date:
On Mon, May 25, 2020 at 09:21:26PM -0400, Bruce Momjian wrote:
>On Mon, May 25, 2020 at 07:53:40PM -0500, Bert Scalzo wrote:
>> I am reposting this from a few months back (see below). I am not trying to be a
>> pest, just very motivated. I really think this feature has merit, and if not
>> generally worthwhile, I'd be willing to pay someone to code it for me as I
>> don't have strong enough C skills to modify the PostgreSQL code myself. So
>> anyone who might have such skills that would be interested, please contact me:
>> bertscalzo2@gmail.com.
>
>I think your best bet is to try getting someone to write a hook
>that will do the replacement so that you don't need to modify too much
>of the Postgres core code.  You will need to have the hook updated for
>new versions of Postgres, which adds to the complexity.
>

I don't think we have a hook to tweak the incoming SQL, though. We only
have post_parse_analyze_hook, i.e. post-parse, at which point we can't
just rewrite the SQL directly. So I guess we'd need new hook.

I do however wonder if an earlier hook is a good idea at all - matching
the SQL directly seems like a rather naive approach that'll break easily
due to formatting, upper/lower-case, subqueries, and many other things.
 From this standpoint it seems actually better to inspect and tweak the
parse-analyze result. Not sure how to define the rules easily, though.

As for the complexity, I think hooks are fairly low-maintenance in
practice, we tend not to modify them very often, and when we do it's
usually just adding an argument etc.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: New Feature Request

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> On Mon, May 25, 2020 at 09:21:26PM -0400, Bruce Momjian wrote:
>> I think your best bet is to try getting someone to write a hook
>> that will do the replacement so that you don't need to modify too much
>> of the Postgres core code.  You will need to have the hook updated for
>> new versions of Postgres, which adds to the complexity.

> Couldn't one just use the existing planner hook for that?

Yeah, probably.  One could also make a case for creating a similar
hook for the rewriter so that a new parsetree could be substituted
before the rewrite step ... but it's really not clear whether it's
better to try to match the parsetree before or after rewriting.
I'd be inclined to just make use of the existing hook until there's
a pretty solid argument why we need another one.

Note to OP: the lack of response to your previous post seems to me
to indicate that there's little enthusiasm for having such a feature
in core Postgres.  Thus, everybody is focusing on what sort of hooks
would be needed in-core to let an extension implement the feature.
Getting someone to write such an extension is left as an exercise
for the reader.

            regards, tom lane



Re: New Feature Request

From
Konstantin Knizhnik
Date:

On 26.05.2020 04:47, Tomas Vondra wrote:
> On Mon, May 25, 2020 at 09:21:26PM -0400, Bruce Momjian wrote:
>> On Mon, May 25, 2020 at 07:53:40PM -0500, Bert Scalzo wrote:
>>> I am reposting this from a few months back (see below). I am not 
>>> trying to be a
>>> pest, just very motivated. I really think this feature has merit, 
>>> and if not
>>> generally worthwhile, I'd be willing to pay someone to code it for 
>>> me as I
>>> don't have strong enough C skills to modify the PostgreSQL code 
>>> myself. So
>>> anyone who might have such skills that would be interested, please 
>>> contact me:
>>> bertscalzo2@gmail.com.
>>
>> I think your best bet is to try getting someone to write a hook
>> that will do the replacement so that you don't need to modify too much
>> of the Postgres core code.  You will need to have the hook updated for
>> new versions of Postgres, which adds to the complexity.
>>
>
> I don't think we have a hook to tweak the incoming SQL, though. We only
> have post_parse_analyze_hook, i.e. post-parse, at which point we can't
> just rewrite the SQL directly. So I guess we'd need new hook.

VOPS extension performs query substitution (replace query to the 
original table with query to projection) using post_parse_analysis_hook
and SPI. So I do not understand why  some extra hook is needed.

>
> I do however wonder if an earlier hook is a good idea at all - matching
> the SQL directly seems like a rather naive approach that'll break easily
> due to formatting, upper/lower-case, subqueries, and many other things.
> From this standpoint it seems actually better to inspect and tweak the
> parse-analyze result. Not sure how to define the rules easily, though.
>

In some cases we need to know exact parameter value (as in case 
SUBSTRING(column,1,3) = 'ABC').
Sometime concrete value of parameter is not important...
Also it is not clear where such pattern-matching transformation should 
be used only for the whole query or for any its subtrees?

> As for the complexity, I think hooks are fairly low-maintenance in
> practice, we tend not to modify them very often, and when we do it's
> usually just adding an argument etc.

I am not sure if the proposed approach can really be useful in many cases.
Bad queries are used to be generated by various ORM tools.
But them rarely generate exactly the same query. So defining matching 
rules for the whole query tree will rarely work.

It seems to be more useful to have extensible SQL optimizer, which 
allows to add user defined rules (may as transformation patterns).
This is how it is done in GCC code optimizer.
Definitely writing such rules is very non-trivial task.
Very few developers will be able to add their own meaningful rules.
But in any case it significantly simplify improvement of optimizer, 
although most of problems with choosing optimal plan are
caused by wrong statistic and rue-based optimization can not help here.




Re: New Feature Request

From
Bert Scalzo
Date:
I greatly appreciate all the replies. Thanks. I also fully understand and appreciate all the points made - especially that this idea may not have general value or acceptance as worthwhile. No argument from me. Let me explain why I am looking to do this to see if that changes any opinions. I have written a product called QIKR for MySQL that leverages the MySQL query rewrite feature and places a knowledge expert of SQL rewrite rules as a preprocessor to the MySQL optimizer. I have defined an extensive set of rules based on my 30 years of doing code reviews for app developers who write terrible SQL. Right now QIKR does 100% syntactic analysis (hoping to do semantic analysis in a later version). For MySQL (which has a less mature and less robust optimizer) the performance gains are huge - in excess of 10X. So far QIKR shows about a 2.5X improvement over the PostgreSQL optimizer when fed bad SQL. I am not saying the PotsgrSQL optimizer does a poor job, but rather that QIKR was designed for "garbage in, not garbage out" - so QIKR fixes all the stupid mistakes that people make which can confuse or even cripple an optimizer. Hence why I am looking for this hook - and have come to the experts for help. I have two very large PostgreSQL partner organizations who have asked me to make QIKR work for PostgreSQL as it does for MySQL. Again, I am willing to pay for this hook since it's a special request for a special purpose and not generally worthwhile in many people's opinions - which I cannot argue with.

On Tue, May 26, 2020 at 2:17 AM Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:


On 26.05.2020 04:47, Tomas Vondra wrote:
> On Mon, May 25, 2020 at 09:21:26PM -0400, Bruce Momjian wrote:
>> On Mon, May 25, 2020 at 07:53:40PM -0500, Bert Scalzo wrote:
>>> I am reposting this from a few months back (see below). I am not
>>> trying to be a
>>> pest, just very motivated. I really think this feature has merit,
>>> and if not
>>> generally worthwhile, I'd be willing to pay someone to code it for
>>> me as I
>>> don't have strong enough C skills to modify the PostgreSQL code
>>> myself. So
>>> anyone who might have such skills that would be interested, please
>>> contact me:
>>> bertscalzo2@gmail.com.
>>
>> I think your best bet is to try getting someone to write a hook
>> that will do the replacement so that you don't need to modify too much
>> of the Postgres core code.  You will need to have the hook updated for
>> new versions of Postgres, which adds to the complexity.
>>
>
> I don't think we have a hook to tweak the incoming SQL, though. We only
> have post_parse_analyze_hook, i.e. post-parse, at which point we can't
> just rewrite the SQL directly. So I guess we'd need new hook.

VOPS extension performs query substitution (replace query to the
original table with query to projection) using post_parse_analysis_hook
and SPI. So I do not understand why  some extra hook is needed.

>
> I do however wonder if an earlier hook is a good idea at all - matching
> the SQL directly seems like a rather naive approach that'll break easily
> due to formatting, upper/lower-case, subqueries, and many other things.
> From this standpoint it seems actually better to inspect and tweak the
> parse-analyze result. Not sure how to define the rules easily, though.
>

In some cases we need to know exact parameter value (as in case
SUBSTRING(column,1,3) = 'ABC').
Sometime concrete value of parameter is not important...
Also it is not clear where such pattern-matching transformation should
be used only for the whole query or for any its subtrees?

> As for the complexity, I think hooks are fairly low-maintenance in
> practice, we tend not to modify them very often, and when we do it's
> usually just adding an argument etc.

I am not sure if the proposed approach can really be useful in many cases.
Bad queries are used to be generated by various ORM tools.
But them rarely generate exactly the same query. So defining matching
rules for the whole query tree will rarely work.

It seems to be more useful to have extensible SQL optimizer, which
allows to add user defined rules (may as transformation patterns).
This is how it is done in GCC code optimizer.
Definitely writing such rules is very non-trivial task.
Very few developers will be able to add their own meaningful rules.
But in any case it significantly simplify improvement of optimizer,
although most of problems with choosing optimal plan are
caused by wrong statistic and rue-based optimization can not help here.



Re: New Feature Request

From
Peter Eisentraut
Date:
On 2020-05-26 12:10, Bert Scalzo wrote:
> So far QIKR shows about a 
> 2.5X improvement over the PostgreSQL optimizer when fed bad SQL. I am 
> not saying the PotsgrSQL optimizer does a poor job, but rather that 
> QIKR was designed for "garbage in, not garbage out" - so QIKR fixes all 
> the stupid mistakes that people make which can confuse or even cripple 
> an optimizer. Hence why I am looking for this hook - and have come to 
> the experts for help. I have two very large PostgreSQL partner 
> organizations who have asked me to make QIKR work for PostgreSQL as it 
> does for MySQL. Again, I am willing to pay for this hook since it's a 
> special request for a special purpose and not generally worthwhile in 
> many people's opinions - which I cannot argue with.

Your project seems entirely legitimate as a third-party optional plugin.

I think the post_parse_analyze_hook would work for this.  I suggest you 
start with it and see how far you can take it.

It may turn out that you need a hook after the rewriter, but that should 
be a small change and shouldn't affect your own code very much, since 
you'd get handed the same data structure in each case.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services