Re: Suppressing unused subquery output columns - Mailing list pgsql-hackers

From David Rowley
Subject Re: Suppressing unused subquery output columns
Date
Msg-id CAApHDvonS8RcCcBNei9fjvOnVhqnP1vvvuvp_wVU-Lt3MRVr2g@mail.gmail.com
Whole thread Raw
In response to Suppressing unused subquery output columns  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Fri, Jun 6, 2014 at 2:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I'm not entirely convinced that it's worth the extra planning cycles,
though.  Given the small number of complaints to date, it might not
be worth doing this.  Thoughts?


That's a difficult question for sure. Obviously it's going to depend on the query that we're planning. If it's a very simple query and we don't reduce the target list of the subquery any, then we'll get a small performance impact... But if we do manage to prune down the targetlist and later allow a left join on a 1 billion row table to be removed, then the cost of the extra few cycles performed by this patch would be totally unmeasurable and completely insignificant.

At work we use <insert name of popular commercial relational database product here>, and on a daily basis I come across poorly written queries by other developers.

The things I can think of off hand are:

1. All primary key fields of a table are in a DISTINCT clause
2. Someone had written a query like: SELECT some,1 as type FROM table UNION SELECT thing,2 as type from othertable

The database in question manage to remove the DISTINCT on 1 because it's just not needed as each group could only ever have 1 row. On 2 it managed to see that because column 2 of the UNION query was a constant and it was a different constant on each side of the UNION, then the query would not produce duplicates and it changed this to UNION ALL.

At home I checked how PostgreSQL handled both of these cases, and saw that it failed to see through the non-sense in the poorly written queries on both accounts. This is fine, as anyone who ever posted on the performance list saying, "why is this slow?" We'd tell them to write their query another way. But what about all the companies who consider porting their application over to PostgreSQL and get to the stage of importing all the data over onto a test server and trying a few of their (poorly written) queries. And they see a 10 times slowdown and immediately think PostgreSQL is just not for them. It's a shame that they'd turn us down so early, but if we went ahead and added code to detect both of these situations then we'd end up increasing planning time for everyone else who writes their queries properly.

I don't really have an answer for this, but I do think it needs more discussion. The only things I've thought of so far as a planner_strength GNU that can try to optimise these things when it gets to a certain level, but that just introduces surprise factor and a whole bunch of threads on performance saying... Why is this query on pg10.4 slower than pg10.2? and our reply goes, is planner_strength set to the same on both? It does not sound pretty, or another option to replan a query when the cost is over a certain threshold with the strength level turned to maximum, but that seems like it could go along the same lines as far as surprise factor is concerned.

It's pretty hard to get the best of both worlds here. I know my post does not have many answers, but I posted it anyway just in case someone else comes up with a good idea that perhaps we could all work towards.

Regards

David Rowley


pgsql-hackers by date:

Previous
From: Jaime Casanova
Date:
Subject: Re: Providing catalog view to pg_hba.conf file - Patch submission
Next
From: Amit Kapila
Date:
Subject: Re: avoiding tuple copying in btree index builds