Thread: Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

nikhil raj <nikhilraj474@gmail.com> writes:
> I've encountered a noticeable difference in execution time and query
> execution plan row counts between PostgreSQL 13 and PostgreSQL 16 when
> running a query on information_schema tables. Surprisingly, PostgreSQL 16
> is performing slower than PostgreSQL 13.

Yeah, it looks like that condition on "table_name" is not getting
pushed down to the scan level anymore.  I'm not sure why not,
but will look closer tomorrow.

            regards, tom lane



On Tue, 27 Aug 2024 at 13:40, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Yeah, it looks like that condition on "table_name" is not getting
> pushed down to the scan level anymore.  I'm not sure why not,
> but will look closer tomorrow.

I was looking for the offending commit as at first I thought it might
be related to Memoize. It does not seem to be.

I get the following up until 2489d76c, and from then on, it's a subquery filter.

 ->  Index Scan using pg_class_relname_nsp_index on pg_class r_2
(cost=0.27..8.30 rows=1 width=8) (actual time=0.004..0.004 rows=0
loops=1)
        Index Cond: (relname = 't_c56ng1_repository'::name)
        Filter: ((relkind = ANY ('{r,p}'::"char"[])) AND
pg_has_role(relowner, 'USAGE'::text))

So looks like it was the "Make Vars be outer-join-aware." commit that
changed this.

David



David Rowley <dgrowleyml@gmail.com> writes:
> On Tue, 27 Aug 2024 at 13:40, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Yeah, it looks like that condition on "table_name" is not getting
>> pushed down to the scan level anymore.  I'm not sure why not,
>> but will look closer tomorrow.

> So looks like it was the "Make Vars be outer-join-aware." commit that
> changed this.

Yeah, I got that same result by bisecting.  It seems like it's
somehow related to the cast to information_schema.sql_identifier:
we are able to get rid of that normally but seem to fail to do so
in this query.

There was a smaller increase in the runtime at dfb75e478 "Add primary
keys and unique constraints to system catalogs", but that seems to
just be due to there being more rows in the relevant catalogs.
(That's from testing the query in an empty database; probably the
effect of dfb75e478 would be swamped in a production DB anyway.)

            regards, tom lane



On 2024-08-27 11:50, David Rowley wrote:
> On Tue, 27 Aug 2024 at 13:40, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Yeah, it looks like that condition on "table_name" is not getting
>> pushed down to the scan level anymore.  I'm not sure why not,
>> but will look closer tomorrow.
> 
> I was looking for the offending commit as at first I thought it might
> be related to Memoize. It does not seem to be.

As a general thought, seeing that this might be an actual problem
should some kind of automated testing be added that checks for
performance regressions like this?

Regards and best wishes,

Justin Clift



On Tue, 27 Aug 2024 at 18:00, Justin Clift <justin@postgresql.org> wrote:
> As a general thought, seeing that this might be an actual problem
> should some kind of automated testing be added that checks for
> performance regressions like this?

We normally try to catch these sorts of things with regression tests.
Of course, that requires having a test that would catch a particular
problem, which we don't seem to have for this particular case.  A
performance test would also require testing a particular scenario, so
I don't see why that's better.  A regression test is better suited as
there's no middle ground between pass and fail.

David



On Tue, 27 Aug 2024 at 14:03, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Yeah, I got that same result by bisecting.  It seems like it's
> somehow related to the cast to information_schema.sql_identifier:
> we are able to get rid of that normally but seem to fail to do so
> in this query.

In case it saves you a bit of time, I stripped as much of the
unrelated stuff out as I could and got:

create table t (a name, b int);
explain select * from (select a::varchar,b from (select distinct a,b
from t) st) t right join t t2 on t.b=t2.b where t.a='test';

getting rid of the cast or swapping to INNER JOIN rather than RIGHT
JOIN means that qual_is_pushdown_safe() gets a Var rather than a
PlaceHolderVar.

David



On 2024-08-27 20:14, David Rowley wrote:
> On Tue, 27 Aug 2024 at 18:00, Justin Clift <justin@postgresql.org> 
> wrote:
>> As a general thought, seeing that this might be an actual problem
>> should some kind of automated testing be added that checks for
>> performance regressions like this?
> 
> We normally try to catch these sorts of things with regression tests.
> Of course, that requires having a test that would catch a particular
> problem, which we don't seem to have for this particular case.  A
> performance test would also require testing a particular scenario, so
> I don't see why that's better.  A regression test is better suited as
> there's no middle ground between pass and fail.

Yeah, that's the kind of thing I was thinking.

Any idea who normally does those, and if it would be reasonable to add
test(s) for the internal information tables?

Regards and best wishes,

Justin Clift



On Wed, 28 Aug 2024 at 18:59, Justin Clift <justin@postgresql.org> wrote:
> Any idea who normally does those, and if it would be reasonable to add
> test(s) for the internal information tables?

These tend to get added along with features and along with of bug
fixes.  I imagine any tests for the information_schema views would be
for the results of the views rather than for the expected plans.
However, that seems very separate from this as the bug has nothing to
do with information_schema. It just happens to be a query to an
information_schema view that helped highlight the bug.  Those views
are often quite complex and so are the resulting plans.  With tests
checking the expected EXPLAIN output, it's much better to give these a
very narrow focus otherwise the expected output could be too unstable
and the purpose of the test harder to determine for anyone working on
a new patch which results in a plan change of a preexisting test.
I've seen tests before rendered useless by people blindly accepting
the plan change without properly determining what the test is supposed
to be testing. That's much more likely to happen when the purpose of
the test is less clear due to some unwieldy and complex expected plan.
I managed to get a reproducer for this down to something quite simple.
Probably that or something similar would be a better test to make sure
this bug stays gone.

David