Thread: Slow queries from information schema

Slow queries from information schema

From
Octavio Alvarez
Date:
I'm aware you already know that information_schema is slow [1] [2], so I
just want to expose/document another case and tests I did.

I'm using the following view to check what tables depend on what other
tables.

CREATE VIEW raw_relation_tree AS
SELECT
   tc_p.table_catalog AS parent_catalog,
   tc_p.table_schema AS parent_schema,
   tc_p.table_name AS parent_table,
   tc_c.table_catalog AS child_catalog,
   tc_c.table_schema AS child_schema,
   tc_c.table_name AS child_table
FROM
   information_schema.referential_constraints AS rc
   NATURAL JOIN information_schema.table_constraints AS tc_c
   LEFT JOIN information_schema.table_constraints AS tc_p ON
      rc.unique_constraint_catalog = tc_p.constraint_catalog AND
      rc.unique_constraint_schema = tc_p.constraint_schema AND
      rc.unique_constraint_name = tc_p.constraint_name
;

test=# select count(*) from raw_relation_tree;
count
-------
    11
(1 row)

An EXPLAIN ANALYZE for a simple SELECT on each of the FROM tables give:
referential_constraints: ~9ms.
table_constraints: ~24ms.

The result, on the above view: ~80ms. Fair enough. But if I apply a
condition:

SELECT * FROM ___pgnui_relation_tree.raw_relation_tree WHERE
parent_schema <> child_schema;

it takes ~2 seconds (!) to complete.

I tried using an alternate table_constraints definition by creating my
own view and changing UNION to UNION ALL (as per [2]) The results were:

table_constraints using UNION ALL has the same number of rows as the
UNION version.

table_constraints now take about 4 ms (as expected).
VIEW raw_relation_tree is now 110 ms.
VIEW raw_relation_tree WHERE parent_schema <> child_schema: 3.3 sec.

EXPLAIN results are way too long to post here. If it is ok, I'll gladly
post them.

Using 8.3.6.

[1] http://archives.postgresql.org/pgsql-bugs/2008-12/msg00144.php
[2]
http://archives.postgresql.org/pgsql-performance/2008-05/msg00062.php



Re: Slow queries from information schema

From
Tom Lane
Date:
Octavio Alvarez <alvarezp@alvarezp.ods.org> writes:
> The result, on the above view: ~80ms. Fair enough. But if I apply a
> condition:
> SELECT * FROM ___pgnui_relation_tree.raw_relation_tree WHERE
> parent_schema <> child_schema;
> it takes ~2 seconds (!) to complete.

I'm not sure I'm seeing the exact same case as you, but what I see here
is that 8.3 puts the join condition involving _pg_keysequal() at the
top of the tree where it will be executed quite a lot of times (way
more than the planner expects, because of bad rowcount estimates below)
... and _pg_keysequal() is implemented in a depressingly inefficient way.

CVS HEAD seems to avoid this trap in the same case, but I'm not entirely
convinced whether it's getting better rowcount estimates or just got
lucky.

Anyway it seems to help a great deal if you use a less sucky definition
of the function, such as

create or replace function information_schema._pg_keysequal(smallint[], smallint[]) RETURNS boolean
LANGUAGE sql STRICT IMMUTABLE AS
'select $1 <@ $2 and $2 <@ $1';

            regards, tom lane

Re: Slow queries from information schema

From
Octavio Alvarez
Date:
On Sat, 2009-02-14 at 15:02 -0500, Tom Lane wrote:
> Octavio Alvarez <alvarezp@alvarezp.ods.org> writes:
> > The result, on the above view: ~80ms. Fair enough. But if I apply a
> > condition:
> > SELECT * FROM ___pgnui_relation_tree.raw_relation_tree WHERE
> > parent_schema <> child_schema;
> > it takes ~2 seconds (!) to complete.
>
> I'm not sure I'm seeing the exact same case as you, but what I see here
> is that 8.3 puts the join condition involving _pg_keysequal() at the
> top of the tree where it will be executed quite a lot of times (way
> more than the planner expects, because of bad rowcount estimates below)
> ... and _pg_keysequal() is implemented in a depressingly inefficient way.
>
> CVS HEAD seems to avoid this trap in the same case, but I'm not entirely
> convinced whether it's getting better rowcount estimates or just got
> lucky.
>
> Anyway it seems to help a great deal if you use a less sucky definition
> of the function, such as
>
> create or replace function information_schema._pg_keysequal(smallint[], smallint[]) RETURNS boolean
> LANGUAGE sql STRICT IMMUTABLE AS
> 'select $1 <@ $2 and $2 <@ $1';

Wow! Just tried it with the UNION (the original) version of
information_schema.table_constraints and it drastically reduced the
total runtime to 309 ms!

I also tested it with UNION ALL and it took 1.6 sec. (and yet, 50% of
the previous time with UNION ALL).