Thread: postgres slower on nested queries

postgres slower on nested queries

From
Alexander Elgert
Date:
Hello,

I programmed a little script which iterates over all databases in a
DBMS, iterating over all tables and then over all columns.
This skript works for mysql and postgres.

1. Solution overview
====================

foreach database {
    foreach table {
        foreach column {
            do something ...
        }
    }
}

1. Solution detail
==================

foreach database
----------------
mysql: show databases
postgres: SELECT d.datname as Name FROM pg_catalog.pg_database d WHERE
(d.datname != 'template0' AND d.datname != 'template1') ORDER BY 1;)

foreach table
-------------
mysql: show tables
postgres: select table_name from information_schema.tables where
table_schema = 'public')

foreach column
--------------
mysql: show columns from '%s'
postgres: select column_name from information_schema.columns where
table_name = '%s')

(If there are better queries for postgres, please let me know.)

2. Solution
===========

I found the postgres version VERY slow, so a decided to fetch
    select table_name, column_name from information_schema.columns where
table_schema = 'public'

and wrote the output to an two dimensional array to process the elements
later:
    $tableA[$row['table_name']][$row['column_name']]=True;

The second approach ist much faster, this must be because there is no
nesting. ;(
Further - I think - there is no real caching done in the PHP Library of
xampp 1.5.4, php_pgsql.dll ;(

It would be very helpful what causes the problem.


Another question rises from this task:
======================================

Is there any possibility to use the shortcut \d as a query from PHP?
I used psql -E to display the query, but these queries rely on the
system catalogs, but the page:
http://www.postgresql.org/docs/8.2/interactive/information-schema.html
says they are not stable.
And - of course - it is much easier to type! ;)

Greetings,
    Alexander Elgert


Re: postgres slower on nested queries

From
Richard Huxton
Date:
Alexander Elgert wrote:
> Hello,
>
> I programmed a little script which iterates over all databases in a
> DBMS, iterating over all tables and then over all columns.
> This skript works for mysql and postgres.
>
> 1. Solution overview
> ====================
>
> foreach database {
>    foreach table {
>        foreach column {
>            do something ...
>        }
>    }
> }

OK - obviously not the way to do it if you really want all the columns
in all the databases...

> 1. Solution detail
> ==================
>
> foreach database
> ----------------
> mysql: show databases
> postgres: SELECT d.datname as Name FROM pg_catalog.pg_database d WHERE
> (d.datname != 'template0' AND d.datname != 'template1') ORDER BY 1;)

You can have template databases called other names too of course.

> foreach table
> -------------
> mysql: show tables
> postgres: select table_name from information_schema.tables where
> table_schema = 'public')
>
> foreach column
> --------------
> mysql: show columns from '%s'
> postgres: select column_name from information_schema.columns where
> table_name = '%s')
>
> (If there are better queries for postgres, please let me know.)
>
> 2. Solution
> ===========
>
> I found the postgres version VERY slow, so a decided to fetch

Define VERY - it took what, milliseconds to do this? Seconds? Hours?

>    select table_name, column_name from information_schema.columns where
> table_schema = 'public'
>
> and wrote the output to an two dimensional array to process the elements
> later:
>    $tableA[$row['table_name']][$row['column_name']]=True;

Not sure what this code is meant to show.

> The second approach ist much faster, this must be because there is no
> nesting. ;(

What nesting? Are you trying to do sub-queries of some sort?

In your first example, you were querying for individual rows, in the
second you fetch them all at once. Of course it's quicker to run one
query and return many rows than many queries each returning one row.
Otherwise your RDBMS has something seriously wrong with it.

> Further - I think - there is no real caching done in the PHP Library of
> xampp 1.5.4, php_pgsql.dll ;(

No (not that I'm sure what xampp is). Look into memcache/pg_memcache if
you want caching - that does it properly.

> It would be very helpful what causes the problem.

You haven't actually described a problem afaict. Can you describe what
it is you're trying to do? Do you want:
   1. Details of a single column
   2. To loop through all the columns doing something
If you can say which of these you're trying to do, and what the
"something" is, then I'm sure there's somebody here who can help.

> Another question rises from this task:
> ======================================
>
> Is there any possibility to use the shortcut \d as a query from PHP?
> I used psql -E to display the query, but these queries rely on the
> system catalogs, but the page:
> http://www.postgresql.org/docs/8.2/interactive/information-schema.html
> says they are not stable.
> And - of course - it is much easier to type! ;)

Nope - \d is psql only. That's because what it does changes with the
system catalogues. If you want a stable representation, use the
information_schema. If you want all the PostgreSQL-specific details
you'll have to cope with changes to the system catalogues. However, I
seem to remember some utility views that pgadmin uses that might be of use.

Oh - and you might find phppgadmin useful to look at.

--
   Richard Huxton
   Archonet Ltd

Re: postgres slower on nested queries

From
Tom Lane
Date:
Richard Huxton <dev@archonet.com> writes:
> Alexander Elgert wrote:
>> I found the postgres version VERY slow, so a decided to fetch

> Define VERY - it took what, milliseconds to do this? Seconds? Hours?

I think he's complaining that the standards-conformant view in Postgres
is slower than the specialized SHOW command in mysql.  For an
apples-to-apples comparison, select directly from the system catalogs
--- the information_schema views are definitely slower, because they
have to enforce various spec restrictions (eg that you can't see info
about tables you don't have access to).

            regards, tom lane

Re: postgres slower on nested queries

From
Richard Huxton
Date:
Tom Lane wrote:
> Richard Huxton <dev@archonet.com> writes:
>> Alexander Elgert wrote:
>>> I found the postgres version VERY slow, so a decided to fetch
>
>> Define VERY - it took what, milliseconds to do this? Seconds? Hours?
>
> I think he's complaining that the standards-conformant view in Postgres
> is slower than the specialized SHOW command in mysql.  For an
> apples-to-apples comparison, select directly from the system catalogs
> --- the information_schema views are definitely slower, because they
> have to enforce various spec restrictions (eg that you can't see info
> about tables you don't have access to).

I'm still seeing times ~ 3ms to find one table and ~18ms for one column
on my test server. Around ~ 300ms for 1169 columns. I still can't quite
see what you'd do that would need to know about individual columns in
such a hurry.

--
   Richard Huxton
   Archonet Ltd

Re: postgres slower on nested queries

From
Alexander Elgert
Date:
Richard Huxton schrieb:
> Alexander Elgert wrote:
>> Hello,
>>
>> I programmed a little script which iterates over all databases in a
>> DBMS, iterating over all tables and then over all columns.
>> This skript works for mysql and postgres.
>>
>> 1. Solution overview
>> ====================
>>
>> foreach database {
>>    foreach table {
>>        foreach column {
>>            do something ...
>>        }
>>    }
>> }
>
> OK - obviously not the way to do it if you really want all the columns
> in all the databases...
>
>> 1. Solution detail
>> ==================
>>
>> foreach database
>> ----------------
>> mysql: show databases
>> postgres: SELECT d.datname as Name FROM pg_catalog.pg_database d
>> WHERE (d.datname != 'template0' AND d.datname != 'template1') ORDER
>> BY 1;)
>
> You can have template databases called other names too of course.
>
>> foreach table
>> -------------
>> mysql: show tables
>> postgres: select table_name from information_schema.tables where
>> table_schema = 'public')
>>
>> foreach column
>> --------------
>> mysql: show columns from '%s'
>> postgres: select column_name from information_schema.columns where
>> table_name = '%s')
>>
>> (If there are better queries for postgres, please let me know.)
>>
>> 2. Solution
>> ===========
>>
>> I found the postgres version VERY slow, so a decided to fetch
>
> Define VERY - it took what, milliseconds to do this? Seconds? Hours?

I wrote two different functions to get the data out of the database, the
one uses a huge amount of queries, the other uses an array to hold the data.
Because in mysql I have to use "show tables/show columns from ..." I
thought I can use the same style in postgres.

Of course, it works, but the unoptimized postgres code is 80 times
slower. ;(
The overhead for a single 'show columns from ...' seems to be small in
mysql.
Please read on, I hope I am able to show you the difference.

>
>>    select table_name, column_name from information_schema.columns
>> where table_schema = 'public'
>>
>> and wrote the output to an two dimensional array to process the
>> elements later:
>>    $tableA[$row['table_name']][$row['column_name']]=True;
>
> Not sure what this code is meant to show.
foreach database {
   foreach table_name, column_name {
       $tableA[$row['table_name']][$row['column_name']]=True;
   }
   // process 2-dim $tableA array
}

This results in a structure where I can itereate over all keys in the
2-dim array.
You can see I iterate first over the databases and then over table AND
columns!
--- mysql: ~1s (Database X)
--- postgres: ~1s (Database Y)
;)

In contrast: =======================================================

foreach database {
   foreach table {
       foreach column {
           do something ...
       }
   }
}
--- mysql: ~1s (Database X)
--- postgres: ~80s (Database Y)
;(

>
>> The second approach ist much faster, this must be because there is no
>> nesting. ;(
>
> What nesting? Are you trying to do sub-queries of some sort?
I did a loop over all tables and THEN calling a query for each table to
get the columns (from the same table).
Yes, there are definitively more queries the DBMS has to manage.
(It is a bad style, but it is intuitive. Maybe the overhead of a single
query is more time consuming than in mysql.)
>
> In your first example, you were querying for individual rows, in the
> second you fetch them all at once. Of course it's quicker to run one
> query and return many rows than many queries each returning one row.
> Otherwise your RDBMS has something seriously wrong with it.
>
>> Further - I think - there is no real caching done in the PHP Library
>> of xampp 1.5.4, php_pgsql.dll ;(
>
> No (not that I'm sure what xampp is). Look into memcache/pg_memcache
> if you want caching - that does it properly.
Thanks.
(xampp is an environment to develop wep applications.)
>
>
>> It would be very helpful what causes the problem.
>
> You haven't actually described a problem afaict. Can you describe what
> it is you're trying to do? Do you want:
>   1. Details of a single column
>   2. To loop through all the columns doing something
> If you can say which of these you're trying to do, and what the
> "something" is, then I'm sure there's somebody here who can help.
I think it will be the best to provide example code.
>
>> Another question rises from this task:
>> ======================================
>>
>> Is there any possibility to use the shortcut \d as a query from PHP?
>> I used psql -E to display the query, but these queries rely on the
>> system catalogs, but the page:
>> http://www.postgresql.org/docs/8.2/interactive/information-schema.html
>> says they are not stable.
>> And - of course - it is much easier to type! ;)
>
> Nope - \d is psql only. That's because what it does changes with the
> system catalogues. If you want a stable representation, use the
> information_schema. If you want all the PostgreSQL-specific details
> you'll have to cope with changes to the system catalogues. However, I
> seem to remember some utility views that pgadmin uses that might be of
> use.
It would be nice to have the \d-query stored somewhere in the database.
>
> Oh - and you might find phppgadmin useful to look at.
>
Thanks.

Greetings,
    Alexander Elgert


Re: postgres slower on nested queries

From
Martijn van Oosterhout
Date:
On Wed, Mar 07, 2007 at 12:12:52PM +0100, Alexander Elgert wrote:
> >Nope - \d is psql only. That's because what it does changes with the
> >system catalogues. If you want a stable representation, use the
> >information_schema. If you want all the PostgreSQL-specific details
> >you'll have to cope with changes to the system catalogues. However, I
> >seem to remember some utility views that pgadmin uses that might be of
> >use.
> It would be nice to have the \d-query stored somewhere in the database.

Not really useful anyway. For the kind of trivial stuff you're doing
the queries will work for many many versions. Your real problem is that
you're using the information_schema which has to do all sorts of extra
work for standards compliance. If you don't need that you're going to
be much better off just using the system catalogs.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

Re: postgres slower on nested queries

From
Richard Huxton
Date:
Alexander Elgert wrote:
>
> This results in a structure where I can itereate over all keys in the
> 2-dim array.
> You can see I iterate first over the databases and then over table AND
> columns!
> --- mysql: ~1s (Database X)
> --- postgres: ~1s (Database Y)
> ;)
>
> In contrast: =======================================================
>
> foreach database {
>   foreach table {
>       foreach column {
>           do something ...
>       }
>   }
> }
> --- mysql: ~1s (Database X)
> --- postgres: ~80s (Database Y)
> ;(
>
>>> The second approach ist much faster, this must be because there is no
>>> nesting. ;(
>>
>> What nesting? Are you trying to do sub-queries of some sort?
> I did a loop over all tables and THEN calling a query for each table to
> get the columns (from the same table).
> Yes, there are definitively more queries the DBMS has to manage.
> (It is a bad style, but it is intuitive. Maybe the overhead of a single
> query is more time consuming than in mysql.)

I think I see what you're doing now. As Tom says, the information_schema
has overheads but I must say I'm surprised at it taking 80 seconds.

I can see how you might find it more intuitive. I think the other way
around myself - grab it all then process it.

--
   Richard Huxton
   Archonet Ltd

Re: postgres slower on nested queries

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> ... Your real problem is that
> you're using the information_schema which has to do all sorts of extra
> work for standards compliance. If you don't need that you're going to
> be much better off just using the system catalogs.

To put that in some perspective:

regression=> create table fooey(f1 int);
CREATE TABLE
regression=> \timing
Timing is on.
regression=> select column_name from information_schema.columns where table_nam
e = 'fooey';
 column_name
-------------
 f1
(1 row)

Time: 84.388 ms
regression=> select attname from pg_attribute where attrelid = 'fooey'::regclas
s and attnum > 0 and not attisdropped;
 attname
---------
 f1
(1 row)

Time: 7.834 ms
regression=>

When you look at the plans involved (information_schema.columns is an
8-way join) the surprising thing is not that the first query is so slow,
it's that it's so fast.

            regards, tom lane

Re: postgres slower on nested queries

From
Alexander Elgert
Date:
Tom Lane schrieb:
> Martijn van Oosterhout <kleptog@svana.org> writes:
>
>> ... Your real problem is that
>> you're using the information_schema which has to do all sorts of extra
>> work for standards compliance. If you don't need that you're going to
>> be much better off just using the system catalogs.
>>
>
> To put that in some perspective:
>
> regression=> create table fooey(f1 int);
> CREATE TABLE
> regression=> \timing
> Timing is on.
> regression=> select column_name from information_schema.columns where table_nam
> e = 'fooey';
>  column_name
> -------------
>  f1
> (1 row)
>
> Time: 84.388 ms
> regression=> select attname from pg_attribute where attrelid = 'fooey'::regclas
> s and attnum > 0 and not attisdropped;
>  attname
> ---------
>  f1
> (1 row)
>
> Time: 7.834 ms
> regression=>
>
> When you look at the plans involved (information_schema.columns is an
> 8-way join) the surprising thing is not that the first query is so slow,
> it's that it's so fast.
>
AAAH! That is the reason.

So creating a temporary table from the information schema would speedup
the whole thing by factor 10 (at least), I guess.
Thanks a lot!
;)

Greetings,
    Alexander Elgert