Thread: Enumeration of tables is very slow in largish database

Enumeration of tables is very slow in largish database

From

Kirill Müller

Date:

11 January 2012, 06:07:41

Hi all,

we have a Postgres/PostGIS database with 100+ schemas and 200+ tables in
each schema, generated automatically. When adding a new PostGIS layer in
QGis, the application obviously enumerates all tables, and this takes
minutes. Even browsing the database in pgAdmin3 is horribly slow -- it
takes several seconds to e.g. open a schema (click on a schema's "+" in
the tree view).

The problems occurred only after adding that many schemas to the
database. Before, with only 10+ schemas, the performance was acceptable.

Is this a known limitation of Postgres, or perhaps a misconfiguration of
our installation? What would you suggest to improve performance here? We
currently don't have administration rights for the database or login
rights for the server machine (Linux), but I think we'll need to take
care about that.


Best regards

Kirill

Re: Enumeration of tables is very slow in largish database

From

Guillaume Lelarge

Date:

11 January 2012, 09:41:58

On Wed, 2012-01-11 at 11:07 +0100, Kirill Müller wrote:
> Hi all,
>
> we have a Postgres/PostGIS database with 100+ schemas and 200+ tables in
> each schema, generated automatically. When adding a new PostGIS layer in
> QGis, the application obviously enumerates all tables, and this takes
> minutes. Even browsing the database in pgAdmin3 is horribly slow -- it
> takes several seconds to e.g. open a schema (click on a schema's "+" in
> the tree view).
>
> The problems occurred only after adding that many schemas to the
> database. Before, with only 10+ schemas, the performance was acceptable.
>
> Is this a known limitation of Postgres, or perhaps a misconfiguration of
> our installation?

I don't think it has anything to do with PostgreSQL. It has to do with
the client. I don't know for QGis but, in the case of pgAdmin, I'm
pretty sure the issue is pgAdmin. When you click on the + sign of a
schema, pgAdmin has to get all the informations on the schema: tables,
columns, constraints, triggers, and all the other objets found in the
schema. It could take some time. Moreover, pgAdmin has to put all this
in the treeview, and my guess would be that it's probably the most time
consuming operation here. We could probably get better performance, but
I didn't have the time to look at that yet.

--
Guillaume
http://blog.guillaume.lelarge.info
http://www.dalibo.com
PostgreSQL Sessions #3: http://www.postgresql-sessions.org

Re: Enumeration of tables is very slow in largish database

From

Andres Freund

Date:

11 January 2012, 09:44:51

On Wednesday, January 11, 2012 11:07:23 AM Kirill Müller wrote:
> Hi all,
>
> we have a Postgres/PostGIS database with 100+ schemas and 200+ tables in
> each schema, generated automatically. When adding a new PostGIS layer in
> QGis, the application obviously enumerates all tables, and this takes
> minutes. Even browsing the database in pgAdmin3 is horribly slow -- it
> takes several seconds to e.g. open a schema (click on a schema's "+" in
> the tree view).
Are you actually sure its the database and not just pgadmin thats getting
really slow?

If you connect via psql and use \dt (see \? for a list of commands) and
consorts, is it that slow as well?

Andres

Re: Enumeration of tables is very slow in largish database

From

Guillaume Lelarge

Date:

11 January 2012, 09:53:27

On Wed, 2012-01-11 at 14:44 +0100, Andres Freund wrote:
> On Wednesday, January 11, 2012 11:07:23 AM Kirill Müller wrote:
> > Hi all,
> >
> > we have a Postgres/PostGIS database with 100+ schemas and 200+ tables in
> > each schema, generated automatically. When adding a new PostGIS layer in
> > QGis, the application obviously enumerates all tables, and this takes
> > minutes. Even browsing the database in pgAdmin3 is horribly slow -- it
> > takes several seconds to e.g. open a schema (click on a schema's "+" in
> > the tree view).
> Are you actually sure its the database and not just pgadmin thats getting
> really slow?
>
> If you connect via psql and use \dt (see \? for a list of commands) and
> consorts, is it that slow as well?
>

\dt won't be as slow as pgAdmin. \dt only gets the table name, owner,
and stuff like that. Kinda quick. pgAdmin will get also all the other
informations, like columns, triggers, constraints, functions, types,
etc.


--
Guillaume
http://blog.guillaume.lelarge.info
http://www.dalibo.com
PostgreSQL Sessions #3: http://www.postgresql-sessions.org

Re: Enumeration of tables is very slow in largish database

From

Andres Freund

Date:

11 January 2012, 10:10:53

On Wednesday, January 11, 2012 02:53:06 PM Guillaume Lelarge wrote:
> On Wed, 2012-01-11 at 14:44 +0100, Andres Freund wrote:
> > On Wednesday, January 11, 2012 11:07:23 AM Kirill Müller wrote:
> > > Hi all,
> > >
> > > we have a Postgres/PostGIS database with 100+ schemas and 200+ tables
> > > in each schema, generated automatically. When adding a new PostGIS
> > > layer in QGis, the application obviously enumerates all tables, and
> > > this takes minutes. Even browsing the database in pgAdmin3 is horribly
> > > slow -- it takes several seconds to e.g. open a schema (click on a
> > > schema's "+" in the tree view).
> >
> > Are you actually sure its the database and not just pgadmin thats getting
> > really slow?
> >
> > If you connect via psql and use \dt (see \? for a list of commands) and
> > consorts, is it that slow as well?
>
> \dt won't be as slow as pgAdmin. \dt only gets the table name, owner,
> and stuff like that. Kinda quick. pgAdmin will get also all the other
> informations, like columns, triggers, constraints, functions, types,
> etc.
Yes, sure. My guess is that the gui/pgadmin is the bottleneck and not postgres
itself. Its hard to really do all what pgadmin does at once inside psql
though.

Andres

Re: Enumeration of tables is very slow in largish database

From

Guillaume Lelarge

Date:

11 January 2012, 10:14:33

On Wed, 2012-01-11 at 15:10 +0100, Andres Freund wrote:
> On Wednesday, January 11, 2012 02:53:06 PM Guillaume Lelarge wrote:
> > On Wed, 2012-01-11 at 14:44 +0100, Andres Freund wrote:
> > > On Wednesday, January 11, 2012 11:07:23 AM Kirill Müller wrote:
> > > > Hi all,
> > > >
> > > > we have a Postgres/PostGIS database with 100+ schemas and 200+ tables
> > > > in each schema, generated automatically. When adding a new PostGIS
> > > > layer in QGis, the application obviously enumerates all tables, and
> > > > this takes minutes. Even browsing the database in pgAdmin3 is horribly
> > > > slow -- it takes several seconds to e.g. open a schema (click on a
> > > > schema's "+" in the tree view).
> > >
> > > Are you actually sure its the database and not just pgadmin thats getting
> > > really slow?
> > >
> > > If you connect via psql and use \dt (see \? for a list of commands) and
> > > consorts, is it that slow as well?
> >
> > \dt won't be as slow as pgAdmin. \dt only gets the table name, owner,
> > and stuff like that. Kinda quick. pgAdmin will get also all the other
> > informations, like columns, triggers, constraints, functions, types,
> > etc.
> Yes, sure. My guess is that the gui/pgadmin is the bottleneck and not postgres
> itself. Its hard to really do all what pgadmin does at once inside psql
> though.
>

Yeah, sure enough.


--
Guillaume
http://blog.guillaume.lelarge.info
http://www.dalibo.com
PostgreSQL Sessions #3: http://www.postgresql-sessions.org

Re: Enumeration of tables is very slow in largish database

From

Adrian Klaver

Date:

11 January 2012, 11:32:28

On Wednesday, January 11, 2012 2:07:23 am Kirill Müller wrote:
> Hi all,
>
> we have a Postgres/PostGIS database with 100+ schemas and 200+ tables in
> each schema, generated automatically. When adding a new PostGIS layer in
> QGis, the application obviously enumerates all tables, and this takes
> minutes. Even browsing the database in pgAdmin3 is horribly slow -- it
> takes several seconds to e.g. open a schema (click on a schema's "+" in
> the tree view).

Would seem that they both have the same issue, namely pulling over the table
names and the meta data is resource intensive.

Not a QGis user but I did find this in the manual:

http://download.osgeo.org/qgis/doc/manual/qgis-1.7.0_user_guide_en.pdf

See in-line comment:

4.2.2. Loading a PostGIS Layer

Once you have one or more connections deﬁned, you can load layers from the
PostgreSQL database.
Of course this requires having data in PostgreSQL. See Section 4.2.4 for a
discussion on importing data into
the database.
To load a layer from PostGIS, perform the following steps:
– If the Add PostGIS Table(s) dialog is not already open, click on the Add
PostGIS Layer toolbar button.
– Choose the connection from the drop-down list and click Connect .
– Select or unselect Also list tables with no geometry
– Optionally use some Search Options to deﬁne which features to load from the
layer or use the
Build query icon to start the Query builder dialog.

^^^^^^^^^^^^^^^
Wonder if it would be possible to restrict the dataset(tables) by using the
above?

– Find the layer(s) you wish to add in the list of available layers.
– Select it by clicking on it. You can select multiple layers by holding down
the shift key while clicking.
See Section 4.6 for information on using the PostgreSQL Query Builder to further
deﬁne the layer.
– Click on the Add button to add the layer to the map.

>
> The problems occurred only after adding that many schemas to the
> database. Before, with only 10+ schemas, the performance was acceptable.
>
> Is this a known limitation of Postgres, or perhaps a misconfiguration of
> our installation? What would you suggest to improve performance here? We
> currently don't have administration rights for the database or login
> rights for the server machine (Linux), but I think we'll need to take
> care about that.
>
>
> Best regards
>
> Kirill

--
Adrian Klaver
adrian.klaver@gmail.com

Re: Enumeration of tables is very slow in largish database

From

Scott Marlowe

Date:

11 January 2012, 14:01:26

On Wed, Jan 11, 2012 at 3:07 AM, Kirill Müller
<kirill.mueller@ivt.baug.ethz.ch> wrote:
> Hi all,
>
> we have a Postgres/PostGIS database with 100+ schemas and 200+ tables in
> each schema, generated automatically. When adding a new PostGIS layer in
> QGis, the application obviously enumerates all tables, and this takes
> minutes. Even browsing the database in pgAdmin3 is horribly slow -- it takes
> several seconds to e.g. open a schema (click on a schema's "+" in the tree
> view).
>
> The problems occurred only after adding that many schemas to the database.
> Before, with only 10+ schemas, the performance was acceptable.
>
> Is this a known limitation of Postgres, or perhaps a misconfiguration of our
> installation? What would you suggest to improve performance here? We
> currently don't have administration rights for the database or login rights
> for the server machine (Linux), but I think we'll need to take care about
> that.

This is a problem I've run into before, but I can't find the previous
post on it.  When you run a \d command, if you run top on your server
do you see a single CPU spinning hard on that one command?  If so then
it's a pg server side problem, which is what I had on one server with
~40k objects in it.

Off the top of my head I remember something like this helping:

alter function pg_table_is_visible cost 10;

But I'm not sure that's it.  Maybe Tom Lane can pipe up on this.

Re: Enumeration of tables is very slow in largish database

From

Kirill Müller

Date:

11 January 2012, 15:45:40

Adrian,


On 01/11/2012 04:32 PM, Adrian Klaver wrote:
> Would seem that they both have the same issue, namely pulling over the table
> names and the meta data is resource intensive.
The problem is that the slow part is the "connect", just at the ***** in
the in-line comment :-)
> 4.2.2. Loading a PostGIS Layer
>
> Once you have one or more connections deﬁned, you can load layers from the
> PostgreSQL database.
> Of course this requires having data in PostgreSQL. See Section 4.2.4 for a
> discussion on importing data into
> the database.
> To load a layer from PostGIS, perform the following steps:
> – If the Add PostGIS Table(s) dialog is not already open, click on the Add
> PostGIS Layer toolbar button.
> – Choose the connection from the drop-down list and click Connect .
*******
> – Select or unselect Also list tables with no geometry
> – Optionally use some Search Options to deﬁne which features to load from the
> layer or use the
> Build query icon to start the Query builder dialog.
>
> ^^^^^^^^^^^^^^^
> Wonder if it would be possible to restrict the dataset(tables) by using the
> above?
Thanks for your feedback anyway, this helps clarifying the issue.


Cheers

Kirill

--
_________________________________________________
ETH Zürich
Institute for Transport Planning and Systems
HIL F 32.2
Wolfgang-Pauli-Str. 15
8093 Zürich

Phone:       +41 44 633 33 17
Fax:         +41 44 633 10 57
Secretariat: +41 44 633 31 05
E-Mail:      kirill.mueller@ivt.baug.ethz.ch

Re: Enumeration of tables is very slow in largish database

From

Kirill Müller

Date:

11 January 2012, 15:50:28

On 01/11/2012 02:44 PM, Andres Freund wrote:
> On Wednesday, January 11, 2012 11:07:23 AM Kirill Müller wrote:
>> Hi all,
>>
>> we have a Postgres/PostGIS database with 100+ schemas and 200+ tables in
>> each schema, generated automatically. When adding a new PostGIS layer in
>> QGis, the application obviously enumerates all tables, and this takes
>> minutes. Even browsing the database in pgAdmin3 is horribly slow -- it
>> takes several seconds to e.g. open a schema (click on a schema's "+" in
>> the tree view).
> Are you actually sure its the database and not just pgadmin thats getting
> really slow?
>
> If you connect via psql and use \dt (see \? for a list of commands) and
> consorts, is it that slow as well?
\dt is quick, I haven't tried the other commands, though. I have built
qgis from source and will try to find out this way which query stalls. I
wonder if Postgres has a profiling tool like MS SQL Server that would
allow tracing the queries and their runtime while they are executed. Or
perhaps there are logs? Could you give me some pointers, please?


Kirill

--
_________________________________________________
ETH Zürich
Institute for Transport Planning and Systems
HIL F 32.2
Wolfgang-Pauli-Str. 15
8093 Zürich

Phone:       +41 44 633 33 17
Fax:         +41 44 633 10 57
Secretariat: +41 44 633 31 05
E-Mail:      kirill.mueller@ivt.baug.ethz.ch

Re: Enumeration of tables is very slow in largish database

From

Pavel Stehule

Date:

11 January 2012, 15:59:39

2012/1/11 Kirill Müller <kirill.mueller@ivt.baug.ethz.ch>:
> On 01/11/2012 02:44 PM, Andres Freund wrote:
>>
>> On Wednesday, January 11, 2012 11:07:23 AM Kirill Müller wrote:
>>>
>>> Hi all,
>>>
>>> we have a Postgres/PostGIS database with 100+ schemas and 200+ tables in
>>> each schema, generated automatically. When adding a new PostGIS layer in
>>> QGis, the application obviously enumerates all tables, and this takes
>>> minutes. Even browsing the database in pgAdmin3 is horribly slow -- it
>>> takes several seconds to e.g. open a schema (click on a schema's "+" in
>>> the tree view).
>>
>> Are you actually sure its the database and not just pgadmin thats getting
>> really slow?
>>
>> If you connect via psql and use \dt (see \? for a list of commands) and
>> consorts, is it that slow as well?
>
> \dt is quick, I haven't tried the other commands, though. I have built qgis
> from source and will try to find out this way which query stalls. I wonder
> if Postgres has a profiling tool like MS SQL Server that would allow tracing
> the queries and their runtime while they are executed. Or perhaps there are
> logs? Could you give me some pointers, please?

log_min_duration_statement = 0 in postgresql.conf and after reload, pg
logs all query to log

Regards

Pavel

>
>
>
> Kirill
>
> --
> _________________________________________________
> ETH Zürich
> Institute for Transport Planning and Systems
> HIL F 32.2
> Wolfgang-Pauli-Str. 15
> 8093 Zürich
>
> Phone:       +41 44 633 33 17
> Fax:         +41 44 633 10 57
> Secretariat: +41 44 633 31 05
> E-Mail:      kirill.mueller@ivt.baug.ethz.ch
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

Re: Enumeration of tables is very slow in largish database

From

Adrian Klaver

Date:

11 January 2012, 16:37:12

On 01/11/2012 11:45 AM, Kirill Müller wrote:
> Adrian,
>
>
> On 01/11/2012 04:32 PM, Adrian Klaver wrote:
>> Would seem that they both have the same issue, namely pulling over the
>> table
>> names and the meta data is resource intensive.
> The problem is that the slow part is the "connect", just at the ***** in
> the in-line comment :-)

More digging:
Looks like QGIS use the geometry_columns table first to determine the
geometry aware tables, failing that it walks the tables looking for
geometry columns. Is the geometry_columns table up to date with the
actual tables?

> Cheers
>
> Kirill
>

--
Adrian Klaver
adrian.klaver@gmail.com

Re: Enumeration of tables is very slow in largish database

From

Reid Thompson

Date:

11 January 2012, 17:33:32

On Wed, 2012-01-11 at 20:50 +0100, Kirill Müller wrote:
> that would 
> allow tracing the queries and their runtime while they are executed \

http://www.postgresql.org/docs/8.4/static/auto-explain.html

Re: Enumeration of tables is very slow in largish database

From

Kirill Müller

Date:

11 January 2012, 17:36:35

On 01/11/2012 09:36 PM, Adrian Klaver wrote:
> On 01/11/2012 11:45 AM, Kirill Müller wrote:
>> On 01/11/2012 04:32 PM, Adrian Klaver wrote:
>>> Would seem that they both have the same issue, namely pulling over the
>>> table
>>> names and the meta data is resource intensive.
>> The problem is that the slow part is the "connect"...
> More digging:
> Looks like QGIS use the geometry_columns table first to determine the
> geometry aware tables, failing that it walks the tables looking for
> geometry columns. Is the geometry_columns table up to date with the
> actual tables?
Thank you for the tip. In the current version of QGIS, I can select
"only look in the geometry_columns table" as an option. For 40000+
entries in this table, it took just some seconds to enumerate all
tables. This workaround solves the most urgent problems (I'm currently
rebuilding the geometry_columns table).

Kirill

--
_________________________________________________
ETH Zürich
Institute for Transport Planning and Systems
HIL F 32.2
Wolfgang-Pauli-Str. 15
8093 Zürich

Phone:       +41 44 633 33 17
Fax:         +41 44 633 10 57
Secretariat: +41 44 633 31 05
E-Mail:      kirill.mueller@ivt.baug.ethz.ch

Re: Enumeration of tables is very slow in largish database

From

Kirill Müller

Date:

11 January 2012, 19:28:16

On 01/11/2012 07:00 PM, Scott Marlowe wrote:
> This is a problem I've run into before, but I can't find the previous
> post on it.  When you run a \d command, if you run top on your server
> do you see a single CPU spinning hard on that one command?  If so then
> it's a pg server side problem, which is what I had on one server with
> ~40k objects in it.
>
> Off the top of my head I remember something like this helping:
>
> alter function pg_table_is_visible cost 10;
Thanks for the feedback. I found the relevant parts in the qgis source
code and have been able to trace the problem. It's just a sub-optimal
query issued by qgis:

SELECT
pg_class.relname,pg_namespace.nspname,pg_attribute.attname,pg_class.relkind
FROM pg_attribute,pg_class,pg_namespace
WHERE pg_namespace.oid=pg_class.relnamespace AND pg_attribute.attrelid =
pg_class.oid
AND ( EXISTS (SELECT * FROM pg_type WHERE
pg_type.oid=pg_attribute.atttypid AND pg_type.typname IN
('geometry','geography','topogeometry')) OR pg_attribute.atttypid IN
(SELECT oid FROM pg_type a WHERE EXISTS (SELECT * FROM pg_type b WHERE
a.typbasetype=b.oid AND b.typname IN
('geometry','geography','topogeometry'))))
AND has_schema_privilege( pg_namespace.nspname, 'usage' )
AND has_table_privilege( '"' || pg_namespace.nspname || '"."' ||
pg_class.relname || '"', 'select' )
AND NOT EXISTS (SELECT * FROM geometry_columns WHERE
pg_namespace.nspname=f_table_schema AND pg_class.relname=f_table_name)
AND pg_class.relkind IN ('v','r');

When leaving out the last two "AND NOT EXISTS..." parts, the query
finishes in no time. I have attached the output of EXPLAIN ANALYZE -- if
I understand the execution tree correctly, the time is burnt in repeated
sequential scans of the geometry_columns table (line 38).? Rewriting the
"AND NOT EXISTS" part using WITH solves the performance issues here, but
works only from Postgres 8.4. Any idea how to speed up this query for
older versions? (Creating a temporary table or an index should be avoided.)

Kirill

Attachment

qgis_list_geometry_columns.txt

Re: Enumeration of tables is very slow in largish database

From

"David Johnston"

Date:

11 January 2012, 19:44:15

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Kirill Müller
Sent: Wednesday, January 11, 2012 6:28 PM
To: pgsql-general@postgresql.org
Cc: Scott Marlowe
Subject: Re: [GENERAL] Enumeration of tables is very slow in largish
database

On 01/11/2012 07:00 PM, Scott Marlowe wrote:
> This is a problem I've run into before, but I can't find the previous
> post on it.  When you run a \d command, if you run top on your server
> do you see a single CPU spinning hard on that one command?  If so then
> it's a pg server side problem, which is what I had on one server with
> ~40k objects in it.
>
> Off the top of my head I remember something like this helping:
>
> alter function pg_table_is_visible cost 10;
Thanks for the feedback. I found the relevant parts in the qgis source code
and have been able to trace the problem. It's just a sub-optimal query
issued by qgis:

SELECT
pg_class.relname,pg_namespace.nspname,pg_attribute.attname,pg_class.relkind
FROM pg_attribute,pg_class,pg_namespace
WHERE pg_namespace.oid=pg_class.relnamespace AND pg_attribute.attrelid =
pg_class.oid AND ( EXISTS (SELECT * FROM pg_type WHERE
pg_type.oid=pg_attribute.atttypid AND pg_type.typname IN
('geometry','geography','topogeometry')) OR pg_attribute.atttypid IN (SELECT
oid FROM pg_type a WHERE EXISTS (SELECT * FROM pg_type b WHERE
a.typbasetype=b.oid AND b.typname IN
('geometry','geography','topogeometry'))))
AND has_schema_privilege( pg_namespace.nspname, 'usage' ) AND
has_table_privilege( '"' || pg_namespace.nspname || '"."' ||
pg_class.relname || '"', 'select' ) AND NOT EXISTS (SELECT * FROM
geometry_columns WHERE pg_namespace.nspname=f_table_schema AND
pg_class.relname=f_table_name) AND pg_class.relkind IN ('v','r');

When leaving out the last two "AND NOT EXISTS..." parts, the query finishes
in no time. I have attached the output of EXPLAIN ANALYZE -- if I understand
the execution tree correctly, the time is burnt in repeated sequential scans
of the geometry_columns table (line 38).? Rewriting the "AND NOT EXISTS"
part using WITH solves the performance issues here, but works only from
Postgres 8.4. Any idea how to speed up this query for older versions?
(Creating a temporary table or an index should be avoided.)

Kirill

----------------------------------------------------------------------------
---------------------

I only see one (1) "AND NOT EXISTS" in the provided query.

Syntax may be a little off but:

... AND (f_table_schema, f_table_name) NOT IN (  SELECT (nspname, relname)
FROM geometry_columns  )

Should work since it is no longer a correlated sub-query; whether the size
of geometry_columns makes this better or worse performing is impossible to
tell without testing but it isn't that much different than using a WITH/CTE.

David J.

Re: Enumeration of tables is very slow in largish database

From

Kirill Müller

Date:

11 January 2012, 20:11:47

I am impressed. Execution time dropped to less than one second. Thanks a
lot!

On 01/12/2012 12:43 AM, David Johnston wrote:
> I only see one (1) "AND NOT EXISTS" in the provided query.
Sorry, there used to be two "AND NOT EXISTS", but I edited the query
without updating the text.
> Syntax may be a little off but:
>
> ... AND (f_table_schema, f_table_name) NOT IN (  SELECT (nspname, relname)
> FROM geometry_columns  )
Just for the record:

...AND (nspname, relname) NOT IN (SELECT f_table_schema, f_table_name
FROM geometry_columns)
> Should work since it is no longer a correlated sub-query; whether the size
> of geometry_columns makes this better or worse performing is impossible to
> tell without testing but it isn't that much different than using a WITH/CTE.
The execution plan looks much nicer (attached). I'd guess that the
altered query might lose a bit if geometry_columns has only very few
entries.? Apparently it gains a lot if the table is populated.

Thanks again!


Kirill

Attachment

qgis_list_geometry_columns_improved.txt

Re: Enumeration of tables is very slow in largish database

From

"David Johnston"

Date:

11 January 2012, 20:19:24

-----Original Message-----
From: Kirill Müller [mailto:kirill.mueller@ivt.baug.ethz.ch]
Sent: Wednesday, January 11, 2012 7:11 PM
To: David Johnston
Cc: pgsql-general@postgresql.org; 'Scott Marlowe'
Subject: Re: [GENERAL] Enumeration of tables is very slow in largish
database

I am impressed. Execution time dropped to less than one second. Thanks a
lot!

On 01/12/2012 12:43 AM, David Johnston wrote:
> I only see one (1) "AND NOT EXISTS" in the provided query.
Sorry, there used to be two "AND NOT EXISTS", but I edited the query without
updating the text.
> Syntax may be a little off but:
>
> ... AND (f_table_schema, f_table_name) NOT IN (  SELECT (nspname,
> relname) FROM geometry_columns  )
Just for the record:

...AND (nspname, relname) NOT IN (SELECT f_table_schema, f_table_name FROM
geometry_columns)
> Should work since it is no longer a correlated sub-query; whether the
> size of geometry_columns makes this better or worse performing is
> impossible to tell without testing but it isn't that much different than
using a WITH/CTE.
The execution plan looks much nicer (attached). I'd guess that the altered
query might lose a bit if geometry_columns has only very few entries.?
Apparently it gains a lot if the table is populated.

Thanks again!

Kirill

--------------------------------------------------------------------

Performance for IN should increase as the results from geometry_columns
decrease since the IN target becomes smaller - thus fewer entries to compare
against.  EXISTS works better than IN if the IN target is large AS LONG AS
the query that exists is using can use an Index.  Since your query was
performing a sequential scan pretty much any size IN target will be better
performing.  For small IN targets and index-using EXISTS it likely matters
very little which one you use.

David J.

Re: Enumeration of tables is very slow in largish database

From

Tom Lane

Date:

11 January 2012, 20:34:56

=?ISO-8859-1?Q?Kirill_M=FCller?= <kirill.mueller@ivt.baug.ethz.ch> writes:
> Thanks for the feedback. I found the relevant parts in the qgis source
> code and have been able to trace the problem. It's just a sub-optimal
> query issued by qgis:

> SELECT
> pg_class.relname,pg_namespace.nspname,pg_attribute.attname,pg_class.relkind
> FROM pg_attribute,pg_class,pg_namespace
> WHERE pg_namespace.oid=pg_class.relnamespace AND pg_attribute.attrelid =
> pg_class.oid
> AND ( EXISTS (SELECT * FROM pg_type WHERE
> pg_type.oid=pg_attribute.atttypid AND pg_type.typname IN
> ('geometry','geography','topogeometry')) OR pg_attribute.atttypid IN
> (SELECT oid FROM pg_type a WHERE EXISTS (SELECT * FROM pg_type b WHERE
> a.typbasetype=b.oid AND b.typname IN
> ('geometry','geography','topogeometry'))))
> AND has_schema_privilege( pg_namespace.nspname, 'usage' )
> AND has_table_privilege( '"' || pg_namespace.nspname || '"."' ||
> pg_class.relname || '"', 'select' )
> AND NOT EXISTS (SELECT * FROM geometry_columns WHERE
> pg_namespace.nspname=f_table_schema AND pg_class.relname=f_table_name)
> AND pg_class.relkind IN ('v','r');

> When leaving out the last two "AND NOT EXISTS..." parts, the query
> finishes in no time. I have attached the output of EXPLAIN ANALYZE -- if
> I understand the execution tree correctly, the time is burnt in repeated
> sequential scans of the geometry_columns table (line 38).

Yeah.  It wouldn't choose that plan if it weren't for the horrid rowcount
misestimate here:

>          ->  Hash Anti Join  (cost=30586.95..37075.41 rows=1 width=133) (actual time=945.911..1760.307 rows=17836
loops=1)

This is probably an indication of eqjoinsel_semi doing the wrong thing;
we've whacked that estimator around a few times now, so it's hard to
know whether this represents an already-fixed bug or not.  What PG
version are you using exactly?

> Rewriting the
> "AND NOT EXISTS" part using WITH solves the performance issues here, but
> works only from Postgres 8.4. Any idea how to speed up this query for
> older versions? (Creating a temporary table or an index should be avoided.)

Maybe use EXCEPT instead of a WHERE condition to get rid of the
already-present entries?

            regards, tom lane

Re: Enumeration of tables is very slow in largish database

From

Kirill Müller

Date:

11 January 2012, 20:46:16

On 01/12/2012 01:34 AM, Tom Lane wrote:
> =?ISO-8859-1?Q?Kirill_M=FCller?=<kirill.mueller@ivt.baug.ethz.ch>  writes:
>> When leaving out the last two "AND NOT EXISTS..." parts, the query
>> finishes in no time. I have attached the output of EXPLAIN ANALYZE -- if
>> I understand the execution tree correctly, the time is burnt in repeated
>> sequential scans of the geometry_columns table (line 38).
> Yeah.  It wouldn't choose that plan if it weren't for the horrid rowcount
> misestimate here:
>
>>           ->   Hash Anti Join  (cost=30586.95..37075.41 rows=1 width=133) (actual time=945.911..1760.307 rows=17836
loops=1)

I have VACUUM ANALYZE-d the table just before testing. Seems that this
didn't help here.
>
> This is probably an indication of eqjoinsel_semi doing the wrong thing;
> we've whacked that estimator around a few times now, so it's hard to
> know whether this represents an already-fixed bug or not.  What PG
> version are you using exactly?
muelleki@xxx:~$ psql
psql (8.4.8)
>
> Maybe use EXCEPT instead of a WHERE condition to get rid of the
> already-present entries?
Thank you for the hint, I was not aware of the rather elegant EXCEPT.
Using WHERE (..., ...) NOT IN (SELECT ..., ... FROM ...) as suggested by
David Johnston shows excellent performance (and better fits the code
that is generating the SQL), but I'll keep the EXCEPT option in mind.


Regards

Kirill

Re: Enumeration of tables is very slow in largish database

From

Tom Lane

Date:

11 January 2012, 21:02:29

=?ISO-8859-1?Q?Kirill_M=FCller?= <kirill.mueller@ivt.baug.ethz.ch> writes:
> On 01/12/2012 01:34 AM, Tom Lane wrote:
>> This is probably an indication of eqjoinsel_semi doing the wrong thing;
>> we've whacked that estimator around a few times now, so it's hard to
>> know whether this represents an already-fixed bug or not.  What PG
>> version are you using exactly?

> muelleki@xxx:~$ psql
> psql (8.4.8)

There were fixes for that in 8.4.9, so I'd be interested to know if you
get a better estimate in a more up-to-date version.

            regards, tom lane

Re: Enumeration of tables is very slow in largish database

From

Kirill Müller

Date:

11 January 2012, 21:08:49

On 01/12/2012 02:02 AM, Tom Lane wrote:
> There were fixes for that in 8.4.9, so I'd be interested to know if you
> get a better estimate in a more up-to-date version.
>
I'll ask our administrator to update the server, but this might take a
while. I'll get back to you when I know more.


Regards

Kirill

Re: Enumeration of tables is very slow in largish database

From

Kirill Müller

Date:

12 January 2012, 17:07:47

On 01/12/2012 02:02 AM, Tom Lane wrote:
>
> There were fixes for that in 8.4.9, so I'd be interested to know if you
> get a better estimate in a more up-to-date version.

Something weird happened today. The problem vanished into thin air. Plus
we got our server upgrade to 8.4.9, but definitely after the problem
disappeared. Unfortunately, I didn't have the time to capture an
execution plan before the upgrade. The database has changed during the
day, some hundred more tables with geometry columns were added.

Anyway, I have attached the new execution plan, as produced by server
8.4.9 (squeeze/updates/main postgresql-8.4 amd64 8.4.9-0squeeze1+b1).

Cheers

Kirill

Attachment

qgis_list_geometry_columns_8.4.9.txt

Re: Enumeration of tables is very slow in largish database

From

Tom Lane

Date:

12 January 2012, 18:10:37

=?ISO-8859-1?Q?Kirill_M=FCller?= <kirill.mueller@ivt.baug.ethz.ch> writes:
> On 01/12/2012 02:02 AM, Tom Lane wrote:
>> There were fixes for that in 8.4.9, so I'd be interested to know if you
>> get a better estimate in a more up-to-date version.

> Something weird happened today. The problem vanished into thin air. Plus
> we got our server upgrade to 8.4.9, but definitely after the problem
> disappeared. Unfortunately, I didn't have the time to capture an
> execution plan before the upgrade. The database has changed during the
> day, some hundred more tables with geometry columns were added.

> Anyway, I have attached the new execution plan, as produced by server
> 8.4.9 (squeeze/updates/main postgresql-8.4 amd64 8.4.9-0squeeze1+b1).

This doesn't seem to be for the same query --- I only see one antijoin
in this plan, whereas the plan you posted at
http://archives.postgresql.org/pgsql-general/2012-01/msg00303.php
has two (and therefore doesn't match the query given there, either).

            regards, tom lane