Thread: Enumeration of tables is very slow in largish database
Hi all, we have a Postgres/PostGIS database with 100+ schemas and 200+ tables in each schema, generated automatically. When adding a new PostGIS layer in QGis, the application obviously enumerates all tables, and this takes minutes. Even browsing the database in pgAdmin3 is horribly slow -- it takes several seconds to e.g. open a schema (click on a schema's "+" in the tree view). The problems occurred only after adding that many schemas to the database. Before, with only 10+ schemas, the performance was acceptable. Is this a known limitation of Postgres, or perhaps a misconfiguration of our installation? What would you suggest to improve performance here? We currently don't have administration rights for the database or login rights for the server machine (Linux), but I think we'll need to take care about that. Best regards Kirill
On Wed, 2012-01-11 at 11:07 +0100, Kirill Müller wrote: > Hi all, > > we have a Postgres/PostGIS database with 100+ schemas and 200+ tables in > each schema, generated automatically. When adding a new PostGIS layer in > QGis, the application obviously enumerates all tables, and this takes > minutes. Even browsing the database in pgAdmin3 is horribly slow -- it > takes several seconds to e.g. open a schema (click on a schema's "+" in > the tree view). > > The problems occurred only after adding that many schemas to the > database. Before, with only 10+ schemas, the performance was acceptable. > > Is this a known limitation of Postgres, or perhaps a misconfiguration of > our installation? I don't think it has anything to do with PostgreSQL. It has to do with the client. I don't know for QGis but, in the case of pgAdmin, I'm pretty sure the issue is pgAdmin. When you click on the + sign of a schema, pgAdmin has to get all the informations on the schema: tables, columns, constraints, triggers, and all the other objets found in the schema. It could take some time. Moreover, pgAdmin has to put all this in the treeview, and my guess would be that it's probably the most time consuming operation here. We could probably get better performance, but I didn't have the time to look at that yet. -- Guillaume http://blog.guillaume.lelarge.info http://www.dalibo.com PostgreSQL Sessions #3: http://www.postgresql-sessions.org
On Wednesday, January 11, 2012 11:07:23 AM Kirill Müller wrote: > Hi all, > > we have a Postgres/PostGIS database with 100+ schemas and 200+ tables in > each schema, generated automatically. When adding a new PostGIS layer in > QGis, the application obviously enumerates all tables, and this takes > minutes. Even browsing the database in pgAdmin3 is horribly slow -- it > takes several seconds to e.g. open a schema (click on a schema's "+" in > the tree view). Are you actually sure its the database and not just pgadmin thats getting really slow? If you connect via psql and use \dt (see \? for a list of commands) and consorts, is it that slow as well? Andres
On Wed, 2012-01-11 at 14:44 +0100, Andres Freund wrote: > On Wednesday, January 11, 2012 11:07:23 AM Kirill Müller wrote: > > Hi all, > > > > we have a Postgres/PostGIS database with 100+ schemas and 200+ tables in > > each schema, generated automatically. When adding a new PostGIS layer in > > QGis, the application obviously enumerates all tables, and this takes > > minutes. Even browsing the database in pgAdmin3 is horribly slow -- it > > takes several seconds to e.g. open a schema (click on a schema's "+" in > > the tree view). > Are you actually sure its the database and not just pgadmin thats getting > really slow? > > If you connect via psql and use \dt (see \? for a list of commands) and > consorts, is it that slow as well? > \dt won't be as slow as pgAdmin. \dt only gets the table name, owner, and stuff like that. Kinda quick. pgAdmin will get also all the other informations, like columns, triggers, constraints, functions, types, etc. -- Guillaume http://blog.guillaume.lelarge.info http://www.dalibo.com PostgreSQL Sessions #3: http://www.postgresql-sessions.org
On Wednesday, January 11, 2012 02:53:06 PM Guillaume Lelarge wrote: > On Wed, 2012-01-11 at 14:44 +0100, Andres Freund wrote: > > On Wednesday, January 11, 2012 11:07:23 AM Kirill Müller wrote: > > > Hi all, > > > > > > we have a Postgres/PostGIS database with 100+ schemas and 200+ tables > > > in each schema, generated automatically. When adding a new PostGIS > > > layer in QGis, the application obviously enumerates all tables, and > > > this takes minutes. Even browsing the database in pgAdmin3 is horribly > > > slow -- it takes several seconds to e.g. open a schema (click on a > > > schema's "+" in the tree view). > > > > Are you actually sure its the database and not just pgadmin thats getting > > really slow? > > > > If you connect via psql and use \dt (see \? for a list of commands) and > > consorts, is it that slow as well? > > \dt won't be as slow as pgAdmin. \dt only gets the table name, owner, > and stuff like that. Kinda quick. pgAdmin will get also all the other > informations, like columns, triggers, constraints, functions, types, > etc. Yes, sure. My guess is that the gui/pgadmin is the bottleneck and not postgres itself. Its hard to really do all what pgadmin does at once inside psql though. Andres
On Wed, 2012-01-11 at 15:10 +0100, Andres Freund wrote: > On Wednesday, January 11, 2012 02:53:06 PM Guillaume Lelarge wrote: > > On Wed, 2012-01-11 at 14:44 +0100, Andres Freund wrote: > > > On Wednesday, January 11, 2012 11:07:23 AM Kirill Müller wrote: > > > > Hi all, > > > > > > > > we have a Postgres/PostGIS database with 100+ schemas and 200+ tables > > > > in each schema, generated automatically. When adding a new PostGIS > > > > layer in QGis, the application obviously enumerates all tables, and > > > > this takes minutes. Even browsing the database in pgAdmin3 is horribly > > > > slow -- it takes several seconds to e.g. open a schema (click on a > > > > schema's "+" in the tree view). > > > > > > Are you actually sure its the database and not just pgadmin thats getting > > > really slow? > > > > > > If you connect via psql and use \dt (see \? for a list of commands) and > > > consorts, is it that slow as well? > > > > \dt won't be as slow as pgAdmin. \dt only gets the table name, owner, > > and stuff like that. Kinda quick. pgAdmin will get also all the other > > informations, like columns, triggers, constraints, functions, types, > > etc. > Yes, sure. My guess is that the gui/pgadmin is the bottleneck and not postgres > itself. Its hard to really do all what pgadmin does at once inside psql > though. > Yeah, sure enough. -- Guillaume http://blog.guillaume.lelarge.info http://www.dalibo.com PostgreSQL Sessions #3: http://www.postgresql-sessions.org
On Wednesday, January 11, 2012 2:07:23 am Kirill Müller wrote: > Hi all, > > we have a Postgres/PostGIS database with 100+ schemas and 200+ tables in > each schema, generated automatically. When adding a new PostGIS layer in > QGis, the application obviously enumerates all tables, and this takes > minutes. Even browsing the database in pgAdmin3 is horribly slow -- it > takes several seconds to e.g. open a schema (click on a schema's "+" in > the tree view). Would seem that they both have the same issue, namely pulling over the table names and the meta data is resource intensive. Not a QGis user but I did find this in the manual: http://download.osgeo.org/qgis/doc/manual/qgis-1.7.0_user_guide_en.pdf See in-line comment: 4.2.2. Loading a PostGIS Layer Once you have one or more connections defined, you can load layers from the PostgreSQL database. Of course this requires having data in PostgreSQL. See Section 4.2.4 for a discussion on importing data into the database. To load a layer from PostGIS, perform the following steps: – If the Add PostGIS Table(s) dialog is not already open, click on the Add PostGIS Layer toolbar button. – Choose the connection from the drop-down list and click Connect . – Select or unselect Also list tables with no geometry – Optionally use some Search Options to define which features to load from the layer or use the Build query icon to start the Query builder dialog. ^^^^^^^^^^^^^^^ Wonder if it would be possible to restrict the dataset(tables) by using the above? – Find the layer(s) you wish to add in the list of available layers. – Select it by clicking on it. You can select multiple layers by holding down the shift key while clicking. See Section 4.6 for information on using the PostgreSQL Query Builder to further define the layer. – Click on the Add button to add the layer to the map. > > The problems occurred only after adding that many schemas to the > database. Before, with only 10+ schemas, the performance was acceptable. > > Is this a known limitation of Postgres, or perhaps a misconfiguration of > our installation? What would you suggest to improve performance here? We > currently don't have administration rights for the database or login > rights for the server machine (Linux), but I think we'll need to take > care about that. > > > Best regards > > Kirill -- Adrian Klaver adrian.klaver@gmail.com
On Wed, Jan 11, 2012 at 3:07 AM, Kirill Müller <kirill.mueller@ivt.baug.ethz.ch> wrote: > Hi all, > > we have a Postgres/PostGIS database with 100+ schemas and 200+ tables in > each schema, generated automatically. When adding a new PostGIS layer in > QGis, the application obviously enumerates all tables, and this takes > minutes. Even browsing the database in pgAdmin3 is horribly slow -- it takes > several seconds to e.g. open a schema (click on a schema's "+" in the tree > view). > > The problems occurred only after adding that many schemas to the database. > Before, with only 10+ schemas, the performance was acceptable. > > Is this a known limitation of Postgres, or perhaps a misconfiguration of our > installation? What would you suggest to improve performance here? We > currently don't have administration rights for the database or login rights > for the server machine (Linux), but I think we'll need to take care about > that. This is a problem I've run into before, but I can't find the previous post on it. When you run a \d command, if you run top on your server do you see a single CPU spinning hard on that one command? If so then it's a pg server side problem, which is what I had on one server with ~40k objects in it. Off the top of my head I remember something like this helping: alter function pg_table_is_visible cost 10; But I'm not sure that's it. Maybe Tom Lane can pipe up on this.
Adrian, On 01/11/2012 04:32 PM, Adrian Klaver wrote: > Would seem that they both have the same issue, namely pulling over the table > names and the meta data is resource intensive. The problem is that the slow part is the "connect", just at the ***** in the in-line comment :-) > 4.2.2. Loading a PostGIS Layer > > Once you have one or more connections defined, you can load layers from the > PostgreSQL database. > Of course this requires having data in PostgreSQL. See Section 4.2.4 for a > discussion on importing data into > the database. > To load a layer from PostGIS, perform the following steps: > – If the Add PostGIS Table(s) dialog is not already open, click on the Add > PostGIS Layer toolbar button. > – Choose the connection from the drop-down list and click Connect . ******* > – Select or unselect Also list tables with no geometry > – Optionally use some Search Options to define which features to load from the > layer or use the > Build query icon to start the Query builder dialog. > > ^^^^^^^^^^^^^^^ > Wonder if it would be possible to restrict the dataset(tables) by using the > above? Thanks for your feedback anyway, this helps clarifying the issue. Cheers Kirill -- _________________________________________________ ETH Zürich Institute for Transport Planning and Systems HIL F 32.2 Wolfgang-Pauli-Str. 15 8093 Zürich Phone: +41 44 633 33 17 Fax: +41 44 633 10 57 Secretariat: +41 44 633 31 05 E-Mail: kirill.mueller@ivt.baug.ethz.ch
On 01/11/2012 02:44 PM, Andres Freund wrote: > On Wednesday, January 11, 2012 11:07:23 AM Kirill Müller wrote: >> Hi all, >> >> we have a Postgres/PostGIS database with 100+ schemas and 200+ tables in >> each schema, generated automatically. When adding a new PostGIS layer in >> QGis, the application obviously enumerates all tables, and this takes >> minutes. Even browsing the database in pgAdmin3 is horribly slow -- it >> takes several seconds to e.g. open a schema (click on a schema's "+" in >> the tree view). > Are you actually sure its the database and not just pgadmin thats getting > really slow? > > If you connect via psql and use \dt (see \? for a list of commands) and > consorts, is it that slow as well? \dt is quick, I haven't tried the other commands, though. I have built qgis from source and will try to find out this way which query stalls. I wonder if Postgres has a profiling tool like MS SQL Server that would allow tracing the queries and their runtime while they are executed. Or perhaps there are logs? Could you give me some pointers, please? Kirill -- _________________________________________________ ETH Zürich Institute for Transport Planning and Systems HIL F 32.2 Wolfgang-Pauli-Str. 15 8093 Zürich Phone: +41 44 633 33 17 Fax: +41 44 633 10 57 Secretariat: +41 44 633 31 05 E-Mail: kirill.mueller@ivt.baug.ethz.ch
2012/1/11 Kirill Müller <kirill.mueller@ivt.baug.ethz.ch>: > On 01/11/2012 02:44 PM, Andres Freund wrote: >> >> On Wednesday, January 11, 2012 11:07:23 AM Kirill Müller wrote: >>> >>> Hi all, >>> >>> we have a Postgres/PostGIS database with 100+ schemas and 200+ tables in >>> each schema, generated automatically. When adding a new PostGIS layer in >>> QGis, the application obviously enumerates all tables, and this takes >>> minutes. Even browsing the database in pgAdmin3 is horribly slow -- it >>> takes several seconds to e.g. open a schema (click on a schema's "+" in >>> the tree view). >> >> Are you actually sure its the database and not just pgadmin thats getting >> really slow? >> >> If you connect via psql and use \dt (see \? for a list of commands) and >> consorts, is it that slow as well? > > \dt is quick, I haven't tried the other commands, though. I have built qgis > from source and will try to find out this way which query stalls. I wonder > if Postgres has a profiling tool like MS SQL Server that would allow tracing > the queries and their runtime while they are executed. Or perhaps there are > logs? Could you give me some pointers, please? log_min_duration_statement = 0 in postgresql.conf and after reload, pg logs all query to log Regards Pavel > > > > Kirill > > -- > _________________________________________________ > ETH Zürich > Institute for Transport Planning and Systems > HIL F 32.2 > Wolfgang-Pauli-Str. 15 > 8093 Zürich > > Phone: +41 44 633 33 17 > Fax: +41 44 633 10 57 > Secretariat: +41 44 633 31 05 > E-Mail: kirill.mueller@ivt.baug.ethz.ch > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general
On 01/11/2012 11:45 AM, Kirill Müller wrote: > Adrian, > > > On 01/11/2012 04:32 PM, Adrian Klaver wrote: >> Would seem that they both have the same issue, namely pulling over the >> table >> names and the meta data is resource intensive. > The problem is that the slow part is the "connect", just at the ***** in > the in-line comment :-) More digging: Looks like QGIS use the geometry_columns table first to determine the geometry aware tables, failing that it walks the tables looking for geometry columns. Is the geometry_columns table up to date with the actual tables? > Cheers > > Kirill > -- Adrian Klaver adrian.klaver@gmail.com
On Wed, 2012-01-11 at 20:50 +0100, Kirill Müller wrote: > that would > allow tracing the queries and their runtime while they are executed \ http://www.postgresql.org/docs/8.4/static/auto-explain.html
On 01/11/2012 09:36 PM, Adrian Klaver wrote: > On 01/11/2012 11:45 AM, Kirill Müller wrote: >> On 01/11/2012 04:32 PM, Adrian Klaver wrote: >>> Would seem that they both have the same issue, namely pulling over the >>> table >>> names and the meta data is resource intensive. >> The problem is that the slow part is the "connect"... > More digging: > Looks like QGIS use the geometry_columns table first to determine the > geometry aware tables, failing that it walks the tables looking for > geometry columns. Is the geometry_columns table up to date with the > actual tables? Thank you for the tip. In the current version of QGIS, I can select "only look in the geometry_columns table" as an option. For 40000+ entries in this table, it took just some seconds to enumerate all tables. This workaround solves the most urgent problems (I'm currently rebuilding the geometry_columns table). Kirill -- _________________________________________________ ETH Zürich Institute for Transport Planning and Systems HIL F 32.2 Wolfgang-Pauli-Str. 15 8093 Zürich Phone: +41 44 633 33 17 Fax: +41 44 633 10 57 Secretariat: +41 44 633 31 05 E-Mail: kirill.mueller@ivt.baug.ethz.ch
On 01/11/2012 07:00 PM, Scott Marlowe wrote: > This is a problem I've run into before, but I can't find the previous > post on it. When you run a \d command, if you run top on your server > do you see a single CPU spinning hard on that one command? If so then > it's a pg server side problem, which is what I had on one server with > ~40k objects in it. > > Off the top of my head I remember something like this helping: > > alter function pg_table_is_visible cost 10; Thanks for the feedback. I found the relevant parts in the qgis source code and have been able to trace the problem. It's just a sub-optimal query issued by qgis: SELECT pg_class.relname,pg_namespace.nspname,pg_attribute.attname,pg_class.relkind FROM pg_attribute,pg_class,pg_namespace WHERE pg_namespace.oid=pg_class.relnamespace AND pg_attribute.attrelid = pg_class.oid AND ( EXISTS (SELECT * FROM pg_type WHERE pg_type.oid=pg_attribute.atttypid AND pg_type.typname IN ('geometry','geography','topogeometry')) OR pg_attribute.atttypid IN (SELECT oid FROM pg_type a WHERE EXISTS (SELECT * FROM pg_type b WHERE a.typbasetype=b.oid AND b.typname IN ('geometry','geography','topogeometry')))) AND has_schema_privilege( pg_namespace.nspname, 'usage' ) AND has_table_privilege( '"' || pg_namespace.nspname || '"."' || pg_class.relname || '"', 'select' ) AND NOT EXISTS (SELECT * FROM geometry_columns WHERE pg_namespace.nspname=f_table_schema AND pg_class.relname=f_table_name) AND pg_class.relkind IN ('v','r'); When leaving out the last two "AND NOT EXISTS..." parts, the query finishes in no time. I have attached the output of EXPLAIN ANALYZE -- if I understand the execution tree correctly, the time is burnt in repeated sequential scans of the geometry_columns table (line 38).? Rewriting the "AND NOT EXISTS" part using WITH solves the performance issues here, but works only from Postgres 8.4. Any idea how to speed up this query for older versions? (Creating a temporary table or an index should be avoided.) Kirill
Attachment
-----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Kirill Müller Sent: Wednesday, January 11, 2012 6:28 PM To: pgsql-general@postgresql.org Cc: Scott Marlowe Subject: Re: [GENERAL] Enumeration of tables is very slow in largish database On 01/11/2012 07:00 PM, Scott Marlowe wrote: > This is a problem I've run into before, but I can't find the previous > post on it. When you run a \d command, if you run top on your server > do you see a single CPU spinning hard on that one command? If so then > it's a pg server side problem, which is what I had on one server with > ~40k objects in it. > > Off the top of my head I remember something like this helping: > > alter function pg_table_is_visible cost 10; Thanks for the feedback. I found the relevant parts in the qgis source code and have been able to trace the problem. It's just a sub-optimal query issued by qgis: SELECT pg_class.relname,pg_namespace.nspname,pg_attribute.attname,pg_class.relkind FROM pg_attribute,pg_class,pg_namespace WHERE pg_namespace.oid=pg_class.relnamespace AND pg_attribute.attrelid = pg_class.oid AND ( EXISTS (SELECT * FROM pg_type WHERE pg_type.oid=pg_attribute.atttypid AND pg_type.typname IN ('geometry','geography','topogeometry')) OR pg_attribute.atttypid IN (SELECT oid FROM pg_type a WHERE EXISTS (SELECT * FROM pg_type b WHERE a.typbasetype=b.oid AND b.typname IN ('geometry','geography','topogeometry')))) AND has_schema_privilege( pg_namespace.nspname, 'usage' ) AND has_table_privilege( '"' || pg_namespace.nspname || '"."' || pg_class.relname || '"', 'select' ) AND NOT EXISTS (SELECT * FROM geometry_columns WHERE pg_namespace.nspname=f_table_schema AND pg_class.relname=f_table_name) AND pg_class.relkind IN ('v','r'); When leaving out the last two "AND NOT EXISTS..." parts, the query finishes in no time. I have attached the output of EXPLAIN ANALYZE -- if I understand the execution tree correctly, the time is burnt in repeated sequential scans of the geometry_columns table (line 38).? Rewriting the "AND NOT EXISTS" part using WITH solves the performance issues here, but works only from Postgres 8.4. Any idea how to speed up this query for older versions? (Creating a temporary table or an index should be avoided.) Kirill ---------------------------------------------------------------------------- --------------------- I only see one (1) "AND NOT EXISTS" in the provided query. Syntax may be a little off but: ... AND (f_table_schema, f_table_name) NOT IN ( SELECT (nspname, relname) FROM geometry_columns ) Should work since it is no longer a correlated sub-query; whether the size of geometry_columns makes this better or worse performing is impossible to tell without testing but it isn't that much different than using a WITH/CTE. David J.
I am impressed. Execution time dropped to less than one second. Thanks a lot! On 01/12/2012 12:43 AM, David Johnston wrote: > I only see one (1) "AND NOT EXISTS" in the provided query. Sorry, there used to be two "AND NOT EXISTS", but I edited the query without updating the text. > Syntax may be a little off but: > > ... AND (f_table_schema, f_table_name) NOT IN ( SELECT (nspname, relname) > FROM geometry_columns ) Just for the record: ...AND (nspname, relname) NOT IN (SELECT f_table_schema, f_table_name FROM geometry_columns) > Should work since it is no longer a correlated sub-query; whether the size > of geometry_columns makes this better or worse performing is impossible to > tell without testing but it isn't that much different than using a WITH/CTE. The execution plan looks much nicer (attached). I'd guess that the altered query might lose a bit if geometry_columns has only very few entries.? Apparently it gains a lot if the table is populated. Thanks again! Kirill
Attachment
-----Original Message----- From: Kirill Müller [mailto:kirill.mueller@ivt.baug.ethz.ch] Sent: Wednesday, January 11, 2012 7:11 PM To: David Johnston Cc: pgsql-general@postgresql.org; 'Scott Marlowe' Subject: Re: [GENERAL] Enumeration of tables is very slow in largish database I am impressed. Execution time dropped to less than one second. Thanks a lot! On 01/12/2012 12:43 AM, David Johnston wrote: > I only see one (1) "AND NOT EXISTS" in the provided query. Sorry, there used to be two "AND NOT EXISTS", but I edited the query without updating the text. > Syntax may be a little off but: > > ... AND (f_table_schema, f_table_name) NOT IN ( SELECT (nspname, > relname) FROM geometry_columns ) Just for the record: ...AND (nspname, relname) NOT IN (SELECT f_table_schema, f_table_name FROM geometry_columns) > Should work since it is no longer a correlated sub-query; whether the > size of geometry_columns makes this better or worse performing is > impossible to tell without testing but it isn't that much different than using a WITH/CTE. The execution plan looks much nicer (attached). I'd guess that the altered query might lose a bit if geometry_columns has only very few entries.? Apparently it gains a lot if the table is populated. Thanks again! Kirill -------------------------------------------------------------------- Performance for IN should increase as the results from geometry_columns decrease since the IN target becomes smaller - thus fewer entries to compare against. EXISTS works better than IN if the IN target is large AS LONG AS the query that exists is using can use an Index. Since your query was performing a sequential scan pretty much any size IN target will be better performing. For small IN targets and index-using EXISTS it likely matters very little which one you use. David J.
=?ISO-8859-1?Q?Kirill_M=FCller?= <kirill.mueller@ivt.baug.ethz.ch> writes: > Thanks for the feedback. I found the relevant parts in the qgis source > code and have been able to trace the problem. It's just a sub-optimal > query issued by qgis: > SELECT > pg_class.relname,pg_namespace.nspname,pg_attribute.attname,pg_class.relkind > FROM pg_attribute,pg_class,pg_namespace > WHERE pg_namespace.oid=pg_class.relnamespace AND pg_attribute.attrelid = > pg_class.oid > AND ( EXISTS (SELECT * FROM pg_type WHERE > pg_type.oid=pg_attribute.atttypid AND pg_type.typname IN > ('geometry','geography','topogeometry')) OR pg_attribute.atttypid IN > (SELECT oid FROM pg_type a WHERE EXISTS (SELECT * FROM pg_type b WHERE > a.typbasetype=b.oid AND b.typname IN > ('geometry','geography','topogeometry')))) > AND has_schema_privilege( pg_namespace.nspname, 'usage' ) > AND has_table_privilege( '"' || pg_namespace.nspname || '"."' || > pg_class.relname || '"', 'select' ) > AND NOT EXISTS (SELECT * FROM geometry_columns WHERE > pg_namespace.nspname=f_table_schema AND pg_class.relname=f_table_name) > AND pg_class.relkind IN ('v','r'); > When leaving out the last two "AND NOT EXISTS..." parts, the query > finishes in no time. I have attached the output of EXPLAIN ANALYZE -- if > I understand the execution tree correctly, the time is burnt in repeated > sequential scans of the geometry_columns table (line 38). Yeah. It wouldn't choose that plan if it weren't for the horrid rowcount misestimate here: > -> Hash Anti Join (cost=30586.95..37075.41 rows=1 width=133) (actual time=945.911..1760.307 rows=17836 loops=1) This is probably an indication of eqjoinsel_semi doing the wrong thing; we've whacked that estimator around a few times now, so it's hard to know whether this represents an already-fixed bug or not. What PG version are you using exactly? > Rewriting the > "AND NOT EXISTS" part using WITH solves the performance issues here, but > works only from Postgres 8.4. Any idea how to speed up this query for > older versions? (Creating a temporary table or an index should be avoided.) Maybe use EXCEPT instead of a WHERE condition to get rid of the already-present entries? regards, tom lane
On 01/12/2012 01:34 AM, Tom Lane wrote: > =?ISO-8859-1?Q?Kirill_M=FCller?=<kirill.mueller@ivt.baug.ethz.ch> writes: >> When leaving out the last two "AND NOT EXISTS..." parts, the query >> finishes in no time. I have attached the output of EXPLAIN ANALYZE -- if >> I understand the execution tree correctly, the time is burnt in repeated >> sequential scans of the geometry_columns table (line 38). > Yeah. It wouldn't choose that plan if it weren't for the horrid rowcount > misestimate here: > >> -> Hash Anti Join (cost=30586.95..37075.41 rows=1 width=133) (actual time=945.911..1760.307 rows=17836 loops=1) I have VACUUM ANALYZE-d the table just before testing. Seems that this didn't help here. > > This is probably an indication of eqjoinsel_semi doing the wrong thing; > we've whacked that estimator around a few times now, so it's hard to > know whether this represents an already-fixed bug or not. What PG > version are you using exactly? muelleki@xxx:~$ psql psql (8.4.8) > > Maybe use EXCEPT instead of a WHERE condition to get rid of the > already-present entries? Thank you for the hint, I was not aware of the rather elegant EXCEPT. Using WHERE (..., ...) NOT IN (SELECT ..., ... FROM ...) as suggested by David Johnston shows excellent performance (and better fits the code that is generating the SQL), but I'll keep the EXCEPT option in mind. Regards Kirill
=?ISO-8859-1?Q?Kirill_M=FCller?= <kirill.mueller@ivt.baug.ethz.ch> writes: > On 01/12/2012 01:34 AM, Tom Lane wrote: >> This is probably an indication of eqjoinsel_semi doing the wrong thing; >> we've whacked that estimator around a few times now, so it's hard to >> know whether this represents an already-fixed bug or not. What PG >> version are you using exactly? > muelleki@xxx:~$ psql > psql (8.4.8) There were fixes for that in 8.4.9, so I'd be interested to know if you get a better estimate in a more up-to-date version. regards, tom lane
On 01/12/2012 02:02 AM, Tom Lane wrote: > There were fixes for that in 8.4.9, so I'd be interested to know if you > get a better estimate in a more up-to-date version. > I'll ask our administrator to update the server, but this might take a while. I'll get back to you when I know more. Regards Kirill
On 01/12/2012 02:02 AM, Tom Lane wrote: > > There were fixes for that in 8.4.9, so I'd be interested to know if you > get a better estimate in a more up-to-date version. Something weird happened today. The problem vanished into thin air. Plus we got our server upgrade to 8.4.9, but definitely after the problem disappeared. Unfortunately, I didn't have the time to capture an execution plan before the upgrade. The database has changed during the day, some hundred more tables with geometry columns were added. Anyway, I have attached the new execution plan, as produced by server 8.4.9 (squeeze/updates/main postgresql-8.4 amd64 8.4.9-0squeeze1+b1). Cheers Kirill
Attachment
=?ISO-8859-1?Q?Kirill_M=FCller?= <kirill.mueller@ivt.baug.ethz.ch> writes: > On 01/12/2012 02:02 AM, Tom Lane wrote: >> There were fixes for that in 8.4.9, so I'd be interested to know if you >> get a better estimate in a more up-to-date version. > Something weird happened today. The problem vanished into thin air. Plus > we got our server upgrade to 8.4.9, but definitely after the problem > disappeared. Unfortunately, I didn't have the time to capture an > execution plan before the upgrade. The database has changed during the > day, some hundred more tables with geometry columns were added. > Anyway, I have attached the new execution plan, as produced by server > 8.4.9 (squeeze/updates/main postgresql-8.4 amd64 8.4.9-0squeeze1+b1). This doesn't seem to be for the same query --- I only see one antijoin in this plan, whereas the plan you posted at http://archives.postgresql.org/pgsql-general/2012-01/msg00303.php has two (and therefore doesn't match the query given there, either). regards, tom lane