Thread: \d on database with a lot of tables is slow
I have a client with a database that contains 4000 relations according to vacuum verbose, and \d in psql is painfully slow. In particular... -> Seq Scan on pg_class c (cost=0.00..2343.09 rows=6124 width=73) (actual time=0.325..22100.840 rows=16856 loops=1) Filter: (((relkind = 'r'::"char") OR (relkind = 'v'::"char") OR (relkind = 'S'::"char") OR (relkind = ''::"char"))AND pg_table_is_visible(oid)) That's off my laptop, but they're seeing similar issues on an 8-way Opteron as well... I've messed around with adding indexes to a copy of pg_class to no avail. Any ideas on how to improve the performance? Also, not sure if this matters, but they're occasionally getting errors like 'Did not find relation named "table that exists"' (where 'table that exists' is the name of some table that is in the catalog) from \d. Does anyone know what causes that? -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Sat, 2005-09-24 at 18:59 -0500, Jim C. Nasby wrote: > I have a client with a database that contains 4000 relations according > to vacuum verbose, and \d in psql is painfully slow. In particular... > > -> Seq Scan on pg_class c (cost=0.00..2343.09 rows=6124 width=73) (actual time=0.325..22100.840 rows=16856 loops=1) > Filter: (((relkind = 'r'::"char") OR (relkind = 'v'::"char") OR (relkind = 'S'::"char") OR (relkind = ''::"char"))AND pg_table_is_visible(oid)) > > That's off my laptop, but they're seeing similar issues on an 8-way > Opteron as well... > > I've messed around with adding indexes to a copy of pg_class to no > avail. Any ideas on how to improve the performance? It is probably the visibility checks. Is a \d fast if you include the full name (schema.table)? I brought this up a while ago and Tom has since rearranged some of the psql queries to move the visibility check to come after the other where clause segments. It would be nice if the cost of the function could be added somehow -- even if it was just a low, medium or high setting. This would allow the planner to shuffle the where clause executing ordering around in a reasonable manner. --
On Sat, Sep 24, 2005 at 08:20:47PM -0400, Rod Taylor wrote: > On Sat, 2005-09-24 at 18:59 -0500, Jim C. Nasby wrote: > > I have a client with a database that contains 4000 relations according > > to vacuum verbose, and \d in psql is painfully slow. In particular... > > > > -> Seq Scan on pg_class c (cost=0.00..2343.09 rows=6124 width=73) (actual time=0.325..22100.840 rows=16856 loops=1) > > Filter: (((relkind = 'r'::"char") OR (relkind = 'v'::"char") OR (relkind = 'S'::"char") OR (relkind = ''::"char"))AND pg_table_is_visible(oid)) > > > > That's off my laptop, but they're seeing similar issues on an 8-way > > Opteron as well... > > > > I've messed around with adding indexes to a copy of pg_class to no > > avail. Any ideas on how to improve the performance? > > It is probably the visibility checks. Is a \d fast if you include the > full name (schema.table)? > > I brought this up a while ago and Tom has since rearranged some of the > psql queries to move the visibility check to come after the other where > clause segments. > > > It would be nice if the cost of the function could be added somehow -- > even if it was just a low, medium or high setting. This would allow the > planner to shuffle the where clause executing ordering around in a > reasonable manner. \d tablename is fast, yes. Maybe instead of re-arranging the query it would make more sense to roll the visibility check into the query itself (probably using a new system view). -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On L, 2005-09-24 at 18:59 -0500, Jim C. Nasby wrote: > I have a client with a database that contains 4000 relations according > to vacuum verbose, and \d in psql is painfully slow. In particular... > > -> Seq Scan on pg_class c (cost=0.00..2343.09 rows=6124 width=73) (actual time=0.325..22100.840 rows=16856 loops=1) > Filter: (((relkind = 'r'::"char") OR (relkind = 'v'::"char") OR (relkind = 'S'::"char") OR (relkind = ''::"char"))AND pg_table_is_visible(oid)) > > That's off my laptop, but they're seeing similar issues on an 8-way > Opteron as well... I expext the issue on 8-way opteron to be more of a high load than slow scan. It seems that sometimes a database with lots of activity slows down considerably. I suspect some locking issues, but I'm not sure this is the case. Also, if a lot of temp tebles are used, then pg_class and pg_attribute (at least) get bloated quickly and need vacuuming . > I've messed around with adding indexes to a copy of pg_class to no > avail. Any ideas on how to improve the performance? > > Also, not sure if this matters, but they're occasionally getting errors > like 'Did not find relation named "table that exists"' (where 'table > that exists' is the name of some table that is in the catalog) from \d. > Does anyone know what causes that? mostly this happens on temp tables from other connections, which have managed to disappear by the time their detailed info is requested, and which would actually not show up in \d output due tu visibility checks. -- Hannu Krosing <hannu@skype.net>
On Sun, Sep 25, 2005 at 10:10:22AM +0300, Hannu Krosing wrote: > > Also, not sure if this matters, but they're occasionally getting errors > > like 'Did not find relation named "table that exists"' (where 'table > > that exists' is the name of some table that is in the catalog) from \d. > > Does anyone know what causes that? > > mostly this happens on temp tables from other connections, which have > managed to disappear by the time their detailed info is requested, and > which would actually not show up in \d output due tu visibility checks. They are seeing this doing \d on regular tables that won't be getting deleted. Maybe there's some locking that should be happening but isn't? Or maybe \d should be doing things in one statement instead of a bunch? -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Sun, Sep 25, 2005 at 10:10:22AM +0300, Hannu Krosing wrote: > On L, 2005-09-24 at 18:59 -0500, Jim C. Nasby wrote: > > I have a client with a database that contains 4000 relations according > > to vacuum verbose, and \d in psql is painfully slow. In particular... > > > > -> Seq Scan on pg_class c (cost=0.00..2343.09 rows=6124 width=73) (actual time=0.325..22100.840 rows=16856 loops=1) > > Filter: (((relkind = 'r'::"char") OR (relkind = 'v'::"char") OR (relkind = 'S'::"char") OR (relkind = ''::"char"))AND pg_table_is_visible(oid)) > > > > That's off my laptop, but they're seeing similar issues on an 8-way > > Opteron as well... > > I expext the issue on 8-way opteron to be more of a high load than slow > scan. It seems that sometimes a database with lots of activity slows > down considerably. I suspect some locking issues, but I'm not sure this > is the case. No temp tables in use. The issue didn't appear to be load-dependant, either. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
"Jim C. Nasby" <jnasby@pervasive.com> writes: >>> I have a client with a database that contains 4000 relations according >>> to vacuum verbose, and \d in psql is painfully slow. In particular... It's presumably mostly in the pg_table_is_visible() calls. Not sure if we can do much to speed those up, but: how many schemas in your search path? What's the distribution of pg_class entries among the schemas? regards, tom lane
On Mon, Sep 26, 2005 at 03:00:44PM -0400, Tom Lane wrote: > "Jim C. Nasby" <jnasby@pervasive.com> writes: > >>> I have a client with a database that contains 4000 relations according > >>> to vacuum verbose, and \d in psql is painfully slow. In particular... > > It's presumably mostly in the pg_table_is_visible() calls. Not sure if > we can do much to speed those up, but: how many schemas in your search > path? What's the distribution of pg_class entries among the schemas? db=# show search_path;decibel, pg_sysviews, information_schema, rrs, public db=# select schema_name, count(*) from pg_user_tables group by schema_name;public | 764ledger | 6state | 2_track_replica | 10repair | 3summarized | 586orders | 512snapshot | 1012acl | 10 db=# Also, do you have any idea on the 'Did not find relation named "table-thats-there"' error? table-thats-there isn't a temp table, and I don't believe they're even using temp tables, so I don't think that's the issue. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
"Jim C. Nasby" <jnasby@pervasive.com> writes: > On Mon, Sep 26, 2005 at 03:00:44PM -0400, Tom Lane wrote: >> It's presumably mostly in the pg_table_is_visible() calls. Not sure if >> we can do much to speed those up, but: how many schemas in your search >> path? What's the distribution of pg_class entries among the schemas? > db=# show search_path; > decibel, pg_sysviews, information_schema, rrs, public > db=# select schema_name, count(*) from pg_user_tables group by schema_name; > public | 764 > ledger | 6 > state | 2 > _track_replica | 10 > repair | 3 > summarized | 586 > orders | 512 > snapshot | 1012 > acl | 10 Hmm, so lots and lots of tables that aren't visible at all ... that's definitely the slowest case for pg_table_is_visible. I'll think about whether we can improve it. > Also, do you have any idea on the 'Did not find relation named > "table-thats-there"' error? table-thats-there isn't a temp table, and I don't > believe they're even using temp tables, so I don't think that's the issue. Uh, what's the exact error message again? (If it's a backend message, the verbose form would be useful.) regards, tom lane
On Tue, Sep 27, 2005 at 06:53:13PM -0400, Tom Lane wrote: > > Also, do you have any idea on the 'Did not find relation named > > "table-thats-there"' error? table-thats-there isn't a temp table, and I don't > > believe they're even using temp tables, so I don't think that's the issue. > > Uh, what's the exact error message again? (If it's a backend message, > the verbose form would be useful.) Sorry, remembered it wrong. It's 'Did not find any relation named', which appears to be in bin/psql/describe.c. It does occur when trying to do a \d on a specific table. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
"Jim C. Nasby" <jnasby@pervasive.com> writes: > Sorry, remembered it wrong. It's 'Did not find any relation named', > which appears to be in bin/psql/describe.c. It does occur when trying to > do a \d on a specific table. Hmm, no immediate ideas. You haven't provided a lot of context about this --- when it happens, is it repeatable? Are they giving an exact table name or a pattern to \d? Is a schema name included in what they give to \d? What PG version are they running exactly? regards, tom lane
On Wed, Sep 28, 2005 at 11:35:31AM -0400, Tom Lane wrote: > "Jim C. Nasby" <jnasby@pervasive.com> writes: > > Sorry, remembered it wrong. It's 'Did not find any relation named', > > which appears to be in bin/psql/describe.c. It does occur when trying to > > do a \d on a specific table. > > Hmm, no immediate ideas. You haven't provided a lot of context about > this --- when it happens, is it repeatable? Are they giving an exact > table name or a pattern to \d? Is a schema name included in what they > give to \d? What PG version are they running exactly? Sorry, had the error message wrong: ERROR: cache lookup failed for relation 1906465919 It is on an exact table name. When we retry the describe on a failure, sometimes it works and sometimes it fails again. When it fails again the relation number is different. Also, \d schema.table always returns quickly and never errors. \d table is slow and produces the error fairly often. They're using 8.0.x; I'm pretty certain it's 8.0.3. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Fri, 30 Sep 2005, Jim C. Nasby wrote: > ERROR: cache lookup failed for relation 1906465919 > > It is on an exact table name. When we retry the describe on a failure, > sometimes it works and sometimes it fails again. When it fails again > the relation number is different. Also, \d schema.table always returns > quickly and never errors. \d table is slow and produces the error fairly > often. > > They're using 8.0.x; I'm pretty certain it's 8.0.3. We have the exact same problem on a 7.4.5 database. Some basic info on the database: psql's \d returns 424 rows, and the on-disk size of the database is about 11 GB. A standalone \dt throws the same occasional error. I've suspected that it may be caused by ongoing periodic (at least once an hour) rebuilding of certain generated tables with TRUNCATE and then INSERT INTO the_table SELECT ... inside a transaction. But I don't have any proof of that; it's just the most obvious different thing going on compared to other databases we have. It does seem like I've encountered the error less often since increasing the max_fsm_pages setting and thus had more effective VACUUM and less pg_class bloat, but OTOH I trained myself not to do \d there very often either, since it was so slow and failed so often, so that may be coincidence. :) Jon -- Jon Jensen End Point Corporation http://www.endpoint.com/ Software development with Interchange, Perl, PostgreSQL, Apache, Linux, ...
I wrote: > It's presumably mostly in the pg_table_is_visible() calls. I did some profiling on a test case with 10000 tables, and noticed that a big part of the problem is that the catalog caches become entirely useless: almost every catcache lookup ends up going to the underlying tables. This is because MAXCCTUPLES in catcache.c is fixed at 5000, and that's not an adequate working set for this many tables. If you are willing to throw memory at the problem, you could try increasing MAXCCTUPLES (to say 50K or 100K) and see if that helps. regards, tom lane
On Sat, Oct 01, 2005 at 02:00:12PM -0400, Tom Lane wrote: > I wrote: > > It's presumably mostly in the pg_table_is_visible() calls. > > I did some profiling on a test case with 10000 tables, and noticed that > a big part of the problem is that the catalog caches become entirely > useless: almost every catcache lookup ends up going to the underlying > tables. This is because MAXCCTUPLES in catcache.c is fixed at 5000, > and that's not an adequate working set for this many tables. If you > are willing to throw memory at the problem, you could try increasing > MAXCCTUPLES (to say 50K or 100K) and see if that helps. Out of curiosity... does catcache cache all pg_* tables? Also, at what point would it be good to up NCCBUCKETS? -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461