Thread: Index on System Table
I've got a SaaS situation where I'm using 1000+ schemas in a single database (each schema contains the same tables, just different data per tenant). I used schemas so that the shared app servers could share a connection to the single database for all schemas. Things are working fine. However, when using psql, doing \d or trying to use tab complete takes FOREVER, because it's doing a sequence scan against pg_class (which has over a million rows), and relying on pg_table_is_visible to do search_path filtering. I've figured out that if I add "nspname = ANY(current_schemas(true))" to the query psql is using, and an index to pg_class on relnamespace, the query optimizer is able to do an index scan, and the queries return in milliseconds instead of minutes. However, I can't actually create an index on pg_class because it is a system table (I was able to test by copying it to a temporary table and adding the index there). My question is if there is a way to create the index on the system table somehow for just my database, and if not how would the developer community react to the suggestion of adding an index to a system table in the default postgres distro. Thanks, Cody Cutrer
Cody Cutrer <cody@instructure.com> writes: > I've got a SaaS situation where I'm using 1000+ schemas in a single > database (each schema contains the same tables, just different data > per tenant). ... > if I add "nspname = ANY(current_schemas(true))" to the query psql is > using, and an index to pg_class on relnamespace, the query optimizer > is able to do an index scan, and the queries return in milliseconds > instead of minutes. However, I can't actually create an index on > pg_class because it is a system table (I was able to test by copying > it to a temporary table and adding the index there). My question is if > there is a way to create the index on the system table somehow for > just my database, There's not really support for adding indexes to system catalogs on-the-fly. I think it would work (barring concurrency issues) for most catalogs, but pg_class has special limitations due to the "relmapping" infrastructure. It's not something I'd particularly care to try on a production database. > and if not how would the developer community react > to the suggestion of adding an index to a system table in the default > postgres distro. In many (probably most) databases, an index on pg_class.relnamespace wouldn't be selective enough to justify its update costs. I'd want to see a lot more than one request for this before considering it. If you're correct that the main costs come from the pg_table_is_visible tests, it should be possible to dodge that without an extra index. I'd suggest making a function similar to current_schemas() except it returns an OID array instead of names (this should be cheaper anyway) and just putting the relnamespace = ANY(current_schema_oids()) condition in front of the visibility test. Or maybe you could dispense with the visibility test altogether, depending on your usage patterns. (BTW, I think that "\d schemaname.*" doesn't involve any visibility tests, in case that helps.) regards, tom lane
Thanks for the tips. I spent some more time investigating. It's definitely pg_table_is_visible that's causing the problem. A \dt <schema>.* is fairly fast (like you said, it doesn't apply pg_table_is_visible at all). I tried adjusting the query in several ways. Adding either nspname=ANY(current_schemas(true)) or relnamespace=ANY(<precalculated list of the oids>) didn't help, because the query planner still applied pg_table_is_visible to every row in pg_class. Doing either of those and *removing* the pg_table_is_visible query gave the best results. That may be a good solution, since for tab complete you don't really care which schema and object is coming from, just that there is an object. I'm not sure about for /dt, though. Anyhow, I've found a workaround with acceptable (still not "snappy") performance for all the queries, though. By running "alter function pg_table_is_visible (rel oid) cost 50;" the query planner is now avoiding that function, and doing other filtering first. The queries are all a few seconds now, but not multiple minutes. Cody Cutrer On Tue, Mar 20, 2012 at 6:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Cody Cutrer <cody@instructure.com> writes: >> I've got a SaaS situation where I'm using 1000+ schemas in a single >> database (each schema contains the same tables, just different data >> per tenant). ... >> if I add "nspname = ANY(current_schemas(true))" to the query psql is >> using, and an index to pg_class on relnamespace, the query optimizer >> is able to do an index scan, and the queries return in milliseconds >> instead of minutes. However, I can't actually create an index on >> pg_class because it is a system table (I was able to test by copying >> it to a temporary table and adding the index there). My question is if >> there is a way to create the index on the system table somehow for >> just my database, > > There's not really support for adding indexes to system catalogs > on-the-fly. I think it would work (barring concurrency issues) > for most catalogs, but pg_class has special limitations due to > the "relmapping" infrastructure. It's not something I'd particularly > care to try on a production database. > >> and if not how would the developer community react >> to the suggestion of adding an index to a system table in the default >> postgres distro. > > In many (probably most) databases, an index on pg_class.relnamespace > wouldn't be selective enough to justify its update costs. I'd want > to see a lot more than one request for this before considering it. > > If you're correct that the main costs come from the pg_table_is_visible > tests, it should be possible to dodge that without an extra index. > I'd suggest making a function similar to current_schemas() except it > returns an OID array instead of names (this should be cheaper anyway) > and just putting the relnamespace = ANY(current_schema_oids()) condition > in front of the visibility test. Or maybe you could dispense with the > visibility test altogether, depending on your usage patterns. > > (BTW, I think that "\d schemaname.*" doesn't involve any visibility > tests, in case that helps.) > > regards, tom lane
Cody Cutrer <cody@instructure.com> writes: > On Tue, Mar 20, 2012 at 6:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> There's not really support for adding indexes to system catalogs >> on-the-fly. I think it would work (barring concurrency issues) >> for most catalogs, but pg_class has special limitations due to >> the "relmapping" infrastructure. It's not something I'd particularly >> care to try on a production database. BTW, I experimented with that a little bit and found that the relmapper is not really the stumbling block, at least not after applying this one-line patch: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=f70f095c9096d5e2689e8d79172b37b57a84e51b It seemed to work for me to do this: stop postmaster start a standalone backend with -c allow_system_table_mods=1 create index pg_class_relnamespace_index on pg_class(relnamespace); stop standalone backend, restart postmaster There are a lot of gotchas here, notably that the session in which you create the index won't know it's there (so in this case, a reindex on pg_class would likely be advisable afterwards). I still think you'd be nuts to try it on a production database, but ... regards, tom lane
That's awesome, thanks! Yeah, I doubt I'll do that to our production database, but maybe I'll try it on a copy sometime down the line. Adjusting the cost for pg_table_is_visible is working well enough so far. Cody Cutrer On Wed, Mar 21, 2012 at 12:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Cody Cutrer <cody@instructure.com> writes: >> On Tue, Mar 20, 2012 at 6:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> There's not really support for adding indexes to system catalogs >>> on-the-fly. I think it would work (barring concurrency issues) >>> for most catalogs, but pg_class has special limitations due to >>> the "relmapping" infrastructure. It's not something I'd particularly >>> care to try on a production database. > > BTW, I experimented with that a little bit and found that the relmapper > is not really the stumbling block, at least not after applying this > one-line patch: > http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=f70f095c9096d5e2689e8d79172b37b57a84e51b > > It seemed to work for me to do this: > > stop postmaster > start a standalone backend with -c allow_system_table_mods=1 > create index pg_class_relnamespace_index on pg_class(relnamespace); > stop standalone backend, restart postmaster > > There are a lot of gotchas here, notably that the session in which you > create the index won't know it's there (so in this case, a reindex on > pg_class would likely be advisable afterwards). I still think you'd be > nuts to try it on a production database, but ... > > regards, tom lane
2012/3/21 Tom Lane <tgl@sss.pgh.pa.us>
BTW, I experimented with that a little bit and found that the relmapper
is not really the stumbling block, at least not after applying this
one-line patch:
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=f70f095c9096d5e2689e8d79172b37b57a84e51b
Can we back-patch it on previous versions like 9.0 and 9.1? Or exist some reason to don't do that?
Regards,
-- Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
=?ISO-8859-1?Q?Fabr=EDzio_de_Royes_Mello?= <fabriziomello@gmail.com> writes: > 2012/3/21 Tom Lane <tgl@sss.pgh.pa.us> >> BTW, I experimented with that a little bit and found that the relmapper >> is not really the stumbling block, at least not after applying this >> one-line patch: >> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=f70f095c9096d5e2689e8d79172b37b57a84e51b > Can we back-patch it on previous versions like 9.0 and 9.1? Or exist some > reason to don't do that? Given that the whole thing is entirely unsupported anyway, I see no need to back-patch. regards, tom lane