Thread: Index on System Table

Index on System Table

From
Cody Cutrer
Date:
I've got a SaaS situation where I'm using 1000+ schemas in a single
database (each schema contains the same tables, just different data
per tenant).  I used schemas so that the shared app servers could
share a connection to the single database for all schemas.  Things are
working fine. However, when using psql, doing \d or trying to use tab
complete takes FOREVER, because it's doing a sequence scan against
pg_class (which has over a million rows), and relying on
pg_table_is_visible to do search_path filtering. I've figured out that
if I add "nspname = ANY(current_schemas(true))" to the query psql is
using, and an index to pg_class on relnamespace, the query optimizer
is able to do an index scan, and the queries return in milliseconds
instead of minutes.  However, I can't actually create an index on
pg_class because it is a system table (I was able to test by copying
it to a temporary table and adding the index there). My question is if
there is a way to create the index on the system table somehow for
just my database, and if not how would the developer community react
to the suggestion of adding an index to a system table in the default
postgres distro.

Thanks,

Cody Cutrer

Re: Index on System Table

From
Tom Lane
Date:
Cody Cutrer <cody@instructure.com> writes:
> I've got a SaaS situation where I'm using 1000+ schemas in a single
> database (each schema contains the same tables, just different data
> per tenant). ...
> if I add "nspname = ANY(current_schemas(true))" to the query psql is
> using, and an index to pg_class on relnamespace, the query optimizer
> is able to do an index scan, and the queries return in milliseconds
> instead of minutes.  However, I can't actually create an index on
> pg_class because it is a system table (I was able to test by copying
> it to a temporary table and adding the index there). My question is if
> there is a way to create the index on the system table somehow for
> just my database,

There's not really support for adding indexes to system catalogs
on-the-fly.  I think it would work (barring concurrency issues)
for most catalogs, but pg_class has special limitations due to
the "relmapping" infrastructure.  It's not something I'd particularly
care to try on a production database.

> and if not how would the developer community react
> to the suggestion of adding an index to a system table in the default
> postgres distro.

In many (probably most) databases, an index on pg_class.relnamespace
wouldn't be selective enough to justify its update costs.  I'd want
to see a lot more than one request for this before considering it.

If you're correct that the main costs come from the pg_table_is_visible
tests, it should be possible to dodge that without an extra index.
I'd suggest making a function similar to current_schemas() except it
returns an OID array instead of names (this should be cheaper anyway)
and just putting the relnamespace = ANY(current_schema_oids()) condition
in front of the visibility test.  Or maybe you could dispense with the
visibility test altogether, depending on your usage patterns.

(BTW, I think that "\d schemaname.*" doesn't involve any visibility
tests, in case that helps.)

            regards, tom lane

Re: Index on System Table

From
Cody Cutrer
Date:
Thanks for the tips.  I spent some more time investigating.  It's
definitely pg_table_is_visible that's causing the problem.  A \dt
<schema>.* is fairly fast (like you said, it doesn't apply
pg_table_is_visible at all).  I tried adjusting the query in several
ways.  Adding either nspname=ANY(current_schemas(true)) or
relnamespace=ANY(<precalculated list of the oids>) didn't help,
because the query planner still applied pg_table_is_visible to every
row in pg_class.  Doing either of those and *removing* the
pg_table_is_visible query gave the best results.  That may be a good
solution, since for tab complete you don't really care which schema
and object is coming from, just that there is an object.  I'm not sure
about for /dt, though.

Anyhow, I've found a workaround with acceptable (still not "snappy")
performance for all the queries, though.  By running "alter function
pg_table_is_visible (rel oid) cost 50;" the query planner is now
avoiding that function, and doing other filtering first.  The queries
are all a few seconds now, but not multiple minutes.

Cody Cutrer

On Tue, Mar 20, 2012 at 6:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Cody Cutrer <cody@instructure.com> writes:
>> I've got a SaaS situation where I'm using 1000+ schemas in a single
>> database (each schema contains the same tables, just different data
>> per tenant). ...
>> if I add "nspname = ANY(current_schemas(true))" to the query psql is
>> using, and an index to pg_class on relnamespace, the query optimizer
>> is able to do an index scan, and the queries return in milliseconds
>> instead of minutes.  However, I can't actually create an index on
>> pg_class because it is a system table (I was able to test by copying
>> it to a temporary table and adding the index there). My question is if
>> there is a way to create the index on the system table somehow for
>> just my database,
>
> There's not really support for adding indexes to system catalogs
> on-the-fly.  I think it would work (barring concurrency issues)
> for most catalogs, but pg_class has special limitations due to
> the "relmapping" infrastructure.  It's not something I'd particularly
> care to try on a production database.
>
>> and if not how would the developer community react
>> to the suggestion of adding an index to a system table in the default
>> postgres distro.
>
> In many (probably most) databases, an index on pg_class.relnamespace
> wouldn't be selective enough to justify its update costs.  I'd want
> to see a lot more than one request for this before considering it.
>
> If you're correct that the main costs come from the pg_table_is_visible
> tests, it should be possible to dodge that without an extra index.
> I'd suggest making a function similar to current_schemas() except it
> returns an OID array instead of names (this should be cheaper anyway)
> and just putting the relnamespace = ANY(current_schema_oids()) condition
> in front of the visibility test.  Or maybe you could dispense with the
> visibility test altogether, depending on your usage patterns.
>
> (BTW, I think that "\d schemaname.*" doesn't involve any visibility
> tests, in case that helps.)
>
>                        regards, tom lane

Re: Index on System Table

From
Tom Lane
Date:
Cody Cutrer <cody@instructure.com> writes:
> On Tue, Mar 20, 2012 at 6:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> There's not really support for adding indexes to system catalogs
>> on-the-fly.  I think it would work (barring concurrency issues)
>> for most catalogs, but pg_class has special limitations due to
>> the "relmapping" infrastructure.  It's not something I'd particularly
>> care to try on a production database.

BTW, I experimented with that a little bit and found that the relmapper
is not really the stumbling block, at least not after applying this
one-line patch:
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=f70f095c9096d5e2689e8d79172b37b57a84e51b

It seemed to work for me to do this:

    stop postmaster
    start a standalone backend with -c allow_system_table_mods=1
    create index pg_class_relnamespace_index on pg_class(relnamespace);
    stop standalone backend, restart postmaster

There are a lot of gotchas here, notably that the session in which you
create the index won't know it's there (so in this case, a reindex on
pg_class would likely be advisable afterwards).  I still think you'd be
nuts to try it on a production database, but ...

            regards, tom lane

Re: Index on System Table

From
Cody Cutrer
Date:
That's awesome, thanks! Yeah, I doubt I'll do that to our production
database, but maybe I'll try it on a copy sometime down the line.
Adjusting the cost for pg_table_is_visible is working well enough so
far.

Cody Cutrer

On Wed, Mar 21, 2012 at 12:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Cody Cutrer <cody@instructure.com> writes:
>> On Tue, Mar 20, 2012 at 6:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> There's not really support for adding indexes to system catalogs
>>> on-the-fly.  I think it would work (barring concurrency issues)
>>> for most catalogs, but pg_class has special limitations due to
>>> the "relmapping" infrastructure.  It's not something I'd particularly
>>> care to try on a production database.
>
> BTW, I experimented with that a little bit and found that the relmapper
> is not really the stumbling block, at least not after applying this
> one-line patch:
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=f70f095c9096d5e2689e8d79172b37b57a84e51b
>
> It seemed to work for me to do this:
>
>        stop postmaster
>        start a standalone backend with -c allow_system_table_mods=1
>        create index pg_class_relnamespace_index on pg_class(relnamespace);
>        stop standalone backend, restart postmaster
>
> There are a lot of gotchas here, notably that the session in which you
> create the index won't know it's there (so in this case, a reindex on
> pg_class would likely be advisable afterwards).  I still think you'd be
> nuts to try it on a production database, but ...
>
>                        regards, tom lane

Re: Index on System Table

From
Fabrízio de Royes Mello
Date:

2012/3/21 Tom Lane <tgl@sss.pgh.pa.us>

BTW, I experimented with that a little bit and found that the relmapper
is not really the stumbling block, at least not after applying this
one-line patch:
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=f70f095c9096d5e2689e8d79172b37b57a84e51b


Can we back-patch it on previous versions like 9.0 and 9.1? Or exist some reason to don't do that?

Regards,
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Blog sobre TI: http://fabriziomello.blogspot.com

Re: Index on System Table

From
Tom Lane
Date:
=?ISO-8859-1?Q?Fabr=EDzio_de_Royes_Mello?= <fabriziomello@gmail.com> writes:
> 2012/3/21 Tom Lane <tgl@sss.pgh.pa.us>
>> BTW, I experimented with that a little bit and found that the relmapper
>> is not really the stumbling block, at least not after applying this
>> one-line patch:

>> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=f70f095c9096d5e2689e8d79172b37b57a84e51b

> Can we back-patch it on previous versions like 9.0 and 9.1? Or exist some
> reason to don't do that?

Given that the whole thing is entirely unsupported anyway, I see no
need to back-patch.

            regards, tom lane