Re: [PERFORM] Performance of information_schema with many schemataand tables - Mailing list pgsql-performance

From Pritam Baral
Subject Re: [PERFORM] Performance of information_schema with many schemataand tables
Date
Msg-id ce8eaea2-3008-8cc1-fa39-129e9e82eaa2@pritambaral.com
Whole thread Raw
In response to [PERFORM] Performance of information_schema with many schemata and tables  (Ulf Lohbrügge <ulf.lohbruegge@gmail.com>)
Responses Re: [PERFORM] Performance of information_schema with many schemataand tables
List pgsql-performance
On Wednesday 28 June 2017 05:27 AM, Ulf Lohbrügge wrote:
> Hi all,
>
> we use schemata to separate our customers in a multi-tenant setup (9.5.7, Debian stable). Each tenant is managed in
hisown schema with all the tables that only he can access. All tables in all schemata are the same in terms of their
DDL:Every tenant uses e.g. his own table 'address'. We currently manage around 1200 schemata (i.e. tenants) on one
cluster.Every schema consists currently of ~200 tables - so we end up with ~240000 tables plus constraints, indexes,
sequenceset al. 
>
> Our current approach is quite nice in terms of data privacy because every tenant is isolated from all other tenants.
Atenant uses his own user that gives him only access to the corresponding schema. Performance is great for us - we
didn'texpect Postgres to scale so well! 
>
> But performance is pretty bad when we query things in the information_schema:
>
> SELECT
>   *
> FROM information_schema.tables
> WHERE table_schema = 'foo'
> AND table_name = 'bar';``
>
> Above query results in a large sequence scan with a filter that removes 1305161 rows:
>
>
                                                     QUERY PLAN 
>
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Nested Loop Left Join  (cost=0.70..101170.18 rows=3 width=265) (actual time=383.505..383.505 rows=0 loops=1)
>    ->  Nested Loop  (cost=0.00..101144.65 rows=3 width=141) (actual time=383.504..383.504 rows=0 loops=1)
>          Join Filter: (nc.oid = c.relnamespace)
>          ->  Seq Scan on pg_class c  (cost=0.00..101023.01 rows=867 width=77) (actual time=383.502..383.502 rows=0
loops=1)
>                Filter: ((relkind = ANY ('{r,v,f}'::"char"[])) AND
(((relname)::information_schema.sql_identifier)::text= 'bar'::text) AND (pg_has_role(relowner, 'USAGE'::text) OR
has_table_privilege(oid,'SELECT, INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER'::text) OR
has_any_column_privilege(oid,'SELECT, INSERT, UPDATE, REFERENCES'::text))) 
>                Rows Removed by Filter: 1305161
>          ->  Materialize  (cost=0.00..56.62 rows=5 width=68) (never executed)
>                ->  Seq Scan on pg_namespace nc  (cost=0.00..56.60 rows=5 width=68) (never executed)
>                      Filter: ((NOT pg_is_other_temp_schema(oid)) AND
(((nspname)::information_schema.sql_identifier)::text= 'foo'::text)) 
>    ->  Nested Loop  (cost=0.70..8.43 rows=1 width=132) (never executed)
>          ->  Index Scan using pg_type_oid_index on pg_type t  (cost=0.42..8.12 rows=1 width=72) (never executed)
>                Index Cond: (c.reloftype = oid)
>          ->  Index Scan using pg_namespace_oid_index on pg_namespace nt  (cost=0.28..0.30 rows=1 width=68) (never
executed)
>                Index Cond: (oid = t.typnamespace)
>  Planning time: 0.624 ms
>  Execution time: 383.784 ms
> (16 rows)
>
> We noticed the degraded performance first when using the psql cli. Pressing tab after beginning a WHERE clause
resultsin a query against the information_schema which is pretty slow and ends in "lag" when trying to enter queries. 
>
> We also use Flyway (https://flywaydb.org/) to handle our database migrations. Unfortunately Flyway is querying the
information_schemato check if specific tables exist (I guess this is one of the reasons information_schema exists) and
thereforevastly slows down the migration of our tenants. Our last migration run on all tenants (schemata) almost took
2hbecause the above query is executed multiple times per tenant. The migration run consisted of multiple sql files to
beexecuted and triggered more than 10 queries on information_schema per tenant. 
>
> I don't think that Flyway is to blame because querying the information_schema should be a fast operation (and was
fastfor us when we had less schemata). I tried to speedup querying pg_class by adding indexes (after enabling
allow_system_table_mods)but didn't succeed. The function call 'pg_has_role' is probably not easy to optimize. 
>
> Postgres is really doing a great job to handle those many schemata and tables but doesn't scale well when querying
information_schema.I actually don't want to change my current multi-tenant setup (one schema per tenant) as it is
workinggreat but the slow information_schema is killing our deployments. 
>
> Are there any other options besides switching from one-schema-per-tenant-approach? Any help is greatly appreciated!

Have you tried a `REINDEX SYSTEM <dbname>`?

>
> Regards,
> Ulf

--
#!/usr/bin/env regards
Chhatoi Pritam Baral



pgsql-performance by date:

Previous
From: Ulf Lohbrügge
Date:
Subject: [PERFORM] Performance of information_schema with many schemata and tables
Next
From: Yevhenii Kurtov
Date:
Subject: [PERFORM]