Statistics and Multi-Column indexes - Mailing list pgsql-performance

From lars
Subject Statistics and Multi-Column indexes
Date
Msg-id 4E1A1696.5010307@yahoo.com
Whole thread Raw
Responses Re: Statistics and Multi-Column indexes
Re: Statistics and Multi-Column indexes
List pgsql-performance
I know this has been discussed various times...

We are maintaining a large multi tenant database where *all* tables have
a tenant-id and all indexes and PKs lead with the tenant-id.
Statistics and counts for the all other columns are only really
meaningful within the context of the tenant they belong to.

There appear to be five options for me:
1. Using single column indexes on all interesting columns and rely on
PostgreSQLs bitmap indexes to combine them (which are pretty cool).
2. Use multi column indexes and accept that sometimes Postgres pick the
wrong index (because a non-tenant-id
column might seem highly selective over the table, but it is not for a
particular tenant - or vice versa).
3. Use a functional index that combines multiple columns and only query
via these, that causes statistics
gathering for the expression.
I.e. create index i on t((tenantid||column1)) and SELECT ... FROM t
WHERE tenantid||column1 = '...'
4. Play with n_distinct and/or set the statistics for the inner columns
to some fixed values that lead to the plans that we want.
5. Have a completely different schema and maybe a database per tenant.

Currently we use Oracle and query hinting, but I do not like that
practice at all (and Postgres does not have hints anyway).
Are there any other options?

#1 would be the simplest, but I am concerned about the overhead, both
maintaining two indexes and building the bitmap during queries - for
every query.

I don't think #2 is actually an option. We have some tenants with many
(sometimes 100s) millions of rows per table,
and picking the wrong index would be disastrous.

Could something like #3 be generally added to Postgres? I.e. if there is
a multi column index keep combined statistics for
the involved columns. Of course in that case is it no longer possible to
query the index by prefix.
#3 also seems expensive as the expression needs to be evaluated for each
changed row.

Still trying #4. I guess it involves setting the stat target for the
inner columns to 0 and then inserting my own records into
pg_statistic. Probably only setting n_distinct, i.e. set it "low" if the
inner column is not selective within the context of a tenant and "high"
otherwise.

For various reasons #5 is also not an option.

And of course the same set of questions comes up with joins.

Thanks.

-- Lars


pgsql-performance by date:

Previous
From: lars
Date:
Subject: Re: UPDATEDs slowing SELECTs in a fully cached database
Next
From: Samuel Gendler
Date:
Subject: Re: Statistics and Multi-Column indexes