Thread: Multitenancy optimization

Re: Multitenancy optimization

From

Hadi Moshayedi

Date:

29 March 2019, 08:06:31

On Thu, Mar 28, 2019 at 5:40 AM Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:

Certainly it is possible to create multicolumn statistics to notify
Postgres about columns correlation.
But unfortunately it is not good and working solution.

First of all we have to create multicolumn statistic for all possible
combinations of table's attributes including "tenant_id".
It is very inconvenient and inefficient.

On the inconvenient part: doesn't postgres itself automatically create functional dependencies on combinations? i.e. it seems to me if we create statistics on (a, b, c), then we don't need to create statistics on (a, b) or (a, c) or (b, c), because the pg_statistic_ext entry for (a, b, c) already includes enough information.

On the inefficient part, I think there's some areas of improvement here. For example, if (product_id) -> seller_id correlation is 1.0, then (product_id, product_name) -> seller_id correlation is definitely 1.0 and we don't need to store it. So we can reduce the amount of information stored in pg_statistic_ext -> stxdependencies, without losing any data points.

More generally, if (a) -> b correlation is X, then (a, c) -> b correlation is >= X. Maybe we can have a threshold to reduce number of entries in pg_statistic_ext -> stxdependencies.

-- Hadi

Re: Multitenancy optimization

From

Konstantin Knizhnik

Date:

29 March 2019, 08:42:55

On 29.03.2019 11:06, Hadi Moshayedi wrote:

On Thu, Mar 28, 2019 at 5:40 AM Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
Certainly it is possible to create multicolumn statistics to notify
Postgres about columns correlation.
But unfortunately it is not good and working solution.

First of all we have to create multicolumn statistic for all possible
combinations of table's attributes including "tenant_id".
It is very inconvenient and inefficient.

On the inconvenient part: doesn't postgres itself automatically create functional dependencies on combinations? i.e. it seems to me if we create statistics on (a, b, c), then we don't need to create statistics on (a, b) or (a, c) or (b, c), because the pg_statistic_ext entry for (a, b, c) already includes enough information.

On the inefficient part, I think there's some areas of improvement here. For example, if (product_id) -> seller_id correlation is 1.0, then (product_id, product_name) -> seller_id correlation is definitely 1.0 and we don't need to store it. So we can reduce the amount of information stored in pg_statistic_ext -> stxdependencies, without losing any data points.

More generally, if (a) -> b correlation is X, then (a, c) -> b correlation is >= X. Maybe we can have a threshold to reduce number of entries in pg_statistic_ext -> stxdependencies.

-- Hadi

Yes, Postgres automatically creates functional dependencies on combinations.
But actually you do not need ALL combinations. Table can contain hundreds of attributes: number of combination in this case will not fit in bigint.
This is why Postgres doesn't allow to create muticolumn statistic for more than 8 columns.
So if you have table with hundred attributes and tenant_id, you with have to manually create statistic for each <tenant_id,att-i> pair.
And it is very inconvenient (and as I already mentioned doesn't completely solve the problem with join selectivity estimation).

May be there are some other ways of addressing this problem (although I do not them).
But I think that in any case, if number of distinction values is explicitly specified for the attribute, then this value should be used by optimizer instead of dummy DEFAULT_NUM_DISTINCT.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Thread: Multitenancy optimization

Multitenancy optimization

Attachment

Re: Multitenancy optimization

Re: Multitenancy optimization