Re: Partitioning and postgres_fdw optimisations for multi-tenancy - Mailing list pgsql-hackers
From | Alexey Kondratov |
---|---|
Subject | Re: Partitioning and postgres_fdw optimisations for multi-tenancy |
Date | |
Msg-id | 55860ece837dd9cb7661bac46ee727af@postgrespro.ru Whole thread Raw |
In response to | Re: Partitioning and postgres_fdw optimisations for multi-tenancy (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>) |
Responses |
Re: Partitioning and postgres_fdw optimisations for multi-tenancy
|
List | pgsql-hackers |
On 2020-07-14 15:27, Ashutosh Bapat wrote: > On Tue, Jul 14, 2020 at 12:48 AM Alexey Kondratov > <a.kondratov@postgrespro.ru> wrote: >> I built a simple two node multi-tenant schema for tests, which can be >> easily set up with attached scripts. It creates three tables >> (companies, >> users, documents) distributed over two nodes. Everything can be found >> in >> this Gist [2] as well. >> >> Some real-life test queries show, that all single-node queries aren't >> pushed-down to the required node. For example: >> >> SELECT >> * >> FROM >> documents >> INNER JOIN users ON documents.user_id = users.id >> WHERE >> documents.company_id = 5 >> AND users.company_id = 5; > > There are a couple of things happening here > 1. the clauses on company_id in WHERE clause are causing partition > pruning. Partition-wise join is disabled with partition pruning before > PG13. In PG13 we have added advanced partition matching algorithm > which will allow partition-wise join with partition pruning. > I forgot to mention that I use a recent master (991c444e7a) for tests with enable_partitionwise_join = 'on' enable_partitionwise_aggregate = 'on' of course. I've also tried postgres_fdw.use_remote_estimate = true followed by ANALYSE on both nodes (it is still used in setup.sh script). BTW, can you, please, share a link to commit / thread about allowing partition-wise join and partition pruning to work together in PG13? > > 2. the query has no equality condition on the partition key of the > tables being joined. Partitionwise join is possible only when there's > an equality condition on the partition keys (company_id) of the > joining tables. PostgreSQL's optimizer is not smart enough to convert > the equality conditions in WHERE clause into equality conditions on > partition keys. So having those conditions just in WHERE clause does > not help. Instead please add equality conditions on partition keys in > JOIN .. ON clause or WHERE clause (only for INNER join). > With adding documents.company_id = users.company_id SELECT * FROM documents INNER JOIN users ON (documents.company_id = users.company_id AND documents.user_id = users.id) WHERE documents.company_id = 5 AND users.company_id = 5; query plan remains the same. >> >> executed as following >> >> QUERY PLAN >> ------------------------------------------------------- >> Nested Loop >> Join Filter: (documents.user_id = users.id) >> -> Foreign Scan on users_node2 users >> -> Materialize >> -> Foreign Scan on documents_node2 documents >> >> i.e. it uses two foreign scans and does the final join locally. >> However, >> once I specify target partitions explicitly, then the entire query is >> pushed down to the foreign node: >> >> QUERY PLAN >> --------------------------------------------------------- >> Foreign Scan >> Relations: (documents_node2) INNER JOIN (users_node2) >> >> Execution time is dropped significantly as well — by more than 3 times >> even for this small test database. Situation for simple queries with >> aggregates or joins and aggregates followed by the sharding key filter >> is the same. Something similar was briefly discussed in this thread >> [3]. >> >> IIUC, it means that push-down of queries through the postgres_fdw >> works >> perfectly well, the problem is with partition-wise operation detection >> at the planning time. Currently, partition-wise aggregate routines, >> e.g., looks for a GROUP BY and checks whether sharding key exists >> there >> or not. After that PARTITIONWISE_AGGREGATE_* flag is set. However, it >> doesn't look for a content of WHERE clause, so frankly speaking it >> isn't >> a problem, this functionality is not yet implemented. >> >> Actually, sometimes I was able to push down queries with aggregate >> simply by adding an additional GROUP BY with sharding key, like this: >> >> SELECT >> count(*) >> FROM >> documents >> WHERE >> company_id = 5 >> GROUP BY company_id; > > This gets pushed down since GROUP BY clause is on the partition key. > Sure, but it only works *sometimes*, I've never seen most of such simple queries with aggregates to be pushed down, e.g.: SELECT sum(id) FROM documents_node2 WHERE company_id = 5 GROUP BY company_id; whether 'GROUP BY company_id' is used or not. >> >> Although it seems that it will be easier to start with aggregates, >> probably we should initially plan a more general solution? For >> example, >> check that all involved tables are filtered by partitioning key and >> push >> down the entire query if all of them target the same foreign server. >> >> Any thoughts? > > I think adding just equality conditions on the partition key will be > enough. No need for any code change. So, it hasn't helped. Maybe I could modify some costs to verify that push-down of such joins is ever possible? Anyway, what about aggregates? Partition-wise aggregates work fine for queries like SELECT count(*) FROM documents GROUP BY company_id; but once I narrow it to a single partition with 'WHERE company_id = 5', then it is being executed in a very inefficient way — takes all rows from remote partition / node and performs aggregate locally. It doesn't seem like a problem with query itself. In my experience, both partition-wise joins and aggregates work well with simple GROUP or JOIN by the partitioning key, which corresponds to massive multi-partition OLAP queries. However, both stop working for a single-partition queries with WHERE, when postgres_fdw and partitioning are used. I'd be glad if you share any new guesses of how to make them working without code modification. Thanks -- Alexey Kondratov Postgres Professional https://www.postgrespro.com Russian Postgres Company
pgsql-hackers by date: