Re: Horizontal scalability/sharding - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Horizontal scalability/sharding |
Date | |
Msg-id | CA+TgmoZ+HDbrUOWQRcH-jog2QXGqoJ=kmo263Ga3nWJGMZ9gEg@mail.gmail.com Whole thread Raw |
In response to | Re: Horizontal scalability/sharding (Mason S <masonlists@gmail.com>) |
List | pgsql-hackers |
On Tue, Sep 1, 2015 at 7:01 AM, Mason S <masonlists@gmail.com> wrote: > For efficient internodes joins with row shipping, FDWs may also not be easy > to do. Maybe it is possible if we optionally pass in lists of other nodes > and information about how they are partitioned so data knows where to get > shipped. > > A challenge for planning with arbitrary copies of different shards is that > sometimes you may be able to push down joins, sometimes not. Planning and > execution get ugly. Maybe this can be simplified by parent-child tables > following the same partitioning scheme. This gets at a problem which Ozgun also mentioned in his Google document, and which I also discussed with Etsuro Fujita at PGCon: good query planning requires good metadata, and we don't really have that today. I think that a big part of the goal of the declarative partitioning work that Amit Langote has recently been undertaking is to provide a catalog representation of the partitioning structure that is easy to work with, as opposed to just having a bunch of CHECK constraints that you have to try to reason about. That's one part of the solution. You also might need to know more about the remote table than is captured by the column and data type list. Most particularly, you might want to know what indexes exist on the remote side, but currently, to figure out that out, you'd need to send queries to retrieve that information to the remote side every time you do planning, or maybe you could contrive a session-lifespan cache. That's pretty annoying. One idea for solving this problem is to allow CREATE INDEX on foreign tables, but I don't like that much. There's no guarantee that the remote side is a PostgreSQL instance, and if it isn't, the relevant details about the indexes that exist may not be convenient to represent in our catalogs. Heck, that can be true even if it is a PostgreSQL instance, if the remote side relies on an AM or a function that doesn't exist locally. But even if both sides are PostgreSQL instances using only btree indexes on raw columns, now you've put the burden on the DBA to make sure that the index definitions on the local and remote sides match, and that's a pain in the neck. What seems better to me is to allow ANALYZE of a foreign table a place to record an arbitrary blob of metadata about the remote side that it can then get access to during planning. Then, you can record details about indexes, or statistics that don't fit into the mold of pg_statistic, or really, anything else you're going to need to figure out the best plan, and if the DBA changes the configuration on the remote side, they don't need to update the local configuration to match; a re-ANALYZE will do the trick. I'm open to other ideas as well. Repartitioning will also arise for an all-local parallel join, and it would be nice if the planner smarts could be shared between that case and the remote-table case. Therefore, while I suspect that some of the logic here will end up inside one or more FDWs, I'm pretty confident that a significant chunk of it needs to go into the core optimizer. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: