Re: Horizontal scalability/sharding - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Horizontal scalability/sharding
Date
Msg-id CA+TgmoZ+HDbrUOWQRcH-jog2QXGqoJ=kmo263Ga3nWJGMZ9gEg@mail.gmail.com
Whole thread Raw
In response to Re: Horizontal scalability/sharding  (Mason S <masonlists@gmail.com>)
List pgsql-hackers
On Tue, Sep 1, 2015 at 7:01 AM, Mason S <masonlists@gmail.com> wrote:
> For efficient internodes joins with row shipping, FDWs may also not be easy
> to do. Maybe it is possible if we optionally pass in lists of other nodes
> and information about how they are partitioned so data knows where to get
> shipped.
>
> A challenge for planning with arbitrary copies of different shards is that
> sometimes you may be able to push down joins, sometimes not. Planning and
> execution get ugly. Maybe this can be simplified by parent-child tables
> following the same partitioning scheme.

This gets at a problem which Ozgun also mentioned in his Google
document, and which I also discussed with Etsuro Fujita at PGCon: good
query planning requires good metadata, and we don't really have that
today.

I think that a big part of the goal of the declarative partitioning
work that Amit Langote has recently been undertaking is to provide a
catalog representation of the partitioning structure that is easy to
work with, as opposed to just having a bunch of CHECK constraints that
you have to try to reason about.  That's one part of the solution.

You also might need to know more about the remote table than is
captured by the column and data type list.  Most particularly, you
might want to know what indexes exist on the remote side, but
currently, to figure out that out, you'd need to send queries to
retrieve that information to the remote side every time you do
planning, or maybe you could contrive a session-lifespan cache.
That's pretty annoying.

One idea for solving this problem is to allow CREATE INDEX on foreign
tables, but I don't like that much.  There's no guarantee that the
remote side is a PostgreSQL instance, and if it isn't, the relevant
details about the indexes that exist may not be convenient to
represent in our catalogs.  Heck, that can be true even if it is a
PostgreSQL instance, if the remote side relies on an AM or a function
that doesn't exist locally.  But even if both sides are PostgreSQL
instances using only btree indexes on raw columns, now you've put the
burden on the DBA to make sure that the index definitions on the local
and remote sides match, and that's a pain in the neck.

What seems better to me is to allow ANALYZE of a foreign table a place
to record an arbitrary blob of metadata about the remote side that it
can then get access to during planning.  Then, you can record details
about indexes, or statistics that don't fit into the mold of
pg_statistic, or really, anything else you're going to need to figure
out the best plan, and if the DBA changes the configuration on the
remote side, they don't need to update the local configuration to
match; a re-ANALYZE will do the trick.

I'm open to other ideas as well.  Repartitioning will also arise for
an all-local parallel join, and it would be nice if the planner smarts
could be shared between that case and the remote-table case.
Therefore, while I suspect that some of the logic here will end up
inside one or more FDWs, I'm pretty confident that a significant chunk
of it needs to go into the core optimizer.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Horizontal scalability/sharding
Next
From: Pavel Stehule
Date:
Subject: Re: On-demand running query plans using auto_explain and signals