Re: Pglogical questions and problems - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Pglogical questions and problems
Date
Msg-id CA+TgmoYAi_bY=Wzg5vtddQZwZSziXgnv5QS3K7h+f5c+3fn=bA@mail.gmail.com
Whole thread Raw
In response to Re: Pglogical questions and problems  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Pglogical questions and problems  (Petr Jelinek <petr@2ndquadrant.com>)
List pgsql-hackers
On Thu, Apr 14, 2016 at 11:26 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> 1) "more deeply into core"
> I'm open to doing that for some parts of the code, if there is benefit. At
> present, an extension has exactly the same attributes as an in-core
> solution, so I don't currently see any benefit in doing so. Could you
> explain what you see?
>
> 2) "SQL syntax"
> I'm not sure what SQL syntax would give us. I know what we would lose, which
> is the ability to implement new and interesting features as extensions
> before putting them into core. That doesn't strike me as a benefit, so
> please explain.

Lots of things start out as extensions but then we decide that they
are important enough that they should be part of the core product.
For example, text search started out in contrib, but then we moved it
to core.  When things are in core, they can have their own DDL, which
I think is an ease-of-use benefit.  Also, they become accessible as
infrastructure for other code that gets written later.  If there were
no benefits of putting features in core, we wouldn't put anything in
core, but of course there are such benefits.

It is absolutely wrong to say that you would "lose the ability to
implement new and interesting features as extensions before putting
them into core".  To the contrary, as we add things to core, it
becomes possible to write more and more interesting extensions.  For
example, the availability of background workers has opened up all
kinds of interesting possibilities for extensions that didn't exist
before; in fact, that's why Alvaro created the feature.  Similarly, a
lot of the code that I and others wrote for parallel query has been
used by other people to do interesting things - and it was one of the
goals of the project to make that sort of thing possible.  I believe
logical replication is a fundamental database technology that should
be considered just as much within the score of the core product as
physical replication, parallel query, or UPSERT.  I held and publicly
expressed that belief on my blog before anyone at 2ndQuadrant began
working in this area, and I still hold it today.

> At present, I don't understand why we would do sharding via FDWs, i.e. an
> out-of-core solution and yet replication as an in-core solution. Sharding
> desires/requires a single system image, so tight coupling is sensible (for
> example, defining a distribution key column on a distributed table). For
> replication between disparate loosely coupled systems, tight coupling is
> exactly what you do not want. So doing it that way round would give an an
> out-of-core solution for something that is best done in-core and an in-core
> solution for something best done out-of-core.

First, I think that replication can be either loosely-coupled or
tightly-coupled.  There are interesting cases with intermittently
connected networks where you really don't want too much coupling, and
then there are cases where you are doing load-balancing across a
cluster and tight coupling is fine, even desirable.   Similarly,
although I agree that a sharding solution intrinsically requires
fairly tight coupling, I think that one of the strengths of FDWs is
that they do not.  I'm not very interested in seeing a sharding
solution in PostgreSQL that limits what you can do to a particular
network topology and enforces tight coupling whether you want it or
not.  I'm more interested in seeing how we can build something that
*permits* a tightly-coupled system but also lets people build other
kinds of systems if they wish.

Second, I don't think that whether the system is tightly-coupled or
loosely-coupled has much to do with whether the code lives in
src/backend or contrib and, to be clear, I don't care all that much
about that, either.  If we end up with a great logical replication
solution and it so happens that it loads pglogical.so under the hood,
fine.  However, I do care about ease of use.  In terms of ease of use,
again, I think DDL would be a better interface than one based on
functions.  SQL is clunky at times, but being able to say CREATE TABLE
blah (a int, b text) instead of SELECT pg_create_table('blah',
ARRAY['a', 'b'], ARRAY['int'::regtype, 'text'::regtype]) surely has
something to recommend it.  Of course, there is a lot to ease of use
other than DDL, and if we end up with a design that relies strictly on
contrib, perhaps that is OK.  But it needs to be just as easy to set
up a replicated PostgreSQL cluster as it is to do the equivalent task
in some competing product, or we are missing the boat.

>> I think this would be a good topic to discuss at PGCon.
>
> I'll be at PgCon to discuss this.

Great.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: pg_basebackup creates a corrupt file for pg_stat_tmp and pg_replslot on a backup location
Next
From: Magnus Hagander
Date:
Subject: Re: pg_basebackup creates a corrupt file for pg_stat_tmp and pg_replslot on a backup location