Re: pglogical_output - a general purpose logical decoding output plugin - Mailing list pgsql-hackers

From Tomasz Rybak
Subject Re: pglogical_output - a general purpose logical decoding output plugin
Date
Msg-id 1453153626.2811.48.camel@post.pl
Whole thread Raw
In response to Re: pglogical_output - a general purpose logical decoding output plugin  (Craig Ringer <craig@2ndquadrant.com>)
List pgsql-hackers
W dniu 07.01.2016, czw o godzinie 15∶50 +0800, użytkownik Craig Ringer
napisał:
> On 7 January 2016 at 01:17, Peter Eisentraut <peter_e@gmx.net> wrote:
> > On 12/22/15 4:55 AM, Craig Ringer wrote:
> > > I'm a touch frustrated by that, as a large part of the point of
> > > submitting the output plugin separately and in advance of the
> > downstream
> > > was to get attention for it separately, as its own entity. A lot
> > of
> > > effort has been put into making this usable for more than just a
> > data
> > > source for pglogical's replication tools.
> >

Maybe chosen name was not the best one - I assumed from the very 
eginning that it's replication solution and not something separate.

> > I can't imagine that there is a lot of interest in a replication
> > tool
> > where you only get one side of it, no matter how well-designed or
> > general it is.
> Well, the other part was posted most of a week ago.
>
> http://www.postgresql.org/message-id/5685BB86.5010901@2ndquadrant.com
>
> ... but this isn't just about replication. At least, not just to
> another PostgreSQL instance. This plugin is designed to be general
> enough to use for replication to other DBMSes (via appropriate
> receivers), to replace trigger-based data collection in existing
> replication systems, for use in audit data collection, etc.
>
> Want to get a stream of data out of PostgreSQL in a consistent,
> simple way, without having to add triggers or otherwise interfere
> with the origin database? That's the purpose of this plugin, and it
> doesn't care in the slightest what the receiver wants to do with that
> data. It's been designed to be usable separately from pglogical
> downstream and - before the Python tests were rejected in discussions
> on this list - was tested using a test suite completely separate to
> the pglogical downstream using psycopg2 to make sure no unintended
> interdependencies got introduced.
>
> You can do way more than that with the output plugin but you have to
> write your own downstream/receiver for the desired purpose, since
> using a downstream based on bgworkers and SPI won't make any sense
> outside PostgreSQL.
>

Put those 3 paragraphs into README.md - and this is not a joke.
This is very good rationale behind this plugin; for now README
starts with link to documentation describing logical decoding
and the second paragraph talks about replication.
So when replication (and only it) is in README, it should be
no wonder that people (only - or mostly) think about replication.

Maybe we should think about changing the name to something like
logical_decoder or logical_streamer, to divorce this plugin
from pglogical? Currently even name suggests tight coupling - and
in other way than it should be. pglogical depends on this plugin,
not the other way around.

> If you just want a canned product to use, see the pglogical post
> above for the downstream code.
>
>  
> > Ultimately, what people will want to do with this is
> > replicate things, not muse about its design aspects. So if we're
> > going
> >  to ship a replication solution in PostgreSQL core, we should ship
> > all
> > the pieces that make the whole system work.
> I don't buy that argument. Doesn't that mean logical decoding
> shouldn't have been accepted? Or the initial patches for parallel
> query? Or any number of other things that're part of incremental
> development solutions?
>
> (This also seems to contradict what you then argue below, that the
> proposed feature is too broad and does too much.)
>
> I'd be happy to see both parts go in, but I'm frustrated that
> nobody's willing to see beyond "replicate from one Pg to another Pg"
> and see all the other things you can do. Want to replicate to Oracle
> / MS-SQL / etc? This will help a lot and solve a significant part of
> the problem for you. Want to stream data to append-only audit logs?
> Ditto. But nope, it's all about PostgreSQL to PostgreSQL.
>
> Please try to look further into what client applications can do with
> this directly. I already know it meets the needs of the pglogical
> downstream. What I was hoping to achieve with posting the output
> plugin earlier was to get some thought going about what *else* it'd
> be good for.
>
> Again: pglogical is posted now (it just took longer than expected to
> get ready) and I'll be happy to see both it and the output plugin
> included. I just urge people to look at the output plugin as more
> than a tightly coupled component of pglogical.
>
> Maybe some quality name bikeshedding for the output plugin would help
> ;)
>
> > Also, I think there are two kinds of general systems: common core,
> > and
> > all possible features.  A common core approach could probably be
> > made
> > acceptable with the argument that anyone will probably want to do
> > things
> > this way, so we might as well implement it once and give it to
> > people.
> That's what we're going for here. Extensible, something people can
> build on and use.
>  
> >  In a way, the logical decoding interface is the common core, as we
> > currently understand it.  But this submission clearly has a lot of
> > features beyond just the basics
> Really? What would you cut? What's beyond the basics here? What
> basics are you thinking of, i.e what set of requirements are you
> working towards / needs are you seeking to meet?
>
> We cut this to the bone to produce a minimum viable logical
> replication solution. Especially the output plugin.
>
> Cut the hook interfaces for row and xact filtering? You lose the
> ability to use replication origins, crippling functionality, and for
> no real gain in simplicity.
>
> Remove JSON support? That's what most people are actually likely to
> want to use when using the output plugin directly, and it's important
> for debugging/tracing/diagnostics. It's a separate feature, to be
> sure, but it's also a pretty trivial addition.
>  
> >  and we could probably go through them
> > one by one and ask, why do we need this bit?  So that kind of
> > system
> > will be very hard to review as a standalone submission.
> >
> Again, I disagree. I think you're looking at this way too narrowly.
>
> I find it quite funny, actually. Here we go and produce something
> that's a nice re-usable component that other people can use in their
> products and solutions ... and all anyone does is complain that the
> other part required to use it as a canned product isn't posted yet
> (though it is now). But with BDR all anyone ever does is complain
> that it's too tightly coupled to the needs of a single product and
> the features extracted from it, like replication origins, should be
> more generic and general purpose so other people can use them in
> their products too. Which is it going to be?
>
> It would be helpful if you could take a step back and describe what
> *you* think logical replication for PostgreSQL should look like. You
> clearly have a picture in mind of what it should be, what
> requirements it satisfies, etc. If you're going to argue based on
> that it'd be very helpful to describe it. I might've missed some
> important points you've seen and you might've overlooked issues I've
> seen. 
>

This is rather long, but I do not want to cut to much, because
it shows slight problem with workflow in PostgreSQL community.
I'm writing as someone trying to increase my involvement,
not fully outsider, but not yet feeling fully belonging.

I'll try to explain what I mean taking this patch as example.
It started
as part of pglogical replication, but you split
it to ease review. But
this origin shows - in name, in comments,
in README. It's good example
of scratching itch - but without
connection to others, or maybe without
wider picture.
Don't get me wrong - having code to discuss is much
better
than just bikeshedding what we'd like to have.
And changes in v5
(like caching and passing tuples to hooks)
give hope for some work.
OTOH,
by looking at parallel queries, maybe once it lands
in repository it'll
get more attention?

I can feel your frustration. Coming to community without own
itch to scratch is also a bit frustrating - I do not know where
to start, what needs the most attention. I can see that
commitfests are in dire need for reviewers, so I started with
them. But at the same time I can only check whether code looks
correct, applies cleanly, whether it compiles, whether
tests pass.

I do not see bigger picture - and also cannot see emails with
discussion about long- or mid-term direction or vision.
It makes harder to feel that it matters and to decide
which patch look at.
Both communities I feel attached to (Debian and PostgreSQL)
differ from many highly visible FLOSS projects that they
do not have one backing company, nor benevolent dictator.
It gives them freedom to pursue different goals without
risk of disrupting power structure, but at the same time
it make it harder to connect the dots and see how project
is doing.

OK, we went quite far away from review. I do not have closing
remarks - only that I hope to provide better review by the weekend.

And let's discuss name - I do not fully like pglogical_decoding.

-- 
Tomasz Rybak  GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A  488E C654 FB33 2AD5 9860
http://member.acm.org/~tomaszrybak



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [PATCH] Improve spinlock inline assembly for x86.
Next
From: Robert Haas
Date:
Subject: Re: plpgsql - DECLARE - cannot to use %TYPE or %ROWTYPE for composite types