Re: Stream consistent snapshot via a logical decoding plugin as a series of INSERTs - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: Stream consistent snapshot via a logical decoding plugin as a series of INSERTs
Date
Msg-id CAMsr+YEiHXnyAhYJTphES2u2hvSpbeGpQ2+NCVffSQjMnMbmcw@mail.gmail.com
Whole thread Raw
In response to Re: Stream consistent snapshot via a logical decoding plugin as a series of INSERTs  ("Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de>)
Responses Re: Stream consistent snapshot via a logical decoding plugin as a series of INSERTs
List pgsql-hackers
On 20 January 2016 at 15:50, Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> wrote:
 
All of this implies another *postgres* database on the receiving side, which is not necessarily the case for my research.

Good point. It might not be a DB at all, either, i.e. it might not understand INSERTs and you may want data in some arbitrary format. Like json.

The same is true for other intended uses of pglogical_output. It's a pain to have to query the desired initial state out of the database via normal libpq, so I like your idea to let the output plugin deal with that.

 
This is what I'm going to benchmark.  With the generic function I can just create two slots: one for pglogical and another one for BottledWater/Avro and see which one performs better when forced to stream some TB worth of INSERTs through the change callback.

Makes sense.
 
What do you say?

Interesting idea. As outlined I think it sounds pretty fragile though; I really, really don't like the idea of lying to the insert callback by passing it a fake insert with (presumably) fake reorder buffer txn, etc.

Fair enough.  However for performance testing it could be not that bad, even if nothing of that lands in the actual API.

I agree. It's fine for performance testing.
 
It should be relatively simple to add to pglogical_output, though you might have to hack out a few things if the faked-up state doesn't fully stand up to scrutiny.


You're presumably not forming a reorder buffer then decoding it since it could create a massive tempfile on disk, so are you just dummying this info up?

In my experience, it doesn't.  We know it's going to be a "committed xact", so we don't really need to queue the changes up before we see a "commit" record.

OK.

That's probably going to confuse pglogical_output a bit because it looks at the tx start/end records. But it might not look closely enough to care, to be honest, and may cope OK with the bogus data. It can probably be hacked around for testing purposes.
 
 
Another consideration is that we might introduce modes for acquiring the slot: Exclusive and Shared access (can be implemented with LWLocks?), so that peek_changes() and stream_relation() could acquire the slot in Shared access mode, thus allowing parallel queries, while START_REPLICATION and get_changes() would require Exclusive access.

That'd be nice, but probably not totally necessary for streaming relations. It doesn't really need the slot at all. Or shouldn't, I think. Though it might be easiest to allow it to acquire the slot just for convenience and shared code.
 

Thanks for the thoughtful reply!  I'm going to experiment with my toy code a bit more, while keeping in mind what could a more workable approach look like.

Great.

I'll be really interested in your results.

If you have trouble making pglogical_output cooperate with your tests and measurements feel free to mail me directly and I'll see if I can help find what's going wrong in the test harness or in pglogical_output . 

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Inconsistent error handling in START_REPLICATION command
Next
From: "Shulgin, Oleksandr"
Date:
Subject: Re: Stream consistent snapshot via a logical decoding plugin as a series of INSERTs