Re: [HACKERS] Detecting schema changes during logical replication - Mailing list pgsql-hackers

From Daniele Varrazzo
Subject Re: [HACKERS] Detecting schema changes during logical replication
Date
Msg-id CA+mi_8bJ_uPr67j-6mbin537DVvfk=bOhmWneyBRfbZu89q0tw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Detecting schema changes during logical replication  (Andres Freund <andres@anarazel.de>)
Responses Re: [HACKERS] Detecting schema changes during logical replication  (Craig Ringer <craig.ringer@2ndquadrant.com>)
List pgsql-hackers
On Sun, May 7, 2017 at 8:04 PM, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> On 2017-05-07 19:27:08 +0100, Daniele Varrazzo wrote:
>> I'm putting together a replication system based on logical
>> replication.
>
> Interesting.  If you very briefly could recap what it's about... ;)

I need to replicate some tables from a central database into the
database that should run a secondary system. For a similar use case we
have used Londiste in the past, which has served us good, but its
usage has not been problem-free. Logical decoding seems much less
invasive on the source database than a trigger-based replication
solution, and has less moving part to care about and maintain.

For the moment I'm hacking into a fork of Euler project for wal
decoding into json (https://github.com/dvarrazzo/wal2json), mostly
adding configurability, so that we may be able to replicate only the
tables we need, skip certain fields etc. I'm also taking a look at
minimising the amount of information produced: sending over and over
the column names and types for every record seems a waste, hence my
question.

>> I would like to send table information only the first
>> time a table is seen by the 'change_cb' callback, but of course there
>> could be some schema change after replication started. So I wonder: is
>> there any information I can find in the 'Relation' structure of the
>> change callback, which may suggest that there could have been a change
>> in the table schema, hence a new schema should be sent to the client?
>
> The best way I can think of - which is also what is implemented in the
> in-core replication framework - is to have a small cache on-top of the
> relcache.  That cache is kept coherent using
> CacheRegisterRelcacheCallback().  Then whenever there's a change you
> look up that change in that cache, and send the schema information if
> it's been invalidated since you last sent something.  That's also how
> the new stuff in v10 essentially works:
> src/backend/replication/pgoutput/pgoutput.c
>
> pgoutput_change(), does a lookup for its own metadata using get_rel_sync_entry()
> which then checks relentry->schema_sent.  Invalidation unsets
> schema_sent in rel_sync_cache_relation_cb.

Thank you very much, it seems exactly what I need. I'll try hacking
around this callback.

-- Daniele



pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: [HACKERS] proposal psql \gdesc
Next
From: Jeff Janes
Date:
Subject: [HACKERS] logical replication deranged sender