Re: [GENERAL] Logcal replication in large scale - Mailing list pgsql-general

From Nicklas Avén
Subject Re: [GENERAL] Logcal replication in large scale
Date
Msg-id 1505894560.2239.1.camel@jordogskog.no
Whole thread Raw
In response to Re: [GENERAL] Logcal replication in large scale  (Magnus Hagander <magnus@hagander.net>)
Responses Re: [GENERAL] Logcal replication in large scale  (Magnus Hagander <magnus@hagander.net>)
List pgsql-general
On Wed, 2017-09-20 at 09:44 +0200, Magnus Hagander wrote:
On Wed, Sep 20, 2017 at 8:53 AM, Nicklas Avén <nicklas.aven@jordogskog.no> wrote:
Hallo all

I am thrilled about logical replication in PostgreSQL 10. My head have
started spinning about use cases.

Would it be possible to use logical replication as a distribution
method of data?

As an answer to the generic  question: yes :)

 
I think about map data from national mapping authorities. The problem
is to get the updates of their data sets. Especially the open data sets
are now distributed as files (shape files) in Sweden and as pg_dump in
Norway.

I guess it is beyond what logical replication is designed for, so I ask
what problems that might arise for a scenario like:

The distributor has a publication database with logical replication
publications of the tables. All users, probably thousands or more,
would subscribe to that publication to get an updated copy of the data
set.

How would the publication server react? I guess the WAL-files will only
be written once anyway?

Yes. But they will  need to be kept around until *all* subscribers have pulled down their changes. So even one subscriber falling behind will mean your WAL will never get cleaned up.

Of course, you can keep some sort of watcher process that kills old replication slots.

I am also not sure how well PostgreSQL will react to having thousands of replication slots. It's not what the system was designed for I believe :)


Ok, I have to read me up on how this works. I thought about it as a bucket of WAL-files that the subscribers just "grab".
But of course there is some bookkeeping to make things work.

I guess there is also mechanisms so a new subscriber can get a complete table after the publcation have been active for a long time? I mean the "cleaning up" of Wal-files will not leave new subscribers missing what is written to the table long ago?


You might be better of using logical decoding (which goes back to 9.4) to stream the data out, but not having each subscriber be a postgresql subscriber. Either using it to generate some sort of "diff files" that can then be consumed downstream, or by distributing it via some kind of dedicated queuing system designed to handle that many downstreams.

The thing is that I am not involved in any of those organisations, just a user that want to give them some good reasons to consider deistributing through PostgreSQL :-)


 
My guess is that it will be a lower payload than today anyway when the
whole data set have to be fetched to get updates.

As always that's going to depend on the number of updates. If the same thing is updated 1000 times, then if you use logical replication it gets transferred 1000 times. So there are definitely cases when there will be *more* traffic with logical replication, but in cases like yours I would guess it will be less.

//Magnus

pgsql-general by date:

Previous
From: Thomas Güttler
Date:
Subject: Re: [GENERAL] Final pg_dumpall should happen in Single-User-Mode
Next
From: Thomas Güttler
Date:
Subject: [GENERAL] VM-Ware Backup of VM safe?