Re: Selective Replication help - Mailing list pgsql-novice

From Payal Singh
Subject Re: Selective Replication help
Date
Msg-id CANUg7LD2_4W-ehM=pbMDEq86scziq7gzOM1gW7avr5nXpqUPfA@mail.gmail.com
Whole thread Raw
In response to Selective Replication help  (Hailey Eckstrand <haileyeckstrand@gmail.com>)
List pgsql-novice
Option 2) I've also been looking into a master-slave replication software called bucardo but it works table to table and so if I did that.. I think I would have to create local versions of the slave table (which would be storing duplicated data), update them by triggers (when new data was inserted into the 5 tables via web scraping scripts) and then set up replication on those tables to the slave databases. However, with replication comes a steep learning curve. I'm not sure if what I'm trying to do is complicated enough to warrant replication. 

If you can create a local copy of the final table, you don't need to use any other tool but normal postgres replication (WAL or streaming). You can probably use your current SQL function to generate the table, just the location of the table will be different, and you won't be using fdw (since it will be on the same database). But yes, you won't be able to just replicate a single table. The cluster as a whole will be replicated (this won't be any more complicated than what you're currently doing, perhaps even simpler). 

Payal Singh,
Database Administrator,
OmniTI Computer Consulting Inc.
Phone: 240.646.0770 x 253

On Thu, May 28, 2015 at 6:13 PM, Hailey Eckstrand <haileyeckstrand@gmail.com> wrote:
I have one local master database and 4 remote slave databases. Every hour I send, around 1000 rows from the master to each of the slaves (4 columns). For context, my master database is where I do web scraping. After the data is scraped, I send it out to the slave databases.

Of the 1000 rows, there are some duplicates in the primary key of the slave databases so part of the push involves deleting the primary key overlaps and then inserting all of the data.

On the master database, the data I send to the slaves is compiled from a view which is a subset of 5 tables. The reason I am explaining this is because it is not a simple master table, slave table replication. The data I send out is from a view of multiple tables in the master and inserted into one table on the slave.

Currently, the way I push the data from local to remote is, 
I've created an SQL function which operates on a slave database table (via a foreign data wrapper) that checks if the row exists and if it does, it updates the values and if it doesn't exist, it inserts the new row.

I find that this is very slow and I'm guessing inefficient but I'm not very good at testing that. The amount of rows that I'm sending out is about to increase to 10,000 and I'd like to look into other solutions. 

Option 1) In my research, I've found the function 'dblink_build_sql_insert' that states it 'can be useful in doing selective replication of a local table to a remote databasewhich sounds like what I'm doing but I've not found any examples online. 

Option 2) I've also been looking into a master-slave replication software called bucardo but it works table to table and so if I did that.. I think I would have to create local versions of the slave table (which would be storing duplicated data), update them by triggers (when new data was inserted into the 5 tables via web scraping scripts) and then set up replication on those tables to the slave databases. However, with replication comes a steep learning curve. I'm not sure if what I'm trying to do is complicated enough to warrant replication. 

Can anyone suggest other options which would be fast and perhaps more efficient or let me know if 1 or 2 are a good idea?
I know enough python and php to get by and can figure out triggers. I am not amazing at plpgsql but could figure it out.

Thank you,
Hailey

pgsql-novice by date:

Previous
From: Hans Ginzel
Date:
Subject: Re: psql readline Tab insert tab
Next
From: Wei Shan
Date:
Subject: Conflict Management in Postgres hot standby