Re: Moving delta data faster - Mailing list pgsql-general
From | Adrian Klaver |
---|---|
Subject | Re: Moving delta data faster |
Date | |
Msg-id | a1abe41c-94d8-4292-899f-ea2f256d76ae@aklaver.com Whole thread Raw |
In response to | Re: Moving delta data faster (yudhi s <learnerdatabase99@gmail.com>) |
Responses |
Re: Moving delta data faster
|
List | pgsql-general |
On 4/3/24 22:24, yudhi s wrote: > > On Thu, Apr 4, 2024 at 10:16 AM Adrian Klaver <adrian.klaver@aklaver.com > <mailto:adrian.klaver@aklaver.com>> wrote: > > On 4/3/24 20:54, yudhi s wrote: > > On Thu, Apr 4, 2024 at 2:41 AM Adrian Klaver > <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com> > > <mailto:adrian.klaver@aklaver.com > <mailto:adrian.klaver@aklaver.com>>> wrote: > > > > > Thank you Adrian. > > > > And one thing i forgot to mention this target postgresql database > would > > be on AWS RDS whereas the source Oracle databases is on premise. > I think > > we don't have the FDW extension currently in place but we can get > that. > > I am just not able to understand clearly though, but do you mean > export > > the data from source using CSV and do truncate and import on > target. And > > as these data will be moved through the network won't that cause > slowness? > > > > The source database here is Oracle database. Correct me if wrong, it > > looks like foreign data wrapper is like a DB link. Or do you mean > > writing a query on the target database (which can be UPSERT or > MERGE) > > but will be joining the table from the source database through the > > DBlink/DDW? But my question was whether we should use UPSERT or > MERGE > > for comparing and loading the delta records to the target postgresql > > database. Want to understand which is more performant , as I see > in the > > past Merge having performance issues in the past, but not very sure > > about that. > > My motivation was to get some basic information about your setup and > what you are trying to achieve. > > If I understand correctly you have: > > 1) An Oracle database with tables that you want to copy the complete > data from to a Postgres database. For this sort of thing > COPY(https://www.postgresql.org/docs/current/sql-copy.html > <https://www.postgresql.org/docs/current/sql-copy.html>) on the > Postgres end using CSV data generated from the source is probably the > quickest bulk load method. > > 2) After the initial load you want to do follow up INSERT/UPDATEs based > on a delta of the source tables relative to the initial load. This is > still a bit of mystery to me. How are determining the delta: a) On the > source end entirely or b) Target relative to source? Also what is the > anticipated size of the delta per transfer? > > Additional information needed: > > 1) Network distance between source and target? > > 2) Network capacity? > > 3) Expected load on both source and target servers from other > operations? > > > Thank you. Actually I was trying to understand how to cater the delta > load after the one time load is done . The delta change in records is > planned to be found based on the primary keys on the tables. If it found > the key it will update the records if it does not find the keys it will > insert the rows. > > Basically the select query from the source database will fetch the data > with a certain time interval(based on the latest update timestamp or > create timestamp if they are available or else full dump) and put it on > S3 and then from the S3 it will be picked and gets merged to the target > postgres database. As upsert and merge both were looking similar , so > was wondering what we should use here for loading the delta records? S3 is not a database. You will need to be more specific about '... then from the S3 it will be picked and gets merged to the target postgres database.' -- Adrian Klaver adrian.klaver@aklaver.com
pgsql-general by date: