Re: Moving delta data faster - Mailing list pgsql-general

From Adrian Klaver
Subject Re: Moving delta data faster
Date
Msg-id a1abe41c-94d8-4292-899f-ea2f256d76ae@aklaver.com
Whole thread Raw
In response to Re: Moving delta data faster  (yudhi s <learnerdatabase99@gmail.com>)
Responses Re: Moving delta data faster
List pgsql-general
On 4/3/24 22:24, yudhi s wrote:
> 
> On Thu, Apr 4, 2024 at 10:16 AM Adrian Klaver <adrian.klaver@aklaver.com 
> <mailto:adrian.klaver@aklaver.com>> wrote:
> 
>     On 4/3/24 20:54, yudhi s wrote:
>      > On Thu, Apr 4, 2024 at 2:41 AM Adrian Klaver
>     <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>      > <mailto:adrian.klaver@aklaver.com
>     <mailto:adrian.klaver@aklaver.com>>> wrote:
>      >
>      >    > Thank you Adrian.
>      >
>      > And one thing i forgot to mention this target postgresql database
>     would
>      > be on AWS RDS whereas the source Oracle databases is on premise.
>     I think
>      > we don't have the FDW extension currently in place but we can get
>     that.
>      > I am just not able to understand clearly  though, but do you mean
>     export
>      > the data from source using CSV and do truncate and import on
>     target. And
>      > as these data will be moved through the network won't that cause
>     slowness?
>      >
>      > The source database here is Oracle database. Correct me if wrong, it
>      > looks like foreign data wrapper is like a DB link. Or do you mean
>      > writing a query on the target database (which can be UPSERT or
>     MERGE)
>      > but will be joining the table from the source database through the
>      > DBlink/DDW? But my question was whether we should use UPSERT or
>     MERGE
>      > for comparing and loading the delta records to the target postgresql
>      > database. Want to understand which is more performant , as I see
>     in the
>      > past Merge having performance issues in the past, but not very sure
>      > about that.
> 
>     My motivation was to get some basic information about your setup and
>     what you are trying to achieve.
> 
>     If I understand correctly you have:
> 
>     1) An Oracle database with tables that you want to copy the complete
>     data from to a Postgres database. For this sort of thing
>     COPY(https://www.postgresql.org/docs/current/sql-copy.html
>     <https://www.postgresql.org/docs/current/sql-copy.html>) on the
>     Postgres end using CSV data generated from the source is probably the
>     quickest bulk load method.
> 
>     2) After the initial load you want to do follow up INSERT/UPDATEs based
>     on a delta of the source tables relative to the initial load. This is
>     still a bit of mystery to me. How are determining the delta: a) On the
>     source end entirely or b) Target relative to source? Also what is the
>     anticipated size of the delta per transfer?
> 
>     Additional information needed:
> 
>     1) Network distance between source and target?
> 
>     2) Network capacity?
> 
>     3) Expected load on both source and target servers from other
>     operations?
> 
> 
> Thank you. Actually I was trying to understand how to cater the delta 
> load after the one time load is done . The delta change in records is 
> planned to be found based on the primary keys on the tables. If it found 
> the key it will update the records if it does not find the keys it will 
> insert the rows.
> 
> Basically the select query from the source database will fetch the data 
> with a certain time interval(based on the latest update timestamp or 
> create timestamp if they are available or else full dump) and put it on 
> S3 and then from the S3 it will be picked and gets merged to the target 
> postgres database. As upsert and merge both were looking similar , so 
> was wondering what we should use here for loading the delta records?

S3 is not a database. You will need to be more specific about '... then 
from the S3 it will be picked and gets merged to the target postgres 
database.'

-- 
Adrian Klaver
adrian.klaver@aklaver.com




pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: Failure of postgres_fdw because of TimeZone setting
Next
From: Carl L
Date:
Subject: [MASSMAIL] Multiple COPY statements for one table vs one for ~half a billion records