Thread: alter table xxx set unlogged take long time

alter table xxx set unlogged take long time

From
"James Pang (chaolpan)"
Date:

Hi ,

We have PG v13.6 in RHEL8.4, we try to set table unlogged before load data.  There are a lot of existing data in this table, when ‘alter table xxx  set unlogged’, we found it take long time and spend time on IO datafileread.  Is it expected?

 

Thanks,

 

James

Re: alter table xxx set unlogged take long time

From
Jim Mlodgenski
Date:


On Tue, Jul 26, 2022 at 4:53 AM James Pang (chaolpan) <chaolpan@cisco.com> wrote:

Hi ,

We have PG v13.6 in RHEL8.4, we try to set table unlogged before load data.  There are a lot of existing data in this table, when ‘alter table xxx  set unlogged’, we found it take long time and spend time on IO datafileread.  Is it expected?

 

Yes, the whole table needs to be written to WAL so this could take a long time for a large table

RE: alter table xxx set unlogged take long time

From
"James Pang (chaolpan)"
Date:

How to make it fast ?  These are our steps about copy large data from Oracle to Postgres

  1. Create table in Postgres  2. Extract data from Oracle to CSV  3. Alter table set xxx unlogged,   4. Run copy command into Postgres db  5. Alter table set xxx logged  6. Create index …

Step 5 took long time ,especially for large tables.

 

Thank,

 

James

 

From: Jim Mlodgenski <jimmy76@gmail.com>
Sent: Tuesday, July 26, 2022 8:21 PM
To: James Pang (chaolpan) <chaolpan@cisco.com>
Cc: pgsql-performance@lists.postgresql.org
Subject: Re: alter table xxx set unlogged take long time

 

 

 

On Tue, Jul 26, 2022 at 4:53 AM James Pang (chaolpan) <chaolpan@cisco.com> wrote:

Hi ,

We have PG v13.6 in RHEL8.4, we try to set table unlogged before load data.  There are a lot of existing data in this table, when ‘alter table xxx  set unlogged’, we found it take long time and spend time on IO datafileread.  Is it expected?

 

Yes, the whole table needs to be written to WAL so this could take a long time for a large table

Re: alter table xxx set unlogged take long time

From
Tom Lane
Date:
"James Pang (chaolpan)" <chaolpan@cisco.com> writes:
> How to make it fast ?  These are our steps about copy large data from Oracle to Postgres
>   1.  Create table in Postgres  2. Extract data from Oracle to CSV  3. Alter table set xxx unlogged,   4. Run copy
commandinto Postgres db  5. Alter table set xxx logged  6. Create index … 

The easy answer is to skip steps 3 and 5.

            regards, tom lane



RE: alter table xxx set unlogged take long time

From
"James Pang (chaolpan)"
Date:
Without step 3 ,  copy data take long time.   Use wal_level=minimal can help make COPY load data without logging ?

Thanks,

James

-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us> 
Sent: Tuesday, July 26, 2022 8:43 PM
To: James Pang (chaolpan) <chaolpan@cisco.com>
Cc: Jim Mlodgenski <jimmy76@gmail.com>; pgsql-performance@lists.postgresql.org
Subject: Re: alter table xxx set unlogged take long time

"James Pang (chaolpan)" <chaolpan@cisco.com> writes:
> How to make it fast ?  These are our steps about copy large data from Oracle to Postgres
>   1.  Create table in Postgres  2. Extract data from Oracle to CSV  3. Alter table set xxx unlogged,   4. Run copy
commandinto Postgres db  5. Alter table set xxx logged  6. Create index …
 

The easy answer is to skip steps 3 and 5.

            regards, tom lane

Re: alter table xxx set unlogged take long time

From
Jim Mlodgenski
Date:


On Tue, Jul 26, 2022 at 8:45 AM James Pang (chaolpan) <chaolpan@cisco.com> wrote:
Without step 3 ,  copy data take long time.   Use wal_level=minimal can help make COPY load data without logging ?


I assume that you're most concerned with the total time of moving the data from the source database into the final table so you might get a big win by not moving the data twice and directly load the table through a Foregin Data Wrapper and avoid the csv export/import. Something like the oracle_fdw might help here:

-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Tuesday, July 26, 2022 8:43 PM
To: James Pang (chaolpan) <chaolpan@cisco.com>
Cc: Jim Mlodgenski <jimmy76@gmail.com>; pgsql-performance@lists.postgresql.org
Subject: Re: alter table xxx set unlogged take long time

"James Pang (chaolpan)" <chaolpan@cisco.com> writes:
> How to make it fast ?  These are our steps about copy large data from Oracle to Postgres
>   1.  Create table in Postgres  2. Extract data from Oracle to CSV  3. Alter table set xxx unlogged,   4. Run copy command into Postgres db  5. Alter table set xxx logged  6. Create index …

The easy answer is to skip steps 3 and 5.

                        regards, tom lane

RE: alter table xxx set unlogged take long time

From
"James Pang (chaolpan)"
Date:

   We use JDBC to export data into csv ,then copy that to Postgres. Multiple sessions working on multiple tables.  If not set unlogged , how to make COPY run fast ?   possible to start a transaction include all of these  “truncate table xxx; copy table xxxx; create index on tables….”  With wal_level=minimal, is it ok to make copy and create index  without logging ?

 

James

 

From: Jim Mlodgenski <jimmy76@gmail.com>
Sent: Tuesday, July 26, 2022 8:53 PM
To: James Pang (chaolpan) <chaolpan@cisco.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; pgsql-performance@lists.postgresql.org
Subject: Re: alter table xxx set unlogged take long time

 

 

 

On Tue, Jul 26, 2022 at 8:45 AM James Pang (chaolpan) <chaolpan@cisco.com> wrote:

Without step 3 ,  copy data take long time.   Use wal_level=minimal can help make COPY load data without logging ?

 

I assume that you're most concerned with the total time of moving the data from the source database into the final table so you might get a big win by not moving the data twice and directly load the table through a Foregin Data Wrapper and avoid the csv export/import. Something like the oracle_fdw might help here:

 

-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Tuesday, July 26, 2022 8:43 PM
To: James Pang (chaolpan) <chaolpan@cisco.com>
Cc: Jim Mlodgenski <jimmy76@gmail.com>; pgsql-performance@lists.postgresql.org
Subject: Re: alter table xxx set unlogged take long time

"James Pang (chaolpan)" <chaolpan@cisco.com> writes:
> How to make it fast ?  These are our steps about copy large data from Oracle to Postgres
>   1.  Create table in Postgres  2. Extract data from Oracle to CSV  3. Alter table set xxx unlogged,   4. Run copy command into Postgres db  5. Alter table set xxx logged  6. Create index …

The easy answer is to skip steps 3 and 5.

                        regards, tom lane

Re: alter table xxx set unlogged take long time

From
"David G. Johnston"
Date:
On Tue, Jul 26, 2022 at 5:45 AM James Pang (chaolpan) <chaolpan@cisco.com> wrote:
Without step 3 ,  copy data take long time.   Use wal_level=minimal can help make COPY load data without logging ?


I believe you are referring to:


Since the final state of your table will be "logged" relying on the above optimization is the correct path, if you enable "logged" at the end, even with wal_level=minimal, you do not benefit from the optimization and thus your data ends up being written to WAL.

Otherwise, it is overall time that matters, it's no use boasting the COPY is fast if you end up spending hours waiting for ALTER TABLE at the end.

David J.

Re: alter table xxx set unlogged take long time

From
Joe Conway
Date:
On 7/26/22 08:59, James Pang (chaolpan) wrote:
>     We use JDBC to export data into csv ,then copy that to Postgres. 
> Multiple sessions working on multiple tables.  If not set unlogged , how 
> to make COPY run fast ?   possible to start a transaction include all of 
> these  “truncate table xxx; copy table xxxx; create index on tables….”  
> With wal_level=minimal, is it ok to make copy and create index  without 
> logging ?

Not sure if it would work for you, but perhaps a usable strategy would 
be to partition the existing large table on something (e.g. a new column 
like batch number?).

Then (completely untested) I *think* you could create the "partition" 
initially as a free standing unlogged table, load it, index it, switch 
to logged, and then attach it to the partitioned table.

Perhaps you could also have a background job that periodically 
aggregates the batch partitions into larger buckets to minimize the 
overall number of partitions.


-- 
Joe Conway
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: alter table xxx set unlogged take long time

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
> Then (completely untested) I *think* you could create the "partition" 
> initially as a free standing unlogged table, load it, index it, switch 
> to logged, and then attach it to the partitioned table.

I'm still of the opinion that this plan to load the data unlogged
and switch to logged later is a loser.  Sooner or later you have
got to write the data to WAL, and this approach doesn't eliminate
that cost.  What it does do is create one whole extra cycle of
writing the data to disk and reading it back.  I don't think
it's an oversight that no such thing is suggested in our standard
tips for bulk-loading data:

https://www.postgresql.org/docs/current/populate.html

What perhaps *is* an oversight is that we don't suggest
use of COPY FREEZE there.  AFAIK that doesn't reduce the initial
data loading cost directly, but it would save overhead later.

            regards, tom lane



Re: alter table xxx set unlogged take long time

From
Joe Conway
Date:
On 7/27/22 10:46, Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
>> Then (completely untested) I *think* you could create the "partition" 
>> initially as a free standing unlogged table, load it, index it, switch 
>> to logged, and then attach it to the partitioned table.
> 
> I'm still of the opinion that this plan to load the data unlogged
> and switch to logged later is a loser.  Sooner or later you have
> got to write the data to WAL, and this approach doesn't eliminate
> that cost.  What it does do is create one whole extra cycle of
> writing the data to disk and reading it back.  I don't think
> it's an oversight that no such thing is suggested in our standard
> tips for bulk-loading data:

Yeah, agreed. I was mostly responding to the OP desire to use unlogged 
and not taking a stance on that.

> https://www.postgresql.org/docs/current/populate.html
> 
> What perhaps *is* an oversight is that we don't suggest
> use of COPY FREEZE there.  AFAIK that doesn't reduce the initial
> data loading cost directly, but it would save overhead later.

Oh, yes, very good point.


-- 
Joe Conway
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: alter table xxx set unlogged take long time

From
Kyotaro Horiguchi
Date:
At Tue, 26 Jul 2022 12:41:07 +0000, "James Pang (chaolpan)" <chaolpan@cisco.com> wrote in 
> How to make it fast ?  These are our steps about copy large data from Oracle to Postgres
> 
>   1.  Create table in Postgres  2. Extract data from Oracle to CSV  3. Alter table set xxx unlogged,   4. Run copy
commandinto Postgres db  5. Alter table set xxx logged  6. Create index …
 
> Step 5 took long time ,especially for large tables.

As others pointed, the step5 inevitably requires WAL emittion. On the
other hand, there is a proposed patch [1]. It lets ALTER TABLE SET
LOGGED/UNLOGGED evade duping the whole target table and could reduce
the amount of WAL to be emitted (caused by the difference of
tuple-based WAL and per-page WAL) (in major cases).

Could you try it and see if it works for you in any extent?

regards.

[1] https://commitfest.postgresql.org/38/3461/

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



RE: alter table xxx set unlogged take long time

From
"James Pang (chaolpan)"
Date:
Does  "wal_level=minimal" help reducing  wal emitting a lot for COPY and CREATE INDEX?   We plan to remove  "set
unlogged/log", instead , just set "wal_level=minimal" ,then COPY data in parallel, then create index.
 

   Thanks,

   James 
-----Original Message-----
From: Joe Conway <mail@joeconway.com> 
Sent: Wednesday, July 27, 2022 11:02 PM
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: James Pang (chaolpan) <chaolpan@cisco.com>; Jim Mlodgenski <jimmy76@gmail.com>;
pgsql-performance@lists.postgresql.org
Subject: Re: alter table xxx set unlogged take long time

On 7/27/22 10:46, Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
>> Then (completely untested) I *think* you could create the "partition" 
>> initially as a free standing unlogged table, load it, index it, 
>> switch to logged, and then attach it to the partitioned table.
> 
> I'm still of the opinion that this plan to load the data unlogged and 
> switch to logged later is a loser.  Sooner or later you have got to 
> write the data to WAL, and this approach doesn't eliminate that cost.  
> What it does do is create one whole extra cycle of writing the data to 
> disk and reading it back.  I don't think it's an oversight that no 
> such thing is suggested in our standard tips for bulk-loading data:

Yeah, agreed. I was mostly responding to the OP desire to use unlogged and not taking a stance on that.

> https://www.postgresql.org/docs/current/populate.html
> 
> What perhaps *is* an oversight is that we don't suggest use of COPY 
> FREEZE there.  AFAIK that doesn't reduce the initial data loading cost 
> directly, but it would save overhead later.

Oh, yes, very good point.


--
Joe Conway
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: alter table xxx set unlogged take long time

From
Joe Conway
Date:
On 7/28/22 03:47, James Pang (chaolpan) wrote:
> Does  "wal_level=minimal" help reducing  wal emitting a lot for COPY
> and CREATE INDEX?   We plan to remove  "set unlogged/log" , instead ,
> just set "wal_level=minimal" ,then COPY data in parallel, then create
> index.

(Note - please don't top post on these lists)

Yes, wal_level = minimal is a big help in my experience if you can 
tolerate it.

Similarly synchronous_commit = off might help as long as you are 
prepared to reload some data in the event of a crash (which generally is 
true when bulk loading). As noted in the docs:

   This parameter can be changed at any time; the
   behavior for any one transaction is determined by
   the setting in effect when it commits. It is
   therefore possible, and useful, to have some
   transactions commit synchronously and others
   asynchronously.


-- 
Joe Conway
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com