Thread: GSoC 2015 proposal: Improve the performance of “ALTER TABLE .. SET LOGGED / UNLOGGED” statement

Hi all,

Below I written my proposal idea to this GSoC.


*** Improve the performance of “ALTER TABLE .. SET LOGGED / UNLOGGED” statement ***

Last year during the GSoC2014 I implemented the feature to allow an unlogged table to be changed to logged [1], but the desing chosen was to rewrite the entire heap in a new relfilenode with a new relpersistence because some troubles pointed here [2].

The project was successfully finished and got committed [3] into PostgreSQL to be released this year in the 9.5 version.

However this design lead us to performance problems with large relations because we need to rewrite the entire content of the relation twice, one into a new heap and other into the WAL, so this project will change the current desing of the mechanism of change an unlogged table to logged without the need to rewrite the entire heap, but just by removing the init forks and if the wal_level != minimal we’ll write the contents to the WAL too.


** Benefits to the PostgreSQL Community **

The  “unlogged” tables feature was introduced by 9.1 version, and provide better write performance than regular tables (logged), but are not crash-safe. Their contents are automatically discarded (cleared) if the server crashes. Also, their contents do not propagate to standby servers.

We already have a way to change an unlogged table to logged using “ALTER TABLE name SET LOGGED” developed last year during the GSoC2014, now during the GSoC2015 we’ll redesing the internals to improve the I/O performance of this feature removing the need of rewrite the entire heap into a new relfilenode.

The are a good idea about the desing here [4], but I’ll discuss the design with my mentor to implement this improvement.


** Additional Goals **

The main goal of this project is improve the performance of the ALTER TABLE name SET {LOGGED|UNLOGGED}, but we can expand this propostal to more related goals.

* Allow unlogged materialized views
  . ALTER MATERIALIZED VIEW name SET { UNLOGGED | LOGGED }
* Allow unlogged indexes on logged tables
  . ALTER INDEX name SET { UNLOGGED | LOGGED }


** Deliverables **

This project has just one deliverable at the end. The deliverable will be the improvement of the routines that transform an “unlogged” table to “logged” and “logged” to “unlogged”, without the need to create a new “relfilenode” with a different “relpersistence”.


** Project Schedule **

until May 25:
* create a website to the project (wiki.postgresql.org)
* create a public repository to the project (github.com/fabriziomello)
* read what has already been discussed by the community about the project [4]
* learn about some PostgreSQL internals:
  . control data (src/include/catalog/pg_control.h)
  . storage (src/backend/storage/*)
* discuss the additional goals with community

May 26 - June 21
* implementation of the first prototype:
  . implement the change of unlogged table to logged without rewrite the entire heap when “wal_level = minimal”
  . at this point when “wal_level != minimal” we use the current implementation
* write documentation and the test cases
* submit this first prototype to the commitfest 2015/06 (https://commitfest.postgresql.org/5/)

June 22 - June 26
* mentor review the work in progress

June 27 - August 17
* do the adjustments based on the community feedback during the commitfest 2015/06
* implementation of the second prototype:
  . when “wal_level != minimal” we’ll remove the init fork (first prototype) and write relation pages to the WAL.
  . implement “ALTER MATERIALIZED VIEW .. SET LOGGED / UNLOGGED”
* submit to the commitfest 2015/09 for final evaluation and maybe will be committed to 9.6 version (webpage don’t created yet)

August 18 - August 21
* do the adjustments based on the community feedback during the commitfest 2015/09
* final mentor review


** About the proponent **

Fabrízio de Royes Mello

e-mail: fabriziomello@gmail.com
twitter: @fabriziomello
github: http://github.com/fabriziomello
linkedin: http://linkedin.com/in/fabriziomello

Currently I help people and teams to take the full potential of relational databases, especially PostgreSQL, helping teams to design the structure of the database (modeling), build physical architecture (database schema), programming (procedural languages), SQL (usage, tuning, best practices), optimization and orchestration of instances in production too. I perform a volunteer work for Brazilian Community of PostgreSQL (www.postgresql.org.br), supporting mailing lists, organizing events (pgbr.postgresql.org.br) and some admin tasks. And also I help a little the PostgreSQL Global Development Group (PGDG) in the implementation of some features and review of patches (git.postgresql.org).


Links

[1] https://wiki.postgresql.org/wiki/Allow_an_unlogged_table_to_be_changed_to_logged_GSoC_2014
[2] http://www.postgresql.org/message-id/CA+Tgmob44LNwwU73N1aJsGQyzQ61SdhKJRC_89wCm0+aLg=x2Q@mail.gmail.com
[3] http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f41872d0c1239d36ab03393c39ec0b70e9ee2a3c
[4] http://www.postgresql.org/message-id/CA+TgmoZM+-0R7h0eDPzZjbokVVQ+gAVKChmno4fypVEccW-EqA@mail.gmail.com

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello
On Wed, Mar 25, 2015 at 9:46 PM, Fabrízio de Royes Mello
<fabriziomello@gmail.com> wrote:
> http://www.postgresql.org/message-id/CA+TgmoZM+-0R7h0eDPzZjbokVVQ+gAVKChmno4fypVEccW-EqA@mail.gmail.com

I like the idea of the feature a lot, but the proposal to which you
refer here mentions some problems, which I'm curious how you think you
might solve.  (I don't have any good ideas myself, beyond what I
mentioned there.)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




On Thu, Apr 2, 2015 at 3:24 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Mar 25, 2015 at 9:46 PM, Fabrízio de Royes Mello
> <fabriziomello@gmail.com> wrote:
> > http://www.postgresql.org/message-id/CA+TgmoZM+-0R7h0eDPzZjbokVVQ+gAVKChmno4fypVEccW-EqA@mail.gmail.com
>
> I like the idea of the feature a lot, but the proposal to which you
> refer here mentions some problems, which I'm curious how you think you
> might solve.  (I don't have any good ideas myself, beyond what I
> mentioned there.)
>

How the cleanup will happens?

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello

On Thu, Apr 2, 2015 at 3:24 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Mar 25, 2015 at 9:46 PM, Fabrízio de Royes Mello
> <fabriziomello@gmail.com> wrote:
> > http://www.postgresql.org/message-id/CA+TgmoZM+-0R7h0eDPzZjbokVVQ+gAVKChmno4fypVEccW-EqA@mail.gmail.com
>
> I like the idea of the feature a lot, but the proposal to which you
> refer here mentions some problems, which I'm curious how you think you
> might solve.  (I don't have any good ideas myself, beyond what I
> mentioned there.)
>

You're right, and we have another design (steps 1 and 2 was implemented last year):


*** ALTER TABLE changes

1) ATController
    1.1) Acquire an AccessExclusiveLock (src/backend/commands/tablecmds.c - AlterTableGetLockLevel:3023)


2) Prepare to change relpersistence (src/backend/commands/tablecmds.c - ATPrepCmd:3249-3270)
• check temp table (src/backend/commands/tablecmds.c - ATPrepChangePersistence:11074)
• check foreign key constraints (src/backend/commands/tablecmds.c - ATPrepChangePersistence:11102)


3) FlushRelationBuffers, DropRelFileNodeBuffers and smgrimmedsync (MAIN_FORKNUM, FSM_FORKNUM, VISIBILITYMAP_FORKNUM and INIT_FORKNUM if exists)


4) Create a new fork called  "TRANSIENT INIT FORK":
 
• from Unlogged to Logged  (create _initl fork) (INIT_TO_LOGGED_FORKNUM)
  ∘ new forkName (src/common/relpath.c) called "_initl"
  ∘ insert XLog record to drop it if transaction abort

• from Logged to Unlogged (create _initu fork) (INIT_TO_UNLOGGED_FORKUM)
  ∘ new forkName (src/common/relpath.c) called "_initu"
  ∘ insert XLog record to drop it if transaction abort


5) Change the relpersistence in catalog (pg_class->relpersistence) (heap, toast, indexes)

 
6) Remove/Create INIT_FORK

• from Unlogged to Logged
  ∘ remove the INIT_FORK and INIT_TO_LOGGED_FORK adding to the pendingDeletes queue
• from Logged to Unlogged
  ∘ remove the INIT_TO_UNLOGGED_FORK adding to the pendingDeletes queue
  ∘ create the INIT_FORK using "heap_create_init_fork"
  ∘ insert XLog record to drop init fork if the transaction abort



*** CRASH RECOVERY changes

1) During crash recovery (src/backend/access/transam/xlog.c:6507:ResetUnloggedRelations)

• if the transient fork "_initl" exists then
  ∘ drop the transient fork "_initl"
  ∘ if the init fork doesn't exist then create it
  ∘ reset relation
• if the transient fork "_initu" exists then
  ∘ drop the transient fork "_initl"
  ∘ if the init fork exists then drop it
  ∘ don't reset the relation


Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello


On Fri, Jul 3, 2015 at 4:48 PM, Fabrízio de Royes Mello <fabriziomello@gmail.com> wrote:

On Thu, Apr 2, 2015 at 3:24 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Mar 25, 2015 at 9:46 PM, Fabrízio de Royes Mello
> <fabriziomello@gmail.com> wrote:
> > http://www.postgresql.org/message-id/CA+TgmoZM+-0R7h0eDPzZjbokVVQ+gAVKChmno4fypVEccW-EqA@mail.gmail.com
>
> I like the idea of the feature a lot, but the proposal to which you
> refer here mentions some problems, which I'm curious how you think you
> might solve.  (I don't have any good ideas myself, beyond what I
> mentioned there.)
>

You're right, and we have another design (steps 1 and 2 was implemented last year):


*** ALTER TABLE changes

1) ATController
    1.1) Acquire an AccessExclusiveLock (src/backend/commands/tablecmds.c - AlterTableGetLockLevel:3023)


2) Prepare to change relpersistence (src/backend/commands/tablecmds.c - ATPrepCmd:3249-3270)
• check temp table (src/backend/commands/tablecmds.c - ATPrepChangePersistence:11074)
• check foreign key constraints (src/backend/commands/tablecmds.c - ATPrepChangePersistence:11102)


3) FlushRelationBuffers, DropRelFileNodeBuffers and smgrimmedsync (MAIN_FORKNUM, FSM_FORKNUM, VISIBILITYMAP_FORKNUM and INIT_FORKNUM if exists)


4) Create a new fork called  "TRANSIENT INIT FORK":
 
• from Unlogged to Logged  (create _initl fork) (INIT_TO_LOGGED_FORKNUM)
  ∘ new forkName (src/common/relpath.c) called "_initl"
  ∘ insert XLog record to drop it if transaction abort

• from Logged to Unlogged (create _initu fork) (INIT_TO_UNLOGGED_FORKUM)
  ∘ new forkName (src/common/relpath.c) called "_initu"
  ∘ insert XLog record to drop it if transaction abort

AFAIU, while reading WAL, the server doesn't know whether the transaction that produced that WAL record aborted or committed. It's only when it sees a COMMIT/ABORT record down the line, it can confirm whether the transaction committed or aborted. So, one can only "redo" the things that WAL tells have been done. We can not "undo" things based on the WAL. We might record this fact "somewhere" while reading the WAL and act on it once we know the status of the transaction, but I do not see that as part of this idea. This comment applies to all the steps inserting WALs for undoing things.
 


5) Change the relpersistence in catalog (pg_class->relpersistence) (heap, toast, indexes)

 
6) Remove/Create INIT_FORK

• from Unlogged to Logged
  ∘ remove the INIT_FORK and INIT_TO_LOGGED_FORK adding to the pendingDeletes queue
• from Logged to Unlogged
  ∘ remove the INIT_TO_UNLOGGED_FORK adding to the pendingDeletes queue
  ∘ create the INIT_FORK using "heap_create_init_fork"
  ∘ insert XLog record to drop init fork if the transaction abort



*** CRASH RECOVERY changes

1) During crash recovery (src/backend/access/transam/xlog.c:6507:ResetUnloggedRelations)


This operation is carried out in two phases: one before replaying WAL records and second after that. Are you sure that the WALs generated for the unlogged or logged forks, as described above, have been taken care of?
 
• if the transient fork "_initl" exists then
  ∘ drop the transient fork "_initl"
  ∘ if the init fork doesn't exist then create it
  ∘ reset relation
• if the transient fork "_initu" exists then
  ∘ drop the transient fork "_initl"
  ∘ if the init fork exists then drop it
  ∘ don't reset the relation


Consider case of converting unlogged to logged. The server crashes after 6th step when both the forks are removed. During crash recovery, it will not see any of the fork and won't reset the unlogged table. Remember the table is still unlogged since the transaction was aborted due to the crash. So, it has to have an init fork to be reset on crash recovery.

Similarly while converting from logged to unlogged. The server crashes in the 6th step after creating the INIT_FORK, during crash recovery the init fork will be seen and a supposedly logged table will be trashed.

The ideas in 1 and 2 might be better than having yet another init fork.


1. http://www.postgresql.org/message-id/533D457A.4030007@vmware.com

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello



--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

On Fri, Jul 3, 2015 at 9:29 AM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
>
>
> On Fri, Jul 3, 2015 at 4:48 PM, Fabrízio de Royes Mello <fabriziomello@gmail.com> wrote:
>>
>>
>> On Thu, Apr 2, 2015 at 3:24 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> >
>> > On Wed, Mar 25, 2015 at 9:46 PM, Fabrízio de Royes Mello
>> > <fabriziomello@gmail.com> wrote:
>> > > http://www.postgresql.org/message-id/CA+TgmoZM+-0R7h0eDPzZjbokVVQ+gAVKChmno4fypVEccW-EqA@mail.gmail.com
>> >
>> > I like the idea of the feature a lot, but the proposal to which you
>> > refer here mentions some problems, which I'm curious how you think you
>> > might solve.  (I don't have any good ideas myself, beyond what I
>> > mentioned there.)
>> >
>>
>> You're right, and we have another design (steps 1 and 2 was implemented last year):
>>
>>
>> *** ALTER TABLE changes
>>
>> 1) ATController
>>     1.1) Acquire an AccessExclusiveLock (src/backend/commands/tablecmds.c - AlterTableGetLockLevel:3023)
>>
>>
>> 2) Prepare to change relpersistence (src/backend/commands/tablecmds.c - ATPrepCmd:3249-3270)
>> • check temp table (src/backend/commands/tablecmds.c - ATPrepChangePersistence:11074)
>> • check foreign key constraints (src/backend/commands/tablecmds.c - ATPrepChangePersistence:11102)
>>
>>
>> 3) FlushRelationBuffers, DropRelFileNodeBuffers and smgrimmedsync (MAIN_FORKNUM, FSM_FORKNUM, VISIBILITYMAP_FORKNUM and INIT_FORKNUM if exists)
>>
>>
>> 4) Create a new fork called  "TRANSIENT INIT FORK":
>>  
>> • from Unlogged to Logged  (create _initl fork) (INIT_TO_LOGGED_FORKNUM)
>>   ∘ new forkName (src/common/relpath.c) called "_initl"
>>   ∘ insert XLog record to drop it if transaction abort
>>
>> • from Logged to Unlogged (create _initu fork) (INIT_TO_UNLOGGED_FORKUM)
>>   ∘ new forkName (src/common/relpath.c) called "_initu"
>>   ∘ insert XLog record to drop it if transaction abort
>
>
> AFAIU, while reading WAL, the server doesn't know whether the transaction that produced that WAL record aborted or committed. It's only when it sees a COMMIT/ABORT record down the line, it can confirm whether the transaction committed or aborted. So, one can only "redo" the things that WAL tells have been done. We can not "undo" things based on the WAL. We might record this fact "somewhere" while reading the WAL and act on it once we know the status of the transaction, but I do not see that as part of this idea. This comment applies to all the steps inserting WALs for undoing things.
>  
>

Even if I "xlog" the create/drop of the transient fork?



>> 5) Change the relpersistence in catalog (pg_class->relpersistence) (heap, toast, indexes)
>>
>>  
>> 6) Remove/Create INIT_FORK
>>
>> • from Unlogged to Logged
>>   ∘ remove the INIT_FORK and INIT_TO_LOGGED_FORK adding to the pendingDeletes queue
>> • from Logged to Unlogged
>>   ∘ remove the INIT_TO_UNLOGGED_FORK adding to the pendingDeletes queue
>>   ∘ create the INIT_FORK using "heap_create_init_fork"
>>   ∘ insert XLog record to drop init fork if the transaction abort
>>
>>
>>
>> *** CRASH RECOVERY changes
>>
>> 1) During crash recovery (src/backend/access/transam/xlog.c:6507:ResetUnloggedRelations)
>>
>
> This operation is carried out in two phases: one before replaying WAL records and second after that. Are you sure that the WALs generated for the unlogged or logged forks, as described above, have been taken care of?
>  

Hummm... actually no...



>>
>> • if the transient fork "_initl" exists then
>>   ∘ drop the transient fork "_initl"
>>   ∘ if the init fork doesn't exist then create it
>>   ∘ reset relation
>> • if the transient fork "_initu" exists then
>>   ∘ drop the transient fork "_initl"
>>   ∘ if the init fork exists then drop it
>>   ∘ don't reset the relation
>>
>
> Consider case of converting unlogged to logged. The server crashes after 6th step when both the forks are removed. During crash recovery, it will not see any of the fork and won't reset the unlogged table. Remember the table is still unlogged since the transaction was aborted due to the crash. So, it has to have an init fork to be reset on crash recovery.
>
> Similarly while converting from logged to unlogged. The server crashes in the 6th step after creating the INIT_FORK, during crash recovery the init fork will be seen and a supposedly logged table will be trashed.
>


My intention for the 6th step is all recorded to wal, so if a crash occurs the recovery process clean the mess.


> The ideas in 1 and 2 might be better than having yet another init fork.
>

The link for idea 2 is missing...

Unfortunately I completely missed this idea, but is also interesting. But I'll need to "stamp" in both ways (from unlogged to logged and vice-versa)

During the recovery to check the status of a transaction can I use src/backend/access/transam/clog.c:TransactionIdGetStatus ? I'm asking because I don't know this part of code to much.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello


On Wed, Jul 8, 2015 at 1:27 AM, Fabrízio de Royes Mello <fabriziomello@gmail.com> wrote:

On Fri, Jul 3, 2015 at 9:29 AM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
>
>
> On Fri, Jul 3, 2015 at 4:48 PM, Fabrízio de Royes Mello <fabriziomello@gmail.com> wrote:
>>
>>
>> On Thu, Apr 2, 2015 at 3:24 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> >
>> > On Wed, Mar 25, 2015 at 9:46 PM, Fabrízio de Royes Mello
>> > <fabriziomello@gmail.com> wrote:
>> > > http://www.postgresql.org/message-id/CA+TgmoZM+-0R7h0eDPzZjbokVVQ+gAVKChmno4fypVEccW-EqA@mail.gmail.com
>> >
>> > I like the idea of the feature a lot, but the proposal to which you
>> > refer here mentions some problems, which I'm curious how you think you
>> > might solve.  (I don't have any good ideas myself, beyond what I
>> > mentioned there.)
>> >
>>
>> You're right, and we have another design (steps 1 and 2 was implemented last year):
>>
>>
>> *** ALTER TABLE changes
>>
>> 1) ATController
>>     1.1) Acquire an AccessExclusiveLock (src/backend/commands/tablecmds.c - AlterTableGetLockLevel:3023)
>>
>>
>> 2) Prepare to change relpersistence (src/backend/commands/tablecmds.c - ATPrepCmd:3249-3270)
>> • check temp table (src/backend/commands/tablecmds.c - ATPrepChangePersistence:11074)
>> • check foreign key constraints (src/backend/commands/tablecmds.c - ATPrepChangePersistence:11102)
>>
>>
>> 3) FlushRelationBuffers, DropRelFileNodeBuffers and smgrimmedsync (MAIN_FORKNUM, FSM_FORKNUM, VISIBILITYMAP_FORKNUM and INIT_FORKNUM if exists)
>>
>>
>> 4) Create a new fork called  "TRANSIENT INIT FORK":
>>  
>> • from Unlogged to Logged  (create _initl fork) (INIT_TO_LOGGED_FORKNUM)
>>   ∘ new forkName (src/common/relpath.c) called "_initl"
>>   ∘ insert XLog record to drop it if transaction abort
>>
>> • from Logged to Unlogged (create _initu fork) (INIT_TO_UNLOGGED_FORKUM)
>>   ∘ new forkName (src/common/relpath.c) called "_initu"
>>   ∘ insert XLog record to drop it if transaction abort
>
>
> AFAIU, while reading WAL, the server doesn't know whether the transaction that produced that WAL record aborted or committed. It's only when it sees a COMMIT/ABORT record down the line, it can confirm whether the transaction committed or aborted. So, one can only "redo" the things that WAL tells have been done. We can not "undo" things based on the WAL. We might record this fact "somewhere" while reading the WAL and act on it once we know the status of the transaction, but I do not see that as part of this idea. This comment applies to all the steps inserting WALs for undoing things.
>  
>

Even if I "xlog" the create/drop of the transient fork?


Yes.
 


>> 5) Change the relpersistence in catalog (pg_class->relpersistence) (heap, toast, indexes)
>>
>>  
>> 6) Remove/Create INIT_FORK
>>
>> • from Unlogged to Logged
>>   ∘ remove the INIT_FORK and INIT_TO_LOGGED_FORK adding to the pendingDeletes queue
>> • from Logged to Unlogged
>>   ∘ remove the INIT_TO_UNLOGGED_FORK adding to the pendingDeletes queue
>>   ∘ create the INIT_FORK using "heap_create_init_fork"
>>   ∘ insert XLog record to drop init fork if the transaction abort
>>
>>
>>
>> *** CRASH RECOVERY changes
>>
>> 1) During crash recovery (src/backend/access/transam/xlog.c:6507:ResetUnloggedRelations)
>>
>
> This operation is carried out in two phases: one before replaying WAL records and second after that. Are you sure that the WALs generated for the unlogged or logged forks, as described above, have been taken care of?
>  

Hummm... actually no...



>>
>> • if the transient fork "_initl" exists then
>>   ∘ drop the transient fork "_initl"
>>   ∘ if the init fork doesn't exist then create it
>>   ∘ reset relation
>> • if the transient fork "_initu" exists then
>>   ∘ drop the transient fork "_initl"
>>   ∘ if the init fork exists then drop it
>>   ∘ don't reset the relation
>>
>
> Consider case of converting unlogged to logged. The server crashes after 6th step when both the forks are removed. During crash recovery, it will not see any of the fork and won't reset the unlogged table. Remember the table is still unlogged since the transaction was aborted due to the crash. So, it has to have an init fork to be reset on crash recovery.
>
> Similarly while converting from logged to unlogged. The server crashes in the 6th step after creating the INIT_FORK, during crash recovery the init fork will be seen and a supposedly logged table will be trashed.
>


My intention for the 6th step is all recorded to wal, so if a crash occurs the recovery process clean the mess.


AFAIU, PostgreSQL recovery is based on "redo"ing WAL. What you described earlier, "undo"ing based on the WAL does not fit in the current framework.
 

> The ideas in 1 and 2 might be better than having yet another init fork.
>

The link for idea 2 is missing...

Sorry for that. 2nd idea is what Robert described using control file.

Unfortunately I completely missed this idea, but is also interesting. But I'll need to "stamp" in both ways (from unlogged to logged and vice-versa)

During the recovery to check the status of a transaction can I use src/backend/access/transam/clog.c:TransactionIdGetStatus ? I'm asking because I don't know this part of code to much.

AFAIU, not unless all WALs (or atleast WALs till the commit/abort record of the transaction in question) are replayed.
 
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

On Wed, Jul 8, 2015 at 3:20 AM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
>
>>
>> My intention for the 6th step is all recorded to wal, so if a crash occurs the recovery process clean the mess.
>>
>
> AFAIU, PostgreSQL recovery is based on "redo"ing WAL. What you described earlier, "undo"ing based on the WAL does not fit in the current framework.


Understood!


>>
>> During the recovery to check the status of a transaction can I use src/backend/access/transam/clog.c:TransactionIdGetStatus ? I'm asking because I don't know this part of code to much.
>
>
> AFAIU, not unless all WALs (or atleast WALs till the commit/abort record of the transaction in question) are replayed.
>  

So we'll need to execute this procedure after replay, then the "ResetUnloggedTables" should be executed in two phases:

1) Before the replay checking just the datafiles with the init fork (_init)

2) After the replay checking the datafiles with the stamped init fork ( _init_XID,  or something else)

Is this reasonable?

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello
On 2015-07-03 08:18:09 -0300, Fabrízio de Royes Mello wrote:
> *** ALTER TABLE changes
> ...

Without going into the specifics of this change: Are we actually sure
this feature warrants the complexity it'll introduce? I'm really rather
doubtful.




On Wed, Jul 8, 2015 at 10:53 AM, Andres Freund <andres@anarazel.de> wrote:
>
> On 2015-07-03 08:18:09 -0300, Fabrízio de Royes Mello wrote:
> > *** ALTER TABLE changes
> > ...
>
> Without going into the specifics of this change: Are we actually sure
> this feature warrants the complexity it'll introduce? I'm really rather
> doubtful.

Think in an ETL job that can be use an unlogged table to improve the load performance, but this job create a "large table" and to guarantee the data consistency you need to transform it into a regular table, and with the current implementation rewrite the entire heap, toast and indexes.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello
On 2015-07-08 10:58:51 -0300, Fabrízio de Royes Mello wrote:
> Think in an ETL job that can be use an unlogged table to improve the load
> performance, but this job create a "large table" and to guarantee the data
> consistency you need to transform it into a regular table, and with the
> current implementation rewrite the entire heap, toast and indexes.

Don't buy that. The final target relation will usually already have
content. Also everything but wal_level=minimal will force you to WAL log
the contents anyway.




On Wed, Jul 8, 2015 at 11:37 AM, Andres Freund <andres@anarazel.de> wrote:
>
> On 2015-07-08 10:58:51 -0300, Fabrízio de Royes Mello wrote:
> > Think in an ETL job that can be use an unlogged table to improve the load
> > performance, but this job create a "large table" and to guarantee the data
> > consistency you need to transform it into a regular table, and with the
> > current implementation rewrite the entire heap, toast and indexes.
>
> Don't buy that. The final target relation will usually already have
> content. Also everything but wal_level=minimal will force you to WAL log
> the contents anyway.

If the "wal_level=minimal" we don't need to force the wal log of the contents. If the "wal_level != minimal" we need just to xlog all the pages, but in both cases we don't need the extra job to create a new datafiles and copy the contents between them. So we'll improve performance, or am I missing something?

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello
On 2015-07-09 10:39:35 -0300, Fabrízio de Royes Mello wrote:
> If the "wal_level=minimal" we don't need to force the wal log of the
> contents. If the "wal_level != minimal" we need just to xlog all the pages,
> but in both cases we don't need the extra job to create a new datafiles and
> copy the contents between them. So we'll improve performance, or am I
> missing something?

Sure. It'll be a bit faster. I just don't see the peformance increase in
not that common situations being worth the price we'll pay in
development, code review, debugging and then maintaining some nontrivial
code. If this were likely to be a 15 line patch, I'd think differently.




On Thu, Jul 9, 2015 at 10:56 AM, Andres Freund <andres@anarazel.de> wrote:
>
> On 2015-07-09 10:39:35 -0300, Fabrízio de Royes Mello wrote:
> > If the "wal_level=minimal" we don't need to force the wal log of the
> > contents. If the "wal_level != minimal" we need just to xlog all the pages,
> > but in both cases we don't need the extra job to create a new datafiles and
> > copy the contents between them. So we'll improve performance, or am I
> > missing something?
>
> Sure. It'll be a bit faster. I just don't see the peformance increase in
> not that common situations being worth the price we'll pay in
> development, code review, debugging and then maintaining some nontrivial
> code. If this were likely to be a 15 line patch, I'd think differently.

Well, this will not be a 15 line patch, but I'll try to simplify as much as possible.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello
Andres Freund <andres@anarazel.de> writes:
> On 2015-07-09 10:39:35 -0300, Fabr�zio de Royes Mello wrote:
>> If the "wal_level=minimal" we don't need to force the wal log of the
>> contents. If the "wal_level != minimal" we need just to xlog all the pages,
>> but in both cases we don't need the extra job to create a new datafiles and
>> copy the contents between them. So we'll improve performance, or am I
>> missing something?

> Sure. It'll be a bit faster. I just don't see the peformance increase in
> not that common situations being worth the price we'll pay in
> development, code review, debugging and then maintaining some nontrivial
> code. If this were likely to be a 15 line patch, I'd think differently.

I'm even more worried about the possible reliability problems (ie
introduction of bugs, which are likely to be of the data-eating variety)
that such changes are likely to incur.  So I tend to agree with Andres
that this is probably not a good idea.

            regards, tom lane