Re: WIP: Pg_upgrade - page layout converter (PLC) hook - Mailing list pgsql-hackers

From Zdenek Kotala
Subject Re: WIP: Pg_upgrade - page layout converter (PLC) hook
Date
Msg-id 480503CB.10407@sun.com
Whole thread Raw
In response to Re: WIP: Pg_upgrade - page layout converter (PLC) hook  (Heikki Linnakangas <heikki@enterprisedb.com>)
Responses Re: WIP: Pg_upgrade - page layout converter (PLC) hook  (Heikki Linnakangas <heikki@enterprisedb.com>)
List pgsql-hackers
Thanks for your comment. See my comments inline:

Heikki Linnakangas napsal(a):
> Zdenek Kotala wrote:
>> I attached patch which implemented page layout converter (PLC) hook. It
>> is base stone for in-place upgrade.
> 
> I agree it's an important piece of the puzzle, but not the most 
> complicated one by far. How will you deal with catalog changes for 
> example? I'm going to assume that you'll use pg_migrator for that, and 
> this page layout conversion is just the final step of the migration and 
> all the other things like the catalog, clog, config file etc. have 
> already been converted.

I'm sorry I forgot to link to my original proposal. 
http://archives.postgresql.org/pgsql-hackers/2007-07/msg00057.php

I agree is that it is easiest part, but it is base for my work and upgrade in 
place itself. Final state should have everything inside postgres include catalog 
conversion. New version should startup on old cluster and converted data "on the 
fly".

For testing I'm currently use my own ksh script which is similar to pg_migrator, 
because I need database with upgraded catalog. And you cannot upgrade catalog 
internally until you are not able to convert page layout. egg chicken problem

> I would suggest starting with putting some serious effort into 
> pg_migrator. This page layout conversion is actually pretty trivial 
> compared to all that, and even more so if you can get the page layout 
> conversion working in pg_migrator first.

By my opinion pg_migrator is good workaround but it is not suitable for real 
production. For example TOAST relid protection is dirty hack and you have 
problem with tablespaces and so on...

> Which versions do you plan to support, initially?

Currently I'm able to upgrade from 8.1, 8.2 to 8.3, 8.4. It means conversion 
between layout version 3 and 4. I'm verifying PLC now and when this part will be 
ready it is very easy to implement others as well.

>> How it works:
>>
>> When PLC module is loaded, then for each page which does not have 
>> native page version conversion routine is called. Buffer is mark as a 
>> dirty and upgraded page is inserted into WAL.
> 
> I would suggest delegating the WAL logging to the PLC module. In some 
> cases it's going to be just a matter of changing

I rejected this idea. PLC function should be used separately. Put WAL logging 
inside makes stronger dependency by my opinion.

>> Performance:
>>
>> I executed "select count(*) from table" on 2,2GB table (4671039 rows) 
>> (without any tunning) and with conversion 2033s (34min) and after 
>> conversion and server restart 31s (0,5min).
> 
> Wow, that's a big slowdown. Where's the time spent? Is it the extra I/O 
> of rewriting the table?

Not idea yet. I going to test it. But I guess that it is I/O problem. Each page 
is written twice.

>> Request for comments:
>>
>> 1) I not sure if calling log_newpage is correct.
>>
>>   a) Calling from storage something in access method seems to me as 
>> bad think. I'm thinking to move log_newpage to storage, but it invokes 
>> more question about placement, RM ...
> 
> Yeah, it probably should be moved somewhere else. We already use it not 
> only for heapam, but for indexes as well.

But question where is good place? Because it is related to whole page I suggest  to put it in smgr.c and under SGMR
RM.

>>   b) log_newpage is used for new page logging, but I use it for 
>> storing converted page. It seems to me that it safe and 
>> heap_xlog_newpage correctly works for new and converted page. I have 
>> only doubt about assert macro mdextend/mdwrite which checks extend 
>> vs.write.
> 
> That should be fine. In WAL replay, we don't distinguish between write 
> and extend.

But in this case the asserts macros in mdexted/mdwrite are odd and the should be 
removed.

>> 3) data structures version tracking
>>
>> For PLC I need to have old version of data structures like page 
>> header, tuple header and so on. It is also useful for external tools 
>> to handle more version of postgresql easily (e.g. pg_control should 
>> show data from all supported postgresql versions).
>>
>> My idea is to have for each structure version keep own header e.g. 
>> bufpage_03.h, bufpage_04.h ... which will contain typedef struct 
>> PageHeaderData_03 ... and generic bufpage.h  with following content:
>>
>> ...
>> #include "bufpage_04.h"
>> ...
>> typedef PageHeaderData_04 PageHeaderData;
>>
>> #define PageGetPageSize(page) PageGetPageSize_04(page)
>> ...
> 
> That's not enough, in general. There might be changes in other header 
> files that affect the page layout. Like the packed varlen change, which 
> affected c.h.

Yes, I know, tuple headers and so on. It is only example.

>> + /* Hook for page layout convertor plugin */
>> + typedef void (*plc_hook_type)(char *buffer);
>> + extern PGDLLIMPORT plc_hook_type plc_hook;
> 
> That's not flexible enough. We've changed the internal representations 
> of data types in the past, and will likely do that in the future. The 
> hook therefore needs to have at least the tuple descriptor, to know how 
> to convert the tuples. I would suggest passing Relation, we have that 
> available at the call site, and that should contain all the necessary 
> information.
> 

Good point, I thought about it. I currently don't use it but in the future it 
could be useful. I will extend it.
    thank Zdenek



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Re: [COMMITTERS] pgsql: Add pg_terminate_backend() to allow terminating only a single
Next
From: Decibel!
Date:
Subject: Re: Commit fest status