Thread: pg_upgrade: convert on read is dead end

pg_upgrade: convert on read is dead end

From
Zdenek Kotala
Date:
We discussed many times what on-line storage upgrade approach is best. Two of 
them were still in the game. One is "read all formats, write new only" and 
second is "convert on read". The first one is general and complex solution which 
needs lot of changes in the code. Second one is not so intrusive, but it has 
another problems.

The main problem with second solution until yesterday was that data could 
overlap a new page size. Second much bigger problem which I found yesterday is 
TOAST table and TOASTed arrays and composite data types.

PostgreSQL stores all table's external data in one TOAST table and stored data 
does not contains any clue about datatype. When postgreSQL reads TOSTEed value 
then there is not possible detect what type it is and perform varlena conversion 
on composite datatypes or arrays.

It could be converted in detoast_datum function but it means that datum have to 
be retoasted and store back on new pages. The old page MUST keep in old format 
because any page conversion lost information about version and different 
datatypes can be store on one page PosgreSQL.

By my opinion, this issue completely kill convert on read solution and only 
"read all formats..." solution is right one.
Comments, ideas?
    Zdenek

-- 
Zdenek Kotala              Sun Microsystems
Prague, Czech Republic     http://sun.com/postgresql



Re: pg_upgrade: convert on read is dead end

From
Heikki Linnakangas
Date:
Zdenek Kotala wrote:
> PostgreSQL stores all table's external data in one TOAST table and 
> stored data does not contains any clue about datatype. When postgreSQL 
> reads TOSTEed value then there is not possible detect what type it is 
> and perform varlena conversion on composite datatypes or arrays.
> 
> It could be converted in detoast_datum function but it means that datum 
> have to be retoasted and store back on new pages. The old page MUST keep 
> in old format because any page conversion lost information about version 
> and different datatypes can be store on one page PosgreSQL.

Hmm. There's two things here:
1. The format of the toast page, and the toast chunk tuples.
2. The format of the toasted tuple, stored within the chunk tuples.

The format of the toast page and the varlena headers of the byteas in 
the toast tuples could be converted when the page is read in, as usual. 
The format of the toasted tuple within the chunks is indeed trickier. 
Retoasting all tuples on the "master" heap page when the page is read in 
is probably simplest, as you say. But they don't necessarily need to be 
stored on new pages, any toast pages will do, because we keep track of 
1. separately.

> By my opinion, this issue completely kill convert on read solution and 
> only "read all formats..." solution is right one.

It is quite heavy-weight, I agree, but doesn't "completely kill" the 
idea in my opinion.

Besides, the "read all formats" approach wouldn't really avoid it 
either. If you passed a toasted datum to a function, when the function 
needs to detoast it, detoast_datum still wouldn't know whether the datum 
is in old or new format. You'd still need to detoast all values in the 
tuple somewhere before they could be passed around.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: pg_upgrade: convert on read is dead end

From
Zdenek Kotala
Date:
Heikki Linnakangas napsal(a):
> Zdenek Kotala wrote:
>> PostgreSQL stores all table's external data in one TOAST table and 
>> stored data does not contains any clue about datatype. When postgreSQL 
>> reads TOSTEed value then there is not possible detect what type it is 
>> and perform varlena conversion on composite datatypes or arrays.
>>
>> It could be converted in detoast_datum function but it means that 
>> datum have to be retoasted and store back on new pages. The old page 
>> MUST keep in old format because any page conversion lost information 
>> about version and different datatypes can be store on one page PosgreSQL.
> 
> Hmm. There's two things here:
> 1. The format of the toast page, and the toast chunk tuples.
> 2. The format of the toasted tuple, stored within the chunk tuples.
> 
> The format of the toast page and the varlena headers of the byteas in 
> the toast tuples could be converted when the page is read in, as usual. 

Convert page and chunk varlena header is not problem. It works. You can get into 
trouble when new max chunk size is smaller, but it is not current case.

> The format of the toasted tuple within the chunks is indeed trickier. 
> Retoasting all tuples on the "master" heap page when the page is read in 
> is probably simplest, as you say. But they don't necessarily need to be 
> stored on new pages, any toast pages will do, because we keep track of 
> 1. separately.

You can store it on any page in the new format with enough free space, but in 
reality it will be mostly new page. You cannot convert old page, because it 
could contain chunk from different tuple. You could check it and converted it 
only if all chunk are related to one datum, but in general you need to have 
mechanism how solve problem with multi datum chunks.


>> By my opinion, this issue completely kill convert on read solution and 
>> only "read all formats..." solution is right one.
> 
> It is quite heavy-weight, I agree, but doesn't "completely kill" the 
> idea in my opinion.
> 
> Besides, the "read all formats" approach wouldn't really avoid it 
> either. If you passed a toasted datum to a function, when the function 
> needs to detoast it, detoast_datum still wouldn't know whether the datum 
> is in old or new format. You'd still need to detoast all values in the 
> tuple somewhere before they could be passed around.

If you look into pg_upgrade prototype patch I added page version information 
into HeapTupleData structure. It keeps information about format. All chunked 
data are stored on a pages with same page versions. I think these two things are 
enough to have all necessary information.
    Zdenek


-- 
Zdenek Kotala              Sun Microsystems
Prague, Czech Republic     http://sun.com/postgresql



Re: pg_upgrade: convert on read is dead end

From
Heikki Linnakangas
Date:
Zdenek Kotala wrote:
> Heikki Linnakangas napsal(a):
>> Besides, the "read all formats" approach wouldn't really avoid it 
>> either. If you passed a toasted datum to a function, when the function 
>> needs to detoast it, detoast_datum still wouldn't know whether the 
>> datum is in old or new format. You'd still need to detoast all values 
>> in the tuple somewhere before they could be passed around.
> 
> If you look into pg_upgrade prototype patch I added page version 
> information into HeapTupleData structure. It keeps information about 
> format. All chunked data are stored on a pages with same page versions. 
> I think these two things are enough to have all necessary information.

Functions are passed just a Datum, not HeapTupleData.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: pg_upgrade: convert on read is dead end

From
Zdenek Kotala
Date:
Heikki Linnakangas napsal(a):
> Zdenek Kotala wrote:
>> Heikki Linnakangas napsal(a):
>>> Besides, the "read all formats" approach wouldn't really avoid it 
>>> either. If you passed a toasted datum to a function, when the 
>>> function needs to detoast it, detoast_datum still wouldn't know 
>>> whether the datum is in old or new format. You'd still need to 
>>> detoast all values in the tuple somewhere before they could be passed 
>>> around.
>>
>> If you look into pg_upgrade prototype patch I added page version 
>> information into HeapTupleData structure. It keeps information about 
>> format. All chunked data are stored on a pages with same page 
>> versions. I think these two things are enough to have all necessary 
>> information.
> 
> Functions are passed just a Datum, not HeapTupleData.
> 

But executor works with HeapTupleData structure and it should convert it to the 
new format before it passes it to function. My idea is to convert tuple in 
ExecTupleStore or invent new node special for tuple conversion.

I expect that function cannot get toast pointer. It can get only detoasted 
attributes. Correct me if I'm wrong.        Zdenek

-- 
Zdenek Kotala              Sun Microsystems
Prague, Czech Republic     http://sun.com/postgresql



Re: pg_upgrade: convert on read is dead end

From
Heikki Linnakangas
Date:
Zdenek Kotala wrote:
> I expect that function cannot get toast pointer. It can get only 
> detoasted attributes. Correct me if I'm wrong.

A function is passed a raw datum, which can be toasted. Now, practically 
all functions call one of the PG_GETARG_* macros that detoast the datum 
before doing anything else, but there is also PG_GETARG_RAW_VARLENA_P 
macro that returns the toast pointer. Although, grepping through the 
source code suggests that it's not actually used anywhere.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com