Thread: WAL, xl_heap_insert and tuple oid mystry

WAL, xl_heap_insert and tuple oid mystry

From
jagan
Date:
Hi,
Suppose I create a table as follows:

CREATE TABLE test2 (name TEXT, age INTEGER) WITH oids;

Now, for every tuple in this table is associated with a unique oid, which I can retrieve by:

SELECT oid, name, age FROM test2;

which works great. So far so good.

Now, if look at the corresponding WAL entry for any insert into this relation, it creates the following WAL entry which
Ican decode. 

XLogRecord ---> Followed by --> xl_heap_insert struct -->followed by --> xl_heap_header --> Followed by --> tuple data

If I use the macro  HeapTupleHeaderGetOid([xl_heap_header struct]) or equivalently if I test ([xl_heap_header
struct]->t_infomask& HEAP_HASOID), it tells me that the tuple oid is not stored with this record. 

Where is the OID of tuple stored in a WAL record of a tuple? If not with xl_heap_header, where is it stored? Is it
storedat all? 

Thanks for any responses.  
Jagan



Re: WAL, xl_heap_insert and tuple oid mystry

From
Heikki Linnakangas
Date:
On 11.04.2011 23:35, jagan wrote:
> Hi,
> Suppose I create a table as follows:
>
> CREATE TABLE test2 (name TEXT, age INTEGER) WITH oids;
>
> Now, for every tuple in this table is associated with a unique oid, which I can retrieve by:
>
> SELECT oid, name, age FROM test2;
>
> which works great. So far so good.
>
> Now, if look at the corresponding WAL entry for any insert into this relation, it creates the following WAL entry
whichI can decode.
 
>
> XLogRecord --->  Followed by -->  xl_heap_insert struct -->followed by -->  xl_heap_header -->  Followed by -->
tupledata
 
>
> If I use the macro  HeapTupleHeaderGetOid([xl_heap_header struct])

Yeah, that won't work because xl_heap_header is not a HeapTupleHeader.

> or equivalently if I test ([xl_heap_header struct]->t_infomask&  HEAP_HASOID), it tells me that the tuple oid is not
storedwith this record.
 

That's strange, the flag should be set.

> Where is the OID of tuple stored in a WAL record of a tuple? If not with xl_heap_header, where is it stored? Is it
storedat all?
 

It's stored in the "tuple data" portion. See the code that writes the 
WAL record, here:


http://git.postgresql.org/gitweb?p=postgresql.git;a=blob;f=src/backend/access/heap/heapam.c;h=1fbd8b39b4a73f6aee337fccf2dcce548d0850bb;hb=HEAD#l1987

Maybe the WAL record you're looking at is a full-page image? A record 
with a full-page image includes a verbatim copy of the page, and the 
individual tuple is omitted in that case.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: WAL, xl_heap_insert and tuple oid mystry

From
jagan
Date:
Hi,

>>  Where is the OID of tuple stored in a WAL record of a
>>  tuple? If not with xl_heap_header, where is it stored? Is it
>>  stored at all?
   > It's stored in the "tuple data" portion. 

Is the OID also recorded with xl_heap_delete record as well or just the xl_heaptid? From my reading of the code it is
notrecorded. 
 

In general, why is OID of a tuple relegated as just another tuple data, when it can replace xl_heaptid as a much more
stabletuple identifier. I understand that by recording tuples in terms of xl_heaptid (i.e., page/offset) means that the
dbcan recover quickly but isn't there some value to storing a much "higher level" information in the WAL logs? For
example,tuple manipulations, if recorded in terms of manipulation on tuple oids, can support applications such as
these:

http://blogs.enterprisedb.com/2011/03/03/more-musings-on-logical-replication/

Just a thought.

Thanks,
Jagan


Re: WAL, xl_heap_insert and tuple oid mystry

From
Heikki Linnakangas
Date:
On 12.04.2011 19:42, jagan wrote:
>>>   Where is the OID of tuple stored in a WAL record of a
>>>   tuple? If not with xl_heap_header, where is it stored? Is it
>>>   stored at all?
>
>      >  It's stored in the "tuple data" portion.
>
> Is the OID also recorded with xl_heap_delete record as well or just the xl_heaptid? From my reading of the code it is
notrecorded.
 

Correct, it's not.

> In general, why is OID of a tuple relegated as just another tuple data, when it can replace xl_heaptid as a much more
stabletuple identifier. I understand that by recording tuples in terms of xl_heaptid (i.e., page/offset) means that the
dbcan recover quickly but isn't there some value to storing a much "higher level" information in the WAL logs? For
example,tuple manipulations, if recorded in terms of manipulation on tuple oids, can support applications such as
these:
>
> http://blogs.enterprisedb.com/2011/03/03/more-musings-on-logical-replication/

Possibly. On the other hand, another common complaint is that PostgreSQL 
writes a lot of WAL, much more than other DBMSs. Adding more 
higher-level information to WAL records would make that issue even 
worse. At the moment we only include the minimum amount of information 
needed for recovery.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: WAL, xl_heap_insert and tuple oid mystry

From
Greg Stark
Date:
On Tue, Apr 12, 2011 at 7:55 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Maybe the WAL record you're looking at is a full-page image? A record with a
> full-page image includes a verbatim copy of the page, and the individual
> tuple is omitted in that case.
>

It is? I thought full page images were only in addition to, not
instead of any data in the record. Doesn't pg_lesslog depend on this?

-- 
greg


Re: WAL, xl_heap_insert and tuple oid mystry

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> On Tue, Apr 12, 2011 at 7:55 AM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> Maybe the WAL record you're looking at is a full-page image? A record with a
>> full-page image includes a verbatim copy of the page, and the individual
>> tuple is omitted in that case.

> It is? I thought full page images were only in addition to, not
> instead of any data in the record.

No, that's not the case.

> Doesn't pg_lesslog depend on this?

One hopes not.
        regards, tom lane


Re: WAL, xl_heap_insert and tuple oid mystry

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> On 12.04.2011 19:42, jagan wrote:
>> In general, why is OID of a tuple relegated as just another tuple
>> data, when it can replace xl_heaptid as a much more stable tuple
>> identifier.

> Possibly. On the other hand, another common complaint is that PostgreSQL 
> writes a lot of WAL, much more than other DBMSs. Adding more 
> higher-level information to WAL records would make that issue even 
> worse.

Actually, the real problem with that proposal is that it assumes that
the OID (a) exists and (b) is unique, neither of which is guaranteed.
Especially not when you consider that WAL log entries have to correctly
distinguish which MVCC version of a tuple is being modified.
        regards, tom lane


Re: WAL, xl_heap_insert and tuple oid mystry

From
Shigeru Hanada
Date:
(2011/04/13 8:34), Tom Lane wrote:
> Greg Stark<gsstark@mit.edu>  writes:
>> Doesn't pg_lesslog depend on this?
> 
> One hopes not.

AFAIK it's safe because pg_lesslog removes full-page image only when it
has enough information for substitute incremental log.

For example of XLOG_HEAP_INSERT, pg_lesslog generates incremental logs
such as xl_heap_header and tuple image from the inserted tuple in the
full-page image, and replaces full-page image with them.

Regards,
-- 
Shigeru Hanada


Re: WAL, xl_heap_insert and tuple oid mystry

From
jagan
Date:
Hi,

> 
> > Where is the OID of tuple stored in a WAL record of a
> tuple? If not with xl_heap_header, where is it stored? Is it
> stored at all?
> 
> It's stored in the "tuple data" portion. 

I see it now. I was having alignment issues, which I resolved. Thanks for your help. I am still not sure if I
understandhow attributes are packed in to a buffer. In particular, according to the inline comments in code, this is
howa xl_heap_insert stores data:
 

Oid + [padding bits] + [data]

Now, to unpack the data, we would have to know what type of data is stored, which is obtained from pg_type. Different
typesrequire either a "no (char) align", "short align", "int aligned" or "double align". I went over the varattrib code
andI am also able to unpack varchar, text and other variable length types. 
 

The code I wrote works fine but runs into alignment issues with some of the records. So, I am still missing something
eventhough I have spent weeks reading postgres code.
 

I notice the following:

1. Oid starts at MAXALIGNed, which is fine.

2. The subsequent attribute starts right away. There never seems to be padding even if the subsequent tuple needs to be
doublealigned. Hence, I skip aligning the first attribute by default, which is bizarre but works.
 

3. When unpacking goes out of alignment, I print the buffer see if this is because they are externally stored. That is
notthe case as I can see text fields are actually stored in the buffer.
 

4. The alignment goes out of whack even for entries with no varattrib for certain tuples. 

In general, what is the best way to unpack buffers containing tuples? I came across "heap_form_tuple" but not sure if
thatis most optimal way to go about. Is there some documentation on how tuples are internally stored? 
 

Thanks in advance for any pointers.
Jagan



Re: WAL, xl_heap_insert and tuple oid mystry

From
Alvaro Herrera
Date:
Excerpts from jagan's message of jue abr 14 17:30:17 -0300 2011:

> In general, what is the best way to unpack buffers containing tuples?
> I came across "heap_form_tuple" but not sure if that is most optimal
> way to go about. Is there some documentation on how tuples are
> internally stored? 

Unpack buffers containing tuples?  You don't do that.  If you have a
page that comes from a table, you grab the tuple offset from the line
pointer, and start decoding there (probably heap_deform_tuple is what
you need; note you need a tuple descriptor).  If you have something that
comes from a WAL record, I don't think that's considered unpackable
directly -- you have to copy it elsewhere first.  I might be wrong about
this though.

What is the use case for this tool you're writing?

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: WAL, xl_heap_insert and tuple oid mystry

From
jagan
Date:
Hi,

> > In general, what is the best way to unpack buffers
> > containing tuples?
> Unpack buffers containing tuples?  You don't do
> that.  If you have a page that comes from a table, you grab the tuple offset
> from the line pointer, and start decoding there (probably
> heap_deform_tuple is what you need; note you need a tuple descriptor). 


Thanks for the pointer. I will try that route of recreating tuple
descriptors and using heap_deform_tuple, although I suspect that "heap_deform_tuple"
cannot be used by a stand-alone application (as it is not externally visible).
So, I will have to basically rewrite parts of it.
I am hoping to build something that is robust that makes use of as much
of the existing unpacking code as possible. Any additional
pointers you can give me would be much appreciated.


> What is the use case for this tool you're writing?

I am trying to build a WAL dumper and hoping to resurrect xlogdump along the way.

Jagan


Re: WAL, xl_heap_insert and tuple oid mystry

From
Greg Stark
Date:
On Fri, Apr 15, 2011 at 6:29 PM, jagan <jaganrvce@yahoo.com> wrote:
> Thanks for the pointer. I will try that route of recreating tuple
> descriptors and using heap_deform_tuple, although I suspect that "heap_deform_tuple"
> cannot be used by a stand-alone application (as it is not externally visible).
> So, I will have to basically rewrite parts of it.
> I am hoping to build something that is robust that makes use of as much
> of the existing unpacking code as possible. Any additional
> pointers you can give me would be much appreciated.
>

heap_deform_tuple isn't very complex itself, and doesn't have any
external dependencies aside from the macros in postgres.h. However it
*does* depend on having a tuple descriptor. This is the data structure
which says what columns to expect and what types they are. You
actually only need to know their sizes and whether they're variable
size, but that in the server that comes from their type.

The difficulty doing this from wal is that there's nowhere to get this
information. If you get it from the live database the structure may
not match the structure that was in place at the time the WAL was
written.

-- 
greg