Thread: WAL, xl_heap_insert and tuple oid mystry
Hi, Suppose I create a table as follows: CREATE TABLE test2 (name TEXT, age INTEGER) WITH oids; Now, for every tuple in this table is associated with a unique oid, which I can retrieve by: SELECT oid, name, age FROM test2; which works great. So far so good. Now, if look at the corresponding WAL entry for any insert into this relation, it creates the following WAL entry which Ican decode. XLogRecord ---> Followed by --> xl_heap_insert struct -->followed by --> xl_heap_header --> Followed by --> tuple data If I use the macro HeapTupleHeaderGetOid([xl_heap_header struct]) or equivalently if I test ([xl_heap_header struct]->t_infomask& HEAP_HASOID), it tells me that the tuple oid is not stored with this record. Where is the OID of tuple stored in a WAL record of a tuple? If not with xl_heap_header, where is it stored? Is it storedat all? Thanks for any responses. Jagan
On 11.04.2011 23:35, jagan wrote: > Hi, > Suppose I create a table as follows: > > CREATE TABLE test2 (name TEXT, age INTEGER) WITH oids; > > Now, for every tuple in this table is associated with a unique oid, which I can retrieve by: > > SELECT oid, name, age FROM test2; > > which works great. So far so good. > > Now, if look at the corresponding WAL entry for any insert into this relation, it creates the following WAL entry whichI can decode. > > XLogRecord ---> Followed by --> xl_heap_insert struct -->followed by --> xl_heap_header --> Followed by --> tupledata > > If I use the macro HeapTupleHeaderGetOid([xl_heap_header struct]) Yeah, that won't work because xl_heap_header is not a HeapTupleHeader. > or equivalently if I test ([xl_heap_header struct]->t_infomask& HEAP_HASOID), it tells me that the tuple oid is not storedwith this record. That's strange, the flag should be set. > Where is the OID of tuple stored in a WAL record of a tuple? If not with xl_heap_header, where is it stored? Is it storedat all? It's stored in the "tuple data" portion. See the code that writes the WAL record, here: http://git.postgresql.org/gitweb?p=postgresql.git;a=blob;f=src/backend/access/heap/heapam.c;h=1fbd8b39b4a73f6aee337fccf2dcce548d0850bb;hb=HEAD#l1987 Maybe the WAL record you're looking at is a full-page image? A record with a full-page image includes a verbatim copy of the page, and the individual tuple is omitted in that case. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Hi, >> Where is the OID of tuple stored in a WAL record of a >> tuple? If not with xl_heap_header, where is it stored? Is it >> stored at all? > It's stored in the "tuple data" portion. Is the OID also recorded with xl_heap_delete record as well or just the xl_heaptid? From my reading of the code it is notrecorded. In general, why is OID of a tuple relegated as just another tuple data, when it can replace xl_heaptid as a much more stabletuple identifier. I understand that by recording tuples in terms of xl_heaptid (i.e., page/offset) means that the dbcan recover quickly but isn't there some value to storing a much "higher level" information in the WAL logs? For example,tuple manipulations, if recorded in terms of manipulation on tuple oids, can support applications such as these: http://blogs.enterprisedb.com/2011/03/03/more-musings-on-logical-replication/ Just a thought. Thanks, Jagan
On 12.04.2011 19:42, jagan wrote: >>> Where is the OID of tuple stored in a WAL record of a >>> tuple? If not with xl_heap_header, where is it stored? Is it >>> stored at all? > > > It's stored in the "tuple data" portion. > > Is the OID also recorded with xl_heap_delete record as well or just the xl_heaptid? From my reading of the code it is notrecorded. Correct, it's not. > In general, why is OID of a tuple relegated as just another tuple data, when it can replace xl_heaptid as a much more stabletuple identifier. I understand that by recording tuples in terms of xl_heaptid (i.e., page/offset) means that the dbcan recover quickly but isn't there some value to storing a much "higher level" information in the WAL logs? For example,tuple manipulations, if recorded in terms of manipulation on tuple oids, can support applications such as these: > > http://blogs.enterprisedb.com/2011/03/03/more-musings-on-logical-replication/ Possibly. On the other hand, another common complaint is that PostgreSQL writes a lot of WAL, much more than other DBMSs. Adding more higher-level information to WAL records would make that issue even worse. At the moment we only include the minimum amount of information needed for recovery. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, Apr 12, 2011 at 7:55 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Maybe the WAL record you're looking at is a full-page image? A record with a > full-page image includes a verbatim copy of the page, and the individual > tuple is omitted in that case. > It is? I thought full page images were only in addition to, not instead of any data in the record. Doesn't pg_lesslog depend on this? -- greg
Greg Stark <gsstark@mit.edu> writes: > On Tue, Apr 12, 2011 at 7:55 AM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> Maybe the WAL record you're looking at is a full-page image? A record with a >> full-page image includes a verbatim copy of the page, and the individual >> tuple is omitted in that case. > It is? I thought full page images were only in addition to, not > instead of any data in the record. No, that's not the case. > Doesn't pg_lesslog depend on this? One hopes not. regards, tom lane
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > On 12.04.2011 19:42, jagan wrote: >> In general, why is OID of a tuple relegated as just another tuple >> data, when it can replace xl_heaptid as a much more stable tuple >> identifier. > Possibly. On the other hand, another common complaint is that PostgreSQL > writes a lot of WAL, much more than other DBMSs. Adding more > higher-level information to WAL records would make that issue even > worse. Actually, the real problem with that proposal is that it assumes that the OID (a) exists and (b) is unique, neither of which is guaranteed. Especially not when you consider that WAL log entries have to correctly distinguish which MVCC version of a tuple is being modified. regards, tom lane
(2011/04/13 8:34), Tom Lane wrote: > Greg Stark<gsstark@mit.edu> writes: >> Doesn't pg_lesslog depend on this? > > One hopes not. AFAIK it's safe because pg_lesslog removes full-page image only when it has enough information for substitute incremental log. For example of XLOG_HEAP_INSERT, pg_lesslog generates incremental logs such as xl_heap_header and tuple image from the inserted tuple in the full-page image, and replaces full-page image with them. Regards, -- Shigeru Hanada
Hi, > > > Where is the OID of tuple stored in a WAL record of a > tuple? If not with xl_heap_header, where is it stored? Is it > stored at all? > > It's stored in the "tuple data" portion. I see it now. I was having alignment issues, which I resolved. Thanks for your help. I am still not sure if I understandhow attributes are packed in to a buffer. In particular, according to the inline comments in code, this is howa xl_heap_insert stores data: Oid + [padding bits] + [data] Now, to unpack the data, we would have to know what type of data is stored, which is obtained from pg_type. Different typesrequire either a "no (char) align", "short align", "int aligned" or "double align". I went over the varattrib code andI am also able to unpack varchar, text and other variable length types. The code I wrote works fine but runs into alignment issues with some of the records. So, I am still missing something eventhough I have spent weeks reading postgres code. I notice the following: 1. Oid starts at MAXALIGNed, which is fine. 2. The subsequent attribute starts right away. There never seems to be padding even if the subsequent tuple needs to be doublealigned. Hence, I skip aligning the first attribute by default, which is bizarre but works. 3. When unpacking goes out of alignment, I print the buffer see if this is because they are externally stored. That is notthe case as I can see text fields are actually stored in the buffer. 4. The alignment goes out of whack even for entries with no varattrib for certain tuples. In general, what is the best way to unpack buffers containing tuples? I came across "heap_form_tuple" but not sure if thatis most optimal way to go about. Is there some documentation on how tuples are internally stored? Thanks in advance for any pointers. Jagan
Excerpts from jagan's message of jue abr 14 17:30:17 -0300 2011: > In general, what is the best way to unpack buffers containing tuples? > I came across "heap_form_tuple" but not sure if that is most optimal > way to go about. Is there some documentation on how tuples are > internally stored? Unpack buffers containing tuples? You don't do that. If you have a page that comes from a table, you grab the tuple offset from the line pointer, and start decoding there (probably heap_deform_tuple is what you need; note you need a tuple descriptor). If you have something that comes from a WAL record, I don't think that's considered unpackable directly -- you have to copy it elsewhere first. I might be wrong about this though. What is the use case for this tool you're writing? -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Hi, > > In general, what is the best way to unpack buffers > > containing tuples? > Unpack buffers containing tuples? You don't do > that. If you have a page that comes from a table, you grab the tuple offset > from the line pointer, and start decoding there (probably > heap_deform_tuple is what you need; note you need a tuple descriptor). Thanks for the pointer. I will try that route of recreating tuple descriptors and using heap_deform_tuple, although I suspect that "heap_deform_tuple" cannot be used by a stand-alone application (as it is not externally visible). So, I will have to basically rewrite parts of it. I am hoping to build something that is robust that makes use of as much of the existing unpacking code as possible. Any additional pointers you can give me would be much appreciated. > What is the use case for this tool you're writing? I am trying to build a WAL dumper and hoping to resurrect xlogdump along the way. Jagan
On Fri, Apr 15, 2011 at 6:29 PM, jagan <jaganrvce@yahoo.com> wrote: > Thanks for the pointer. I will try that route of recreating tuple > descriptors and using heap_deform_tuple, although I suspect that "heap_deform_tuple" > cannot be used by a stand-alone application (as it is not externally visible). > So, I will have to basically rewrite parts of it. > I am hoping to build something that is robust that makes use of as much > of the existing unpacking code as possible. Any additional > pointers you can give me would be much appreciated. > heap_deform_tuple isn't very complex itself, and doesn't have any external dependencies aside from the macros in postgres.h. However it *does* depend on having a tuple descriptor. This is the data structure which says what columns to expect and what types they are. You actually only need to know their sizes and whether they're variable size, but that in the server that comes from their type. The difficulty doing this from wal is that there's nowhere to get this information. If you get it from the live database the structure may not match the structure that was in place at the time the WAL was written. -- greg