Re: [HACKERS] Last thoughts about LONG - Mailing list pgsql-hackers
From | wieck@debis.com (Jan Wieck) |
---|---|
Subject | Re: [HACKERS] Last thoughts about LONG |
Date | |
Msg-id | m11wrHL-0003kGC@orion.SAPserv.Hamburg.dsh.de Whole thread Raw |
In response to | Re: [HACKERS] Last thoughts about LONG (Peter Eisentraut <e99re41@DoCS.UU.SE>) |
List | pgsql-hackers |
Peter Eisentraut wrote: > Before I start the nagging, please be aware that I'm not as smart as I > think I am. Long datatypes of some sort are clearly necessary -- more > power to you. So be it. It forces me to think it over again and points to sections, I might have forgotten so far. Also, it happend more than one time to me, that writing a totally OBVIOUS answer triggerd a better solution in my brain (dunno what's wrong with that brain, but sometimes it needs to be shaken well before use). Thus, any of your notes can help, and that counts! > > > - A new field "rellongrelid" type Oid is added to pg_class. > > It contains the Oid of the long-value relation or the > > invalid Oid for those who have no LONG attributes. > > I have a mixed feeling about all these "sparse" fields everywhere. Doing > it completely formally, this seems to be a one-to-many relation, so you > should put the referencing field into the pg_long table or whatever > structure you use, pointing the other way around. This is probably slower, > but it's cleaner. As I mentioned earlier, this whole arrangement will > (hopefully) not be needed for all too long, and then we wouldn't want to > be stuck with it. It's 4 bytes per RELATION in pg_class. As a side effect, the information will be available at NO COST immediately after heap_open() and in every place, where a relation is accessed. So it is the best place to put it. > > > - At CREATE TABLE, a long value relation named > > "_LONG<tablename>" is created for those tables who need it. > > Please don't forget, this would require changes to pg_dump and psql. Also, > the COPY command might not be able to get away without changes, either. Oh yes, thanks. That was a point I forgot! Psql must not list tables that begin with "_LONG" on the \d request. Anything else should IMHO be transparent. Pg_dump either uses a SELECT to build a script that INSERT's the data via SQL, or uses COPY. In the SELECT/INSERT case, my implementation would again be totally transparent and not noticed by pg_dump, only that it must IGNORE "_LONG*" relations and be aware that really big tuples can be sent, but that's more a libpq question I think (what I already checked because the view/rule/PL combo I created to demonstrate a >128K tuple was done through psql). AFAIK, pg_dump doesn't use a binary COPY, and looking at the code tells me that this is transparent too (due to use of type specific input/output function there). All pg_dump would have to do is to ignore "_LONG*" relations too. The real problem is COPY. In the case of a COPY BINARY it outputs the data portion of the fetched tuples directly. But these will only contain the LongData headers, not the data itself. So at that point, COPY has to do the reverse process of heap_insert(). Rebuild a faked tuple where all the not NULL LONG values are placed in the representation, they would have after type input. Not a big deal, must only be done with the same care as the changes in heapam not to leave unfreed, leaked memory around. > In general, it wouldn't surprise me if some sections of the code would go > nuts about the news of tuples longer than BLCKSZ coming along. (Where > "nuts" is either 'truncation' or 'segfault'.) The place, where the size of a heap tuple only is checked (and where the "tuple size too big" message is coming from) is in hio.c, right before it is copied into the block. Up to then, a tuple is NOT explicitly limited to any size. So I would be glad to see crashes coming up from this change (not after release - during BETA of course). It would help us to get another existing bug out of the code. > I guess what I'm really saying is that I'd be totally in awe of you if you > could get all of this (and RI) done by Feb 1st. Good luck. Thank's for the flowers, but "awe" is far too much - sorry. During the years I had my hands on nearly every part of the code involved in this. So I'm not a newbe in creating data types, utility commands or doing syscat changes. The LONG type I described will be the work of two or three nights. I already intended to tackle the long tuples next. Missing was the idea how to AVOID it simply. And I had this idea just while answering a question about storing big text files in the database in the [SQL] list - that woke me up. In contrast to the RI stuff, this time I don't expect any bugs, because there are absolutely no side effects I noticed so far. On the RI stuff, we discussed for weeks (if not months) about tuple visibility during concurrent transactions and I finally ran into exactly these problems anyway. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
pgsql-hackers by date: