Re: [HACKERS] Last thoughts about LONG - Mailing list pgsql-hackers

From wieck@debis.com (Jan Wieck)
Subject Re: [HACKERS] Last thoughts about LONG
Date
Msg-id m11wrHL-0003kGC@orion.SAPserv.Hamburg.dsh.de
Whole thread Raw
In response to Re: [HACKERS] Last thoughts about LONG  (Peter Eisentraut <e99re41@DoCS.UU.SE>)
List pgsql-hackers
Peter Eisentraut wrote:

> Before I start the nagging, please be aware that I'm not as smart as I
> think I am. Long datatypes of some sort are clearly necessary -- more
> power to you.

    So  be  it. It forces me to think it over again and points to
    sections, I might have forgotten so  far.  Also,  it  happend
    more  than  one  time  to  me, that writing a totally OBVIOUS
    answer triggerd a better solution in my brain  (dunno  what's
    wrong  with  that  brain, but sometimes it needs to be shaken
    well before use).  Thus, any of your notes can help, and that
    counts!

>
> >     - A  new  field "rellongrelid" type Oid is added to pg_class.
> >       It contains the Oid  of  the  long-value  relation  or  the
> >       invalid Oid for those who have no LONG attributes.
>
> I have a mixed feeling about all these "sparse" fields everywhere. Doing
> it completely formally, this seems to be a one-to-many relation, so you
> should put the referencing field into the pg_long table or whatever
> structure you use, pointing the other way around. This is probably slower,
> but it's cleaner. As I mentioned earlier, this whole arrangement will
> (hopefully) not be needed for all too long, and then we wouldn't want to
> be stuck with it.

    It's  4 bytes per RELATION in pg_class. As a side effect, the
    information will be available at NO  COST  immediately  after
    heap_open() and in every place, where a relation is accessed.
    So it is the best place to put it.

>
> >     - At    CREATE   TABLE,   a   long   value   relation   named
> >       "_LONG<tablename>" is created for those tables who need it.
>
> Please don't forget, this would require changes to pg_dump and psql. Also,
> the COPY command might not be able to get away without changes, either.

    Oh yes, thanks. That was a point I forgot!

    Psql must not list tables that begin with "_LONG" on  the  \d
    request.  Anything else should IMHO be transparent.

    Pg_dump  either uses a SELECT to build a script that INSERT's
    the data via SQL, or uses COPY. In the SELECT/INSERT case, my
    implementation  would  again  be  totally transparent and not
    noticed  by  pg_dump,  only  that  it  must  IGNORE  "_LONG*"
    relations  and  be  aware that really big tuples can be sent,
    but that's more a libpq question  I  think  (what  I  already
    checked   because   the   view/rule/PL  combo  I  created  to
    demonstrate a >128K tuple  was  done  through  psql).  AFAIK,
    pg_dump  doesn't  use  a binary COPY, and looking at the code
    tells me that this is transparent too (due  to  use  of  type
    specific input/output function there).

    All  pg_dump would have to do is to ignore "_LONG*" relations
    too.

    The real problem is COPY. In the case of  a  COPY  BINARY  it
    outputs  the data portion of the fetched tuples directly. But
    these will only contain the LongData headers,  not  the  data
    itself.

    So  at  that  point,  COPY  has  to do the reverse process of
    heap_insert().  Rebuild a faked tuple where all the not  NULL
    LONG values are placed in the representation, they would have
    after type input.  Not a big deal, must only be done with the
    same  care  as  the  changes  in heapam not to leave unfreed,
    leaked memory around.

> In general, it wouldn't surprise me if some sections of the code would go
> nuts about the news of tuples longer than BLCKSZ coming along. (Where
> "nuts" is either 'truncation' or 'segfault'.)

    The place, where the size of a heap  tuple  only  is  checked
    (and  where  the "tuple size too big" message is coming from)
    is in hio.c, right before it is copied into the block. Up  to
    then, a tuple is NOT explicitly limited to any size.

    So  I would be glad to see crashes coming up from this change
    (not after release - during BETA of course). It would help us
    to get another existing bug out of the code.

> I guess what I'm really saying is that I'd be totally in awe of you if you
> could get all of this (and RI) done by Feb 1st. Good luck.

    Thank's for the flowers, but "awe" is far too much - sorry.

    During  the  years I had my hands on nearly every part of the
    code involved in this. So I'm not a newbe  in  creating  data
    types,  utility  commands  or doing syscat changes.  The LONG
    type I described will be the work of two or three nights.

    I already intended to tackle the long tuples  next.   Missing
    was the idea how to AVOID it simply. And I had this idea just
    while answering a question about storing big  text  files  in
    the database in the [SQL] list - that woke me up.

    In  contrast  to  the  RI stuff, this time I don't expect any
    bugs, because there are absolutely no side effects I  noticed
    so  far.   On  the  RI  stuff, we discussed for weeks (if not
    months) about tuple visibility during concurrent transactions
    and I finally ran into exactly these problems anyway.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] LONG
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] Last thoughts about LONG