Re: [HACKERS] LONG - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: [HACKERS] LONG
Date
Msg-id 199912112325.SAA13157@candle.pha.pa.us
Whole thread Raw
In response to Re: [HACKERS] LONG  (wieck@debis.com (Jan Wieck))
List pgsql-hackers
> Bruce Momjian wrote:
> 
> > > While this is great and all, what will happen when long tuples finally get
> > > done? Will you remove this, or keep it, or just make LONG and TEXT
> > > equivalent? I fear that elaborate structures will be put in place here
> > > that might perhaps only be of use for one release cycle.
> >
> > I think the idea is that Jan's idea is better than chaining tuples.
> 
>     Just as Tom already pointed out, it cannot completely replace
>     tuple chaining because of the atomicy  assumption  of  single
>     fsync(2)  operation  in  current code. Due to this, we cannot
>     get around the cases LONG will leave open by  simply  raising
>     BLKSIZE, we instead need to tackle that anyways.

Actually, in looking at the fsync() system call, it does write the
entire file descriptor before marking the transaction as complete, so
there is no hard reason not to raise it, but because the OS has to do
two reads to get 16k, I think we are better keeping 8k as our base block
size.

Jan's idea is not to chain tuples, but to keep tuples at 8k, and instead
chain out individual fields into 8k tuple chunks, as needed.  This seems
like it makes much more sense.  It uses the database to recreate the
chains.

Let me mention a few things.  First, I would like to avoid a LONG data
type if possible.  Seems a new data type is just going to make things
more confusing for users.

My ideas is a much more limited one than Jan's.  It is to have a special
-1 varlena length when the data is chained on the long relation.  I
would do:

-1|oid|attno

in 12 bytes.  That way, you can pass this around as long as you want,
and just expand it in the varlena textout and compare routines when you
need the value.  That prevents the tuples from changing size while being
processed.  As far as I remember, there is no need to see the data in
the tuple except in the type comparison/output routines.

Now it would be nice if we could set the varlena length to 12, it's
actual length, and then just somehow know that the varlena of 12 was a
long data entry.  Our current varlena has a maximum length of 64k.  I
wonder if we should grab a high bit of that to trigger long.  I think we
may be able to do that, and just do a AND mask to remove the bit to see
the length.  We don't need the high bit because our varlena's can't be
over 32k.  We can modify VARSIZE to strip it off, and make another
macro like ISLONG to check for that high bit.

Seems this could be done with little code.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


pgsql-hackers by date:

Previous
From: wieck@debis.com (Jan Wieck)
Date:
Subject: Re: [HACKERS] LONG
Next
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] LONG