Re: [HACKERS] LONG - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: [HACKERS] LONG |
Date | |
Msg-id | 199912112325.SAA13157@candle.pha.pa.us Whole thread Raw |
In response to | Re: [HACKERS] LONG (wieck@debis.com (Jan Wieck)) |
List | pgsql-hackers |
> Bruce Momjian wrote: > > > > While this is great and all, what will happen when long tuples finally get > > > done? Will you remove this, or keep it, or just make LONG and TEXT > > > equivalent? I fear that elaborate structures will be put in place here > > > that might perhaps only be of use for one release cycle. > > > > I think the idea is that Jan's idea is better than chaining tuples. > > Just as Tom already pointed out, it cannot completely replace > tuple chaining because of the atomicy assumption of single > fsync(2) operation in current code. Due to this, we cannot > get around the cases LONG will leave open by simply raising > BLKSIZE, we instead need to tackle that anyways. Actually, in looking at the fsync() system call, it does write the entire file descriptor before marking the transaction as complete, so there is no hard reason not to raise it, but because the OS has to do two reads to get 16k, I think we are better keeping 8k as our base block size. Jan's idea is not to chain tuples, but to keep tuples at 8k, and instead chain out individual fields into 8k tuple chunks, as needed. This seems like it makes much more sense. It uses the database to recreate the chains. Let me mention a few things. First, I would like to avoid a LONG data type if possible. Seems a new data type is just going to make things more confusing for users. My ideas is a much more limited one than Jan's. It is to have a special -1 varlena length when the data is chained on the long relation. I would do: -1|oid|attno in 12 bytes. That way, you can pass this around as long as you want, and just expand it in the varlena textout and compare routines when you need the value. That prevents the tuples from changing size while being processed. As far as I remember, there is no need to see the data in the tuple except in the type comparison/output routines. Now it would be nice if we could set the varlena length to 12, it's actual length, and then just somehow know that the varlena of 12 was a long data entry. Our current varlena has a maximum length of 64k. I wonder if we should grab a high bit of that to trigger long. I think we may be able to do that, and just do a AND mask to remove the bit to see the length. We don't need the high bit because our varlena's can't be over 32k. We can modify VARSIZE to strip it off, and make another macro like ISLONG to check for that high bit. Seems this could be done with little code. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
pgsql-hackers by date: