Re: On-disk Tuple Size - Mailing list pgsql-hackers

From Tom Lane
Subject Re: On-disk Tuple Size
Date
Msg-id 14279.1019416241@sss.pgh.pa.us
Whole thread Raw
In response to Re: On-disk Tuple Size  (Curt Sampson <cjs@cynic.net>)
Responses Re: On-disk Tuple Size  (Curt Sampson <cjs@cynic.net>)
List pgsql-hackers
Curt Sampson <cjs@cynic.net> writes:
> Yes, this uses a bit more CPU, but I think it's going to be a pretty
> trivial amount. It's a short list, and since you're touching the data
> anyway, it's going to be in the CPU cache. The real cost you'll pay is
> in the time to access the area of memory where you're storing the sorted
> list of line pointers. But the potential saving here is up to 5% in I/O
> costs (due to using less disk space).

At this point you're essentially arguing that it's faster to recompute
the list of item sizes than it is to read it off disk.  Given that the
recomputation would require sorting the list of item locations (with
up to a couple hundred entries --- more than that if blocksize > 8K)
I'm not convinced of that.

Another difficulty is that we'd lose the ability to record item sizes
to the exact byte.  What we'd reconstruct from the item locations are
sizes rounded up to the next MAXALIGN boundary.  I am not sure that
this is a problem, but I'm not sure it's not either.

The part of this idea that I actually like is overlapping the status
bits with the low order part of the item location, using the assumption
that MAXALIGN is at least 4.  That would allow us to support BLCKSZ up
to 64K, and probably save a cycle or two in fetching/storing the item
fields as well.  The larger BLCKSZ limit isn't nearly as desirable
as it used to be, because of TOAST, and in fact it could be a net loser
because of increased WAL traffic.  But it'd be interesting to try it
and see.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Curt Sampson
Date:
Subject: Re: Schema (namespace) privilege details
Next
From: Curt Sampson
Date:
Subject: Re: On-disk Tuple Size