Re: Reducing the overhead of NUMERIC data - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Reducing the overhead of NUMERIC data
Date
Msg-id 1131025786.8300.1911.camel@localhost.localdomain
Whole thread Raw
In response to Re: Reducing the overhead of NUMERIC data  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: Reducing the overhead of NUMERIC data  (Martijn van Oosterhout <kleptog@svana.org>)
Re: Reducing the overhead of NUMERIC data  (Alvaro Herrera <alvherre@commandprompt.com>)
List pgsql-hackers
On Thu, 2005-11-03 at 08:27 +0000, Simon Riggs wrote:
> On Wed, 2005-11-02 at 19:12 -0500, Tom Lane wrote:
> > If we were willing to invent the "varlena2" datum format then we could
> > save four bytes per numeric, plus reduce numeric's alignment requirement
> > from int to short which would probably save another byte per value on
> > average.  I'm not sure that that's worth doing if numeric and inet are
> > the only beneficiaries, but it might be.
> 
> That and variations can be the next discussion. They sound good.

Kicking off the discussion on that...

Varlena2 datum format sounds interesting. If we did that, I'd also like
to apply that thought to VAR/CHAR(32000) and below.
(The benefit of varlena2 is saving of 2 bytes + ~1 byte alignment, yes?,
the other two bytes come from the other numeric savings discussed).

Alternatively, what I'd been thinking about was altering the self-
contained nature of PostgreSQL datatypes.

In other databases, CHAR(12) and NUMERIC(12) are fixed length datatypes.
In PostgreSQL, they are dynamically varying datatypes.

What actually happens is that in many other systems the datatype is the
same, but additional metadata is provided for that particular attribute.
So CHAR(12) is a datatype of CHAR with a metadata item called length
which is set to 12 for that attribute.

On PostgreSQL, CHAR(12) is a bpchar datatype with all instantiations of
that datatype having a 4 byte varlena header. In this example, all of
those instantiations having the varlena header set to 12, so essentially
wasting the 4 byte header.

It seems like it would be an interesting move to allow the attribute
metadata to be stored in the TupleDesc, so we can store it once, rather
than once per row.

If we did this we would need two datatypes where currently we need only
one. We would still need variable-length char datatype VARCHAR and we
would be inventing a new fixed-char datatype with metadata of length
CHAR(n).

This would give us two things:
- reduce many attributes by 4 bytes in length
- allow attribute access to increase considerably in speed for queries,
sorts etc since more of the tuple offsets will be constant

Anyway, I accept that many will say I clearly don't understand Object
Relational. It seems like this could be done without actually breaking
anything. The question is, how much work would it be?

Best Regards, Simon Riggs



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Reducing the overhead of NUMERIC data
Next
From: "Steinar H. Gunderson"
Date:
Subject: Transitive closure of a directed graph