Home > mailing lists

Re: Reducing the overhead of NUMERIC data - Mailing list pgsql-hackers

From	Simon Riggs
Subject	Re: Reducing the overhead of NUMERIC data
Date	November 3, 2005 09:52:12
Msg-id	1131025786.8300.1911.camel@localhost.localdomain Whole thread Raw
In response to	Re: Reducing the overhead of NUMERIC data (Simon Riggs <simon@2ndquadrant.com>)
Responses	Re: Reducing the overhead of NUMERIC data Re: Reducing the overhead of NUMERIC data
List	pgsql-hackers

Tree view

On Thu, 2005-11-03 at 08:27 +0000, Simon Riggs wrote:
> On Wed, 2005-11-02 at 19:12 -0500, Tom Lane wrote:
> > If we were willing to invent the "varlena2" datum format then we could
> > save four bytes per numeric, plus reduce numeric's alignment requirement
> > from int to short which would probably save another byte per value on
> > average.  I'm not sure that that's worth doing if numeric and inet are
> > the only beneficiaries, but it might be.
> 
> That and variations can be the next discussion. They sound good.

Kicking off the discussion on that...

Varlena2 datum format sounds interesting. If we did that, I'd also like
to apply that thought to VAR/CHAR(32000) and below.
(The benefit of varlena2 is saving of 2 bytes + ~1 byte alignment, yes?,
the other two bytes come from the other numeric savings discussed).

Alternatively, what I'd been thinking about was altering the self-
contained nature of PostgreSQL datatypes.

In other databases, CHAR(12) and NUMERIC(12) are fixed length datatypes.
In PostgreSQL, they are dynamically varying datatypes.

What actually happens is that in many other systems the datatype is the
same, but additional metadata is provided for that particular attribute.
So CHAR(12) is a datatype of CHAR with a metadata item called length
which is set to 12 for that attribute.

On PostgreSQL, CHAR(12) is a bpchar datatype with all instantiations of
that datatype having a 4 byte varlena header. In this example, all of
those instantiations having the varlena header set to 12, so essentially
wasting the 4 byte header.

It seems like it would be an interesting move to allow the attribute
metadata to be stored in the TupleDesc, so we can store it once, rather
than once per row.

If we did this we would need two datatypes where currently we need only
one. We would still need variable-length char datatype VARCHAR and we
would be inventing a new fixed-char datatype with metadata of length
CHAR(n).

This would give us two things:
- reduce many attributes by 4 bytes in length
- allow attribute access to increase considerably in speed for queries,
sorts etc since more of the tuple offsets will be constant

Anyway, I accept that many will say I clearly don't understand Object
Relational. It seems like this could be done without actually breaking
anything. The question is, how much work would it be?

Best Regards, Simon Riggs

pgsql-hackers by date:

From: Simon Riggs
Date: 03 November 2005, 04:30:20
Subject: Re: Reducing the overhead of NUMERIC data

From: "Steinar H. Gunderson"
Date: 03 November 2005, 09:57:06
Subject: Transitive closure of a directed graph

Re: Reducing the overhead of NUMERIC data - Mailing list pgsql-hackers

Previous

Next