Making oidvector and int2vector variable-length - Mailing list pgsql-hackers

From Tom Lane
Subject Making oidvector and int2vector variable-length
Date
Msg-id 6039.1111945481@sss.pgh.pa.us
Whole thread Raw
Responses Re: Making oidvector and int2vector variable-length  (Alvaro Herrera <alvherre@dcc.uchile.cl>)
List pgsql-hackers
I've been toying with the idea of converting the oidvector and
int2vector datatypes from fixed-width arrays to variable-length;
that is, stick a varlena length word on the front and store only
pronargs or indnatts entries instead of a fixed number.

This would not immediately allow us to eliminate the fixed FUNC_MAX_ARGS
and INDEX_MAX_KEYS limits, but it would have some positive effects:

* The two limits could be set to different values, rather than being
constrained to be the same.

* AFAICS, it'd be possible to change FUNC_MAX_ARGS (though not
INDEX_MAX_KEYS) with just a recompile, no initdb needed.

* There would be significant space savings in pg_proc and its index
pg_proc_proname_args_nsp_index.  Currently, in a freshly initdb'd
database, pg_proc is 576KB and its index is 1176KB (!!).  (This is on my
machine, your numbers might vary a little due to alignment padding.)
I calculate that converting proargtypes to variable width would
eliminate 200K of data.  This would presumably translate directly into
savings in pg_proc, and one might hope that the index would shrink
even more.

You might think we should remove the separate datatypes altogether
and use the normal oid[] and int2[] datatypes.  I am disinclined to do
that however, for two reasons:
1. It would change the I/O behavior of these columns, almost certainly
breaking clients that look at pg_proc for instance.
2. I don't think we can make lookups in pg_proc depend on array
comparison; given the way the array code works, that's circular.

If you're wondering about pushing forwards to eliminate the hardwired
limits altogether ... for an hour or so today I was thinking that might
be within reach, but I found one showstopper reason why not in each
case:

1. The layout of the FunctionCallInfoData struct depends on
FUNC_MAX_ARGS.  This is a sufficiently fundamental and widely-used
struct that I don't think we can easily afford to make it more
complicated.

2. The layout of index tuple headers depends on INDEX_MAX_KEYS;
specifically, they have a fixed-width null bitmap independent of the
number of attributes actually in the index.  Since an index tuple header
hasn't got room to store the number of attributes, it seems difficult
to change the fixed-width policy.  (Considering that the bitmap width
has to be MAXALIGN'd, there'd be no space savings anyway ... and it's
not like anyone has a reason to want indexes with more than 32 columns.)

Comments?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Bug 1500
Next
From: "Magnus Hagander"
Date:
Subject: Re: Upcoming 8.0.2 Release