Thread: Making oidvector and int2vector variable-length

Making oidvector and int2vector variable-length

From

Tom Lane

Date:

27 March 2005, 18:46:08

I've been toying with the idea of converting the oidvector and
int2vector datatypes from fixed-width arrays to variable-length;
that is, stick a varlena length word on the front and store only
pronargs or indnatts entries instead of a fixed number.

This would not immediately allow us to eliminate the fixed FUNC_MAX_ARGS
and INDEX_MAX_KEYS limits, but it would have some positive effects:

* The two limits could be set to different values, rather than being
constrained to be the same.

* AFAICS, it'd be possible to change FUNC_MAX_ARGS (though not
INDEX_MAX_KEYS) with just a recompile, no initdb needed.

* There would be significant space savings in pg_proc and its index
pg_proc_proname_args_nsp_index.  Currently, in a freshly initdb'd
database, pg_proc is 576KB and its index is 1176KB (!!).  (This is on my
machine, your numbers might vary a little due to alignment padding.)
I calculate that converting proargtypes to variable width would
eliminate 200K of data.  This would presumably translate directly into
savings in pg_proc, and one might hope that the index would shrink
even more.

You might think we should remove the separate datatypes altogether
and use the normal oid[] and int2[] datatypes.  I am disinclined to do
that however, for two reasons:
1. It would change the I/O behavior of these columns, almost certainly
breaking clients that look at pg_proc for instance.
2. I don't think we can make lookups in pg_proc depend on array
comparison; given the way the array code works, that's circular.

If you're wondering about pushing forwards to eliminate the hardwired
limits altogether ... for an hour or so today I was thinking that might
be within reach, but I found one showstopper reason why not in each
case:

1. The layout of the FunctionCallInfoData struct depends on
FUNC_MAX_ARGS.  This is a sufficiently fundamental and widely-used
struct that I don't think we can easily afford to make it more
complicated.

2. The layout of index tuple headers depends on INDEX_MAX_KEYS;
specifically, they have a fixed-width null bitmap independent of the
number of attributes actually in the index.  Since an index tuple header
hasn't got room to store the number of attributes, it seems difficult
to change the fixed-width policy.  (Considering that the bitmap width
has to be MAXALIGN'd, there'd be no space savings anyway ... and it's
not like anyone has a reason to want indexes with more than 32 columns.)

Comments?
        regards, tom lane

Re: Making oidvector and int2vector variable-length

From

Alvaro Herrera

Date:

27 March 2005, 20:42:50

On Sun, Mar 27, 2005 at 12:44:41PM -0500, Tom Lane wrote:
> I've been toying with the idea of converting the oidvector and
> int2vector datatypes from fixed-width arrays to variable-length;
> that is, stick a varlena length word on the front and store only
> pronargs or indnatts entries instead of a fixed number.

This mean that it would be possible to set FUNC_MAX_ARGS much higher
without performance loss, right?  That alone sounds like a big win to
me.

-- 
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"Linux transformó mi computadora, de una `máquina para hacer cosas',
en un aparato realmente entretenido, sobre el cual cada día aprendo
algo nuevo" (Jaime Salinas)

Re: Making oidvector and int2vector variable-length

From

Tom Lane

Date:

27 March 2005, 21:41:12

Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> On Sun, Mar 27, 2005 at 12:44:41PM -0500, Tom Lane wrote:
>> I've been toying with the idea of converting the oidvector and
>> int2vector datatypes from fixed-width arrays to variable-length;
>> that is, stick a varlena length word on the front and store only
>> pronargs or indnatts entries instead of a fixed number.

> This mean that it would be possible to set FUNC_MAX_ARGS much higher
> without performance loss, right?  That alone sounds like a big win to
> me.

It wouldn't cost anything in terms of disk space.  I'm not sure about
performance implications --- there are a lot of MemSets floating around
that zero FUNC_MAX_ARGS worth of space, and I'm unsure if any of them
are in performance-critical paths.  Atsushi Ogawa got the most critical
ones recently, though.

Very likely we could kick it up to 100 or so without feeling any pain;
how high were you thinking?

(Since I wasn't thinking of making oidvector TOAST-capable, it wouldn't
be possible to set FUNC_MAX_ARGS higher than ~600 anyway without
exceeding the maximum index tuple size in pg_proc's index.)
        regards, tom lane

Re: Making oidvector and int2vector variable-length

From

Alvaro Herrera

Date:

28 March 2005, 04:12:54

On Sun, Mar 27, 2005 at 03:41:08PM -0500, Tom Lane wrote:
> Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> > On Sun, Mar 27, 2005 at 12:44:41PM -0500, Tom Lane wrote:
> >> I've been toying with the idea of converting the oidvector and
> >> int2vector datatypes from fixed-width arrays to variable-length;
> >> that is, stick a varlena length word on the front and store only
> >> pronargs or indnatts entries instead of a fixed number.
> 
> > This mean that it would be possible to set FUNC_MAX_ARGS much higher
> > without performance loss, right?  That alone sounds like a big win to
> > me.
> 
> It wouldn't cost anything in terms of disk space.  I'm not sure about
> performance implications --- there are a lot of MemSets floating around
> that zero FUNC_MAX_ARGS worth of space, and I'm unsure if any of them
> are in performance-critical paths.  Atsushi Ogawa got the most critical
> ones recently, though.
> 
> Very likely we could kick it up to 100 or so without feeling any pain;
> how high were you thinking?

I used to see people asking to raise it to 64 or so.  Not sure if it
would be useful to go higher than that ... much less now that we have
full-fledged support for row types.

I see 93 appearances of FUNC_MAX_ARGS in the code.  There are 20 MemSet;
the only one of them that appears to be used regularly seems to be the
one in ParseFuncOrColumn; the rest are in FooCreate() or routines
thereof.

There are 3 memcpys, two of them are used regularly (inline_function,
get_func_signature).  The other one is fetch_fp_info, which is used only
in dealing with PQfn().

(This info brought to you courtesy of cscope)

-- 
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"I love the Postgres community. It's all about doing things _properly_. :-)"
(David Garamond)

Re: Making oidvector and int2vector variable-length

From

Josh Berkus

Date:

28 March 2005, 17:58:25

Alvaro, Tom,

> > Very likely we could kick it up to 100 or so without feeling any pain;
> > how high were you thinking?
>
> I used to see people asking to raise it to 64 or so.  Not sure if it
> would be useful to go higher than that ... much less now that we have
> full-fledged support for row types.

Actually, I'm right now working on a function with 88 parameters, which have
to be passed through a temp table in 7.4.   I've seen Oracle PL/SQL
procedures with 140 parameters.   While it's possible to work around this
using row types, it's cumbersome, and extremely painful if you're porting an
application.

So if we can make this variable-length, I'd say set the limit at 400 to avoid
TOASTing, but otherwise allow users as many parameters as they want.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Making oidvector and int2vector variable-length

From

Tom Lane

Date:

28 March 2005, 18:47:07

I wrote:
> I've been toying with the idea of converting the oidvector and
> int2vector datatypes from fixed-width arrays to variable-length;
> that is, stick a varlena length word on the front and store only
> pronargs or indnatts entries instead of a fixed number.

I have a prototype patch that does this; it gets through the regression
tests but needs cleanup, and I think contrib is still broken, so I won't
post it yet.

What I ended up doing is sticking a full ArrayType header onto these
types, so that for example the representation of oidvector is the same
as oid[] except it doesn't support toasting.  This was forced by the
fact that we need to support subscripting, and the array subscripting
code expects anything that's not a fixed-length type to be a normal
array.  I think this is not entirely a bad thing though, because it
means that all the array functionality Joe has been adding over the
past couple of releases now works with pg_proc.proargtypes:

regression=# select oid::regprocedure from pg_proc
regression-# where 'point'::regtype = any (proargtypes);               oid

-----------------------------------point_send(point)circle_contain_pt(circle,point)pt_contained_circle(point,circle)line(point,point)point_add(point,point)point_sub(point,point)point_mul(point,point)point_div(point,point)circle(point,double
precision)...

The space savings is still pretty good, so I'm not unhappy about the
few extra bytes needed.  The main thing that is bothering me at the
moment is that oidvector and int2vector are now a little TOO much like
real array types: they're tending to capture operations they shouldn't.
A couple of hacks that I've had to put in:

* dissuade format_type() from displaying oidvector as "oid[]";

* prevent find_coercion_pathway() from deciding that int[] can be implicitly casted to oidvector, as this causes at
leastone regression test failure.

I suspect there may be a few more places like this that will need
ugly hacks to prevent them from treating oidvector and int2vector
as real array types.  Still, it seems like a net step forward,
especially considering the prospect that we can raise FUNC_MAX_ARGS.

Comments?
        regards, tom lane