Thread: Making oidvector and int2vector variable-length
I've been toying with the idea of converting the oidvector and int2vector datatypes from fixed-width arrays to variable-length; that is, stick a varlena length word on the front and store only pronargs or indnatts entries instead of a fixed number. This would not immediately allow us to eliminate the fixed FUNC_MAX_ARGS and INDEX_MAX_KEYS limits, but it would have some positive effects: * The two limits could be set to different values, rather than being constrained to be the same. * AFAICS, it'd be possible to change FUNC_MAX_ARGS (though not INDEX_MAX_KEYS) with just a recompile, no initdb needed. * There would be significant space savings in pg_proc and its index pg_proc_proname_args_nsp_index. Currently, in a freshly initdb'd database, pg_proc is 576KB and its index is 1176KB (!!). (This is on my machine, your numbers might vary a little due to alignment padding.) I calculate that converting proargtypes to variable width would eliminate 200K of data. This would presumably translate directly into savings in pg_proc, and one might hope that the index would shrink even more. You might think we should remove the separate datatypes altogether and use the normal oid[] and int2[] datatypes. I am disinclined to do that however, for two reasons: 1. It would change the I/O behavior of these columns, almost certainly breaking clients that look at pg_proc for instance. 2. I don't think we can make lookups in pg_proc depend on array comparison; given the way the array code works, that's circular. If you're wondering about pushing forwards to eliminate the hardwired limits altogether ... for an hour or so today I was thinking that might be within reach, but I found one showstopper reason why not in each case: 1. The layout of the FunctionCallInfoData struct depends on FUNC_MAX_ARGS. This is a sufficiently fundamental and widely-used struct that I don't think we can easily afford to make it more complicated. 2. The layout of index tuple headers depends on INDEX_MAX_KEYS; specifically, they have a fixed-width null bitmap independent of the number of attributes actually in the index. Since an index tuple header hasn't got room to store the number of attributes, it seems difficult to change the fixed-width policy. (Considering that the bitmap width has to be MAXALIGN'd, there'd be no space savings anyway ... and it's not like anyone has a reason to want indexes with more than 32 columns.) Comments? regards, tom lane
On Sun, Mar 27, 2005 at 12:44:41PM -0500, Tom Lane wrote: > I've been toying with the idea of converting the oidvector and > int2vector datatypes from fixed-width arrays to variable-length; > that is, stick a varlena length word on the front and store only > pronargs or indnatts entries instead of a fixed number. This mean that it would be possible to set FUNC_MAX_ARGS much higher without performance loss, right? That alone sounds like a big win to me. -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "Linux transformó mi computadora, de una `máquina para hacer cosas', en un aparato realmente entretenido, sobre el cual cada día aprendo algo nuevo" (Jaime Salinas)
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > On Sun, Mar 27, 2005 at 12:44:41PM -0500, Tom Lane wrote: >> I've been toying with the idea of converting the oidvector and >> int2vector datatypes from fixed-width arrays to variable-length; >> that is, stick a varlena length word on the front and store only >> pronargs or indnatts entries instead of a fixed number. > This mean that it would be possible to set FUNC_MAX_ARGS much higher > without performance loss, right? That alone sounds like a big win to > me. It wouldn't cost anything in terms of disk space. I'm not sure about performance implications --- there are a lot of MemSets floating around that zero FUNC_MAX_ARGS worth of space, and I'm unsure if any of them are in performance-critical paths. Atsushi Ogawa got the most critical ones recently, though. Very likely we could kick it up to 100 or so without feeling any pain; how high were you thinking? (Since I wasn't thinking of making oidvector TOAST-capable, it wouldn't be possible to set FUNC_MAX_ARGS higher than ~600 anyway without exceeding the maximum index tuple size in pg_proc's index.) regards, tom lane
On Sun, Mar 27, 2005 at 03:41:08PM -0500, Tom Lane wrote: > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > On Sun, Mar 27, 2005 at 12:44:41PM -0500, Tom Lane wrote: > >> I've been toying with the idea of converting the oidvector and > >> int2vector datatypes from fixed-width arrays to variable-length; > >> that is, stick a varlena length word on the front and store only > >> pronargs or indnatts entries instead of a fixed number. > > > This mean that it would be possible to set FUNC_MAX_ARGS much higher > > without performance loss, right? That alone sounds like a big win to > > me. > > It wouldn't cost anything in terms of disk space. I'm not sure about > performance implications --- there are a lot of MemSets floating around > that zero FUNC_MAX_ARGS worth of space, and I'm unsure if any of them > are in performance-critical paths. Atsushi Ogawa got the most critical > ones recently, though. > > Very likely we could kick it up to 100 or so without feeling any pain; > how high were you thinking? I used to see people asking to raise it to 64 or so. Not sure if it would be useful to go higher than that ... much less now that we have full-fledged support for row types. I see 93 appearances of FUNC_MAX_ARGS in the code. There are 20 MemSet; the only one of them that appears to be used regularly seems to be the one in ParseFuncOrColumn; the rest are in FooCreate() or routines thereof. There are 3 memcpys, two of them are used regularly (inline_function, get_func_signature). The other one is fetch_fp_info, which is used only in dealing with PQfn(). (This info brought to you courtesy of cscope) -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "I love the Postgres community. It's all about doing things _properly_. :-)" (David Garamond)
Alvaro, Tom, > > Very likely we could kick it up to 100 or so without feeling any pain; > > how high were you thinking? > > I used to see people asking to raise it to 64 or so. Not sure if it > would be useful to go higher than that ... much less now that we have > full-fledged support for row types. Actually, I'm right now working on a function with 88 parameters, which have to be passed through a temp table in 7.4. I've seen Oracle PL/SQL procedures with 140 parameters. While it's possible to work around this using row types, it's cumbersome, and extremely painful if you're porting an application. So if we can make this variable-length, I'd say set the limit at 400 to avoid TOASTing, but otherwise allow users as many parameters as they want. -- Josh Berkus Aglio Database Solutions San Francisco
I wrote: > I've been toying with the idea of converting the oidvector and > int2vector datatypes from fixed-width arrays to variable-length; > that is, stick a varlena length word on the front and store only > pronargs or indnatts entries instead of a fixed number. I have a prototype patch that does this; it gets through the regression tests but needs cleanup, and I think contrib is still broken, so I won't post it yet. What I ended up doing is sticking a full ArrayType header onto these types, so that for example the representation of oidvector is the same as oid[] except it doesn't support toasting. This was forced by the fact that we need to support subscripting, and the array subscripting code expects anything that's not a fixed-length type to be a normal array. I think this is not entirely a bad thing though, because it means that all the array functionality Joe has been adding over the past couple of releases now works with pg_proc.proargtypes: regression=# select oid::regprocedure from pg_proc regression-# where 'point'::regtype = any (proargtypes); oid -----------------------------------point_send(point)circle_contain_pt(circle,point)pt_contained_circle(point,circle)line(point,point)point_add(point,point)point_sub(point,point)point_mul(point,point)point_div(point,point)circle(point,double precision)... The space savings is still pretty good, so I'm not unhappy about the few extra bytes needed. The main thing that is bothering me at the moment is that oidvector and int2vector are now a little TOO much like real array types: they're tending to capture operations they shouldn't. A couple of hacks that I've had to put in: * dissuade format_type() from displaying oidvector as "oid[]"; * prevent find_coercion_pathway() from deciding that int[] can be implicitly casted to oidvector, as this causes at leastone regression test failure. I suspect there may be a few more places like this that will need ugly hacks to prevent them from treating oidvector and int2vector as real array types. Still, it seems like a net step forward, especially considering the prospect that we can raise FUNC_MAX_ARGS. Comments? regards, tom lane