WIP: RangeTypes - Mailing list pgsql-hackers

Ok, I have made some progress. This is still a proof-of-concept patch,
but the important pieces are working together.

Synopsis:

  CREATE TYPE numrange AS RANGE (SUBTYPE=numeric,
    SUBTYPE_CMP=numeric_cmp);

  SELECT range_eq('[1,2.2)'::numrange,'[1,2.2]');
  SELECT range_lbound('(3.7,9]'::numrange);
  SELECT range(6.7);
  SELECT '-'::numrange; -- empty
  SELECT '[1, NULL]'::numrange; -- ] will become )
  SELECT '(INF, 3)'::numrange;

I haven't completed many of the other generic functions, because I'd
like to make sure I'm on the right track first. The important thing
about the functions above is that they show ANYRANGE working in
conjunction with ANYELEMENT in various combinations, which was a
significant part of this patch.

Here are the open items:

1. Generic functions -- most of which are fairly obvious. However, I
want to make sure I'm on the right track first.

2. GiST -- I'll need a mechanism to implement the "penalty" function,
and perhaps I'll also need additional support for the picksplit
function. For the "penalty" function, I think I'll need to require a
function to convert the subtype into a float, and I can use that to find
a distance (which can be the penalty). That should also satisfy anything
that picksplit might need.

3. Typmod -- There is still one annoyance about typmod remaining. I need
to treat it like an array in find_typmod_coercion_function(), and then
create a coercion expression. Is it worth it? Would typmod on a range be
confusing, or should I just finish this item up?

4. Docs

5. Tests

6. pg_dump -- should be pretty easy; I just want to settle some of the
other stuff first.

7. Right now the parse function is quite dumb. Is there some example
code I should follow to make sure I get this right?

8. In order to properly support the various combinations of ANYRANGE and
ANYELEMENT in a function definition (which are all important), we need
to be able to determine the range type given a subtype. That means that
each subtype can only have one associated range, which sounds somewhat
limiting, but it can be worked around by using domains. I don't think
this is a major limitation. Comments?

9. Representation -- right now I store the OID of the range type in the
range itself, much like arrays, in order to call the find the functions
to operate on the subtype. Robert has some justifiable concerns about
that 4-byte overhead. Possible ideas:

  * Forget about ANYRANGE altogether, and generate new catalog entries
for the generic functions for each new range type defined. I don't
particularly like this approach because it makes it very difficult to
define new generic functions.

  * Somehow fix the type system so that we know the specific types of
arguments in all situations. I don't know if this is feasible.

  * Store a 8- or 16-bit unique number in pg_range, and store that
number in the representation. That would be pretty ugly, and limit the
total possible range types defined at once, but it saves a couple bytes
per value.

  * Try to somehow mimic what records do. Records use a global array and
use the typmod as an index into that array. It looks like a hack to me,
but might be worth borrowing anyway.

Also related to representation:

  * Right now I always align the subtypes within the range according to
typalign. I could avoid that by packing the bytes tightly, and then
copying them around later. Suggestions? And what should the overall
alignment of the range type be?

  * If it's a fixed-length type, we can save the varlena header byte on
the overall range; but we lose the ability to save space when one of the
boundaries of the range is missing (NULL or INF), and it would
complicate the code a little. Thoughts?

Regards,
    Jeff Davis

Attachment

pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: [COMMITTERS] pgsql: Set process title to indicate base backup is running
Next
From: Shigeru HANADA
Date:
Subject: Re: SQL/MED - file_fdw