Re: type design guidance needed - Mailing list pgsql-hackers

From Evgeni E. Selkov
Subject Re: type design guidance needed
Date
Msg-id 200009230441.XAA09037@juju.mcs.anl.gov
Whole thread Raw
In response to type design guidance needed  (Brook Milligan <brook@biology.nmsu.edu>)
List pgsql-hackers
Brook,

I have been contemplating such data type for years. I believe I have
assembled the most important parts, but I did not have time to
complete the whole thing.

The idea is that hte units of measurement can be treated as arithmetic
expressions. One can assign each of the few existing base units a
fixed position in a bit vector, parse the expression, then evaluate it
to obtain three things: scale factor, numerator and quotient, the
latter two being bit vectors.

So, if you assign the base units as
 'm'    => 1, 'kg'   => 2, 's'    => 4, 'K'    => 8, 'mol'  => 16, 'A'    => 32, 'cd'   => 64,

the unit, umol/min/mg, will be represented as 

(0.01667, 00010000,00000110). 

Such structure is compact enough to be stashed into an atomic type.
In fact, one needs more than just a plain bit vector to represent
exponents:

umol/min/ml => (0.01667, '00010000', '00000103') (because ml is a m^3)

Here I use the whole charater per bit for clarity, but one does not
need more than two or three bits -- you normally don't have kg^4 or
m^7 in your units.

I considered other alternatives, but none seemed as good as an atomic
type. I can bet you will see performance problems and indexing
nightmare with non-atomic solutions well before you hit the space
constraints with the atomic type. You are even likely to see the space
problems with the non-atomic storage: pointers can easily cost more
than compacted units.

There are numerous benefits to the atomic type. The units can be
re-assembled on the output, the operators can be written to work on
non-normalized units and discard the incompatible ones, and the
chances that you screw up the unit integrity are none.

So, if that makes sense, I will be willing to funnel more energy into
this project, and I would aprreciate any co-operation.

In the meanwhile, you might want to check out what I have done so far.

1. A perl parser for the units of measurement that computes units as  algebraic expressions. I have done it in perl for
theease of  prototyping, but it is flex- and bison-generated and can be ported  to c and included into the data type.
 
  Get it from  http://wit.mcs.anl.gov/~selkovjr/Unit.tgz
  This is a regular perl extension; do a 
perl Makefile.PL; make; make install
  type of thing, but first you need to build and install my version of  bison,
http://wit.mcs.anl.gov/~selkovjr/camel-1.24.tar.gz
  There is a demo script that you can run as follows
       perl browse.pl units

2. The postgres extension, seg, to which I was planning to add the  units of measurement. It has its own use already,
andit  exemplifies the use of the yacc parser in an extension.
 
  Please see the README in 
http://wit.mcs.anl.gov/~selkovjr/pg_extensions/
  as well as a brief description in 
http://wit.mcs.anl.gov/EMP/seg-type.html
  and a running demo in 
http://wit.mcs.anl.gov/EMP/indexing.html (search for seg)

Food for thought.

--Gene


pgsql-hackers by date:

Previous
From: Brook Milligan
Date:
Subject: type design guidance needed
Next
From: Tom Lane
Date:
Subject: Re: type design guidance needed