Thread: Bit strings

Bit strings

From
Adriaan Joubert
Date:
Hi Bruce,
   I think there has been some confusion about the bit/bit varying
stuff I did. I initially
sent in a bit type that was non-SQL compliant, with the suggestion that
this may be
useful to stick into contrib.

Then several people remarked that it would be better to have an SQL
compliant type, so
I wrote the code to do all the manipulations, roughly based on

src/backend/utils/adt/varchar.c

This can evidently not be used as a standalone routine, but needs to be
properly
integrated into the database as a type. Unfortunately I don't really
know an awful
lot about the internals of postgres, and got a bit stuck trying to
figure out how to
integrate a new type correctly.

Now I see that this second set of routines has ended up in the contrib
directory,
where they are quite useless. In my opinion it would be best to have no
bit-type
in the contrib directory and have the proper SQL compliant one properly
integrated
into postgres, but I'm going to need some hints to do that correctly.

So my question is how we should proceed on this. Am I right in assuming
that there
should be no major objections to adding an SQL type to postgres, or
should I go
through another iteration on the mailing list?

Thanks for any suggestions,

Adriaan





Re: Bit strings

From
Bruce Momjian
Date:
Can we get the BIT type working now that 7.1 is branched?

> Hi Bruce,
> 
>     I think there has been some confusion about the bit/bit varying
> stuff I did. I initially
> sent in a bit type that was non-SQL compliant, with the suggestion that
> this may be
> useful to stick into contrib.
> 
> Then several people remarked that it would be better to have an SQL
> compliant type, so
> I wrote the code to do all the manipulations, roughly based on
> 
> src/backend/utils/adt/varchar.c
> 
> This can evidently not be used as a standalone routine, but needs to be
> properly
> integrated into the database as a type. Unfortunately I don't really
> know an awful
> lot about the internals of postgres, and got a bit stuck trying to
> figure out how to
> integrate a new type correctly.
> 
> Now I see that this second set of routines has ended up in the contrib
> directory,
> where they are quite useless. In my opinion it would be best to have no
> bit-type
> in the contrib directory and have the proper SQL compliant one properly
> integrated
> into postgres, but I'm going to need some hints to do that correctly.
> 
> So my question is how we should proceed on this. Am I right in assuming
> that there
> should be no major objections to adding an SQL type to postgres, or
> should I go
> through another iteration on the mailing list?
> 
> Thanks for any suggestions,
> 
> Adriaan
> 
> 
> 
> 
> ************
> 


--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Bit strings

From
Adriaan Joubert
Date:
Bruce Momjian wrote:

> Can we get the BIT type working now that 7.1 is branched?
>

Yep. We never quite finished the discussions before the 7.0 release. The
outstanding issues were

1) should bit operations be allowed between bit strings of different sizes?
They are at the moment, but the longest string is truncated, as you
otherwise get illogical behaviour (imagine padding with zeros and doing an
XOR). Truncation can be somewhat unlogical as well. My current feeling is
that perhaps it would be better to just disallow bit operations on
bitstrings that haven't got the same length completely. Makes the code
easier anyway ;-)

2) A second point brought up by Peter is the conversion of bit-strings to
integers and vice-versa. Postgres does not have unsigned ints, so it can
lead to somewhat surprising behaviour. Implicit conversion is probably out
of the question. So what type of conversion functions would people like to
see, if any?

3) With TOAST bit strings could be used to store large amounts of binary
data. Don mentioned MIME-encoded strings for I/O. Is this useful? I guess
and advantage is that BIT is an SQL type, but I'm not sure that any such
application would be particularly portable. It doesn't look like a lot of
extra work though, (provided somebody has got some MIME en-/decoding code
lying about) so we could add it. Am i right that with TOAST and no
restrictions on the query-buffer lengths in psql one could then upload mime
objects straight into postgres?

So, what do people think?

Adriaan



Re: Bit strings

From
Peter Eisentraut
Date:
Adriaan Joubert writes:

> My current feeling is that perhaps it would be better to just disallow
> bit operations on bitstrings that haven't got the same length
> completely.

I feel with you. :)

> 3) With TOAST bit strings could be used to store large amounts of binary
> data.

This is not really the proper application for bit strings. There are BLOBs
for that. Which leads to the question, what are bit strings really useful
for other than genuine bit masks?


-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden




Re: Bit strings

From
Bruce Momjian
Date:
Can someone comment on this?

> Bruce Momjian wrote:
> 
> > Can we get the BIT type working now that 7.1 is branched?
> >
> 
> Yep. We never quite finished the discussions before the 7.0 release. The
> outstanding issues were
> 
> 1) should bit operations be allowed between bit strings of different sizes?
> They are at the moment, but the longest string is truncated, as you
> otherwise get illogical behaviour (imagine padding with zeros and doing an
> XOR). Truncation can be somewhat unlogical as well. My current feeling is
> that perhaps it would be better to just disallow bit operations on
> bitstrings that haven't got the same length completely. Makes the code
> easier anyway ;-)
> 
> 2) A second point brought up by Peter is the conversion of bit-strings to
> integers and vice-versa. Postgres does not have unsigned ints, so it can
> lead to somewhat surprising behaviour. Implicit conversion is probably out
> of the question. So what type of conversion functions would people like to
> see, if any?
> 
> 3) With TOAST bit strings could be used to store large amounts of binary
> data. Don mentioned MIME-encoded strings for I/O. Is this useful? I guess
> and advantage is that BIT is an SQL type, but I'm not sure that any such
> application would be particularly portable. It doesn't look like a lot of
> extra work though, (provided somebody has got some MIME en-/decoding code
> lying about) so we could add it. Am i right that with TOAST and no
> restrictions on the query-buffer lengths in psql one could then upload mime
> objects straight into postgres?
> 
> So, what do people think?
> 
> Adriaan
> 
> 


--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Bit strings

From
Tom Lane
Date:
>>>> Can we get the BIT type working now that 7.1 is branched?

I did some work on the BIT types a couple months ago.  According to
my notes, the following issues are still outstanding before they
can be said to work at all:

Bit and hexstring literals are not handled in a reasonable fashion;
the scanner converts them to integer constants which is bogus.
Probably they need to be converted to some generic 'UNKNOWNBITSTRING'
pseudo-type that can later be coerced to a specific bitstring type.
I didn't touch this because it seems to open up the Pandora's box
of unknown-constant handling, for which we do not have a good
general solution.

SQL92 sez we need a position() function for bitstrings.

Need a regression test for bit types.

scalarltsel() and friends need to cope with bit/varbit types in
order to make good use of indexes on bitstrings.

pg_dump does not handle BIT/VARBIT lengths properly (pjw may have
fixed this by now).
        regards, tom lane


Re: Bit strings

From
Bruce Momjian
Date:
Any idea where we are on this?

> >>>> Can we get the BIT type working now that 7.1 is branched?
> 
> I did some work on the BIT types a couple months ago.  According to
> my notes, the following issues are still outstanding before they
> can be said to work at all:
> 
> Bit and hexstring literals are not handled in a reasonable fashion;
> the scanner converts them to integer constants which is bogus.
> Probably they need to be converted to some generic 'UNKNOWNBITSTRING'
> pseudo-type that can later be coerced to a specific bitstring type.
> I didn't touch this because it seems to open up the Pandora's box
> of unknown-constant handling, for which we do not have a good
> general solution.
> 
> SQL92 sez we need a position() function for bitstrings.
> 
> Need a regression test for bit types.
> 
> scalarltsel() and friends need to cope with bit/varbit types in
> order to make good use of indexes on bitstrings.
> 
> pg_dump does not handle BIT/VARBIT lengths properly (pjw may have
> fixed this by now).
> 
>             regards, tom lane
> 


--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Bit strings

From
Peter Eisentraut
Date:
Bruce Momjian writes:

> Any idea where we are on this?

> > Bit and hexstring literals are not handled in a reasonable fashion;

Bit string literals are handled correctly, although it occurred to me that
perhaps

#define IsA_Value(t) \        (IsA(t, Integer) || IsA(t, Float) || IsA(t, String))

should be augmented with BitString.  Can someone advise?

Hex literals are still not handled correctly.

> > SQL92 sez we need a position() function for bitstrings.

We have one now.

> > Need a regression test for bit types.

We have one now.

> > scalarltsel() and friends need to cope with bit/varbit types in
> > order to make good use of indexes on bitstrings.

Not done.

> > pg_dump does not handle BIT/VARBIT lengths properly (pjw may have
> > fixed this by now).

Works fine for me.

-- 
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/



Re: Bit strings

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> perhaps
> #define IsA_Value(t) \
>          (IsA(t, Integer) || IsA(t, Float) || IsA(t, String))
> should be augmented with BitString.  Can someone advise?

Yes, definitely --- see the sole use of that macro, in outfuncs.c.
This is a bug.
        regards, tom lane


Re: Bit strings

From
Bruce Momjian
Date:
OK, I will delete my copy of the email. If there is something to be
added to the TODO list, let me know.

[ People are welcome to modify it themselves too.]

> Bruce Momjian writes:
> 
> > Any idea where we are on this?
> 
> > > Bit and hexstring literals are not handled in a reasonable fashion;
> 
> Bit string literals are handled correctly, although it occurred to me that
> perhaps
> 
> #define IsA_Value(t) \
>          (IsA(t, Integer) || IsA(t, Float) || IsA(t, String))
> 
> should be augmented with BitString.  Can someone advise?
> 
> Hex literals are still not handled correctly.
> 
> > > SQL92 sez we need a position() function for bitstrings.
> 
> We have one now.
> 
> > > Need a regression test for bit types.
> 
> We have one now.
> 
> > > scalarltsel() and friends need to cope with bit/varbit types in
> > > order to make good use of indexes on bitstrings.
> 
> Not done.
> 
> > > pg_dump does not handle BIT/VARBIT lengths properly (pjw may have
> > > fixed this by now).
> 
> Works fine for me.
> 
> -- 
> Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/
> 
> 


--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Bit strings

From
Peter Eisentraut
Date:
Tom Lane writes:

> Peter Eisentraut <peter_e@gmx.net> writes:
> > perhaps
> > #define IsA_Value(t) \
> >          (IsA(t, Integer) || IsA(t, Float) || IsA(t, String))
> > should be augmented with BitString.  Can someone advise?
>
> Yes, definitely --- see the sole use of that macro, in outfuncs.c.

Maybe this macro could be removed and expanded in place.  That would make
things more transparent.  Or is there any other potential use?

-- 
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/



Re: Bit strings

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
>> Yes, definitely --- see the sole use of that macro, in outfuncs.c.

> Maybe this macro could be removed and expanded in place.  That would make
> things more transparent.  Or is there any other potential use?

No objection from me.  The macro clearly failed to make things better in
that usage, and offhand I think it would be just as likely to be a time
bomb in any other usage.
        regards, tom lane