Home > mailing lists

Re: bytea_ops - Mailing list pgsql-patches

From	Tom Lane
Subject	Re: bytea_ops
Date	August 12, 2001 21:02:23
Msg-id	1135.997653733@sss.pgh.pa.us Whole thread Raw
In response to	Re: bytea_ops ("Joe Conway" <joseph.conway@home.com>)
List	pgsql-patches

Tree view

"Joe Conway" <joseph.conway@home.com> writes:
> But in any case, for this type of data assuming a 0..255 range for any
> particular byte is completely appropriate. And in testing, I found that the
> current calculation is reasonably accurate.

Well, it *is* accurate, if and only if that assumption is correct.
Obviously it's correct for random-byte data.

> I think there are other scenarios in which you might want to use bytea where
> the distribution is less random (I'm thinking in terms of any compressible
> binary data like executables or some image types), but I can't think of any
> offhand where ordering of the data is all that meaningful.

Yeah.  Compressed data would show a pretty even byte-value distribution
as well, but the real question is what sort of data might be found in a
bytea column for which "x < y" is an interesting comparison.  I'm not
sure either.

> On the other hand, I suppose if you wanted to use bytea to store some sort
> of bitmapped data it might be highly skewed, and interesting to select
> distinct ranges from. Given that, it might make sense to leave the
> range estimate as-is.

I don't like the estimate as-is.  For textual data it makes some sense
to classify characters into a small number of categories (letters,
digits, other), and with so few categories it's not completely
ridiculous to suppose that the three available strings might tell you
which categories are present in a column.  For bytea data, there are
no natural categories and thus no justification for extrapolating
byte-value distribution from the info available to scalarltsel.  So
I think there's no defensible argument for using anything but 0..255.
(I suppose we could consider adding more info to pg_statistic for these
types of columns, but I'm not eager to do that right at the moment.)

            regards, tom lane

pgsql-patches by date:

From: "Joe Conway"
Date: 12 August 2001, 18:44:42
Subject: Re: bytea_ops

From: "Joe Conway"
Date: 13 August 2001, 00:04:15
Subject: Re: bytea_ops

Re: bytea_ops - Mailing list pgsql-patches

Previous

Next