Thread: bytea_ops

bytea_ops

From
"Joe Conway"
Date:
Not sure if this needs discussion on hackers or not, but I am sure someone
will redirect me if it does ;)

I found that I had a need to create and use an index on bytea columns for a
project I'm currently working on. In the archives, I found this message by
Tom -- http://fts.postgresql.org/db/mw/msg.html?mid=93112 -- in which he
indicated that operators for bytea were needed but no one had gotten to them
yet. So here's a patch against cvs HEAD. Seems to work correctly in my
limited testing, and passes all regression tests. I would appreciate it if
someone
more experienced could give it a look.

Thanks,

-- Joe




Attachment

Re: bytea_ops

From
Tom Lane
Date:
"Joe Conway" <joseph.conway@home.com> writes:
> I found that I had a need to create and use an index on bytea columns for a
> project I'm currently working on.

Looks pretty good.  I have only three gripes, two trivial.  In order of
decreasing trivialness:

+bytealt(PG_FUNCTION_ARGS)
+{
+    bytea        *arg1 = PG_GETARG_BYTEA_P(0);
+    VarChar        *arg2 = PG_GETARG_BYTEA_P(1);

Looks like you missed one VarChar -> bytea.

+    if ((cmp < 0) || ((cmp == 0) && (len1 < len2)))
+        PG_RETURN_BOOL(TRUE);
+    else
+        PG_RETURN_BOOL(FALSE);

I think it's better style to code stuff like this as

    PG_RETURN_BOOL((cmp < 0) || ((cmp == 0) && (len1 < len2)));

BoolGetDatum already does the work of ensuring that the returned Datum
is exactly TRUE or FALSE, so why do it over again?

The biggie is that you missed adding support for bytea to scalarltsel,
which puts a severe crimp on the optimizer's ability to make any
intelligent decisions about using your index.  See
src/backend/utils/adt/selfuncs.c, about line 580.  You need to add a
case to the convert_to_scalar() function in that file.  I'd say that
bytea should be a separate type category --- don't try to cram it into
convert_to_scalar's string category, because the string-handling code
assumes null-termination is safe.  But the implementation should be
modeled on the handling of strings: you want to strip any common prefix,
then use the next few bytes as base-256 digits to form numeric values.
The comments for convert_string_to_scalar should give you an idea.

            regards, tom lane

Re: bytea_ops

From
"Joe Conway"
Date:
> Looks pretty good.  I have only three gripes, two trivial.  In order of
> decreasing trivialness:
>
> +bytealt(PG_FUNCTION_ARGS)
> +{
> + bytea *arg1 = PG_GETARG_BYTEA_P(0);
> + VarChar *arg2 = PG_GETARG_BYTEA_P(1);
>
> Looks like you missed one VarChar -> bytea.

Ouch! You are of course correct.

>
> + if ((cmp < 0) || ((cmp == 0) && (len1 < len2)))
> + PG_RETURN_BOOL(TRUE);
> + else
> + PG_RETURN_BOOL(FALSE);
>
> I think it's better style to code stuff like this as
>
> PG_RETURN_BOOL((cmp < 0) || ((cmp == 0) && (len1 < len2)));

OK

>
> BoolGetDatum already does the work of ensuring that the returned Datum
> is exactly TRUE or FALSE, so why do it over again?
>
> The biggie is that you missed adding support for bytea to scalarltsel,
> which puts a severe crimp on the optimizer's ability to make any
> intelligent decisions about using your index.  See
> src/backend/utils/adt/selfuncs.c, about line 580.  You need to add a
> case to the convert_to_scalar() function in that file.  I'd say that
> bytea should be a separate type category --- don't try to cram it into
> convert_to_scalar's string category, because the string-handling code
> assumes null-termination is safe.  But the implementation should be
> modeled on the handling of strings: you want to strip any common prefix,
> then use the next few bytes as base-256 digits to form numeric values.
> The comments for convert_string_to_scalar should give you an idea.

I thought I must have missed something here. I noticed that with ~40000 rows
inserted (and the table vacuumed), explain on a < or > query showed the
optimizer tended to not use the index unless severely restricted, i.e.
    select * from foobar where f1 < '\\000\\015';
would do a table scan while
    select * from foobar where f1 < '\\000\\011';
would use the index.

Hopefully fixing the above will improve this. Thank you for looking at this
so quickly :-)

I'll take another pass and send in a new patch soon.

-- Joe


Re: bytea_ops

From
"Joe Conway"
Date:
Here's a new patch.

> Looks like you missed one VarChar -> bytea.

Fixed.

> I think it's better style to code stuff like this as
>
> PG_RETURN_BOOL((cmp < 0) || ((cmp == 0) && (len1 < len2)));

Done.

> The biggie is that you missed adding support for bytea to scalarltsel,
> which puts a severe crimp on the optimizer's ability to make any
> intelligent decisions about using your index.

Hopefully done correctly ;-)


-- Joe


Attachment

Re: bytea_ops

From
Tom Lane
Date:
"Joe Conway" <joseph.conway@home.com> writes:
>> The biggie is that you missed adding support for bytea to scalarltsel,
>> which puts a severe crimp on the optimizer's ability to make any
>> intelligent decisions about using your index.

> Hopefully done correctly ;-)

I'm inclined to think that convert_bytea_to_scalar should just always
assume that the appropriate base is 256, ie, all possible byte values
can appear in the data.  The logic that convert_string_to_scalar uses
to guess at a suitable base depends strongly on the assumption that
ASCII text is much more likely than any other kind of data.  That
doesn't seem like the right assumption to make for bytea, I'd think.
You've removed the more blatant ASCII dependencies from the code,
but I still wonder whether it makes any sense to assume that the byte
values seen in the given strings should be used as predictors of the
overall distribution of byte values in the column.

On the other hand, the reason that convert_string_to_scalar makes all
those difficult-to-justify assumptions is that using a large base leads
to overly optimistic selectivity estimates.  (Example: suppose the
histogram bounds are 'aa' and 'zz', and we are trying to estimate the
selectivity of the range 'bb' to 'cc'.  If we assume the data range is
'a'..'z' then we get scalar equivalents of aa = 0, zz = 0.9985, bb =
0.03994, cc = 0.079881 leading to a selectivity estimate of 0.03994.
If we use a data range of 0..255 then we get aa = 0.380386, bb =
0.384307, cc = 0.388229, zz = 0.478424 leading to selectivity = 0.00392,
more than a factor of 10 smaller.)  Depending on how you are using
bytea, 0..255 might be too large for its data range too.  Thoughts?

BTW, I think that convert_bytea_datum is probably unnecessary, and
definitely it's a waste of cycles to palloc a copy of the input values.
convert_string_datum exists to (a) unify the representations of the
different datatypes that we consider strings, and (b) apply strxfrm
if necessary.  Neither of those motivations will ever apply to bytea
AFAICS.  So you could just as easily pass the given Datums directly to
convert_bytea_to_scalar and let it work directly on them.

            regards, tom lane

Re: bytea_ops

From
"Joe Conway"
Date:
> selectivity of the range 'bb' to 'cc'.  If we assume the data range is
> 'a'..'z' then we get scalar equivalents of aa = 0, zz = 0.9985, bb =
> 0.03994, cc = 0.079881 leading to a selectivity estimate of 0.03994.
> If we use a data range of 0..255 then we get aa = 0.380386, bb =
> 0.384307, cc = 0.388229, zz = 0.478424 leading to selectivity = 0.00392,
> more than a factor of 10 smaller.)  Depending on how you are using
> bytea, 0..255 might be too large for its data range too.  Thoughts?

The specific case that I am using bytea for is storage of completely random
binary data generated for use as cryptographic keys and account identifiers.
For this application, you generally are only interested in the selectivity
of an exact
match, i.e. "where vl_guid = 'bytea-data'".

But in any case, for this type of data assuming a 0..255 range for any
particular byte is completely appropriate. And in testing, I found that the
current calculation is reasonably accurate. For example, with some randomly
generated data:

vsreg_192=# select count(*) from vs_lkup;
 count
-------
 24950
(1 row)

vsreg_192=# explain select vl_guid from vs_lkup where vl_guid <
'\\300\\300';
NOTICE:  QUERY PLAN:

Seq Scan on vs_lkup  (cost=0.00..727.88 rows=319 width=24)

EXPLAIN
vsreg_192=# select count(*) from vs_lkup where vl_guid < '\\300\\300';
 count
-------
   287
(1 row)

vsreg_192=# explain select vl_guid from vs_lkup where vl_guid <
'\\300\\377';
NOTICE:  QUERY PLAN:

Seq Scan on vs_lkup  (cost=0.00..727.88 rows=425 width=24)

EXPLAIN
vsreg_192=# select count(*) from vs_lkup where vl_guid < '\\300\\377';
 count
-------
   390
(1 row)

vsreg_192=# explain select vl_guid from vs_lkup where vl_guid <
'\\300\\010';
NOTICE:  QUERY PLAN:

Index Scan using vs_lkup_vl_guid_key on vs_lkup  (cost=0.00..43.80 rows=11
width=24)

EXPLAIN
vsreg_192=# select count(*) from vs_lkup where vl_guid < '\\300\\010';
 count
-------
    14
(1 row)

(**Note that this database *only* has data >= \300 in vl_guid**)

I think there are other scenarios in which you might want to use bytea where
the distribution is less random (I'm thinking in terms of any compressible
binary data like executables or some image types), but I can't think of any
offhand where ordering of the data is all that meaningful.

On the other hand, I suppose if you wanted to use bytea to store some sort
of bitmapped data it might be highly skewed, and interesting to select
distinct ranges from. Given that, it might make sense to leave the
range estimate as-is.


> BTW, I think that convert_bytea_datum is probably unnecessary, and
> definitely it's a waste of cycles to palloc a copy of the input values.
> convert_string_datum exists to (a) unify the representations of the
> different datatypes that we consider strings, and (b) apply strxfrm
> if necessary.  Neither of those motivations will ever apply to bytea
> AFAICS.  So you could just as easily pass the given Datums directly to
> convert_bytea_to_scalar and let it work directly on them.

OK -- I almost did just that, but I was unsure of the reason for working
with copies. I'll fix it on the next round.

-- Joe






Re: bytea_ops

From
Tom Lane
Date:
"Joe Conway" <joseph.conway@home.com> writes:
> But in any case, for this type of data assuming a 0..255 range for any
> particular byte is completely appropriate. And in testing, I found that the
> current calculation is reasonably accurate.

Well, it *is* accurate, if and only if that assumption is correct.
Obviously it's correct for random-byte data.

> I think there are other scenarios in which you might want to use bytea where
> the distribution is less random (I'm thinking in terms of any compressible
> binary data like executables or some image types), but I can't think of any
> offhand where ordering of the data is all that meaningful.

Yeah.  Compressed data would show a pretty even byte-value distribution
as well, but the real question is what sort of data might be found in a
bytea column for which "x < y" is an interesting comparison.  I'm not
sure either.

> On the other hand, I suppose if you wanted to use bytea to store some sort
> of bitmapped data it might be highly skewed, and interesting to select
> distinct ranges from. Given that, it might make sense to leave the
> range estimate as-is.

I don't like the estimate as-is.  For textual data it makes some sense
to classify characters into a small number of categories (letters,
digits, other), and with so few categories it's not completely
ridiculous to suppose that the three available strings might tell you
which categories are present in a column.  For bytea data, there are
no natural categories and thus no justification for extrapolating
byte-value distribution from the info available to scalarltsel.  So
I think there's no defensible argument for using anything but 0..255.
(I suppose we could consider adding more info to pg_statistic for these
types of columns, but I'm not eager to do that right at the moment.)

            regards, tom lane

Re: bytea_ops

From
"Joe Conway"
Date:
> which categories are present in a column.  For bytea data, there are
> no natural categories and thus no justification for extrapolating
> byte-value distribution from the info available to scalarltsel.  So
> I think there's no defensible argument for using anything but 0..255.
> (I suppose we could consider adding more info to pg_statistic for these
> types of columns, but I'm not eager to do that right at the moment.)

Changes in this rev:

- convert_byte_to_scalar uses 0..255 range always
- eliminated convert_bytea_datum, just pass and use the Datums directly

Results still look good on my random data and passes all regression tests.

-- Joe


Attachment

Re: bytea_ops

From
Tom Lane
Date:
"Joe Conway" <joseph.conway@home.com> writes:
> [ bytea comparison & indexing patch, rev 2 ]

Checked and applied.  I renumbered the OID assignments to avoid creating
gaps in the set of used OIDs, but otherwise it looked great.  Many
thanks!

            regards, tom lane

Re: bytea_ops

From
Bruce Momjian
Date:
Tom has applied your patch.

> > which categories are present in a column.  For bytea data, there are
> > no natural categories and thus no justification for extrapolating
> > byte-value distribution from the info available to scalarltsel.  So
> > I think there's no defensible argument for using anything but 0..255.
> > (I suppose we could consider adding more info to pg_statistic for these
> > types of columns, but I'm not eager to do that right at the moment.)
>
> Changes in this rev:
>
> - convert_byte_to_scalar uses 0..255 range always
> - eliminated convert_bytea_datum, just pass and use the Datums directly
>
> Results still look good on my random data and passes all regression tests.
>
> -- Joe
>

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026