Re: A bug when use get_bit() function for a long bytea string - Mailing list pgsql-hackers

From Daniel Verite
Subject Re: A bug when use get_bit() function for a long bytea string
Date
Msg-id 84525289-3891-4368-9157-5452f53a6908@manitou-mail.org
Whole thread Raw
In response to Re: A bug when use get_bit() function for a long bytea string  ("movead.li@highgo.ca" <movead.li@highgo.ca>)
Responses Re: A bug when use get_bit() function for a long bytea string  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
    movead.li@highgo.ca wrote:

> A little update for the patch, and patches for all stable avilable.

Some comments about the set_bit/get_bit parts.
I'm reading long_bytea_string_bug_fix_ver6_pg10.patch, but that
applies probably to the other files meant for the existing releases
(I think you could get away with only one patch for backpatching
and one patch for v13, and committers will sort out how
to deal with release branches).

 byteaSetBit(PG_FUNCTION_ARGS)
 {
    bytea       *res = PG_GETARG_BYTEA_P_COPY(0);
-    int32        n = PG_GETARG_INT32(1);
+    int64        n = PG_GETARG_INT64(1);
    int32        newBit = PG_GETARG_INT32(2);

The 2nd argument is 32-bit, not 64. PG_GETARG_INT32(1) must be used.

+    errmsg("index "INT64_FORMAT" out of valid range, 0.."INT64_FORMAT,
+        n, (int64)len * 8 - 1)));

The cast to int64 avoids the overflow, but it may also produce a
result that does not reflect the actual range, which is limited to
2^31-1, again because the bit number is a signed 32-bit value.

I believe the formula for the upper limit in the 32-bit case is:
  (len <= PG_INT32_MAX / 8) ? (len*8 - 1) : PG_INT32_MAX;

These functions could benefit from a comment mentioning that
they cannot reach the full extent of a bytea, because of the size limit
on the bit number.

--- a/src/test/regress/expected/bit.out
+++ b/src/test/regress/expected/bit.out
@@ -656,6 +656,40 @@ SELECT set_bit(B'0101011000100100', 15, 1);

 SELECT set_bit(B'0101011000100100', 16, 1);    -- fail
 ERROR:  bit index 16 out of valid range (0..15)+SELECT get_bit(
+    set_bit((repeat('Postgres', 512 * 1024 * 1024 / 8))::bytea, 0, 0)
+    ,0);
+ get_bit
+---------
+    0
+(1 row)
+
+SELECT get_bit(
+    set_bit((repeat('Postgres', 512 * 1024 * 1024 / 8))::bytea, 0, 1)
+    ,0);
+ get_bit
+---------
+    1
+(1 row)
+

These 2 tests need to allocate big chunks of contiguous memory, so they
might fail for lack of memory on tiny machines, and even when not failing,
they're pretty slow to run. Are they worth the trouble?

+select get_bit(
+     set_bit((repeat('Postgres', 512 * 1024 * 1024 / 8))::bytea,
+         512::bigint * 1024 * 1024 * 8 - 1, 0)
+     ,512::bigint * 1024 * 1024 * 8 - 1);
+ get_bit
+---------
+    0
+(1 row)
+
+select get_bit(
+     set_bit((repeat('Postgres', 512 * 1024 * 1024 / 8))::bytea,
+         512::bigint * 1024 * 1024 * 8 - 1, 1)
+    ,512::bigint * 1024 * 1024 * 8 - 1);
+ get_bit
+---------
+    1
+(1 row)

These 2 tests are supposed to fail in existing releases because set_bit()
and get_bit() don't take a bigint as the 2nd argument.
Also, the same comment as above on how much they allocate.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: snapshot too old issues, first around wraparound and then more.
Next
From: Alvaro Herrera
Date:
Subject: Re: control max length of parameter values logged