Re: A bug when use get_bit() function for a long bytea string - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: A bug when use get_bit() function for a long bytea string
Date
Msg-id CAG-ACPUbQwpCHhn_KkLAGEc08A91yZhARazj2BGjXvK+eFCVog@mail.gmail.com
Whole thread Raw
In response to Re: A bug when use get_bit() function for a long bytea string  ("movead.li@highgo.ca" <movead.li@highgo.ca>)
Responses Re: A bug when use get_bit() function for a long bytea string  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: A bug when use get_bit() function for a long bytea string  ("Daniel Verite" <daniel@manitou-mail.org>)
List pgsql-hackers


On Wed, 18 Mar 2020 at 08:18, movead.li@highgo.ca <movead.li@highgo.ca> wrote:

Hello thanks for the detailed review,

>I think the second argument indicates the bit position, which would be max bytea length * 8. If max bytea length covers whole int32, the second argument >needs to be wider i.e. int64.
Yes, it makes sence and followed.

I think we need a similar change in byteaGetBit() and byteaSetBit() as well.
 

> Some more comments on the patch
> struct pg_encoding
> {
>- unsigned (*encode_len) (const char *data, unsigned dlen);
>+ int64 (*encode_len) (const char *data, unsigned dlen);
 unsigned (*decode_len) (const char *data, unsigned dlen);
 unsigned (*encode) (const char *data, unsigned dlen, char *res);
>  unsigned (*decode) (const char *data, unsigned dlen, char *res);
> Why not use return type of int64 for rest of the functions here as well?
>  res = enc->encode(VARDATA_ANY(data), datalen, VARDATA(result));
>  /* Make this FATAL 'cause we've trodden on memory ... */
>- if (res > resultlen)
>+ if ((int64)res > resultlen)
>
>if we change return type of all those functions to int64, we won't need this cast.
I change the 'encode' function, it needs an int64 return type, but keep other 
functions in 'pg_encoding', because I think it of no necessary reason.

Ok, let's leave it for a committer to decide. 


>Right now we are using int64 because bytea can be 1GB, but what if we increase
>that limit tomorrow, will int64 be sufficient? That may be unlikely in the near
>future, but nevertheless a possibility. Should we then define a new datatype
>which resolves to int64 right now but really depends upon the bytea length. I
>am not suggesting that we have to do it right now, but may be something to
>think about.
I decide to use int64 because if we really want to increase the limit,  it should be
the same change with 'text', 'varchar' which have the same limit. So it may need
a more general struct. Hence I give up the idea.

Hmm, Let's see what a committer says.

Some more review comments.
+   int64       res,resultlen;

We need those on separate lines, possibly.

+   byteNo = (int32)(n / BITS_PER_BYTE);
Does it hurt to have byteNo as int64 so as to avoid a cast. Otherwise, please
add a comment explaining the reason for the cast. The comment applies at other
places where this change appears.

-       int         len;
+       int64       len;
Why do we need this change?
        int         i;
 


>It might help to add a test where we could pass the second argument something
>greater than 1G. But it may be difficult to write such a test case.
Add two test cases.
 
+
+select get_bit(
+        set_bit((repeat('Postgres', 512 * 1024 * 1024 / 8))::bytea, 1024 * 1024 * 1024 + 1, 0)
+       ,1024 * 1024 * 1024 + 1);

This bit position is still within int4.
postgres=# select pg_column_size(1024 * 1024 * 1024 + 1);
 pg_column_size
----------------
              4  
(1 row)

You want something like
postgres=# select pg_column_size(512::bigint * 1024 * 1024 * 8);
 pg_column_size
----------------
              8  
(1 row)

--
Best Wishes,
Ashutosh

pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: Planning counters in pg_stat_statements (using pgss_store)
Next
From: Fujii Masao
Date:
Subject: Re: Patch: to pass query string to pg_plan_query()