Thread: PSQL error: total cell count of XXX exceeded

PSQL error: total cell count of XXX exceeded

From
Hongxu Ma
Date:
Hi Hackers,

When I tried to select a big amount of rows, psql complains a error "Cannot add cell to table content: total cell count of 905032704 exceeded."

Here are the reproduce steps:
```
interma=# select version();
                                         version
-----------------------------------------------------------------------------------------
 PostgreSQL 12.13 on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
(1 row)

interma=# create table t26(a int,b int,c int,d int,e int,f int,g int,h int,i int,j int,k int,l int,m int,n int,o int,p int,q int,r int,s int,t int ,u int,v int,w int,x int,y int,z int);
CREATE TABLE
interma=# insert into t26 select generate_series(1,200000000);
INSERT 0 200000000
interma=# select * from t26;
Cannot add cell to table content: total cell count of 905032704 exceeded.
```

I checked the related code, and root cause is clear:
```
// in printTableAddCell()
if (content->cellsadded >= content->ncolumns * content->nrows)
      report this error and exit

// cellsadded is long type, but ncolumns and nrows are int
// so, it's possible overflow the int value here.

// using a test program to verify:
int rows = 200000000;
int cols = 26;
printf("%d*%d = %d\n", rows,cols, rows*cols);

output:
      2,0000,0000*26 = 9,0503,2704 // overflow and be truncated into int value here
```

Based on it, I think it's a bug. We should use long for ncolumns and nrows and give a more obvious error message here.

My version is 12.13, and I think the latest code also exists this issue: issue: https://github.com/postgres/postgres/blob/1a4fd77db85abac63e178506335aee74625f6499/src/fe_utils/print.c#L3259

Any thoughts? or some other hidden reasons?
Thanks.


Re: PSQL error: total cell count of XXX exceeded

From
"David G. Johnston"
Date:
On Friday, August 25, 2023, Hongxu Ma <interma@outlook.com> wrote:

When I tried to select a big amount of rows, psql complains a error "Cannot add cell to table content: total cell count of 905032704 exceeded."

 We should use long for ncolumns and nrows and give a more obvious error message here.

Any thoughts? or some other hidden reasons?

9 millions cells seems more than realistic a limit for a psql query result output.  In any case it isn’t a bug, the code demonstrates that fact by producing an explicit error.

I wouldn’t be adverse to an improved error message, and possibly documenting said limit.

David J.

Re: PSQL error: total cell count of XXX exceeded

From
Hongxu Ma
Date:
Thank you David.

From the code logic, I don't think this check is meant to check the limit:
If it enters the double-loop (cont.nrows * cont.ncolumns) in printQuery(), the check should be always false (except overflow happened). So, if want to check the limit, we could have done this check before the double-loop: just checking PGresult and reports error earlier.

I wouldn’t be adverse to an improved error message, and possibly documenting said limit.

Agreed with you, current error message may even report a negative value, it's very confusing for user. It's better to introduce a limit here. Or using a bigger integer type (e.g. long) for them, but it's also have the theoretical upbound.

Thanks.


From: David G. Johnston <david.g.johnston@gmail.com>
Sent: Saturday, August 26, 2023 12:09
To: Hongxu Ma <interma@outlook.com>
Cc: PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Subject: Re: PSQL error: total cell count of XXX exceeded
 
On Friday, August 25, 2023, Hongxu Ma <interma@outlook.com> wrote:

When I tried to select a big amount of rows, psql complains a error "Cannot add cell to table content: total cell count of 905032704 exceeded."

 We should use long for ncolumns and nrows and give a more obvious error message here.

Any thoughts? or some other hidden reasons?

9 millions cells seems more than realistic a limit for a psql query result output.  In any case it isn’t a bug, the code demonstrates that fact by producing an explicit error.

I wouldn’t be adverse to an improved error message, and possibly documenting said limit.

David J.

Re: PSQL error: total cell count of XXX exceeded

From
Hongxu Ma
Date:
I created a patch to fix it.
Really appreciate to anyone can help to review it.
Thanks.


From: Hongxu Ma <interma@outlook.com>
Sent: Saturday, August 26, 2023 19:19
To: David G. Johnston <david.g.johnston@gmail.com>
Cc: PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Subject: Re: PSQL error: total cell count of XXX exceeded
 
Thank you David.

From the code logic, I don't think this check is meant to check the limit:
If it enters the double-loop (cont.nrows * cont.ncolumns) in printQuery(), the check should be always false (except overflow happened). So, if want to check the limit, we could have done this check before the double-loop: just checking PGresult and reports error earlier.

I wouldn’t be adverse to an improved error message, and possibly documenting said limit.

Agreed with you, current error message may even report a negative value, it's very confusing for user. It's better to introduce a limit here. Or using a bigger integer type (e.g. long) for them, but it's also have the theoretical upbound.

Thanks.


From: David G. Johnston <david.g.johnston@gmail.com>
Sent: Saturday, August 26, 2023 12:09
To: Hongxu Ma <interma@outlook.com>
Cc: PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Subject: Re: PSQL error: total cell count of XXX exceeded
 
On Friday, August 25, 2023, Hongxu Ma <interma@outlook.com> wrote:

When I tried to select a big amount of rows, psql complains a error "Cannot add cell to table content: total cell count of 905032704 exceeded."

 We should use long for ncolumns and nrows and give a more obvious error message here.

Any thoughts? or some other hidden reasons?

9 millions cells seems more than realistic a limit for a psql query result output.  In any case it isn’t a bug, the code demonstrates that fact by producing an explicit error.

I wouldn’t be adverse to an improved error message, and possibly documenting said limit.

David J.

Attachment

Re: PSQL error: total cell count of XXX exceeded

From
Jelte Fennema
Date:
On Mon, 11 Sept 2023 at 08:51, Hongxu Ma <interma@outlook.com> wrote:
>
> I created a patch to fix it.
> Really appreciate to anyone can help to review it.
> Thanks.

I think "product" as a variable name isn't very descriptive. Let's
call it total_cells (or something similar instead).

Other than that I think it's a good change. content->cellsadded is
also a long, So I agree that I don't think the limit of int cells was
intended here.



Re: PSQL error: total cell count of XXX exceeded

From
Hongxu Ma
Date:
Thank you for your advice, Jelte.
I have refactored my code, please see the attached patch. (and I put it into https://commitfest.postgresql.org/45/ for trace)

Thanks.


From: Jelte Fennema <postgres@jeltef.nl>
Sent: Monday, September 11, 2023 15:04
To: Hongxu Ma <interma@outlook.com>
Cc: David G. Johnston <david.g.johnston@gmail.com>; PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Subject: Re: PSQL error: total cell count of XXX exceeded
 
On Mon, 11 Sept 2023 at 08:51, Hongxu Ma <interma@outlook.com> wrote:
>
> I created a patch to fix it.
> Really appreciate to anyone can help to review it.
> Thanks.

I think "product" as a variable name isn't very descriptive. Let's
call it total_cells (or something similar instead).

Other than that I think it's a good change. content->cellsadded is
also a long, So I agree that I don't think the limit of int cells was
intended here.
Attachment

Re: PSQL error: total cell count of XXX exceeded

From
Michael Paquier
Date:
On Tue, Sep 12, 2023 at 02:39:55AM +0000, Hongxu Ma wrote:
> Thank you for your advice, Jelte.
> I have refactored my code, please see the attached patch. (and I put
> it into https://commitfest.postgresql.org/45/ for trace)

 {
+       long total_cells;

long is 4 bytes on Windows, and 8 bytes basically elsewhere.  So you
would still have the same problem on Windows, no?
--
Michael

Attachment

Re: PSQL error: total cell count of XXX exceeded

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> On Tue, Sep 12, 2023 at 02:39:55AM +0000, Hongxu Ma wrote:
> +       long total_cells;

> long is 4 bytes on Windows, and 8 bytes basically elsewhere.  So you
> would still have the same problem on Windows, no?

More to the point: what about the multiplication in printTableInit?
The cat's been out of the bag for quite some time before we get to
printTableAddCell.

I'm more than a bit skeptical about trying to do something about this,
simply because this range of query result sizes is far past what is
practical.  The OP clearly hasn't tested his patch on actually
overflowing query results, and I don't care to either.

            regards, tom lane



Re: PSQL error: total cell count of XXX exceeded

From
Hongxu Ma
Date:
Thanks for pointing that, I did miss some other "ncols * nrows" places. Uploaded v3 patch to fix them.

As for the Windows, I didn't test it before but I think it should also have the issue (and happens more possible since ​`cellsadded` is also a long type).
My fix idea is simple: define a common long64 type for it.
I referred MSDN: only `LONGLONG` and `LONG64` are 64 bytes. And I assume Postgres should already have a similar type, but only found `typedef long int int64` in src/include/c.h, looks it's not a proper choose.
@Michael Paquier, could you help to give some advices here (which type should be used? or should define a new one?). Thank you very much.


From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Tuesday, September 12, 2023 12:19
To: Michael Paquier <michael@paquier.xyz>
Cc: Hongxu Ma <interma@outlook.com>; Jelte Fennema <postgres@jeltef.nl>; David G. Johnston <david.g.johnston@gmail.com>; PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Subject: Re: PSQL error: total cell count of XXX exceeded
 
Michael Paquier <michael@paquier.xyz> writes:
> On Tue, Sep 12, 2023 at 02:39:55AM +0000, Hongxu Ma wrote:
> +       long total_cells;

> long is 4 bytes on Windows, and 8 bytes basically elsewhere.  So you
> would still have the same problem on Windows, no?

More to the point: what about the multiplication in printTableInit?
The cat's been out of the bag for quite some time before we get to
printTableAddCell.

I'm more than a bit skeptical about trying to do something about this,
simply because this range of query result sizes is far past what is
practical.  The OP clearly hasn't tested his patch on actually
overflowing query results, and I don't care to either.

                        regards, tom lane
Attachment

Re: PSQL error: total cell count of XXX exceeded

From
Hongxu Ma
Date:
After double check, looks `int64` of src/include/c.h is the proper type for it.
Uploaded the v4 patch to fix it.
Thanks.


From: Hongxu Ma <interma@outlook.com>
Sent: Wednesday, September 13, 2023 10:22
To: Tom Lane <tgl@sss.pgh.pa.us>; Michael Paquier <michael@paquier.xyz>
Cc: PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Subject: Re: PSQL error: total cell count of XXX exceeded
 
Thanks for pointing that, I did miss some other "ncols * nrows" places. Uploaded v3 patch to fix them.

As for the Windows, I didn't test it before but I think it should also have the issue (and happens more possible since ​`cellsadded` is also a long type).
My fix idea is simple: define a common long64 type for it.
I referred MSDN: only `LONGLONG` and `LONG64` are 64 bytes. And I assume Postgres should already have a similar type, but only found `typedef long int int64` in src/include/c.h, looks it's not a proper choose.
@Michael Paquier, could you help to give some advices here (which type should be used? or should define a new one?). Thank you very much.


From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Tuesday, September 12, 2023 12:19
To: Michael Paquier <michael@paquier.xyz>
Cc: Hongxu Ma <interma@outlook.com>; Jelte Fennema <postgres@jeltef.nl>; David G. Johnston <david.g.johnston@gmail.com>; PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Subject: Re: PSQL error: total cell count of XXX exceeded
 
Michael Paquier <michael@paquier.xyz> writes:
> On Tue, Sep 12, 2023 at 02:39:55AM +0000, Hongxu Ma wrote:
> +       long total_cells;

> long is 4 bytes on Windows, and 8 bytes basically elsewhere.  So you
> would still have the same problem on Windows, no?

More to the point: what about the multiplication in printTableInit?
The cat's been out of the bag for quite some time before we get to
printTableAddCell.

I'm more than a bit skeptical about trying to do something about this,
simply because this range of query result sizes is far past what is
practical.  The OP clearly hasn't tested his patch on actually
overflowing query results, and I don't care to either.

                        regards, tom lane
Attachment

Re: PSQL error: total cell count of XXX exceeded

From
Alvaro Herrera
Date:
On 2023-Sep-12, Tom Lane wrote:

> I'm more than a bit skeptical about trying to do something about this,
> simply because this range of query result sizes is far past what is
> practical.  The OP clearly hasn't tested his patch on actually
> overflowing query results, and I don't care to either.

I think we're bound to hit this limit at some point in the future, and
it seems easy enough to solve.  I propose the attached, which is pretty
much what Hongxu last submitted, with some minor changes.

Having this make a difference requires some 128GB of RAM, so it's not a
piece of cake, but it's an amount that can be reasonably expected to be
physically installed in real machines nowadays.

(I first thought we could just use pg_mul_s32_overflow during
printTableInit and raise an error if that returns true, but that just
postpones the problem.)

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
Subversion to GIT: the shortest path to happiness I've ever heard of
                                                (Alexey Klyukin)

Attachment

Re: PSQL error: total cell count of XXX exceeded

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> I think we're bound to hit this limit at some point in the future, and
> it seems easy enough to solve.  I propose the attached, which is pretty
> much what Hongxu last submitted, with some minor changes.

This bit needs more work:

-    content->cells = pg_malloc0((ncolumns * nrows + 1) * sizeof(*content->cells));
+    total_cells = (int64) ncolumns * nrows;
+    content->cells = pg_malloc0((total_cells + 1) * sizeof(*content->cells));

You've made the computation of total_cells reliable, but there's
nothing stopping the subsequent computation of the malloc argument
from overflowing (especially on 32-bit machines).  I think we need
an explicit test along the lines of

    if (total_cells >= SIZE_MAX / sizeof(*content->cells))
        throw error;

(">=" allows not needing to add +1.)

Also, maybe total_cells should be uint64?  We don't want
negative values to pass this test.  Alternatively, add a separate
check that total_cells >= 0.

It should be sufficient to be paranoid about this in printTableInit,
since after that we know the product of ncolumns * nrows isn't
too big.

The rest of this passes an eyeball check.

            regards, tom lane



Re: PSQL error: total cell count of XXX exceeded

From
Alvaro Herrera
Date:
On 2023-Sep-13, Hongxu Ma wrote:

> After double check, looks `int64` of src/include/c.h is the proper type for it.
> Uploaded the v4 patch to fix it.

Right.  I made a few more adjustments, including the additional overflow
check in printTableInit that Tom Lane suggested, and pushed this.

It's a bit annoying that the error recovery decision of this code is to
exit the process with an error.  If somebody were to be interested in a
fun improvement exercise, it may be worth redoing the print.c API so
that it returns errors that psql can report and recover from, instead of
just closing the process.

TBH though, I've never hit that code in real usage.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/



Re: PSQL error: total cell count of XXX exceeded

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> Right.  I made a few more adjustments, including the additional overflow
> check in printTableInit that Tom Lane suggested, and pushed this.

Committed patch LGTM.

> It's a bit annoying that the error recovery decision of this code is to
> exit the process with an error.  If somebody were to be interested in a
> fun improvement exercise, it may be worth redoing the print.c API so
> that it returns errors that psql can report and recover from, instead of
> just closing the process.
> TBH though, I've never hit that code in real usage.

Yeah, I think the reason it's stayed like that for 25 years is that
nobody's hit the case in practice.  Changing the API would be a bit
troublesome, too, because we don't know if anybody besides psql
uses it.

            regards, tom lane



Re: PSQL error: total cell count of XXX exceeded

From
Alvaro Herrera
Date:
On 2023-Nov-21, Tom Lane wrote:

> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > Right.  I made a few more adjustments, including the additional overflow
> > check in printTableInit that Tom Lane suggested, and pushed this.
> 
> Committed patch LGTM.

Thanks for looking!

> > It's a bit annoying that the error recovery decision of this code is to
> > exit the process with an error.  [...]
> > TBH though, I've never hit that code in real usage.
> 
> Yeah, I think the reason it's stayed like that for 25 years is that
> nobody's hit the case in practice.  Changing the API would be a bit
> troublesome, too, because we don't know if anybody besides psql
> uses it.

True.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/



Re: PSQL error: total cell count of XXX exceeded

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On 2023-Nov-21, Tom Lane wrote:
>> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
>>> It's a bit annoying that the error recovery decision of this code is to
>>> exit the process with an error.  [...]
>>> TBH though, I've never hit that code in real usage.

>> Yeah, I think the reason it's stayed like that for 25 years is that
>> nobody's hit the case in practice.  Changing the API would be a bit
>> troublesome, too, because we don't know if anybody besides psql
>> uses it.

> True.

It strikes me that perhaps a workable compromise behavior could be
"report the error to wherever we would have printed the table, and
return normally".  I'm still not excited about doing anything about
it, but ...

            regards, tom lane