Thread: PSQL error: total cell count of XXX exceeded
Hi Hackers,
When I tried to select a big amount of rows, psql complains a error "Cannot add cell to table content: total cell count of 905032704 exceeded."
Here are the reproduce steps:
```
interma=# select version();
version
-----------------------------------------------------------------------------------------
PostgreSQL 12.13 on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
(1 row)
interma=# create table t26(a int,b int,c int,d int,e int,f int,g int,h int,i int,j int,k int,l int,m int,n int,o int,p int,q int,r int,s int,t int ,u int,v int,w int,x int,y int,z int);
CREATE TABLE
interma=# insert into t26 select generate_series(1,200000000);
INSERT 0 200000000
interma=# select * from t26;
Cannot add cell to table content: total cell count of 905032704 exceeded. ```
I checked the related code, and root cause is clear:
```
// in printTableAddCell()
if (content->cellsadded >= content->ncolumns * content->nrows)
report this error and exit
// cellsadded is long type, but ncolumns and nrows are int
// so, it's possible overflow the int value here.
// using a test program to verify:
int rows = 200000000;
int cols = 26;
printf("%d*%d = %d\n", rows,cols, rows*cols);
output:
2,0000,0000*26 = 9,0503,2704 // overflow and be truncated into int value here ```
Based on it, I think it's a bug. We should use long for ncolumns and nrows and give a more obvious error message here.
My version is 12.13, and I think the latest code also exists this issue: issue: https://github.com/postgres/postgres/blob/1a4fd77db85abac63e178506335aee74625f6499/src/fe_utils/print.c#L3259
Any thoughts? or some other hidden reasons?
Thanks.
On Friday, August 25, 2023, Hongxu Ma <interma@outlook.com> wrote:
When I tried to select a big amount of rows, psql complains a error "Cannot add cell to table content: total cell count of 905032704 exceeded."We should use long for ncolumns and nrows and give a more obvious error message here.Any thoughts? or some other hidden reasons?
9 millions cells seems more than realistic a limit for a psql query result output. In any case it isn’t a bug, the code demonstrates that fact by producing an explicit error.
I wouldn’t be adverse to an improved error message, and possibly documenting said limit.
David J.
Thank you David.
From the code logic, I don't think this check is meant to check the limit:
If it enters the double-loop (cont.nrows * cont.ncolumns) in printQuery(), the check should be always false (except overflow happened). So, if want to check the limit, we could have done this check before the double-loop: just checking PGresult and reports error earlier.
> I wouldn’t be adverse to an improved error message, and possibly documenting said limit.
Agreed with you, current error message may even report a negative value, it's very confusing for user. It's better to introduce a limit here. Or using a bigger integer type (e.g. long) for them, but it's also have the theoretical upbound.
Thanks.
From: David G. Johnston <david.g.johnston@gmail.com>
Sent: Saturday, August 26, 2023 12:09
To: Hongxu Ma <interma@outlook.com>
Cc: PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Subject: Re: PSQL error: total cell count of XXX exceeded
Sent: Saturday, August 26, 2023 12:09
To: Hongxu Ma <interma@outlook.com>
Cc: PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Subject: Re: PSQL error: total cell count of XXX exceeded
On Friday, August 25, 2023, Hongxu Ma <interma@outlook.com> wrote:
When I tried to select a big amount of rows, psql complains a error "Cannot add cell to table content: total cell count of 905032704 exceeded."We should use long for ncolumns and nrows and give a more obvious error message here.Any thoughts? or some other hidden reasons?
9 millions cells seems more than realistic a limit for a psql query result output. In any case it isn’t a bug, the code demonstrates that fact by producing an explicit error.
I wouldn’t be adverse to an improved error message, and possibly documenting said limit.
David J.
I created a patch to fix it.
Really appreciate to anyone can help to review it.
Thanks.
From: Hongxu Ma <interma@outlook.com>
Sent: Saturday, August 26, 2023 19:19
To: David G. Johnston <david.g.johnston@gmail.com>
Cc: PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Subject: Re: PSQL error: total cell count of XXX exceeded
Sent: Saturday, August 26, 2023 19:19
To: David G. Johnston <david.g.johnston@gmail.com>
Cc: PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Subject: Re: PSQL error: total cell count of XXX exceeded
Thank you David.
From the code logic, I don't think this check is meant to check the limit:
If it enters the double-loop (cont.nrows * cont.ncolumns) in printQuery(), the check should be always false (except overflow happened). So, if want to check the limit, we could have done this check before the double-loop: just checking PGresult and reports error earlier.
> I wouldn’t be adverse to an improved error message, and possibly documenting said limit.
Agreed with you, current error message may even report a negative value, it's very confusing for user. It's better to introduce a limit here. Or using a bigger integer type (e.g. long) for them, but it's also have the theoretical upbound.
Thanks.
From: David G. Johnston <david.g.johnston@gmail.com>
Sent: Saturday, August 26, 2023 12:09
To: Hongxu Ma <interma@outlook.com>
Cc: PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Subject: Re: PSQL error: total cell count of XXX exceeded
Sent: Saturday, August 26, 2023 12:09
To: Hongxu Ma <interma@outlook.com>
Cc: PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Subject: Re: PSQL error: total cell count of XXX exceeded
On Friday, August 25, 2023, Hongxu Ma <interma@outlook.com> wrote:
When I tried to select a big amount of rows, psql complains a error "Cannot add cell to table content: total cell count of 905032704 exceeded."We should use long for ncolumns and nrows and give a more obvious error message here.Any thoughts? or some other hidden reasons?
9 millions cells seems more than realistic a limit for a psql query result output. In any case it isn’t a bug, the code demonstrates that fact by producing an explicit error.
I wouldn’t be adverse to an improved error message, and possibly documenting said limit.
David J.
Attachment
On Mon, 11 Sept 2023 at 08:51, Hongxu Ma <interma@outlook.com> wrote: > > I created a patch to fix it. > Really appreciate to anyone can help to review it. > Thanks. I think "product" as a variable name isn't very descriptive. Let's call it total_cells (or something similar instead). Other than that I think it's a good change. content->cellsadded is also a long, So I agree that I don't think the limit of int cells was intended here.
Thank you for your advice, Jelte.
I have refactored my code, please see the attached patch. (and I put it into https://commitfest.postgresql.org/45/ for trace)
Thanks.
From: Jelte Fennema <postgres@jeltef.nl>
Sent: Monday, September 11, 2023 15:04
To: Hongxu Ma <interma@outlook.com>
Cc: David G. Johnston <david.g.johnston@gmail.com>; PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Subject: Re: PSQL error: total cell count of XXX exceeded
Sent: Monday, September 11, 2023 15:04
To: Hongxu Ma <interma@outlook.com>
Cc: David G. Johnston <david.g.johnston@gmail.com>; PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Subject: Re: PSQL error: total cell count of XXX exceeded
On Mon, 11 Sept 2023 at 08:51, Hongxu Ma <interma@outlook.com> wrote:
>
> I created a patch to fix it.
> Really appreciate to anyone can help to review it.
> Thanks.
I think "product" as a variable name isn't very descriptive. Let's
call it total_cells (or something similar instead).
Other than that I think it's a good change. content->cellsadded is
also a long, So I agree that I don't think the limit of int cells was
intended here.
>
> I created a patch to fix it.
> Really appreciate to anyone can help to review it.
> Thanks.
I think "product" as a variable name isn't very descriptive. Let's
call it total_cells (or something similar instead).
Other than that I think it's a good change. content->cellsadded is
also a long, So I agree that I don't think the limit of int cells was
intended here.
Attachment
On Tue, Sep 12, 2023 at 02:39:55AM +0000, Hongxu Ma wrote: > Thank you for your advice, Jelte. > I have refactored my code, please see the attached patch. (and I put > it into https://commitfest.postgresql.org/45/ for trace) { + long total_cells; long is 4 bytes on Windows, and 8 bytes basically elsewhere. So you would still have the same problem on Windows, no? -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > On Tue, Sep 12, 2023 at 02:39:55AM +0000, Hongxu Ma wrote: > + long total_cells; > long is 4 bytes on Windows, and 8 bytes basically elsewhere. So you > would still have the same problem on Windows, no? More to the point: what about the multiplication in printTableInit? The cat's been out of the bag for quite some time before we get to printTableAddCell. I'm more than a bit skeptical about trying to do something about this, simply because this range of query result sizes is far past what is practical. The OP clearly hasn't tested his patch on actually overflowing query results, and I don't care to either. regards, tom lane
Thanks for pointing that, I did miss some other "ncols * nrows" places. Uploaded v3 patch to fix them.
As for the Windows, I didn't test it before but I think it should also have the issue (and happens more possible since
`cellsadded` is also a long type). My fix idea is simple: define a common long64 type for it.
I referred MSDN: only `LONGLONG` and `LONG64` are 64 bytes. And I assume Postgres should already have a similar type, but only found `typedef long int int64` in src/include/c.h, looks it's not a proper choose.
@Michael Paquier, could you help to give some advices here (which type should be used? or should define a new one?). Thank you very much.
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Tuesday, September 12, 2023 12:19
To: Michael Paquier <michael@paquier.xyz>
Cc: Hongxu Ma <interma@outlook.com>; Jelte Fennema <postgres@jeltef.nl>; David G. Johnston <david.g.johnston@gmail.com>; PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Subject: Re: PSQL error: total cell count of XXX exceeded
Sent: Tuesday, September 12, 2023 12:19
To: Michael Paquier <michael@paquier.xyz>
Cc: Hongxu Ma <interma@outlook.com>; Jelte Fennema <postgres@jeltef.nl>; David G. Johnston <david.g.johnston@gmail.com>; PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Subject: Re: PSQL error: total cell count of XXX exceeded
Michael Paquier <michael@paquier.xyz> writes:
> On Tue, Sep 12, 2023 at 02:39:55AM +0000, Hongxu Ma wrote:
> + long total_cells;
> long is 4 bytes on Windows, and 8 bytes basically elsewhere. So you
> would still have the same problem on Windows, no?
More to the point: what about the multiplication in printTableInit?
The cat's been out of the bag for quite some time before we get to
printTableAddCell.
I'm more than a bit skeptical about trying to do something about this,
simply because this range of query result sizes is far past what is
practical. The OP clearly hasn't tested his patch on actually
overflowing query results, and I don't care to either.
regards, tom lane
> On Tue, Sep 12, 2023 at 02:39:55AM +0000, Hongxu Ma wrote:
> + long total_cells;
> long is 4 bytes on Windows, and 8 bytes basically elsewhere. So you
> would still have the same problem on Windows, no?
More to the point: what about the multiplication in printTableInit?
The cat's been out of the bag for quite some time before we get to
printTableAddCell.
I'm more than a bit skeptical about trying to do something about this,
simply because this range of query result sizes is far past what is
practical. The OP clearly hasn't tested his patch on actually
overflowing query results, and I don't care to either.
regards, tom lane
Attachment
After double check, looks `int64` of src/include/c.h is the proper type for it.
Uploaded the v4 patch to fix it.
Thanks.
From: Hongxu Ma <interma@outlook.com>
Sent: Wednesday, September 13, 2023 10:22
To: Tom Lane <tgl@sss.pgh.pa.us>; Michael Paquier <michael@paquier.xyz>
Cc: PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Subject: Re: PSQL error: total cell count of XXX exceeded
Sent: Wednesday, September 13, 2023 10:22
To: Tom Lane <tgl@sss.pgh.pa.us>; Michael Paquier <michael@paquier.xyz>
Cc: PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Subject: Re: PSQL error: total cell count of XXX exceeded
Thanks for pointing that, I did miss some other "ncols * nrows" places. Uploaded v3 patch to fix them.
As for the Windows, I didn't test it before but I think it should also have the issue (and happens more possible since
`cellsadded` is also a long type). My fix idea is simple: define a common long64 type for it.
I referred MSDN: only `LONGLONG` and `LONG64` are 64 bytes. And I assume Postgres should already have a similar type, but only found `typedef long int int64` in src/include/c.h, looks it's not a proper choose.
@Michael Paquier, could you help to give some advices here (which type should be used? or should define a new one?). Thank you very much.
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Tuesday, September 12, 2023 12:19
To: Michael Paquier <michael@paquier.xyz>
Cc: Hongxu Ma <interma@outlook.com>; Jelte Fennema <postgres@jeltef.nl>; David G. Johnston <david.g.johnston@gmail.com>; PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Subject: Re: PSQL error: total cell count of XXX exceeded
Sent: Tuesday, September 12, 2023 12:19
To: Michael Paquier <michael@paquier.xyz>
Cc: Hongxu Ma <interma@outlook.com>; Jelte Fennema <postgres@jeltef.nl>; David G. Johnston <david.g.johnston@gmail.com>; PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Subject: Re: PSQL error: total cell count of XXX exceeded
Michael Paquier <michael@paquier.xyz> writes:
> On Tue, Sep 12, 2023 at 02:39:55AM +0000, Hongxu Ma wrote:
> + long total_cells;
> long is 4 bytes on Windows, and 8 bytes basically elsewhere. So you
> would still have the same problem on Windows, no?
More to the point: what about the multiplication in printTableInit?
The cat's been out of the bag for quite some time before we get to
printTableAddCell.
I'm more than a bit skeptical about trying to do something about this,
simply because this range of query result sizes is far past what is
practical. The OP clearly hasn't tested his patch on actually
overflowing query results, and I don't care to either.
regards, tom lane
> On Tue, Sep 12, 2023 at 02:39:55AM +0000, Hongxu Ma wrote:
> + long total_cells;
> long is 4 bytes on Windows, and 8 bytes basically elsewhere. So you
> would still have the same problem on Windows, no?
More to the point: what about the multiplication in printTableInit?
The cat's been out of the bag for quite some time before we get to
printTableAddCell.
I'm more than a bit skeptical about trying to do something about this,
simply because this range of query result sizes is far past what is
practical. The OP clearly hasn't tested his patch on actually
overflowing query results, and I don't care to either.
regards, tom lane
Attachment
On 2023-Sep-12, Tom Lane wrote: > I'm more than a bit skeptical about trying to do something about this, > simply because this range of query result sizes is far past what is > practical. The OP clearly hasn't tested his patch on actually > overflowing query results, and I don't care to either. I think we're bound to hit this limit at some point in the future, and it seems easy enough to solve. I propose the attached, which is pretty much what Hongxu last submitted, with some minor changes. Having this make a difference requires some 128GB of RAM, so it's not a piece of cake, but it's an amount that can be reasonably expected to be physically installed in real machines nowadays. (I first thought we could just use pg_mul_s32_overflow during printTableInit and raise an error if that returns true, but that just postpones the problem.) -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ Subversion to GIT: the shortest path to happiness I've ever heard of (Alexey Klyukin)
Attachment
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > I think we're bound to hit this limit at some point in the future, and > it seems easy enough to solve. I propose the attached, which is pretty > much what Hongxu last submitted, with some minor changes. This bit needs more work: - content->cells = pg_malloc0((ncolumns * nrows + 1) * sizeof(*content->cells)); + total_cells = (int64) ncolumns * nrows; + content->cells = pg_malloc0((total_cells + 1) * sizeof(*content->cells)); You've made the computation of total_cells reliable, but there's nothing stopping the subsequent computation of the malloc argument from overflowing (especially on 32-bit machines). I think we need an explicit test along the lines of if (total_cells >= SIZE_MAX / sizeof(*content->cells)) throw error; (">=" allows not needing to add +1.) Also, maybe total_cells should be uint64? We don't want negative values to pass this test. Alternatively, add a separate check that total_cells >= 0. It should be sufficient to be paranoid about this in printTableInit, since after that we know the product of ncolumns * nrows isn't too big. The rest of this passes an eyeball check. regards, tom lane
On 2023-Sep-13, Hongxu Ma wrote: > After double check, looks `int64` of src/include/c.h is the proper type for it. > Uploaded the v4 patch to fix it. Right. I made a few more adjustments, including the additional overflow check in printTableInit that Tom Lane suggested, and pushed this. It's a bit annoying that the error recovery decision of this code is to exit the process with an error. If somebody were to be interested in a fun improvement exercise, it may be worth redoing the print.c API so that it returns errors that psql can report and recover from, instead of just closing the process. TBH though, I've never hit that code in real usage. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > Right. I made a few more adjustments, including the additional overflow > check in printTableInit that Tom Lane suggested, and pushed this. Committed patch LGTM. > It's a bit annoying that the error recovery decision of this code is to > exit the process with an error. If somebody were to be interested in a > fun improvement exercise, it may be worth redoing the print.c API so > that it returns errors that psql can report and recover from, instead of > just closing the process. > TBH though, I've never hit that code in real usage. Yeah, I think the reason it's stayed like that for 25 years is that nobody's hit the case in practice. Changing the API would be a bit troublesome, too, because we don't know if anybody besides psql uses it. regards, tom lane
On 2023-Nov-21, Tom Lane wrote: > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > > Right. I made a few more adjustments, including the additional overflow > > check in printTableInit that Tom Lane suggested, and pushed this. > > Committed patch LGTM. Thanks for looking! > > It's a bit annoying that the error recovery decision of this code is to > > exit the process with an error. [...] > > TBH though, I've never hit that code in real usage. > > Yeah, I think the reason it's stayed like that for 25 years is that > nobody's hit the case in practice. Changing the API would be a bit > troublesome, too, because we don't know if anybody besides psql > uses it. True. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > On 2023-Nov-21, Tom Lane wrote: >> Alvaro Herrera <alvherre@alvh.no-ip.org> writes: >>> It's a bit annoying that the error recovery decision of this code is to >>> exit the process with an error. [...] >>> TBH though, I've never hit that code in real usage. >> Yeah, I think the reason it's stayed like that for 25 years is that >> nobody's hit the case in practice. Changing the API would be a bit >> troublesome, too, because we don't know if anybody besides psql >> uses it. > True. It strikes me that perhaps a workable compromise behavior could be "report the error to wherever we would have printed the table, and return normally". I'm still not excited about doing anything about it, but ... regards, tom lane