Thread: Crash: invalid DSA memory alloc request
Hello, I'm running a couple of large tests, and in this particular test I have a few million tables more. At some point it fails, and I gathered the following trace: 2024-12-12 22:22:55.307 CET [1496210] ERROR: invalid DSA memory alloc request size 1073741824 2024-12-12 22:22:55.307 CET [1496210] BACKTRACE: postgres: ads tabletest [local] CREATE TABLE(+0x15e570) [0x6309c379c570] postgres: ads tabletest [local] CREATE TABLE(dshash_find_or_insert+0x1a4) [0x6309c39882d4] postgres: ads tabletest [local] CREATE TABLE(pgstat_get_entry_ref+0x440) [0x6309c3b0a530] postgres: ads tabletest [local] CREATE TABLE(pgstat_prep_pending_entry+0x3a) [0x6309c3b0676a] postgres: ads tabletest [local] CREATE TABLE(pgstat_assoc_relation+0x32) [0x6309c3b086c2] postgres: ads tabletest [local] CREATE TABLE(StartReadBuffer+0x3c0) [0x6309c3ab9870] postgres: ads tabletest [local] CREATE TABLE(ReadBufferExtended+0xa1) [0x6309c3abb271] postgres: ads tabletest [local] CREATE TABLE(+0x2c6caa) [0x6309c3904caa] postgres: ads tabletest [local] CREATE TABLE(AlterSequence+0xc0) [0x6309c3905860] postgres: ads tabletest [local] CREATE TABLE(+0x4b6336) [0x6309c3af4336] postgres: ads tabletest [local] CREATE TABLE(standard_ProcessUtility+0x259) [0x6309c3af33f9] postgres: ads tabletest [local] CREATE TABLE(+0x4b6e64) [0x6309c3af4e64] postgres: ads tabletest [local] CREATE TABLE(standard_ProcessUtility+0x259) [0x6309c3af33f9] postgres: ads tabletest [local] CREATE TABLE(+0x4b3d2f) [0x6309c3af1d2f] postgres: ads tabletest [local] CREATE TABLE(+0x4b3e4b) [0x6309c3af1e4b] postgres: ads tabletest [local] CREATE TABLE(PortalRun+0x16f) [0x6309c3af226f] postgres: ads tabletest [local] CREATE TABLE(+0x4b06cc) [0x6309c3aee6cc] postgres: ads tabletest [local] CREATE TABLE(PostgresMain+0xf67) [0x6309c3aefa87] postgres: ads tabletest [local] CREATE TABLE(+0x4accc5) [0x6309c3aeacc5] postgres: ads tabletest [local] CREATE TABLE(postmaster_child_launch+0x8f) [0x6309c3a5b95f] postgres: ads tabletest [local] CREATE TABLE(+0x421479) [0x6309c3a5f479] postgres: ads tabletest [local] CREATE TABLE(PostmasterMain+0xd71) [0x6309c3a61251] postgres: ads tabletest [local] CREATE TABLE(main+0x207) [0x6309c379efc7] /lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca) [0x710c33a2a1ca] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b) [0x710c33a2a28b] postgres: ads tabletest [local] CREATE TABLE(_start+0x25) [0x6309c379f595] 2024-12-12 22:22:55.307 CET [1496210] STATEMENT: CREATE TABLE IF NOT EXISTS test_16718629 (id SERIAL PRIMARY KEY, d VARCHAR(200), e VARCHAR(200), f VARCHAR(200), i INTEGER, j INTEGER); PostgreSQL Version is 17.2, compiled with debug symbols. tabletest=# select version(); version -------------------------------------------------------------------------------------------------- PostgreSQL 17.2 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 13.2.0-23ubuntu4) 13.2.0, 64-bit (1 row) I'm not able to reproduce this for every DDL statement, but grouping together about 50 of them it fails at some point. Regards, -- Andreas 'ads' Scherbaum German PostgreSQL User Group European PostgreSQL User Group - Board of Directors Volunteer Regional Contact, Germany - PostgreSQL Project
On 12/12/2024 22:49, Matthias van de Meent wrote: > On Thu, 12 Dec 2024 at 22:28, Andreas 'ads' Scherbaum <ads@pgug.de> wrote: >> >> Hello, >> >> I'm running a couple of large tests, and in this particular test I have >> a few million tables more. >> >> At some point it fails, and I gathered the following trace: >> >> >> 2024-12-12 22:22:55.307 CET [1496210] ERROR: invalid DSA memory alloc >> request size 1073741824 >> 2024-12-12 22:22:55.307 CET [1496210] BACKTRACE: >> postgres: ads tabletest [local] CREATE TABLE(+0x15e570) >> [0x6309c379c570] >> postgres: ads tabletest [local] CREATE >> TABLE(dshash_find_or_insert+0x1a4) [0x6309c39882d4] >> postgres: ads tabletest [local] CREATE >> TABLE(pgstat_get_entry_ref+0x440) [0x6309c3b0a530] > It looks like the dshash table used in the pgstats system uses > resize(), which only specifies DSA_ALLOC_ZERO, not DSA_ALLOC_HUGE, > causing issues when the table grows larger than 1 GB. > > I expect that error to disappear when you replace the > dsa_allocate0(...) call in dshash.c's resize function with > dsa_allocate_extended(..., DSA_ALLOC_HUGE | DSA_ALLOC_ZERO) as > attached, but haven't tested it due to a lack of database with > millions of relations. IIUC the table is doubled in size when filled over 75%, so we went from 500MB to 1GB here, doubling the number of available buckets. It's probably good up to a point but the size limit is exceed here only by 1 byte and 1GB-1 are hopefully more than enough pointers. Is it interesting to revisit the logic to increase size less quickly (over 500MB) ? (if at all possible given how buckets and partitions are managed). There is this comment in 8c0d7bafad3 which introduce this "dshash": There is a wide range of potential users for such a hash table, though it's very likely the interface will need to evolve as we come to understand the needs of different kinds of users. E.g support for iterators and incremental resizing is planned for later commits and the details of the callback signatures are likely to change. I'm unsure iterators and incremental resizing has made it ? --- Cédric Villemain +33 6 20 30 22 52 https://www.Data-Bene.io PostgreSQL Support, Expertise, Training, R&D
Hi Matthias,
On Thu, Dec 12, 2024 at 10:49 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
On Thu, 12 Dec 2024 at 22:28, Andreas 'ads' Scherbaum <ads@pgug.de> wrote:
>
>
> Hello,
>
> I'm running a couple of large tests, and in this particular test I have
> a few million tables more.
>
> At some point it fails, and I gathered the following trace:
>
>
> 2024-12-12 22:22:55.307 CET [1496210] ERROR: invalid DSA memory alloc
> request size 1073741824
> 2024-12-12 22:22:55.307 CET [1496210] BACKTRACE:
> postgres: ads tabletest [local] CREATE TABLE(+0x15e570)
> [0x6309c379c570]
> postgres: ads tabletest [local] CREATE
> TABLE(dshash_find_or_insert+0x1a4) [0x6309c39882d4]
> postgres: ads tabletest [local] CREATE
> TABLE(pgstat_get_entry_ref+0x440) [0x6309c3b0a530]
It looks like the dshash table used in the pgstats system uses
resize(), which only specifies DSA_ALLOC_ZERO, not DSA_ALLOC_HUGE,
causing issues when the table grows larger than 1 GB.
I expect that error to disappear when you replace the
dsa_allocate0(...) call in dshash.c's resize function with
dsa_allocate_extended(..., DSA_ALLOC_HUGE | DSA_ALLOC_ZERO) as
attached, but haven't tested it due to a lack of database with
millions of relations.
Can confirm that the crash no longer happens when applying your patch.
Was able to both continue the old and crashed test, as well as run a new test:
tabletest=# select count(*) from information_schema.tables;
count
----------
20000211
(1 row)
count
----------
20000211
(1 row)
Thanks,
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project
On Mon, Dec 16, 2024 at 08:00:00AM +0100, Andreas 'ads' Scherbaum wrote: > Can confirm that the crash no longer happens when applying your patch. The patch looks reasonable to me. I'll commit it soon unless someone objects. I was surprised to learn that the DSA_ALLOC_HUGE flag is only intended to catch faulty allocation requests [0]. > Was able to both continue the old and crashed test, as well as run a new > test: > > tabletest=# select count(*) from information_schema.tables; > count > ---------- > 20000211 > (1 row) That's a lot of tables... [0] https://postgr.es/m/28062.1487456862%40sss.pgh.pa.us -- nathan
Hello,
On Mon, Dec 16, 2024 at 11:18 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Mon, Dec 16, 2024 at 08:00:00AM +0100, Andreas 'ads' Scherbaum wrote:
> Can confirm that the crash no longer happens when applying your patch.
The patch looks reasonable to me. I'll commit it soon unless someone
objects. I was surprised to learn that the DSA_ALLOC_HUGE flag is only
intended to catch faulty allocation requests [0].
Is there a way to test it, except by creating so many tables?
There might be more such problems.
I did run a few basic queries in the database, but that's far from a full test.
> Was able to both continue the old and crashed test, as well as run a new
> test:
>
> tabletest=# select count(*) from information_schema.tables;
> count
> ----------
> 20000211
> (1 row)
That's a lot of tables...
Started as a discussion, got me curious and it's only about a magnitude or so off
from what I've seen in production.
Not unrealistic to find out when and where it breaks.
Thanks,
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project
Hi, On 2024-12-17 16:50:45 +0900, Michael Paquier wrote: > On Mon, Dec 16, 2024 at 04:18:26PM -0600, Nathan Bossart wrote: > > On Mon, Dec 16, 2024 at 08:00:00AM +0100, Andreas 'ads' Scherbaum wrote: > >> Can confirm that the crash no longer happens when applying your patch. > > > > The patch looks reasonable to me. I'll commit it soon unless someone > > objects. I was surprised to learn that the DSA_ALLOC_HUGE flag is only > > intended to catch faulty allocation requests [0]. > > No objections. > > Most likely this issue gets by a large degree easier to reach now that > we can plug into the backend custom pgstats kinds. If pgstats or an > equivalent implementation uses pgstats, I don't think that we'll be > able to live without lifting this limit (500k query entries are > common, at 2kB each it would be enough to blow things), so using > DSA_ALLOC_HUGE sounds good to me. I don't see a huge point in > backpatching, FWIW. I don't see why we wouldn't want to backpatch? The number of objects here isn't entirely unrealistic to reach with relations alone, and if you enable e.g. function execution stats it can reasonably reach higher numbers more quickly. And use DSA_ALLOC_HUGE in that place feels like a rather low risk change? Greetings, Andres Freund
On Tue, Dec 17, 2024 at 10:53:07AM -0500, Andres Freund wrote: > On 2024-12-17 16:50:45 +0900, Michael Paquier wrote: >> I don't see a huge point in backpatching, FWIW. > > I don't see why we wouldn't want to backpatch? The number of objects here > isn't entirely unrealistic to reach with relations alone, and if you enable > e.g. function execution stats it can reasonably reach higher numbers more > quickly. And use DSA_ALLOC_HUGE in that place feels like a rather low risk > change? Agreed, this feels low-risk enough to back-patch to at least v15, where statistics were moved to shared memory. But I don't see a strong reason to avoid back-patching it to all supported versions, either. -- nathan
Committed. -- nathan
On 2024-12-17 15:32:06 -0600, Nathan Bossart wrote: > Committed. Thanks!
On 17/12/2024 22:32, Nathan Bossart wrote: > Committed. > Thanks, I see you backpatched it all the way to 13. Will see how far back I can test this, will take a while. Regards, -- Andreas 'ads' Scherbaum German PostgreSQL User Group European PostgreSQL User Group - Board of Directors Volunteer Regional Contact, Germany - PostgreSQL Project