Limit GRAPH_TABLE path combinations to prevent memory exhaustion - Mailing list pgsql-hackers

From SATYANARAYANA NARLAPURAM
Subject Limit GRAPH_TABLE path combinations to prevent memory exhaustion
Date
Msg-id CAHg+QDfDwcM4=DSiAV6Ly89YQ5EcMhzO1-9x=mGG1WJzODcAig@mail.gmail.com
Whole thread
Responses Re: Limit GRAPH_TABLE path combinations to prevent memory exhaustion
List pgsql-hackers
Hi hackers,

generate_queries_for_path_pattern_recurse() enumerates all path
combinations by recursing over the Cartesian product of matching elements
per pattern position.  Without IS label filters, each position matches
ALL tables of that kind, leading to N^K combinations (N tables, K
pattern positions).  Each combination allocates a Query node via palloc
causing unbounded memory growth.

A 8-table graph with a -element pattern reaches 81.3 GB RES in a few seconds
before I cancel the query. Tests in the patch (those were failed) can reproduce the problem
without the fix included in the patch.

top - 15:04:19 up 43 days, 19:18,  5 users,  load average: 0.43, 0.19, 0.08
Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.9 us,  0.8 sy,  0.0 ni, 98.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 515766.2 total, 248412.7 free, 234847.7 used,  48014.7 buff/cache    
MiB Swap:      0.0 total,      0.0 free,      0.0 used. 280918.6 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                  
 649642 azureus+  20   0  212.2g  81.3g  33948 R 100.0  16.1   0:41.20 postgres  


As a POC I added a pre-computation check that calculates the total number 
of path combinations before entering the generate_queries_for_path_pattern_recurse.
If the product exceeds MAX_GRAPH_TABLE_PATH_COMBINATIONS (set to 10,000), 
the rewriter reports ERRCODE_PROGRAM_LIMIT_EXCEEDED with a hint suggesting 
IS label filters to reduce the search space. The limit of 10,000 is somewhat arbitrary
but conservative. It caps memory at roughly 5 MB of Query nodes.
Patterns that would exceed the limit without labels can always be made to succeed
by adding IS expressions to pin specific positions to fewer tables.
Alternatively, we can consider adding a GUC to control the limit but appears
to be an overkill. Thoughts?

Thanks,
Satya
Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Bogus recovery conflicts due to outdated XLogReceiptTime
Next
From: Ayush Tiwari
Date:
Subject: Re: [PATCH] Reject ENCODING option for COPY TO FORMAT JSON