Re: Does RelCache/SysCache shrink except when relations are deleted? - Mailing list pgsql-hackers

From Bowen Shi
Subject Re: Does RelCache/SysCache shrink except when relations are deleted?
Date
Msg-id CAM_vCufVCeDfP3T4uccK-ssY-GkgxAE8Cy8A6sDer2-fm-jF6w@mail.gmail.com
Whole thread Raw
In response to Does RelCache/SysCache shrink except when relations are deleted?  ("MauMau" <maumau307@gmail.com>)
Responses RE: Does RelCache/SysCache shrink except when relations are deleted?
List pgsql-hackers
Hello,

It has been over a decade since the discussion of this email. I would like to know if there has been any proposal regarding the eviction of relcache/syscache. 

We have recently encountered a similar memory problem: relcache/syscache keeps growing for the reason that connections have been reserved for a long time (one hour or more), and has more than 100000+ tables in the database.

Regards
Bowen Shi

On Wed, Dec 4, 2024 at 11:23 AM MauMau <maumau307@gmail.com> wrote:
Hello,

Please let me ask you some questions about RelCache/SysCache/CatCache
design. I know I should post this to pgsql-general, but I decided to post
here because the content includes design questions.

<<Background>>
My customer is facing a "out of memory" problem during a batch job. I'd like
to know the cause and solutions. PostgreSQL version is 8.2.7 (32-bit on
Linux).

The batch job consists of two steps in a single psql session:

1. call some PL/pgSQL function (say "somefunc" here)
2. VACUUM tables (at this time, maintenance_work_mem=256MB)

The step 2 emitted the following messages in syslog.

ERROR:  out of memory
DETAIL:  Failed on request of size 268435452.
STATEMENT:  VACUUM some_table_name

somefunc copies rows from a single table to 100,000 tables (table_1 -
table_100000) as follows:

[somefunc]
FOR id in 1 .. 100000 LOOP
    check if the table "table_${ID}" exists by searching pg_class
    if the table exists
        INSERT INTO table_${id} SELECT * FROM some_table
            WHERE pk = id;
    else /* the table does not exist */
        CREATE TABLE table_${id} AS SELECT * FROM some_table
            WHERE pk = id;
END LOOP;

Before starting somefunc, the virtual memory of the backend postgres is
1.6GB, as reported by top command as "VIRT" column. When somefunc completes,
it becomes 2.6GB. So, VACUUM cannot allocate 256MB because the virtual
memory space is full.

This is all the information I have now. I requested the customer to collect
PostgreSQL server log so that memory context statistics can be obtained when
"out of memory" occurs. Plus, I asked for the result of "SHOW ALL" and the
minimal procedure to reproduce the problem. However, I'd like to ask your
opinions rather than waiting for the problem to happen again.


<<Question>>
I'm guessing that CacheMemoryContext might be using much memory, because
somefunc accesses as many as 100,000 tables. But I don't understand
RelCache/SysCache implementation yet.

Q1: When are the RelCache/SysCache entries removed from CacheMemoryContext?
Are they removed only when the corresponding relations are deleted? If so,
"many tables and indexes" is not friendly for the current PostgreSQL?

Q2: somefunc increased 1GB of virtual memory after accessing 100,000 tables.
This means that one table uses 10KB of local memory.
Is it common that this much memory is used for RelCache/SysCache or other
control information?
Does the number of attributes in a table affect local memory usage much?

Q3: I think one solution is to run VACUUM in a separate psql session.
Are there any other solutions you can think of?

Q4: The customer says one strange thing. If the 100,000 tables exist before
somefunc starts (i.e., somefunc just copy records), the virtual memory of
postgres does not increase.
Is there anything to reason about his comment?

Regards
MauMau



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Cannot find a working 64-bit integer type on Illumos
Next
From: Tom Lane
Date:
Subject: Re: Remove useless casts to (void *)