[BUG] Excessive memory usage with update on STORED generated columns. - Mailing list pgsql-hackers

From Anton A. Melnikov
Subject [BUG] Excessive memory usage with update on STORED generated columns.
Date
Msg-id ddc34dbd-1efc-4710-824c-e101e7eb63e6@postgrespro.ru
Whole thread Raw
Responses Re: [BUG] Excessive memory usage with update on STORED generated columns.
Re: [BUG] Excessive memory usage with update on STORED generated columns.
List pgsql-hackers
Hi!


My colleagues found that a queries like that:

\timing

DROP TABLE IF EXISTS t;

CREATE TABLE t (
     id int,
     a int,
     b int,
     g text GENERATED ALWAYS AS (
         lpad((CASE WHEN a <> 0 THEN a ELSE b END)::text, 10, '0') ||
         lpad((CASE WHEN a <> 0 THEN a ELSE b END)::text, 10, '0') ||
         lpad((CASE WHEN a <> 0 THEN a ELSE b END)::text, 10, '0') ||
         lpad((CASE WHEN a <> 0 THEN a ELSE b END)::text, 10, '0') ||
         lpad((CASE WHEN a <> 0 THEN a ELSE b END)::text, 10, '0') ||
         lpad((CASE WHEN a <> 0 THEN a ELSE b END)::text, 10, '0') ||
         lpad((CASE WHEN a <> 0 THEN a ELSE b END)::text, 10, '0') ||
         lpad((CASE WHEN a <> 0 THEN a ELSE b END)::text, 10, '0') ||
         lpad((CASE WHEN a <> 0 THEN a ELSE b END)::text, 10, '0') ||
         lpad((CASE WHEN a <> 0 THEN a ELSE b END)::text, 10, '0')
     ) STORED
);

INSERT INTO t
SELECT 1, 100, 0
FROM generate_series(1, 1000000);

UPDATE t SET id = 2; -- < problem query

lead to excessive memory consumption up to 10Gb in this example and
query execution time up to ~1,5min.

Bisect shows that the problem appeared after commit 83ea6c540
(Virtual generated columns).

Before this commit the update query took only ~8s and the memory
consumption did not exceed 150Mb for this backend.
MemoryContextStats reports only a small amount of memory usage, while
malloc_stats() confirms large allocations outside PostgreSQL memory
contexts.

With help of massif tool i found repeated allocations originating from:

     ExecInitGenerated
       → build_column_default
         → stringToNode

This indicates that generated expressions are reparsed multiple times,
once per row to be updated instead of being reused.

There is a problem call stack during UPDATE t SET id = 2;
execution: see attached bt.txt, please.

Before the above-mentioned commit, ExecInitGenerated() was effectively
invoked once per ResultRelInfo, so this behavior was not observable.

I would like to propose a fix that add a caching of the the parsed
expression trees (Node *) in ResultRelInfo, so that build_column_default()
and stringToNode() are executed at most once per attribute per query.

With this fix, the query execution time
and memory consumption return to normal:

postgres=# UPDATE t SET id = 2;
UPDATE 1000000
Time: 11522,621 ms (00:11,523)


A patch for this approach for current master is attached here.


Would be glad for any feedback.

Best regards,

-- 
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment

pgsql-hackers by date:

Previous
From: Daniil Davydov
Date:
Subject: Re: Get rid of redundant StringInfo accumulation
Next
From: Heikki Linnakangas
Date:
Subject: Re: Thread-safe getopt()