Thread: Hash join in SELECT target list expression keeps consuming memory

Hash join in SELECT target list expression keeps consuming memory

From

Amit Khandekar

Date:

16 March 2018, 13:02:03

Hi,

If the SELECT target list expression is a join subquery, and if the
subquery does a hash join, then the query keeps on consuming more and
more memory. Below is such a query :

SELECT
   (SELECT id FROM
      unnest((pg_catalog.acldefault('L',l.col1))) WITH ORDINALITY AS
perm(acl,id),
      generate_series(1, l.col1::int % 2) AS gen2(gen2_id)
      where id = gen2_id
)
FROM largetable l ;

where the table largetable is created using :
create table largetable as select * from generate_series(1, 32000000)
as t(col1);

Here is the plan :
                                        QUERY PLAN
-------------------------------------------------------------------------------------------
 Seq Scan on largetable l  (cost=0.00..672781971.36 rows=32000018 width=8)
   SubPlan 1
     ->  Hash Join  (cost=2.26..21.01 rows=500 width=8)
           Hash Cond: (gen2.gen2_id = perm.id)
           ->  Function Scan on generate_series gen2
(cost=0.01..10.01 rows=1000 width=4)
           ->  Hash  (cost=1.00..1.00 rows=100 width=8)
                 ->  Function Scan on unnest perm  (cost=0.01..1.00
rows=100 width=8)

Now, if we disable hash join, the planner chooses merge join, and that
does not keep on consuming memory:

                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
 Seq Scan on largetable l  (cost=0.00..2275308358.33 rows=32000018 width=8)
   SubPlan 1
     ->  Merge Join  (cost=59.84..71.09 rows=500 width=8)
           Merge Cond: (perm.id = gen2.gen2_id)
           ->  Function Scan on unnest perm  (cost=0.01..1.00 rows=100 width=8)
           ->  Sort  (cost=59.83..62.33 rows=1000 width=4)
                 Sort Key: gen2.gen2_id
                 ->  Function Scan on generate_series gen2
(cost=0.01..10.01 rows=1000 width=4)

Either with merge join or nested loop join, the postgres process
memory remains constant all the time. (It chooses hash join due to
incorrect row estimates of unnest() and generate_series() functions)

I am yet to analyze the root cause of this behaviour, but meanwhile, I
am posting it here, in case this turns out to be a known
behaviour/issue for hash joins, or there is already some work being
done on it.

I suspected that the set returning functions might be leaking memory,
but then we would have seen the behaviour on other types of joins as
well.

----------

Actually I encountered this issue when I tried to run pg_dump with 32M
number of blobs. The corresponding backend process consumed memory
until it was killed by the OOM killer.

Steps to reproduce the issue with pg_dump :
1. create table lo_table (id numeric, lo oid) ;
2. -- Create 32M rows
   insert into lo_table select
a.i,lo_from_bytea(0,E'\\xffffff0000000000') from
generate_series(1,32775000) as a(i);
3. -- Then run pg_dump.

pg_dump backend gets killed, and pg_dump aborts wih this message, that
has the query which consumed memory :

pg_dump: [archiver (db)] query failed: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_dump: [archiver (db)] query was:

SELECT l.oid,

(SELECT rolname FROM pg_catalog.pg_roles WHERE oid = l.lomowner) AS rolname,

(SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl,
row_n FROM pg_catalog.unnest(coalesce(l.lomacl,pg_catalog.acldefault('L',l.lomowner)))
WITH ORDINALITY AS perm(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM
pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault('L',l.lomowner)))
AS init(init_acl) WHERE acl = init_acl)) as foo) AS lomacl,

(SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl,
row_n FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault('L',l.lomowner)))
WITH ORDINALITY AS initp(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM
pg_catalog.unnest(coalesce(l.lomacl,pg_catalog.acldefault('L',l.lomowner)))
AS permp(orig_acl) WHERE acl = orig_acl)) as foo) AS rlomacl,

NULL AS initlomacl, NULL AS initrlomacl

FROM pg_largeobject_metadata l LEFT JOIN pg_init_privs pip ON (l.oid =
pip.objoid AND pip.classoid = 'pg_largeobject'::regclass AND
pip.objsubid = 0) ;

As you can see, the subplan expressions are using pg_init_privs table,
so as to collect any changed extension ACLs for the large objects. So
this won't reproduce before commit
23f34fa4ba358671adab16773e79c17c92cbc870.

Note: the pg_dump client process itself also consumes more and more
memory, although to a lesser extent. But this is a different thing,
and is already a known/expected behaviour :
https://www.postgresql.org/message-id/29613.1476969807%40sss.pgh.pa.us


-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: Hash join in SELECT target list expression keeps consuming memory

From

Tom Lane

Date:

16 March 2018, 22:17:43

Amit Khandekar <amitdkhan.pg@gmail.com> writes:
> If the SELECT target list expression is a join subquery, and if the
> subquery does a hash join, then the query keeps on consuming more and
> more memory. Below is such a query :

Thanks for the report!

I dug into this with valgrind, and found that the problem is that
ExecHashTableCreate allocates some memory that isn't freed by
ExecHashTableDestroy, specifically the per-hash-key function
information.  This is just dumb.  We can keep that stuff in the
hashtable's hashCxt instead, where it will get freed at the right time.
The attached patch seems to fix it just by reordering the code.

I'm surprised nobody's noticed this before; maybe the problem is
of relatively recent vintage?  Haven't checked the back branches yet.

            regards, tom lane

diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 06bb44b..4f069d1 100644
*** a/src/backend/executor/nodeHash.c
--- b/src/backend/executor/nodeHash.c
*************** ExecHashTableCreate(HashState *state, Li
*** 472,478 ****
       * Initialize the hash table control block.
       *
       * The hashtable control block is just palloc'd from the executor's
!      * per-query memory context.
       */
      hashtable = (HashJoinTable) palloc(sizeof(HashJoinTableData));
      hashtable->nbuckets = nbuckets;
--- 472,479 ----
       * Initialize the hash table control block.
       *
       * The hashtable control block is just palloc'd from the executor's
!      * per-query memory context.  Everything else should be kept inside the
!      * subsidiary hashCxt or batchCxt.
       */
      hashtable = (HashJoinTable) palloc(sizeof(HashJoinTableData));
      hashtable->nbuckets = nbuckets;
*************** ExecHashTableCreate(HashState *state, Li
*** 515,520 ****
--- 516,537 ----
  #endif

      /*
+      * Create temporary memory contexts in which to keep the hashtable working
+      * storage.  See notes in executor/hashjoin.h.
+      */
+     hashtable->hashCxt = AllocSetContextCreate(CurrentMemoryContext,
+                                                "HashTableContext",
+                                                ALLOCSET_DEFAULT_SIZES);
+
+     hashtable->batchCxt = AllocSetContextCreate(hashtable->hashCxt,
+                                                 "HashBatchContext",
+                                                 ALLOCSET_DEFAULT_SIZES);
+
+     /* Allocate data that will live for the life of the hashjoin */
+
+     oldcxt = MemoryContextSwitchTo(hashtable->hashCxt);
+
+     /*
       * Get info about the hash functions to be used for each hash key. Also
       * remember whether the join operators are strict.
       */
*************** ExecHashTableCreate(HashState *state, Li
*** 540,561 ****
          i++;
      }

-     /*
-      * Create temporary memory contexts in which to keep the hashtable working
-      * storage.  See notes in executor/hashjoin.h.
-      */
-     hashtable->hashCxt = AllocSetContextCreate(CurrentMemoryContext,
-                                                "HashTableContext",
-                                                ALLOCSET_DEFAULT_SIZES);
-
-     hashtable->batchCxt = AllocSetContextCreate(hashtable->hashCxt,
-                                                 "HashBatchContext",
-                                                 ALLOCSET_DEFAULT_SIZES);
-
-     /* Allocate data that will live for the life of the hashjoin */
-
-     oldcxt = MemoryContextSwitchTo(hashtable->hashCxt);
-
      if (nbatch > 1 && hashtable->parallel_state == NULL)
      {
          /*
--- 557,562 ----

Re: Hash join in SELECT target list expression keeps consuming memory

From

Amit Khandekar

Date:

19 March 2018, 09:55:06

On 17 March 2018 at 00:47, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Khandekar <amitdkhan.pg@gmail.com> writes:
>> If the SELECT target list expression is a join subquery, and if the
>> subquery does a hash join, then the query keeps on consuming more and
>> more memory. Below is such a query :
>
> Thanks for the report!
>
> I dug into this with valgrind, and found that the problem is that
> ExecHashTableCreate allocates some memory that isn't freed by
> ExecHashTableDestroy, specifically the per-hash-key function
> information.  This is just dumb.  We can keep that stuff in the
> hashtable's hashCxt instead, where it will get freed at the right time.
> The attached patch seems to fix it just by reordering the code.

I saw that you have now committed the fix and also backported it to
all supported branches.

Thanks !

-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: Hash join in SELECT target list expression keeps consuming memory

From

Jaime Soler

Date:

21 March 2018, 16:18:56

Hi,

We still get out of memory error during pg_dump execution

bin$ ./initdb -D /tmp/test

The files belonging to this database system will be owned by user "jsoler".

This user must also own the server process.

The database cluster will be initialized with locale "es_ES.UTF-8".

The default database encoding has accordingly been set to "UTF8".

The default text search configuration will be set to "spanish".

Data page checksums are disabled.

creating directory /tmp/test ... ok

creating subdirectories ... ok

selecting default max_connections ... 100

selecting default shared_buffers ... 128MB

selecting dynamic shared memory implementation ... posix

creating configuration files ... ok

running bootstrap script ... ok

performing post-bootstrap initialization ... ok

syncing data to disk ... ok

WARNING: enabling "trust" authentication for local connections

You can change this by editing pg_hba.conf or using the option -A, or

--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

./pg_ctl -D /tmp/test -l logfile start

bin$ ./pg_ctl -D /tmp/test -l logfile start

waiting for server to start.... done

server started

bin$ ./psql -p 10000 -d postgres

psql (11devel)

Type "help" for help.

postgres=# select version();

version

----------------------------------------------------------------------------------------------------

PostgreSQL 11devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.2.0-8ubuntu3.2) 7.2.0, 64-bit

(1 row)

postgres=# create table lo_table (id numeric, lo oid) ;

CREATE TABLE

postgres=# insert into lo_table select

a.i,lo_from_bytea(0,E'\\xffffff0000000000') from

generate_series(1,32775000) as a(i);

INSERT 0 32775000

postgres=# \q

/bin$ ./pg_dump -v --port=10000 -Fc -d postgres -f /tmp/dump

pg_dump: last built-in OID is 16383

pg_dump: reading extensions

pg_dump: identifying extension members

pg_dump: reading schemas

pg_dump: reading user-defined tables

pg_dump: reading user-defined functions

pg_dump: reading user-defined types

pg_dump: reading procedural languages

pg_dump: reading user-defined aggregate functions

pg_dump: reading user-defined operators

pg_dump: reading user-defined access methods

pg_dump: reading user-defined operator classes

pg_dump: reading user-defined operator families

pg_dump: reading user-defined text search parsers

pg_dump: reading user-defined text search templates

pg_dump: reading user-defined text search dictionaries

pg_dump: reading user-defined text search configurations

pg_dump: reading user-defined foreign-data wrappers

pg_dump: reading user-defined foreign servers

pg_dump: reading default privileges

pg_dump: reading user-defined collations

pg_dump: reading user-defined conversions

pg_dump: reading type casts

pg_dump: reading transforms

pg_dump: reading table inheritance information

pg_dump: reading event triggers

pg_dump: finding extension tables

pg_dump: finding inheritance relationships

pg_dump: reading column info for interesting tables

pg_dump: finding the columns and types of table "public.lo_table"

pg_dump: flagging inherited columns in subtables

pg_dump: reading indexes

pg_dump: flagging indexes in partitioned tables

pg_dump: reading extended statistics

pg_dump: reading constraints

pg_dump: reading triggers

pg_dump: reading rewrite rules

pg_dump: reading policies

pg_dump: reading row security enabled for table "public.lo_table"

pg_dump: reading policies for table "public.lo_table"

pg_dump: reading publications

pg_dump: reading publication membership

pg_dump: reading publication membership for table "public.lo_table"

pg_dump: reading subscriptions

pg_dump: reading large objects

out of memory

2018-03-19 7:55 GMT+01:00 Amit Khandekar <amitdkhan.pg@gmail.com>:

On 17 March 2018 at 00:47, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Khandekar <amitdkhan.pg@gmail.com> writes:
>> If the SELECT target list expression is a join subquery, and if the
>> subquery does a hash join, then the query keeps on consuming more and
>> more memory. Below is such a query :
>
> Thanks for the report!
>
> I dug into this with valgrind, and found that the problem is that
> ExecHashTableCreate allocates some memory that isn't freed by
> ExecHashTableDestroy, specifically the per-hash-key function
> information. This is just dumb. We can keep that stuff in the
> hashtable's hashCxt instead, where it will get freed at the right time.
> The attached patch seems to fix it just by reordering the code.

I saw that you have now committed the fix and also backported it to
all supported branches.

Thanks !

-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: Hash join in SELECT target list expression keeps consuming memory

From

Tomas Vondra

Date:

21 March 2018, 17:55:08

On 03/21/2018 02:18 PM, Jaime Soler wrote:
> Hi,
> 
> We still get out of memory error during pg_dump execution
> ...
> pg_dump: reading row security enabled for table "public.lo_table"
> pg_dump: reading policies for table "public.lo_table"
> pg_dump: reading publications
> pg_dump: reading publication membership
> pg_dump: reading publication membership for table "public.lo_table"
> pg_dump: reading subscriptions
> pg_dump: reading large objects
> out of memory
> 

Hmmmm ... that likely happens because of this for loop copying a lot of
data:

https://github.com/postgres/postgres/blob/master/src/bin/pg_dump/pg_dump.c#L3258

But I'm having trouble verifying that, because the query fetching the
list of objects is rather expensive with this number of large objects.
How long does it take for you? I wonder if there's a way to make the
query faster.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Hash join in SELECT target list expression keeps consuming memory

From

Tom Lane

Date:

21 March 2018, 18:51:14

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On 03/21/2018 02:18 PM, Jaime Soler wrote:
>> We still get out of memory error during pg_dump execution
>> pg_dump: reading large objects
>> out of memory

> Hmmmm ... that likely happens because of this for loop copying a lot of
> data:
> https://github.com/postgres/postgres/blob/master/src/bin/pg_dump/pg_dump.c#L3258

The long and the short of it is that too many large objects *will*
choke pg_dump; this has been obvious since we decided to let it treat
large objects as heavyweight objects.  See eg

https://www.postgresql.org/message-id/29613.1476969807@sss.pgh.pa.us

I don't think there's any simple fix available.  We discussed some
possible solutions in

https://www.postgresql.org/message-id/flat/5539483B.3040401%40commandprompt.com

but none of them looked easy.  The best short-term answer is "run
pg_dump in a less memory-constrained system".

            regards, tom lane

Re: Hash join in SELECT target list expression keeps consuming memory

From

Jaime Soler

Date:

22 March 2018, 11:56:21

Right now we are purging old LO objects because our production system run out of memory

Mem: 41154296k total, 40797560k used, 356736k free, 15748k buffers

Swap: 16777208k total, 1333260k used, 15443948k free, 35304844k cached

SELECT count(*) FROM pg_largeobject;

count
----------
52614842
(1 row)

SELECT pg_size_pretty(pg_table_size('pg_largeobject'));
pg_size_pretty
----------------
15 GB
(1 row)

Regards

2018-03-21 16:51 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On 03/21/2018 02:18 PM, Jaime Soler wrote:
>> We still get out of memory error during pg_dump execution
>> pg_dump: reading large objects
>> out of memory

> Hmmmm ... that likely happens because of this for loop copying a lot of
> data:
> https://github.com/postgres/postgres/blob/master/src/bin/pg_dump/pg_dump.c#L3258

The long and the short of it is that too many large objects *will*
choke pg_dump; this has been obvious since we decided to let it treat
large objects as heavyweight objects. See eg

https://www.postgresql.org/message-id/29613.1476969807@sss.pgh.pa.us

I don't think there's any simple fix available. We discussed some
possible solutions in

https://www.postgresql.org/message-id/flat/5539483B.3040401%40commandprompt.com

but none of them looked easy. The best short-term answer is "run
pg_dump in a less memory-constrained system".

regards, tom lane