Thread: Hash join in SELECT target list expression keeps consuming memory

Hash join in SELECT target list expression keeps consuming memory

From
Amit Khandekar
Date:
Hi,

If the SELECT target list expression is a join subquery, and if the
subquery does a hash join, then the query keeps on consuming more and
more memory. Below is such a query :

SELECT
   (SELECT id FROM
      unnest((pg_catalog.acldefault('L',l.col1))) WITH ORDINALITY AS
perm(acl,id),
      generate_series(1, l.col1::int % 2) AS gen2(gen2_id)
      where id = gen2_id
)
FROM largetable l ;

where the table largetable is created using :
create table largetable as select * from generate_series(1, 32000000)
as t(col1);

Here is the plan :
                                        QUERY PLAN
-------------------------------------------------------------------------------------------
 Seq Scan on largetable l  (cost=0.00..672781971.36 rows=32000018 width=8)
   SubPlan 1
     ->  Hash Join  (cost=2.26..21.01 rows=500 width=8)
           Hash Cond: (gen2.gen2_id = perm.id)
           ->  Function Scan on generate_series gen2
(cost=0.01..10.01 rows=1000 width=4)
           ->  Hash  (cost=1.00..1.00 rows=100 width=8)
                 ->  Function Scan on unnest perm  (cost=0.01..1.00
rows=100 width=8)

Now, if we disable hash join, the planner chooses merge join, and that
does not keep on consuming memory:

                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
 Seq Scan on largetable l  (cost=0.00..2275308358.33 rows=32000018 width=8)
   SubPlan 1
     ->  Merge Join  (cost=59.84..71.09 rows=500 width=8)
           Merge Cond: (perm.id = gen2.gen2_id)
           ->  Function Scan on unnest perm  (cost=0.01..1.00 rows=100 width=8)
           ->  Sort  (cost=59.83..62.33 rows=1000 width=4)
                 Sort Key: gen2.gen2_id
                 ->  Function Scan on generate_series gen2
(cost=0.01..10.01 rows=1000 width=4)

Either with merge join or nested loop join, the postgres process
memory remains constant all the time. (It chooses hash join due to
incorrect row estimates of unnest() and generate_series() functions)

I am yet to analyze the root cause of this behaviour, but meanwhile, I
am posting it here, in case this turns out to be a known
behaviour/issue for hash joins, or there is already some work being
done on it.

I suspected that the set returning functions might be leaking memory,
but then we would have seen the behaviour on other types of joins as
well.

----------

Actually I encountered this issue when I tried to run pg_dump with 32M
number of blobs. The corresponding backend process consumed memory
until it was killed by the OOM killer.

Steps to reproduce the issue with pg_dump :
1. create table lo_table (id numeric, lo oid) ;
2. -- Create 32M rows
   insert into lo_table select
a.i,lo_from_bytea(0,E'\\xffffff0000000000') from
generate_series(1,32775000) as a(i);
3. -- Then run pg_dump.

pg_dump backend gets killed, and pg_dump aborts wih this message, that
has the query which consumed memory :

pg_dump: [archiver (db)] query failed: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_dump: [archiver (db)] query was:

SELECT l.oid,

(SELECT rolname FROM pg_catalog.pg_roles WHERE oid = l.lomowner) AS rolname,

(SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl,
row_n FROM pg_catalog.unnest(coalesce(l.lomacl,pg_catalog.acldefault('L',l.lomowner)))
WITH ORDINALITY AS perm(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM
pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault('L',l.lomowner)))
AS init(init_acl) WHERE acl = init_acl)) as foo) AS lomacl,

(SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl,
row_n FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault('L',l.lomowner)))
WITH ORDINALITY AS initp(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM
pg_catalog.unnest(coalesce(l.lomacl,pg_catalog.acldefault('L',l.lomowner)))
AS permp(orig_acl) WHERE acl = orig_acl)) as foo) AS rlomacl,

NULL AS initlomacl, NULL AS initrlomacl

FROM pg_largeobject_metadata l LEFT JOIN pg_init_privs pip ON (l.oid =
pip.objoid AND pip.classoid = 'pg_largeobject'::regclass AND
pip.objsubid = 0) ;

As you can see, the subplan expressions are using pg_init_privs table,
so as to collect any changed extension ACLs for the large objects. So
this won't reproduce before commit
23f34fa4ba358671adab16773e79c17c92cbc870.

Note: the pg_dump client process itself also consumes more and more
memory, although to a lesser extent. But this is a different thing,
and is already a known/expected behaviour :
https://www.postgresql.org/message-id/29613.1476969807%40sss.pgh.pa.us


-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


Re: Hash join in SELECT target list expression keeps consuming memory

From
Tom Lane
Date:
Amit Khandekar <amitdkhan.pg@gmail.com> writes:
> If the SELECT target list expression is a join subquery, and if the
> subquery does a hash join, then the query keeps on consuming more and
> more memory. Below is such a query :

Thanks for the report!

I dug into this with valgrind, and found that the problem is that
ExecHashTableCreate allocates some memory that isn't freed by
ExecHashTableDestroy, specifically the per-hash-key function
information.  This is just dumb.  We can keep that stuff in the
hashtable's hashCxt instead, where it will get freed at the right time.
The attached patch seems to fix it just by reordering the code.

I'm surprised nobody's noticed this before; maybe the problem is
of relatively recent vintage?  Haven't checked the back branches yet.

            regards, tom lane

diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 06bb44b..4f069d1 100644
*** a/src/backend/executor/nodeHash.c
--- b/src/backend/executor/nodeHash.c
*************** ExecHashTableCreate(HashState *state, Li
*** 472,478 ****
       * Initialize the hash table control block.
       *
       * The hashtable control block is just palloc'd from the executor's
!      * per-query memory context.
       */
      hashtable = (HashJoinTable) palloc(sizeof(HashJoinTableData));
      hashtable->nbuckets = nbuckets;
--- 472,479 ----
       * Initialize the hash table control block.
       *
       * The hashtable control block is just palloc'd from the executor's
!      * per-query memory context.  Everything else should be kept inside the
!      * subsidiary hashCxt or batchCxt.
       */
      hashtable = (HashJoinTable) palloc(sizeof(HashJoinTableData));
      hashtable->nbuckets = nbuckets;
*************** ExecHashTableCreate(HashState *state, Li
*** 515,520 ****
--- 516,537 ----
  #endif

      /*
+      * Create temporary memory contexts in which to keep the hashtable working
+      * storage.  See notes in executor/hashjoin.h.
+      */
+     hashtable->hashCxt = AllocSetContextCreate(CurrentMemoryContext,
+                                                "HashTableContext",
+                                                ALLOCSET_DEFAULT_SIZES);
+
+     hashtable->batchCxt = AllocSetContextCreate(hashtable->hashCxt,
+                                                 "HashBatchContext",
+                                                 ALLOCSET_DEFAULT_SIZES);
+
+     /* Allocate data that will live for the life of the hashjoin */
+
+     oldcxt = MemoryContextSwitchTo(hashtable->hashCxt);
+
+     /*
       * Get info about the hash functions to be used for each hash key. Also
       * remember whether the join operators are strict.
       */
*************** ExecHashTableCreate(HashState *state, Li
*** 540,561 ****
          i++;
      }

-     /*
-      * Create temporary memory contexts in which to keep the hashtable working
-      * storage.  See notes in executor/hashjoin.h.
-      */
-     hashtable->hashCxt = AllocSetContextCreate(CurrentMemoryContext,
-                                                "HashTableContext",
-                                                ALLOCSET_DEFAULT_SIZES);
-
-     hashtable->batchCxt = AllocSetContextCreate(hashtable->hashCxt,
-                                                 "HashBatchContext",
-                                                 ALLOCSET_DEFAULT_SIZES);
-
-     /* Allocate data that will live for the life of the hashjoin */
-
-     oldcxt = MemoryContextSwitchTo(hashtable->hashCxt);
-
      if (nbatch > 1 && hashtable->parallel_state == NULL)
      {
          /*
--- 557,562 ----

Re: Hash join in SELECT target list expression keeps consuming memory

From
Amit Khandekar
Date:
On 17 March 2018 at 00:47, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Khandekar <amitdkhan.pg@gmail.com> writes:
>> If the SELECT target list expression is a join subquery, and if the
>> subquery does a hash join, then the query keeps on consuming more and
>> more memory. Below is such a query :
>
> Thanks for the report!
>
> I dug into this with valgrind, and found that the problem is that
> ExecHashTableCreate allocates some memory that isn't freed by
> ExecHashTableDestroy, specifically the per-hash-key function
> information.  This is just dumb.  We can keep that stuff in the
> hashtable's hashCxt instead, where it will get freed at the right time.
> The attached patch seems to fix it just by reordering the code.

I saw that you have now committed the fix and also backported it to
all supported branches.

Thanks !

-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


Re: Hash join in SELECT target list expression keeps consuming memory

From
Jaime Soler
Date:
Hi,

We still get out of memory error during pg_dump execution

bin$ ./initdb -D /tmp/test 
The files belonging to this database system will be owned by user "jsoler".
This user must also own the server process.

The database cluster will be initialized with locale "es_ES.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "spanish".

Data page checksums are disabled.

creating directory /tmp/test ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    ./pg_ctl -D /tmp/test -l logfile start

bin$ ./pg_ctl -D /tmp/test -l logfile start
waiting for server to start.... done
server started
bin$ ./psql -p 10000 -d postgres
psql (11devel)
Type "help" for help.

postgres=# select version();
                                              version                                               
----------------------------------------------------------------------------------------------------
 PostgreSQL 11devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.2.0-8ubuntu3.2) 7.2.0, 64-bit
(1 row)


postgres=# create table lo_table (id numeric, lo oid) ;
CREATE TABLE
postgres=# insert into lo_table select
a.i,lo_from_bytea(0,E'\\xffffff0000000000') from
generate_series(1,32775000) as a(i);
INSERT 0 32775000
postgres=# \q
j

/bin$ ./pg_dump -v --port=10000 -Fc -d postgres -f /tmp/dump
pg_dump: last built-in OID is 16383
pg_dump: reading extensions
pg_dump: identifying extension members
pg_dump: reading schemas
pg_dump: reading user-defined tables
pg_dump: reading user-defined functions
pg_dump: reading user-defined types
pg_dump: reading procedural languages
pg_dump: reading user-defined aggregate functions
pg_dump: reading user-defined operators
pg_dump: reading user-defined access methods
pg_dump: reading user-defined operator classes
pg_dump: reading user-defined operator families
pg_dump: reading user-defined text search parsers
pg_dump: reading user-defined text search templates
pg_dump: reading user-defined text search dictionaries
pg_dump: reading user-defined text search configurations
pg_dump: reading user-defined foreign-data wrappers
pg_dump: reading user-defined foreign servers
pg_dump: reading default privileges
pg_dump: reading user-defined collations
pg_dump: reading user-defined conversions
pg_dump: reading type casts
pg_dump: reading transforms
pg_dump: reading table inheritance information
pg_dump: reading event triggers
pg_dump: finding extension tables
pg_dump: finding inheritance relationships
pg_dump: reading column info for interesting tables
pg_dump: finding the columns and types of table "public.lo_table"
pg_dump: flagging inherited columns in subtables
pg_dump: reading indexes
pg_dump: flagging indexes in partitioned tables
pg_dump: reading extended statistics
pg_dump: reading constraints
pg_dump: reading triggers
pg_dump: reading rewrite rules
pg_dump: reading policies
pg_dump: reading row security enabled for table "public.lo_table"
pg_dump: reading policies for table "public.lo_table"
pg_dump: reading publications
pg_dump: reading publication membership
pg_dump: reading publication membership for table "public.lo_table"
pg_dump: reading subscriptions
pg_dump: reading large objects
out of memory



2018-03-19 7:55 GMT+01:00 Amit Khandekar <amitdkhan.pg@gmail.com>:
On 17 March 2018 at 00:47, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Khandekar <amitdkhan.pg@gmail.com> writes:
>> If the SELECT target list expression is a join subquery, and if the
>> subquery does a hash join, then the query keeps on consuming more and
>> more memory. Below is such a query :
>
> Thanks for the report!
>
> I dug into this with valgrind, and found that the problem is that
> ExecHashTableCreate allocates some memory that isn't freed by
> ExecHashTableDestroy, specifically the per-hash-key function
> information.  This is just dumb.  We can keep that stuff in the
> hashtable's hashCxt instead, where it will get freed at the right time.
> The attached patch seems to fix it just by reordering the code.

I saw that you have now committed the fix and also backported it to
all supported branches.

Thanks !

-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


Re: Hash join in SELECT target list expression keeps consuming memory

From
Tomas Vondra
Date:
On 03/21/2018 02:18 PM, Jaime Soler wrote:
> Hi,
> 
> We still get out of memory error during pg_dump execution
> ...
> pg_dump: reading row security enabled for table "public.lo_table"
> pg_dump: reading policies for table "public.lo_table"
> pg_dump: reading publications
> pg_dump: reading publication membership
> pg_dump: reading publication membership for table "public.lo_table"
> pg_dump: reading subscriptions
> pg_dump: reading large objects
> out of memory
> 

Hmmmm ... that likely happens because of this for loop copying a lot of
data:

https://github.com/postgres/postgres/blob/master/src/bin/pg_dump/pg_dump.c#L3258

But I'm having trouble verifying that, because the query fetching the
list of objects is rather expensive with this number of large objects.
How long does it take for you? I wonder if there's a way to make the
query faster.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Hash join in SELECT target list expression keeps consuming memory

From
Tom Lane
Date:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On 03/21/2018 02:18 PM, Jaime Soler wrote:
>> We still get out of memory error during pg_dump execution
>> pg_dump: reading large objects
>> out of memory

> Hmmmm ... that likely happens because of this for loop copying a lot of
> data:
> https://github.com/postgres/postgres/blob/master/src/bin/pg_dump/pg_dump.c#L3258

The long and the short of it is that too many large objects *will*
choke pg_dump; this has been obvious since we decided to let it treat
large objects as heavyweight objects.  See eg

https://www.postgresql.org/message-id/29613.1476969807@sss.pgh.pa.us

I don't think there's any simple fix available.  We discussed some
possible solutions in

https://www.postgresql.org/message-id/flat/5539483B.3040401%40commandprompt.com

but none of them looked easy.  The best short-term answer is "run
pg_dump in a less memory-constrained system".

            regards, tom lane


Re: Hash join in SELECT target list expression keeps consuming memory

From
Jaime Soler
Date:
Right now we are purging old LO objects because our production system run out of memory

Mem: 41154296k total, 40797560k used, 356736k free, 15748k buffers
Swap: 16777208k total, 1333260k used, 15443948k free, 35304844k cached

SELECT count(*) FROM pg_largeobject;
count
----------
52614842
(1 row)

SELECT pg_size_pretty(pg_table_size('pg_largeobject'));
pg_size_pretty
----------------
15 GB
(1 row)

Regards


2018-03-21 16:51 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On 03/21/2018 02:18 PM, Jaime Soler wrote:
>> We still get out of memory error during pg_dump execution
>> pg_dump: reading large objects
>> out of memory

> Hmmmm ... that likely happens because of this for loop copying a lot of
> data:
> https://github.com/postgres/postgres/blob/master/src/bin/pg_dump/pg_dump.c#L3258

The long and the short of it is that too many large objects *will*
choke pg_dump; this has been obvious since we decided to let it treat
large objects as heavyweight objects.  See eg

https://www.postgresql.org/message-id/29613.1476969807@sss.pgh.pa.us

I don't think there's any simple fix available.  We discussed some
possible solutions in

https://www.postgresql.org/message-id/flat/5539483B.3040401%40commandprompt.com

but none of them looked easy.  The best short-term answer is "run
pg_dump in a less memory-constrained system".

                        regards, tom lane