Thread: Running out of memory while making a join

Running out of memory while making a join

From
Carlos Henrique Reimer
Date:
Hi,

The following SQL join command runs the PostgreSQL server out of memory. The server runs on a box with Red Hat Enterprise Linux Server release 6.3 (Santiago) and PostgreSQL 8.3.21.

select wm_nfsp from "5611_isarq".wm_nfsp
left join "5611_nfarq".nfe on
wm_nfsp.tpdoc = 7 where 1 = 1 and
wm_nfsp.codpre = 2866 and
wm_nfsp.compet = '10/2012';  

Explain result:
 Nested Loop Left Join  (cost=7356.61..48466.46 rows=346312 width=32)
   Join Filter: (wm_nfsp.tpdoc = 7)
   ->  Bitmap Heap Scan on wm_nfsp  (cost=11.65..1162.37 rows=11 width=34)
         Recheck Cond: (codpre = 2866)
         Filter: ((compet)::text = '10/2012'::text)
         ->  Bitmap Index Scan on idx_wm_nfsp_codpre  (cost=0.00..11.64 rows=714 width=0)
               Index Cond: (codpre = 2866)
   ->  Materialize  (cost=7344.96..8959.47 rows=161451 width=0)
         ->  Seq Scan on nfe  (cost=0.00..7183.51 rows=161451 width=0)
(9 rows)

Once the query starts the top command starts showing an increase of memory use and minutes later vmstat shows the server performing a lot swapping and almost stops everything until the PID is killed.

top command output while join running on PID 29787:
top - 11:26:41 up 10 days,  6:30,  3 users,  load average: 131.05, 74.55, 32.33
Tasks: 499 total,   3 running, 496 sleeping,   0 stopped,   0 zombie
Cpu(s): 10.8%us, 12.7%sy,  0.0%ni,  0.0%id, 75.2%wa,  0.0%hi,  1.3%si,  0.0%st
Mem:  32876756k total, 32677176k used,   199580k free,     4804k buffers
Swap: 16777208k total,  6248980k used, 10528228k free,   485188k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
29787 postgres  20   0 35.5g  29g  98m D 30.8 92.9  15:07.23 postgres: ipm Fisca
   98 root      20   0     0    0    0 D 13.7  0.0  23:46.72 [kswapd0]
31496 postgres  20   0  546m  20m  18m D  4.9  0.1   0:00.29 postgres: ipm Fisca
29024 postgres  20   0  547m  52m  50m D  4.0  0.2   0:03.95 postgres: ipm Fisca


vmstat output showing the server is performing a lot of swapping:
Tue Nov  6 11:27:06 BRST 2012
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1 143 6653476 199076   6368 476356    0    1   694   646    2    0  8  3 88  1  0
 1 79 6700576 204104   6212 453808   64 9438 99833  9503 14213 9477 10  9  0 80  0
 5 74 6813252 199144   5196 488872    7 22540 102592 22704 13770 8762  9 10  2 80  0
 2 58 6855596 199332   4456 462592   70 8474 133870  8509 13527 9242  8 10  0 82  0
 3 90 6907264 199096   5544 472112  102 10403 102617 11136 12764 8497  7  9  0 84  0

Running the same command on a PostgreSQL 9.0.0 server results in an OUT OF MEMORY message stopping the backend but preventing the entire server to stop.

The development team is going to change the SQL join command but my concern is with other similar SQL commands not yet identified.

Is there a way to make PostgreSQL 8.3.21 server stop memory bound backends as PostgreSQL 9.0.0 does?

--
Reimer
47-3347-1724 47-9183-0547 msn: carlos.reimer@opendb.com.br

Re: Running out of memory while making a join

From
Craig Ringer
Date:
On 11/08/2012 06:20 PM, Carlos Henrique Reimer wrote:
> Is there a way to make PostgreSQL 8.3.21 server stop memory bound
> backends as PostgreSQL 9.0.0 does?

Are there any triggers on the table?

What's the setting for work_mem?

--
Craig Ringer


Re: Running out of memory while making a join

From
Carlos Henrique Reimer
Date:
Hi Craig,

work_mem is defined with 10MB and yes, there are triggers defined on both tables:

FiscalWeb=# \d "5611_isarq".wm_nfsp
          Table "5611_isarq.wm_nfsp"
  Column   |         Type          | Modifiers
-----------+-----------------------+-----------
 tpdoc     | smallint              | not null
 numero    | character varying(18) | not null
 codpre    | bigint                | not null
 compet    | character varying(7)  |
 dtemis    | date                  |
 vlrtot    | numeric(15,2)         |
 situnf    | character(1)          |
 parcela   | smallint              |
 obs       | text                  |
 sequencia | bigint                | not null
 cnpjtom   | numeric(14,0)         |
Indexes:
    "wm_nfsp_pkey" PRIMARY KEY, btree (tpdoc, numero, codpre, sequencia)
    "idx_wm_nfsp_codpre" btree (codpre)
    "idx_wm_nfsp_dtemis" btree (dtemis)
Triggers:
    tg_wm_nfsp BEFORE INSERT OR DELETE OR UPDATE ON "5611_isarq".wm_nfsp FOR EACH ROW EXECUTE PROCEDURE fun_isarq.wm_nfsp('5611', 'isarq')

FiscalWeb=# \d "5611_nfarq".nfe
                     Table "5611_nfarq.nfe"
      Column       |         Type          |     Modifiers
-------------------+-----------------------+--------------------
 cadastro          | bigint                | not null
 serie             | smallint              | not null
 numero            | integer               | not null
 codauten          | character varying(14) | not null
 dataemis          | date                  | not null
 horaemis          | character varying(8)  | not null
 vlrtot            | numeric(15,2)         | not null
 situacao          | smallint              | not null
 observ            | text                  |
 datacanc          | date                  |
 obscanc           | text                  |
 tipotom           | character varying(30) | not null
 nf_html           | text                  |
 nf_assinada       | text                  |
 nf_protocolada    | text                  |
 vlrtot_desc       | numeric(15,2)         |
 vlrtot_ir         | numeric(15,2)         |
 vlrtot_inss       | numeric(15,2)         |
 vlrtot_contsocial | numeric(15,2)         |
 tipo              | smallint              | not null default 1
 vlrtot_rps        | numeric(15,2)         |
 vlrtot_pis        | numeric(15,2)         |
 vlrtot_cofins     | numeric(15,2)         |
 lote_rps          | integer               | default 0
 numero_rps        | integer               | default 0
 cnpjtom           | bigint                |
 cpfcnpjcanc       | bigint                |
 tomadorsemdoc     | integer               |
Indexes:
    "pk_nfe" PRIMARY KEY, btree (cadastro, serie, numero)
    "uni_codauten" UNIQUE, btree (codauten)
Foreign-key constraints:
    "nfe_serie_fkey" FOREIGN KEY (serie) REFERENCES "5611_nfarq".serie(codigo) ON UPDATE RESTRICT ON DELETE RESTRICT
Disabled triggers:
    tg_nfe BEFORE INSERT OR DELETE OR UPDATE ON "5611_nfarq".nfe FOR EACH ROW EXECUTE PROCEDURE fun_nfarq.nfe('5611', 'NFARQ')

FiscalWeb=#


On Thu, Nov 8, 2012 at 10:50 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
On 11/08/2012 06:20 PM, Carlos Henrique Reimer wrote:
> Is there a way to make PostgreSQL 8.3.21 server stop memory bound
> backends as PostgreSQL 9.0.0 does?

Are there any triggers on the table?

What's the setting for work_mem?

--
Craig Ringer



--
Reimer
47-3347-1724 47-9183-0547 msn: carlos.reimer@opendb.com.br

Re: Running out of memory while making a join

From
Craig Ringer
Date:
On 11/08/2012 11:35 PM, Carlos Henrique Reimer wrote:
> Hi Craig,
>
> work_mem is defined with 10MB and yes, there are triggers defined on
> both tables

Come to think of it, the triggers don't make any difference to memory
use for a SELECT anyway.

Your work_mem is perfectly reasonable.

The plan looks pretty harsh, with that big nested loop, but I'm not
aware of anything that'd cause that to run out of memory.

Personally I'd be attaching a debugger to it and seeing what it was
doing while it ran. I'm sure there are smarter ways to trace where the
memory use is going, though.

--
Craig Ringer


Re: Running out of memory while making a join

From
Carlos Henrique Reimer
Date:
Hi,

How is the best way to attach a debugger to the SELECT and identify why is it exhausting server storage.

Thank you in advance!

On Fri, Nov 9, 2012 at 4:10 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
On 11/08/2012 11:35 PM, Carlos Henrique Reimer wrote:
> Hi Craig,
>
> work_mem is defined with 10MB and yes, there are triggers defined on
> both tables

Come to think of it, the triggers don't make any difference to memory
use for a SELECT anyway.

Your work_mem is perfectly reasonable.

The plan looks pretty harsh, with that big nested loop, but I'm not
aware of anything that'd cause that to run out of memory.

Personally I'd be attaching a debugger to it and seeing what it was
doing while it ran. I'm sure there are smarter ways to trace where the
memory use is going, though.

--
Craig Ringer



--
Reimer
47-3347-1724 47-9183-0547 msn: carlos.reimer@opendb.com.br

Re: Running out of memory while making a join

From
Tom Lane
Date:
Carlos Henrique Reimer <carlos.reimer@opendb.com.br> writes:
> How is the best way to attach a debugger to the SELECT and identify why is
> it exhausting server storage.

In gdb,
    call MemoryContextStats(TopMemoryContext)
should produce some useful information on the process's stderr file.

            regards, tom lane


Re: Running out of memory while making a join

From
Craig Ringer
Date:
On 11/11/2012 12:50 AM, Carlos Henrique Reimer wrote:
Hi,

How is the best way to attach a debugger to the SELECT and identify why is it exhausting server storage.

This page is more focused on getting a stack trace after a crash, but provides some information about how to identify the backend to connect to, how to connect with gdb, etc:

http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD

Short version is:

Get the pid of the backend that's growing with

    SELECT pg_backend_pid()

before running the test, from `top`, or from pg_stat_activity.

    gdb -p pid_of_backend

when you get a (gdb) prompt:

    cont<enter>

Let the memory use grow until you think it's fairly big. Hit control-C in the gdb terminal to interrupt execution of the backend. You'll get another (gdb) prompt.

Now follow Tom's advice:
In gdb,call MemoryContextStats(TopMemoryContext)
should produce some useful information on the process's stderr file.

It may also be helpful to have a stack trace or two or three to see what Pg is doing while it's running and growing, so if you want, also run:

    bt

then "cont" again, control C, bt again, cont, control c, bt, and provide the copied and pasted backtraces in case they provide additional information about what's going on.

--
Craig Ringer

Re: Running out of memory while making a join

From
Craig Ringer
Date:
On 11/11/2012 08:54 AM, Craig Ringer wrote:

Now follow Tom's advice:
In gdb,call MemoryContextStats(TopMemoryContext)
should produce some useful information on the process's stderr file.

Oh, I forgot to explain how to actually get the output.

stderr goes to the PostgreSQL log files, so you need to look in the Pg logs. These are in /var/lib/pgsql/data/pg_log or /var/lib/pgsql/[version]/data/pg_log on Red Hat / Fedora systems, and in /var/log/postgresql/ on Debian/Ubuntu systems.

You're looking for something like:

  MessageContext: 8192 total in 1 blocks; 7120 free (1 chunks); 1072 used
  Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  smgr relation table: 24576 total in 2 blocks; 13920 free (4 chunks); 10656 used
  TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks); 32 used
  Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  PortalMemory: 8192 total in 1 blocks; 8160 free (1 chunks); 32 used
  Relcache by OID: 24576 total in 2 blocks; 13872 free (3 chunks); 10704 used
  CacheMemoryContext: 817840 total in 20 blocks; 140840 free (3 chunks); 677000 used
    pg_db_role_setting_databaseid_rol_index: 2048 total in 1 blocks; 528 free (0 chunks); 1520 used
    pg_opclass_am_name_nsp_index: 3072 total in 2 blocks; 1272 free (2 chunks); 1800 used
    ... lots more ...
        pg_database_oid_index: 3072 total in 2 blocks; 1656 free (3 chunks); 1416 used
    pg_authid_rolname_index: 3072 total in 2 blocks; 1656 free (3 chunks); 1416 used
  MdSmgr: 8192 total in 1 blocks; 8128 free (0 chunks); 64 used
  tokenize file cxt: 0 total in 0 blocks; 0 free (0 chunks); 0 used
  hba parser context: 7168 total in 3 blocks; 3760 free (4 chunks); 3408 used
  LOCALLOCK hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  Timezones: 83472 total in 2 blocks; 3744 free (0 chunks); 79728 used
  ErrorContext: 8192 total in 1 blocks; 8160 free (3 chunks); 32 used

Copy and paste the entire block (including the bits I omitted from mine because it's just an example to show you what it looks like).

--
Craig Ringer

Re: Running out of memory while making a join

From
Carlos Henrique Reimer
Date:
Hi,

That is what I got from gdb:

TopMemoryContext: 88992 total in 10 blocks; 10336 free (7 chunks); 78656 used
  Type information cache: 24576 total in 2 blocks; 11888 free (5 chunks); 12688 used
  Operator lookup cache: 24576 total in 2 blocks; 11888 free (5 chunks); 12688 used
  Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  TopTransactionContext: 8192 total in 1 blocks; 7648 free (1 chunks); 544 used
  MessageContext: 73728 total in 4 blocks; 14752 free (3 chunks); 58976 used
  smgr relation table: 24576 total in 2 blocks; 13904 free (4 chunks); 10672 used
  TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks); 32 used
  Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  PortalMemory: 8192 total in 1 blocks; 7888 free (0 chunks); 304 used
    PortalHeapMemory: 1024 total in 1 blocks; 744 free (0 chunks); 280 used
      ExecutorState: 11586756656 total in 1391 blocks; 4938408 free (6 chunks); 11581818248 used
        TIDBitmap: 57344 total in 3 blocks; 23936 free (8 chunks); 33408 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 8192 total in 1 blocks; 8016 free (0 chunks); 176 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
  Relcache by OID: 24576 total in 2 blocks; 14912 free (3 chunks); 9664 used
  CacheMemoryContext: 817392 total in 20 blocks; 145728 free (0 chunks); 671664 used
    uni_codauten: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pk_nfe: 2048 total in 1 blocks; 632 free (0 chunks); 1416 used
    idx_wm_nfsp_dtemis: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    idx_wm_nfsp_codpre: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used
    wm_nfsp_pkey: 2048 total in 1 blocks; 440 free (0 chunks); 1608 used
    pg_index_indrelid_index: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used
    pg_attrdef_adrelid_adnum_index: 2048 total in 1 blocks; 608 free (0 chunks); 1440 used
    pg_ts_dict_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_aggregate_fnoid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_language_name_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_statistic_relid_att_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used
    pg_ts_dict_dictname_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_namespace_nspname_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
    pg_opfamily_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_opclass_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
    pg_ts_parser_prsname_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_amop_fam_strat_index: 3072 total in 2 blocks; 1384 free (2 chunks); 1688 used
    pg_opclass_am_name_nsp_index: 3072 total in 2 blocks; 1624 free (3 chunks); 1448 used
    pg_trigger_tgrelid_tgname_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used
    pg_cast_source_target_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used
    pg_auth_members_role_member_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_attribute_relid_attnum_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used
    pg_ts_config_cfgname_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_authid_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
    pg_ts_config_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_conversion_default_index: 3072 total in 2 blocks; 1432 free (3 chunks); 1640 used
    pg_language_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_enum_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_proc_proname_args_nsp_index: 3072 total in 2 blocks; 1624 free (3 chunks); 1448 used
    pg_ts_parser_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_database_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
    pg_conversion_name_nsp_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_class_relname_nsp_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used
    pg_attribute_relid_attnam_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_class_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
    pg_amproc_fam_proc_index: 3072 total in 2 blocks; 1384 free (2 chunks); 1688 used
    pg_operator_oprname_l_r_n_index: 3072 total in 2 blocks; 1384 free (2 chunks); 1688 used
    pg_index_indexrelid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
    pg_type_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
    pg_rewrite_rel_rulename_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_authid_rolname_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
    pg_auth_members_member_role_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_enum_typid_label_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_constraint_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_conversion_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_ts_template_tmplname_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_ts_config_map_index: 3072 total in 2 blocks; 1624 free (3 chunks); 1448 used
    pg_namespace_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_type_typname_nsp_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_operator_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
    pg_amop_opr_fam_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used
    pg_proc_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
    pg_opfamily_am_name_nsp_index: 3072 total in 2 blocks; 1624 free (3 chunks); 1448 used
    pg_ts_template_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
  MdSmgr: 8192 total in 1 blocks; 7520 free (0 chunks); 672 used
  LOCALLOCK hash: 24576 total in 2 blocks; 15984 free (5 chunks); 8592 used
  Timezones: 53584 total in 2 blocks; 3744 free (0 chunks); 49840 used
  ErrorContext: 8192 total in 1 blocks; 8160 free (3 chunks); 32 used


There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Attaching to process 31446
Reading symbols from /usr/local/pgsql/bin/postgres...(no debugging symbols found)...done.
Reading symbols from /lib64/libcrypt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libcrypt.so.1
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/libfreebl3.so...(no debugging symbols found)...done.
Loaded symbols for /lib64/libfreebl3.so
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnss_files.so.2
0x000000395d6e83e2 in recv () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.80.el6.x86_64 nss-softokn-freebl-3.12.9-11.el6.x86_64
(gdb) cont
Continuing.
bt
^C
Program received signal SIGINT, Interrupt.
0x00000000005ff177 in record_out ()
(gdb) bt
#0  0x00000000005ff177 in record_out ()
#1  0x000000000065b73f in FunctionCall1 ()
#2  0x000000000065c61b in OutputFunctionCall ()
#3  0x00000000004495cc in printtup ()
#4  0x000000000051e6a3 in ExecutorRun ()
#5  0x00000000005b658a in PortalRunSelect ()
#6  0x00000000005b7af9 in PortalRun ()
#7  0x00000000005b2ddf in exec_simple_query ()
#8  0x00000000005b4423 in PostgresMain ()
#9  0x00000000005881e4 in ServerLoop ()
#10 0x000000000058a1e5 in PostmasterMain ()
#11 0x0000000000540cf8 in main ()
(gdb) ^C(gdb) Quit
(gdb) bt
#0  0x00000000005ff177 in record_out ()
#1  0x000000000065b73f in FunctionCall1 ()
#2  0x000000000065c61b in OutputFunctionCall ()
#3  0x00000000004495cc in printtup ()
#4  0x000000000051e6a3 in ExecutorRun ()
#5  0x00000000005b658a in PortalRunSelect ()
#6  0x00000000005b7af9 in PortalRun ()
#7  0x00000000005b2ddf in exec_simple_query ()
#8  0x00000000005b4423 in PostgresMain ()
#9  0x00000000005881e4 in ServerLoop ()
#10 0x000000000058a1e5 in PostmasterMain ()
#11 0x0000000000540cf8 in main ()
(gdb) ^C(gdb) Quit
(gdb) bt
#0  0x00000000005ff177 in record_out ()
#1  0x000000000065b73f in FunctionCall1 ()
#2  0x000000000065c61b in OutputFunctionCall ()
#3  0x00000000004495cc in printtup ()
#4  0x000000000051e6a3 in ExecutorRun ()
#5  0x00000000005b658a in PortalRunSelect ()
#6  0x00000000005b7af9 in PortalRun ()
#7  0x00000000005b2ddf in exec_simple_query ()
#8  0x00000000005b4423 in PostgresMain ()
#9  0x00000000005881e4 in ServerLoop ()
#10 0x000000000058a1e5 in PostmasterMain ()
#11 0x0000000000540cf8 in main ()
(gdb) cont
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x000000395d688ae9 in memcpy () from /lib64/libc.so.6
(gdb) bt
#0  0x000000395d688ae9 in memcpy () from /lib64/libc.so.6
#1  0x000000000046314f in heap_tuple_untoast_attr ()
#2  0x00000000005fb2ef in numeric_out ()
#3  0x000000000065b73f in FunctionCall1 ()
#4  0x000000000065c61b in OutputFunctionCall ()
#5  0x00000000005ff121 in record_out ()
#6  0x000000000065b73f in FunctionCall1 ()
#7  0x000000000065c61b in OutputFunctionCall ()
#8  0x00000000004495cc in printtup ()
#9  0x000000000051e6a3 in ExecutorRun ()
#10 0x00000000005b658a in PortalRunSelect ()
#11 0x00000000005b7af9 in PortalRun ()
#12 0x00000000005b2ddf in exec_simple_query ()
#13 0x00000000005b4423 in PostgresMain ()
#14 0x00000000005881e4 in ServerLoop ()
#15 0x000000000058a1e5 in PostmasterMain ()
#16 0x0000000000540cf8 in main ()
(gdb) cont
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x00000000005395d0 in appendStringInfoChar ()
(gdb) bt
#0  0x00000000005395d0 in appendStringInfoChar ()
#1  0x00000000005ff1e6 in record_out ()
#2  0x000000000065b73f in FunctionCall1 ()
#3  0x000000000065c61b in OutputFunctionCall ()
#4  0x00000000004495cc in printtup ()
#5  0x000000000051e6a3 in ExecutorRun ()
#6  0x00000000005b658a in PortalRunSelect ()
#7  0x00000000005b7af9 in PortalRun ()
#8  0x00000000005b2ddf in exec_simple_query ()
#9  0x00000000005b4423 in PostgresMain ()
#10 0x00000000005881e4 in ServerLoop ()
#11 0x000000000058a1e5 in PostmasterMain ()
#12 0x0000000000540cf8 in main ()
(gdb) cont
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x00000000005ed175 in int2out ()
(gdb) bt
#0  0x00000000005ed175 in int2out ()
#1  0x000000000065b73f in FunctionCall1 ()
#2  0x000000000065c61b in OutputFunctionCall ()
#3  0x00000000005ff121 in record_out ()
#4  0x000000000065b73f in FunctionCall1 ()
#5  0x000000000065c61b in OutputFunctionCall ()
#6  0x00000000004495cc in printtup ()
#7  0x000000000051e6a3 in ExecutorRun ()
#8  0x00000000005b658a in PortalRunSelect ()
#9  0x00000000005b7af9 in PortalRun ()
#10 0x00000000005b2ddf in exec_simple_query ()
#11 0x00000000005b4423 in PostgresMain ()
#12 0x00000000005881e4 in ServerLoop ()
#13 0x000000000058a1e5 in PostmasterMain ()
#14 0x0000000000540cf8 in main ()
(gdb) cont
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x00000000005f5ab0 in get_str_from_var ()
(gdb) bt
#0  0x00000000005f5ab0 in get_str_from_var ()
#1  0x00000000005fb33b in numeric_out ()
#2  0x000000000065b73f in FunctionCall1 ()
#3  0x000000000065c61b in OutputFunctionCall ()
#4  0x00000000005ff121 in record_out ()
#5  0x000000000065b73f in FunctionCall1 ()
#6  0x000000000065c61b in OutputFunctionCall ()
#7  0x00000000004495cc in printtup ()
#8  0x000000000051e6a3 in ExecutorRun ()
#9  0x00000000005b658a in PortalRunSelect ()
#10 0x00000000005b7af9 in PortalRun ()
#11 0x00000000005b2ddf in exec_simple_query ()
#12 0x00000000005b4423 in PostgresMain ()
#13 0x00000000005881e4 in ServerLoop ()
#14 0x000000000058a1e5 in PostmasterMain ()
#15 0x0000000000540cf8 in main ()
(gdb) cont
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x0000000000671990 in AllocSetAlloc ()
(gdb) bt
#0  0x0000000000671990 in AllocSetAlloc ()
#1  0x00000000005f5136 in alloc_var ()
#2  0x00000000005f5192 in set_var_from_num ()
#3  0x00000000005fb32f in numeric_out ()
#4  0x000000000065b73f in FunctionCall1 ()
#5  0x000000000065c61b in OutputFunctionCall ()
#6  0x00000000005ff121 in record_out ()
#7  0x000000000065b73f in FunctionCall1 ()
#8  0x000000000065c61b in OutputFunctionCall ()
#9  0x00000000004495cc in printtup ()
#10 0x000000000051e6a3 in ExecutorRun ()
#11 0x00000000005b658a in PortalRunSelect ()
#12 0x00000000005b7af9 in PortalRun ()
#13 0x00000000005b2ddf in exec_simple_query ()
#14 0x00000000005b4423 in PostgresMain ()
#15 0x00000000005881e4 in ServerLoop ()
#16 0x000000000058a1e5 in PostmasterMain ()
#17 0x0000000000540cf8 in main ()
(gdb) cont
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x00000000006719a2 in AllocSetAlloc ()
(gdb) bt
#0  0x00000000006719a2 in AllocSetAlloc ()
#1  0x0000000000463136 in heap_tuple_untoast_attr ()
#2  0x00000000005fb2ef in numeric_out ()
#3  0x000000000065b73f in FunctionCall1 ()
#4  0x000000000065c61b in OutputFunctionCall ()
#5  0x00000000005ff121 in record_out ()
#6  0x000000000065b73f in FunctionCall1 ()
#7  0x000000000065c61b in OutputFunctionCall ()
#8  0x00000000004495cc in printtup ()
#9  0x000000000051e6a3 in ExecutorRun ()
#10 0x00000000005b658a in PortalRunSelect ()
#11 0x00000000005b7af9 in PortalRun ()
#12 0x00000000005b2ddf in exec_simple_query ()
#13 0x00000000005b4423 in PostgresMain ()
#14 0x00000000005881e4 in ServerLoop ()
#15 0x000000000058a1e5 in PostmasterMain ()
#16 0x0000000000540cf8 in main ()
(gdb) cont
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x000000395d688ae9 in memcpy () from /lib64/libc.so.6
(gdb) bt
#0  0x000000395d688ae9 in memcpy () from /lib64/libc.so.6
#1  0x00000000006724b1 in MemoryContextStrdup ()
#2  0x00000000005d7069 in date_out ()
#3  0x000000000065b73f in FunctionCall1 ()
#4  0x000000000065c61b in OutputFunctionCall ()
#5  0x00000000005ff121 in record_out ()
#6  0x000000000065b73f in FunctionCall1 ()
#7  0x000000000065c61b in OutputFunctionCall ()
#8  0x00000000004495cc in printtup ()
#9  0x000000000051e6a3 in ExecutorRun ()
#10 0x00000000005b658a in PortalRunSelect ()
#11 0x00000000005b7af9 in PortalRun ()
#12 0x00000000005b2ddf in exec_simple_query ()
#13 0x00000000005b4423 in PostgresMain ()
#14 0x00000000005881e4 in ServerLoop ()
#15 0x000000000058a1e5 in PostmasterMain ()
#16 0x0000000000540cf8 in main ()
(gdb) cont
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x00000000005395e6 in appendStringInfoChar ()
(gdb) call MemoryContextStats(TopMemoryContext)
$1 = 68
(gdb) cont
Continuing.

Program received signal SIGINT, Interrupt.
0x00000000005ff168 in record_out ()
(gdb) cont
Continuing.

Program received signal SIGINT, Interrupt.
0x00000000005ff168 in record_out ()
(gdb) ^C(gdb) Quit
(gdb) ^C(gdb) Quit
(gdb) quit
A debugging session is active.

        Inferior 1 [process 31446] will be detached.

Quit anyway? (y or n) y
Detaching from program: /usr/local/pgsql/bin/postgres, process 31446
[root@00002-xxx-SGDB ~]#




On Sat, Nov 10, 2012 at 11:01 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
On 11/11/2012 08:54 AM, Craig Ringer wrote:

Now follow Tom's advice:
In gdb,call MemoryContextStats(TopMemoryContext)
should produce some useful information on the process's stderr file.

Oh, I forgot to explain how to actually get the output.

stderr goes to the PostgreSQL log files, so you need to look in the Pg logs. These are in /var/lib/pgsql/data/pg_log or /var/lib/pgsql/[version]/data/pg_log on Red Hat / Fedora systems, and in /var/log/postgresql/ on Debian/Ubuntu systems.

You're looking for something like:

  MessageContext: 8192 total in 1 blocks; 7120 free (1 chunks); 1072 used
  Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  smgr relation table: 24576 total in 2 blocks; 13920 free (4 chunks); 10656 used
  TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks); 32 used
  Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  PortalMemory: 8192 total in 1 blocks; 8160 free (1 chunks); 32 used
  Relcache by OID: 24576 total in 2 blocks; 13872 free (3 chunks); 10704 used
  CacheMemoryContext: 817840 total in 20 blocks; 140840 free (3 chunks); 677000 used
    pg_db_role_setting_databaseid_rol_index: 2048 total in 1 blocks; 528 free (0 chunks); 1520 used
    pg_opclass_am_name_nsp_index: 3072 total in 2 blocks; 1272 free (2 chunks); 1800 used
    ... lots more ...
        pg_database_oid_index: 3072 total in 2 blocks; 1656 free (3 chunks); 1416 used
    pg_authid_rolname_index: 3072 total in 2 blocks; 1656 free (3 chunks); 1416 used
  MdSmgr: 8192 total in 1 blocks; 8128 free (0 chunks); 64 used
  tokenize file cxt: 0 total in 0 blocks; 0 free (0 chunks); 0 used
  hba parser context: 7168 total in 3 blocks; 3760 free (4 chunks); 3408 used
  LOCALLOCK hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  Timezones: 83472 total in 2 blocks; 3744 free (0 chunks); 79728 used
  ErrorContext: 8192 total in 1 blocks; 8160 free (3 chunks); 32 used

Copy and paste the entire block (including the bits I omitted from mine because it's just an example to show you what it looks like).

--
Craig Ringer



--
Reimer
47-3347-1724 47-9183-0547 msn: carlos.reimer@opendb.com.br

Re: Running out of memory while making a join

From
Tom Lane
Date:
Carlos Henrique Reimer <carlos.reimer@opendb.com.br> writes:
> That is what I got from gdb:

>       ExecutorState: 11586756656 total in 1391 blocks; 4938408 free (6
> chunks); 11581818248 used

So, query-lifespan memory leak.  After poking at this for a bit, I think
the problem has nothing to do with joins; more likely it's because you
are returning a composite column:

    select wm_nfsp from "5611_isarq".wm_nfsp ...

I found out that record_out() leaks sizable amounts of memory, which
won't be recovered till end of query.  You could work around that by
returning "select wm_nfsp.*" instead, but if you really want the result
in composite-column notation, I'd suggest applying this patch:
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=c027d84c81d5e07e58cd25ea38805d6f1ae4dfcd

            regards, tom lane


Re: Running out of memory while making a join

From
Carlos Henrique Reimer
Date:
Hi Tom,

Thank you for the analyzes!

No problem, there is no problem to use "select wm_nfsp.*" but as my concern is to prevent this in the future I think I should apply the fix or is there a config parameter to abend the backend if it reaches some kind of storage limit?

Thank you!

Reimer


On Tue, Nov 13, 2012 at 5:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Carlos Henrique Reimer <carlos.reimer@opendb.com.br> writes:
> That is what I got from gdb:

>       ExecutorState: 11586756656 total in 1391 blocks; 4938408 free (6
> chunks); 11581818248 used

So, query-lifespan memory leak.  After poking at this for a bit, I think
the problem has nothing to do with joins; more likely it's because you
are returning a composite column:

        select wm_nfsp from "5611_isarq".wm_nfsp ...

I found out that record_out() leaks sizable amounts of memory, which
won't be recovered till end of query.  You could work around that by
returning "select wm_nfsp.*" instead, but if you really want the result
in composite-column notation, I'd suggest applying this patch:
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=c027d84c81d5e07e58cd25ea38805d6f1ae4dfcd

                        regards, tom lane



--
Reimer
47-3347-1724 47-9183-0547 msn: carlos.reimer@opendb.com.br

Re: Running out of memory while making a join

From
Tom Lane
Date:
Carlos Henrique Reimer <carlos.reimer@opendb.com.br> writes:
> No problem, there is no problem to use "select wm_nfsp.*" but as my concern
> is to prevent this in the future I think I should apply the fix or is there
> a config parameter to abend the backend if it reaches some kind of storage
> limit?

You could start the postmaster under a smaller ulimit setting ...

            regards, tom lane