Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem - Mailing list pgsql-hackers

From Anastasia Lubennikova
Subject Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem
Date
Msg-id d7649878-8d73-20eb-dc60-c26ac4e495d1@postgrespro.ru
Whole thread Raw
In response to Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem  (Claudio Freire <klaussfreire@gmail.com>)
Responses Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem  (Claudio Freire <klaussfreire@gmail.com>)
Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
23.12.2016 22:54, Claudio Freire:
On Fri, Dec 23, 2016 at 1:39 PM, Anastasia Lubennikova
<a.lubennikova@postgrespro.ru> wrote:
I found the reason. I configure postgres with CFLAGS="-O0" and it causes
Segfault on initdb.
It works fine and passes tests with default configure flags, but I'm pretty
sure that we should fix segfault before testing the feature.
If you need it, I'll send a core dump.
I just ran it with CFLAGS="-O0" and it passes all checks too:

CFLAGS='-O0' ./configure --enable-debug --enable-cassert
make clean && make -j8 && make check-world

A stacktrace and a thorough description of your build environment
would be helpful to understand why it breaks on your system.

I ran configure using following set of flags:
 ./configure --enable-tap-tests --enable-cassert --enable-debug --enable-depend CFLAGS="-O0 -g3 -fno-omit-frame-pointer"
And then ran make check. Here is the stacktrace:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00000000006941e7 in lazy_vacuum_heap (onerel=0x1ec2360, vacrelstats=0x1ef6e00) at vacuumlazy.c:1417
1417                tblk = ItemPointerGetBlockNumber(&seg->dead_tuples[tupindex]);
(gdb) bt
#0  0x00000000006941e7 in lazy_vacuum_heap (onerel=0x1ec2360, vacrelstats=0x1ef6e00) at vacuumlazy.c:1417
#1  0x0000000000693dfe in lazy_scan_heap (onerel=0x1ec2360, options=9, vacrelstats=0x1ef6e00, Irel=0x1ef7168, nindexes=2, aggressive=1 '\001')
    at vacuumlazy.c:1337
#2  0x0000000000691e66 in lazy_vacuum_rel (onerel=0x1ec2360, options=9, params=0x7ffe0f866310, bstrategy=0x1f1c4a8) at vacuumlazy.c:290
#3  0x000000000069191f in vacuum_rel (relid=1247, relation=0x0, options=9, params=0x7ffe0f866310) at vacuum.c:1418
#4  0x0000000000690122 in vacuum (options=9, relation=0x0, relid=0, params=0x7ffe0f866310, va_cols=0x0, bstrategy=0x1f1c4a8,
    isTopLevel=1 '\001') at vacuum.c:320
#5  0x000000000068fd0b in vacuum (options=-1652367447, relation=0x0, relid=3324614038, params=0x1f11bf0, va_cols=0xb59f63,
    bstrategy=0x1f1c620, isTopLevel=0 '\000') at vacuum.c:150
#6  0x0000000000852993 in standard_ProcessUtility (parsetree=0x1f07e60, queryString=0x1f07468 "VACUUM FREEZE;\n",
    context=PROCESS_UTILITY_TOPLEVEL, params=0x0, dest=0xea5cc0 <debugtupDR>, completionTag=0x7ffe0f866750 "") at utility.c:669
#7  0x00000000008520da in standard_ProcessUtility (parsetree=0x401ef6cd8, queryString=0x18 <error: Cannot access memory at address 0x18>,
    context=PROCESS_UTILITY_TOPLEVEL, params=0x68, dest=0x9e5d62 <AllocSetFree+60>, completionTag=0x7ffe0f8663f0 "`~\360\001")
    at utility.c:360
#8  0x0000000000851161 in PortalRunMulti (portal=0x7ffe0f866750, isTopLevel=0 '\000', setHoldSnapshot=-39 '\331',
    dest=0x851161 <PortalRunMulti+19>, altdest=0x7ffe0f8664f0, completionTag=0x1f07e60 "\341\002") at pquery.c:1219
#9  0x0000000000851374 in PortalRunMulti (portal=0x1f0a488, isTopLevel=1 '\001', setHoldSnapshot=0 '\000', dest=0xea5cc0 <debugtupDR>,
    altdest=0xea5cc0 <debugtupDR>, completionTag=0x7ffe0f866750 "") at pquery.c:1345
#10 0x0000000000850889 in PortalRun (portal=0x1f0a488, count=9223372036854775807, isTopLevel=1 '\001', dest=0xea5cc0 <debugtupDR>,
    altdest=0xea5cc0 <debugtupDR>, completionTag=0x7ffe0f866750 "") at pquery.c:824
#11 0x000000000084a4dc in exec_simple_query (query_string=0x1f07468 "VACUUM FREEZE;\n") at postgres.c:1113
#12 0x000000000084e960 in PostgresMain (argc=10, argv=0x1e60a50, dbname=0x1e823b0 "template1", username=0x1e672a0 "anastasia")
    at postgres.c:4091
#13 0x00000000006f967e in init_locale (categoryname=0x100000000000000 <error: Cannot access memory at address 0x100000000000000>,
    category=32766, locale=0xa004692f0 <error: Cannot access memory at address 0xa004692f0>) at main.c:310
#14 0x00007f1e5f463830 in __libc_start_main (main=0x6f93e1 <main+85>, argc=10, argv=0x7ffe0f866a78, init=<optimized out>,
    fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe0f866a68) at ../csu/libc-start.c:291
#15 0x0000000000469319 in _start ()

core file is quite big, so I didn't attach it to the mail. You can download it here: core dump file.

Here are some notes about the first patch:

1. prefetchBlkno = blkno & ~0x1f;
    prefetchBlkno = (prefetchBlkno > 32) ? prefetchBlkno - 32 : 0;

I didn't get it what for we need these tricks. How does it differ from:
prefetchBlkno = (blkno > 32) ? blkno - 32 : 0;

2. Why do we decrease prefetchBlckno twice?

Here:
+    prefetchBlkno = (prefetchBlkno > 32) ? prefetchBlkno - 32 : 0;
And here:
if (prefetchBlkno >= 32)
+                prefetchBlkno -= 32;
   

I'll inspect second patch in a few days and write questions about it.

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: [HACKERS] ALTER TABLE parent SET WITHOUT OIDS and the oid column
Next
From: Alvaro Herrera
Date:
Subject: Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem