Thread: Re: [COMMITTERS] pgsql: Reduce WAL activity for page splits: > Currently, an index split

Re: [COMMITTERS] pgsql: Reduce WAL activity for page splits: > Currently, an index split

From
Stefan Kaltenbrunner
Date:
Bruce Momjian wrote:
> Log Message:
> -----------
> Reduce WAL activity for page splits:
> 
>> Currently, an index split writes all the data on the split page to
>> WAL. That's a lot of WAL traffic. The tuples that are copied to the
>> right page need to be WAL logged, but the tuples that stay on the
>> original page don't.


this patch killed both of my OpenBSD buildfarm members

a backtrace of the crash shows:

(gdb) bt
#0  0x1c055504 in _bt_split (rel=0x835fcab0, buf=53, firstright=50, 
newitemoff=54, newitemsz=76,    newitem=0x8210a364, newitemonleft=0 '\0') at nbtinsert.c:956
#1  0x1c05468b in _bt_insertonpg (rel=0x835fcab0, buf=53, 
stack=0x8210a078, keysz=2, scankey=0x8210a3f0,    itup=0x8210a364, afteritem=0, split_only_page=0 '\0') at 
nbtinsert.c:538
#2  0x1c053fae in _bt_doinsert (rel=0x835fcab0, itup=0x8210a364, 
index_is_unique=1 '\001', heapRel=0x7ec0ee7c)    at nbtinsert.c:141
#3  0x1c05a1c4 in btinsert (fcinfo=0x2) at nbtree.c:224
#4  0x1c2028fc in FunctionCall6 (flinfo=0x8bc6d1f0, arg1=2, arg2=2, 
arg3=2, arg4=2, arg5=2, arg6=2) at fmgr.c:1267
#5  0x1c052d20 in index_insert (indexRelation=0x835fcab0, 
values=0xcf7defd0, isnull=0xcf7df050 "",    heap_t_ctid=0x7df53e50, heapRelation=0x7ec0ee7c, check_uniqueness=1 
'\001') at indexam.c:196
#6  0x1c08f87e in CatalogIndexInsert (indstate=0x2, 
heapTuple=0x8210a224) at indexing.c:124
#7  0x1c08f902 in CatalogUpdateIndexes (heapRel=0x7ec0ee7c, 
heapTuple=0x7df53e4c) at indexing.c:149
#8  0x1c17255c in SetRelationRuleStatus (relationId=10951, relHasRules=1 
'\001', relIsBecomingView=0 '\0')    at rewriteSupport.c:80
#9  0x1c16f074 in DefineQueryRewrite (stmt=0x82ca7e78) at 
rewriteDefine.c:432
#10 0x1c18c822 in PortalRunUtility (portal=0x82cb201c, query=0x81897484, 
dest=0x3c0c8ff4,    completionTag=0xcf7df2a0 "") at pquery.c:1063
#11 0x1c18cb12 in PortalRunMulti (portal=0x82cb201c, dest=0x3c0c8ff4, 
altdest=0x3c0c8ff4,    completionTag=0xcf7df2a0 "") at pquery.c:1131
#12 0x1c18c28a in PortalRun (portal=0x82cb201c, count=2147483647, 
dest=0x3c0c8ff4, altdest=0x3c0c8ff4,    completionTag=0xcf7df2a0 "") at pquery.c:700
#13 0x1c187d13 in exec_simple_query (    query_string=0x84a2501c "/*\n * PostgreSQL System Views\n *\n * 
Copyright (c) 1996-2007, PostgreSQL Global Development Group\n *\n * 
$PostgreSQL: pgsql/src/backend/catalog/system_views.sql,v 1.35 
2007/01/05 22:19:25 momjian Exp $\n"...) at postgres.c:939
#14 0x1c18abb2 in PostgresMain (argc=1, argv=0xcf7df480, 
username=0x80ba55b0 "mastermind") at postgres.c:3423
#15 0x1c122102 in main (argc=10, argv=0xcf7df47c) at main.c:186


Stefan


Stefan Kaltenbrunner wrote:
> Bruce Momjian wrote:
>> Log Message:
>> -----------
>> Reduce WAL activity for page splits:
>>
>>> Currently, an index split writes all the data on the split page to
>>> WAL. That's a lot of WAL traffic. The tuples that are copied to the
>>> right page need to be WAL logged, but the tuples that stay on the
>>> original page don't.
> 
> 
> this patch killed both of my OpenBSD buildfarm members

Sorry about that.. I'll take a look at it.

> 
> a backtrace of the crash shows:
> 
> (gdb) bt
> #0  0x1c055504 in _bt_split (rel=0x835fcab0, buf=53, firstright=50, 
> newitemoff=54, newitemsz=76,
>     newitem=0x8210a364, newitemonleft=0 '\0') at nbtinsert.c:956
> #1  0x1c05468b in _bt_insertonpg (rel=0x835fcab0, buf=53, 
> stack=0x8210a078, keysz=2, scankey=0x8210a3f0,
>     itup=0x8210a364, afteritem=0, split_only_page=0 '\0') at 
> nbtinsert.c:538
> #2  0x1c053fae in _bt_doinsert (rel=0x835fcab0, itup=0x8210a364, 
> index_is_unique=1 '\001', heapRel=0x7ec0ee7c)
>     at nbtinsert.c:141
> #3  0x1c05a1c4 in btinsert (fcinfo=0x2) at nbtree.c:224
> #4  0x1c2028fc in FunctionCall6 (flinfo=0x8bc6d1f0, arg1=2, arg2=2, 
> arg3=2, arg4=2, arg5=2, arg6=2) at fmgr.c:1267
> #5  0x1c052d20 in index_insert (indexRelation=0x835fcab0, 
> values=0xcf7defd0, isnull=0xcf7df050 "",
>     heap_t_ctid=0x7df53e50, heapRelation=0x7ec0ee7c, check_uniqueness=1 
> '\001') at indexam.c:196
> #6  0x1c08f87e in CatalogIndexInsert (indstate=0x2, 
> heapTuple=0x8210a224) at indexing.c:124
> #7  0x1c08f902 in CatalogUpdateIndexes (heapRel=0x7ec0ee7c, 
> heapTuple=0x7df53e4c) at indexing.c:149
> #8  0x1c17255c in SetRelationRuleStatus (relationId=10951, relHasRules=1 
> '\001', relIsBecomingView=0 '\0')
>     at rewriteSupport.c:80
> #9  0x1c16f074 in DefineQueryRewrite (stmt=0x82ca7e78) at 
> rewriteDefine.c:432
> #10 0x1c18c822 in PortalRunUtility (portal=0x82cb201c, query=0x81897484, 
> dest=0x3c0c8ff4,
>     completionTag=0xcf7df2a0 "") at pquery.c:1063
> #11 0x1c18cb12 in PortalRunMulti (portal=0x82cb201c, dest=0x3c0c8ff4, 
> altdest=0x3c0c8ff4,
>     completionTag=0xcf7df2a0 "") at pquery.c:1131
> #12 0x1c18c28a in PortalRun (portal=0x82cb201c, count=2147483647, 
> dest=0x3c0c8ff4, altdest=0x3c0c8ff4,
>     completionTag=0xcf7df2a0 "") at pquery.c:700
> #13 0x1c187d13 in exec_simple_query (
>     query_string=0x84a2501c "/*\n * PostgreSQL System Views\n *\n * 
> Copyright (c) 1996-2007, PostgreSQL Global Development Group\n *\n * 
> $PostgreSQL: pgsql/src/backend/catalog/system_views.sql,v 1.35 
> 2007/01/05 22:19:25 momjian Exp $\n"...) at postgres.c:939
> #14 0x1c18abb2 in PostgresMain (argc=1, argv=0xcf7df480, 
> username=0x80ba55b0 "mastermind") at postgres.c:3423
> #15 0x1c122102 in main (argc=10, argv=0xcf7df47c) at main.c:186
> 
> 
> Stefan
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match


--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Stefan Kaltenbrunner wrote:
> Bruce Momjian wrote:
>> Log Message:
>> -----------
>> Reduce WAL activity for page splits:
>>
>>> Currently, an index split writes all the data on the split page to
>>> WAL. That's a lot of WAL traffic. The tuples that are copied to the
>>> right page need to be WAL logged, but the tuples that stay on the
>>> original page don't.
>
>
> this patch killed both of my OpenBSD buildfarm members

lopaque was referenced after pfreeing the temp page it pointed to. Also
later in the function the LSN of the left page was set, but again using
a pointer to the pfreed temp copy instead of the real shared memory buffer.

Here's a fix.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
Index: src/backend/access/nbtree/nbtinsert.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/nbtree/nbtinsert.c,v
retrieving revision 1.150
diff -c -r1.150 nbtinsert.c
*** src/backend/access/nbtree/nbtinsert.c    8 Feb 2007 05:05:53 -0000    1.150
--- src/backend/access/nbtree/nbtinsert.c    8 Feb 2007 09:36:05 -0000
***************
*** 953,959 ****
          xlrec.rightsib = BufferGetBlockNumber(rbuf);
          xlrec.firstright = firstright;
          xlrec.rnext = ropaque->btpo_next;
!         xlrec.level = lopaque->btpo.level;

          rdata[0].data = (char *) &xlrec;
          rdata[0].len = SizeOfBtreeSplit;
--- 953,959 ----
          xlrec.rightsib = BufferGetBlockNumber(rbuf);
          xlrec.firstright = firstright;
          xlrec.rnext = ropaque->btpo_next;
!         xlrec.level = ropaque->btpo.level;

          rdata[0].data = (char *) &xlrec;
          rdata[0].len = SizeOfBtreeSplit;
***************
*** 962,968 ****
          lastrdata = &rdata[0];

          /* Log downlink on non-leaf pages. */
!         if (lopaque->btpo.level > 0)
          {
              lastrdata->next = lastrdata + 1;
              lastrdata++;
--- 962,968 ----
          lastrdata = &rdata[0];

          /* Log downlink on non-leaf pages. */
!         if (ropaque->btpo.level > 0)
          {
              lastrdata->next = lastrdata + 1;
              lastrdata++;
***************
*** 1040,1047 ****

          recptr = XLogInsert(RM_BTREE_ID, xlinfo, rdata);

!         PageSetLSN(leftpage, recptr);
!         PageSetTLI(leftpage, ThisTimeLineID);
          PageSetLSN(rightpage, recptr);
          PageSetTLI(rightpage, ThisTimeLineID);
          if (!P_RIGHTMOST(ropaque))
--- 1040,1047 ----

          recptr = XLogInsert(RM_BTREE_ID, xlinfo, rdata);

!         PageSetLSN(origpage, recptr);
!         PageSetTLI(origpage, ThisTimeLineID);
          PageSetLSN(rightpage, recptr);
          PageSetTLI(rightpage, ThisTimeLineID);
          if (!P_RIGHTMOST(ropaque))

Re: Re: [COMMITTERS] pgsql: Reduce WAL activity for page splits: > Currently, an index split

From
Stefan Kaltenbrunner
Date:
Heikki Linnakangas wrote:
> Stefan Kaltenbrunner wrote:
>> Bruce Momjian wrote:
>>> Log Message:
>>> -----------
>>> Reduce WAL activity for page splits:
>>>
>>>> Currently, an index split writes all the data on the split page to
>>>> WAL. That's a lot of WAL traffic. The tuples that are copied to the
>>>> right page need to be WAL logged, but the tuples that stay on the
>>>> original page don't.
>>
>>
>> this patch killed both of my OpenBSD buildfarm members
> 
> lopaque was referenced after pfreeing the temp page it pointed to. Also 
> later in the function the LSN of the left page was set, but again using 
> a pointer to the pfreed temp copy instead of the real shared memory buffer.
> 
> Here's a fix.

confirmed - with that patch -HEAD passes a full regression test run at 
least on emu.


Stefan


Heikki Linnakangas wrote:
> Stefan Kaltenbrunner wrote:
> >Bruce Momjian wrote:
> >>Log Message:
> >>-----------
> >>Reduce WAL activity for page splits:
> >>
> >>>Currently, an index split writes all the data on the split page to
> >>>WAL. That's a lot of WAL traffic. The tuples that are copied to the
> >>>right page need to be WAL logged, but the tuples that stay on the
> >>>original page don't.
> >
> >
> >this patch killed both of my OpenBSD buildfarm members

Please note that zebra died for a different reason the second time (not
enough shared memory).

> lopaque was referenced after pfreeing the temp page it pointed to. Also 
> later in the function the LSN of the left page was set, but again using 
> a pointer to the pfreed temp copy instead of the real shared memory buffer.

Applied, thanks.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Re: [COMMITTERS] pgsql: Reduce WAL activity for page splits: > Currently, an index split

From
Stefan Kaltenbrunner
Date:
Alvaro Herrera wrote:
> Heikki Linnakangas wrote:
>> Stefan Kaltenbrunner wrote:
>>> Bruce Momjian wrote:
>>>> Log Message:
>>>> -----------
>>>> Reduce WAL activity for page splits:
>>>>
>>>>> Currently, an index split writes all the data on the split page to
>>>>> WAL. That's a lot of WAL traffic. The tuples that are copied to the
>>>>> right page need to be WAL logged, but the tuples that stay on the
>>>>> original page don't.
>>>
>>> this patch killed both of my OpenBSD buildfarm members
> 
> Please note that zebra died for a different reason the second time (not
> enough shared memory).

yeah that's because initdb is leaving the shm-segments there on a failure.
This resulted in the out-of-shm failure on the following run which I 
have manually cleared now ...

Stefan