Thread: pg_autovacuum w/ dbt2

pg_autovacuum w/ dbt2

From

Mark Wong

Date:

21 December 2004, 17:41:47

After all this time I finally got around to vacuuming the database
with dbt2 with pg_autovacuum. :)http://www.osdl.org/projects/dbt2dev/results/dev4-010/215/

Doesn't look so good though, probably because I'm not using optimal
settings with pg_autovacuum.  So far I have only tried the default
settings (running without any arguments, except -D).

The only thing that's peculiar is a number of unexpected rollbacks
across all of the transactions.  I suspect it was something to do with
these messages coming from pg_autovacuum:

[2004-12-20 15:48:18 PST] ERROR:   Can not refresh statistics information from the database dbt2.
[2004-12-20 15:48:18 PST]          The error is [ERROR:  failed to re-find parent key in "pk_district"
]

This is with 8.0rc1.  I can get rc2 installed since it just came out.
So let me know what I can try and what not.

Mark

Re: pg_autovacuum w/ dbt2

From

Tom Lane

Date:

21 December 2004, 19:23:50

Mark Wong <markw@osdl.org> writes:
> [2004-12-20 15:48:18 PST]          The error is [ERROR:  failed to re-find parent key in "pk_district"
> ]

Yikes.  Is this reproducible?
        regards, tom lane

Re: pg_autovacuum w/ dbt2

From

"Matthew T. O'Connor"

Date:

21 December 2004, 19:34:18

Mark Wong wrote:

>After all this time I finally got around to vacuuming the database
>with dbt2 with pg_autovacuum. :)
>    http://www.osdl.org/projects/dbt2dev/results/dev4-010/215/
>  
>
Thanks!

>Doesn't look so good though, probably because I'm not using optimal
>settings with pg_autovacuum.  So far I have only tried the default
>settings (running without any arguments, except -D).
>  
>
I don't know what you mean by "Not Good" since I don't have graphs from 
a similar test without pg_autovacuum handy.  Do you have a link to such 
a test?

As for better pg_autovacuum settings, It appears that the little 
performance dips are happening about once every 5 minutes, which if I 
remember correctly is the default sleep time.  You might try playing 
with the lazy vacuum settings to see if that smooths out the curve.  
Beyond that all you can do is play with the thresholds to see if there 
is a better sweet spot than the defaults (which by the way I have no 
confidence in, they were just conservative guesses)

>The only thing that's peculiar is a number of unexpected rollbacks
>across all of the transactions.  I suspect it was something to do with
>these messages coming from pg_autovacuum:
>
>[2004-12-20 15:48:18 PST] ERROR:   Can not refresh statistics information from the database dbt2.
>[2004-12-20 15:48:18 PST]          The error is [ERROR:  failed to re-find parent key in "pk_district"
>]
>  
>
Not sure what this is all about, but if you turn up the debug level to 4 
or greater (pg_autovacuum -d4), pg_autovacuum will log the query that is 
causing the problems, that would be helpful output to have.

>This is with 8.0rc1.  I can get rc2 installed since it just came out.
>So let me know what I can try and what not.
>  
>
I don't think anything has changed for pg_autovacuum between rc1 and rc2.


thanks again for the good work!!!

Re: pg_autovacuum w/ dbt2

From

Mark Wong

Date:

21 December 2004, 22:44:18

On Tue, Dec 21, 2004 at 02:23:41PM -0500, Tom Lane wrote:
> Mark Wong <markw@osdl.org> writes:
> > [2004-12-20 15:48:18 PST]          The error is [ERROR:  failed to re-find parent key in "pk_district"
> > ]
> 
> Yikes.  Is this reproducible?
> 
>             regards, tom lane

Yes, and I think there is one for each of the rollbacks that are
occuring in the workload.  Except for the 1% that's supposed to happen
for the new-order transaction.

Mark

Re: pg_autovacuum w/ dbt2

From

Mark Wong

Date:

21 December 2004, 22:47:27

The overall throughput is better for a run like this:http://www.osdl.org/projects/dbt2dev/results/dev4-010/207/

A drop from 3865 to 2679 (31%) by just adding pg_autovacuum.  That's
what I meant by "not good". :)

I'll start with the additional debug messages, with 8.0rc2, before
I start changing the other settings, if that sounds good.

Mark

On Tue, Dec 21, 2004 at 02:33:57PM -0500, Matthew T. O'Connor wrote:
> Mark Wong wrote:
> 
> >After all this time I finally got around to vacuuming the database
> >with dbt2 with pg_autovacuum. :)
> >    http://www.osdl.org/projects/dbt2dev/results/dev4-010/215/
> >  
> >
> Thanks!
> 
> >Doesn't look so good though, probably because I'm not using optimal
> >settings with pg_autovacuum.  So far I have only tried the default
> >settings (running without any arguments, except -D).
> >  
> >
> I don't know what you mean by "Not Good" since I don't have graphs from 
> a similar test without pg_autovacuum handy.  Do you have a link to such 
> a test?
>
> As for better pg_autovacuum settings, It appears that the little 
> performance dips are happening about once every 5 minutes, which if I 
> remember correctly is the default sleep time.  You might try playing 
> with the lazy vacuum settings to see if that smooths out the curve.  
> Beyond that all you can do is play with the thresholds to see if there 
> is a better sweet spot than the defaults (which by the way I have no 
> confidence in, they were just conservative guesses)
> 
> >The only thing that's peculiar is a number of unexpected rollbacks
> >across all of the transactions.  I suspect it was something to do with
> >these messages coming from pg_autovacuum:
> >
> >[2004-12-20 15:48:18 PST] ERROR:   Can not refresh statistics information from the database dbt2.
> >[2004-12-20 15:48:18 PST]          The error is [ERROR:  failed to re-find parent key in "pk_district"
> >]
> >  
> >
> Not sure what this is all about, but if you turn up the debug level to 4 
> or greater (pg_autovacuum -d4), pg_autovacuum will log the query that is 
> causing the problems, that would be helpful output to have.
> 
> >This is with 8.0rc1.  I can get rc2 installed since it just came out.
> >So let me know what I can try and what not.
> >  
> >
> I don't think anything has changed for pg_autovacuum between rc1 and rc2.
> 
> 
> thanks again for the good work!!!

Re: pg_autovacuum w/ dbt2

From

Tom Lane

Date:

21 December 2004, 22:56:59

Mark Wong <markw@osdl.org> writes:
> On Tue, Dec 21, 2004 at 02:23:41PM -0500, Tom Lane wrote:
>> Mark Wong <markw@osdl.org> writes:
>>> [2004-12-20 15:48:18 PST]          The error is [ERROR:  failed to re-find parent key in "pk_district"
>> 
>> Yikes.  Is this reproducible?

> Yes, and I think there is one for each of the rollbacks that are
> occuring in the workload.  Except for the 1% that's supposed to happen
> for the new-order transaction.

Well, we need to find out what's causing that.  There are two possible
sources of that error (one elog in src/backend/access/nbtree/nbtinsert.c,
and one in src/backend/access/nbtree/nbtpage.c) and neither of them
should ever fire.

If you want to track it yourself, please change those elog(ERROR)s to
elog(PANIC) so that they'll generate core dumps, then build with
--enable-debug if you didn't already (--enable-cassert would be good too)
and get a debugger stack trace from the core dump.

Otherwise, can you extract a test case that causes this without needing
vast resources to run?
        regards, tom lane

Re: pg_autovacuum w/ dbt2

From

Mark Wong

Date:

21 December 2004, 23:16:17

On Tue, Dec 21, 2004 at 05:56:47PM -0500, Tom Lane wrote:
> Mark Wong <markw@osdl.org> writes:
> > On Tue, Dec 21, 2004 at 02:23:41PM -0500, Tom Lane wrote:
> >> Mark Wong <markw@osdl.org> writes:
> >>> [2004-12-20 15:48:18 PST]          The error is [ERROR:  failed to re-find parent key in "pk_district"
> >> 
> >> Yikes.  Is this reproducible?
> 
> > Yes, and I think there is one for each of the rollbacks that are
> > occuring in the workload.  Except for the 1% that's supposed to happen
> > for the new-order transaction.
> 
> Well, we need to find out what's causing that.  There are two possible
> sources of that error (one elog in src/backend/access/nbtree/nbtinsert.c,
> and one in src/backend/access/nbtree/nbtpage.c) and neither of them
> should ever fire.
> 
> If you want to track it yourself, please change those elog(ERROR)s to
> elog(PANIC) so that they'll generate core dumps, then build with
> --enable-debug if you didn't already (--enable-cassert would be good too)
> and get a debugger stack trace from the core dump.
> 
> Otherwise, can you extract a test case that causes this without needing
> vast resources to run?
> 
>             regards, tom lane

I was going to try Matthew's suggestion of turning up the debug on
pg_autovacuum, unless you don't that'll help find the cause.  I'm not
sure if I can more easily reproduce the problem but i can try.

I'll go ahead and make the elog() changes you recommended and do a run
overnight either way.

Mark

Re: pg_autovacuum w/ dbt2

From

Tom Lane

Date:

21 December 2004, 23:21:22

Mark Wong <markw@osdl.org> writes:
> I was going to try Matthew's suggestion of turning up the debug on
> pg_autovacuum, unless you don't that'll help find the cause.

It won't help --- this is a backend-internal bug of some kind.
        regards, tom lane

Re: pg_autovacuum w/ dbt2

From

"Matthew T. O'Connor"

Date:

21 December 2004, 23:51:32

Mark Wong wrote:

>The overall throughput is better for a run like this:
>    http://www.osdl.org/projects/dbt2dev/results/dev4-010/207/
>
>A drop from 3865 to 2679 (31%) by just adding pg_autovacuum.  That's
>what I meant by "not good". :)
>  
>
I would agree that is "not good" :-)  It sounds like pg_autovacuum is 
being to aggressive for this type of load, that is vacuuming more often 
than needed, however the lazy vacuum options were added so as to reduce 
the performance impact of running a vacuum while doing other things, so, 
I would recommend both higher autovacuum thresholds and trying out some 
of the lazy vacuum settings. 

>I'll start with the additional debug messages, with 8.0rc2, before
>I start changing the other settings, if that sounds good.
>

Sounds fine.  From Tom Lane's response, we have a backend bug that needs 
to be resolved and I think that is the priority.

Re: [Testperf-general] pg_autovacuum w/ dbt2

From

Mark Wong

Date:

07 January 2005, 00:43:57

I apologize for the significant delay, here's a link to results to a
test with 8.0rc3:http://www.osdl.org/projects/dbt2dev/results/dev4-010/236/

These are the same parameters with as run 215, listed below with the
but with --enable-debug --enable-cassert.  I also ran pg_autovacuum
with -d4, where the output can be seen here:http://www.osdl.org/projects/dbt2dev/results/dev4-010/236/dbt2.out

I, uh, wasn't able to reproduce the previous errors after repairing my
filesystems after a power outage.  So I figure that might be good news.
The performance is up from run 215 with the errors, so I'll continue
with trying to tune some of the pg_autovacuum values.

Mark

On Tue, Dec 21, 2004 at 09:41:31AM -0800, Mark Wong wrote:
> After all this time I finally got around to vacuuming the database
> with dbt2 with pg_autovacuum. :)
>     http://www.osdl.org/projects/dbt2dev/results/dev4-010/215/
> 
> Doesn't look so good though, probably because I'm not using optimal
> settings with pg_autovacuum.  So far I have only tried the default
> settings (running without any arguments, except -D).
> 
> The only thing that's peculiar is a number of unexpected rollbacks
> across all of the transactions.  I suspect it was something to do with
> these messages coming from pg_autovacuum:
> 
> [2004-12-20 15:48:18 PST] ERROR:   Can not refresh statistics information from the database dbt2.
> [2004-12-20 15:48:18 PST]          The error is [ERROR:  failed to re-find parent key in "pk_district"
> ]
> 
> This is with 8.0rc1.  I can get rc2 installed since it just came out.
> So let me know what I can try and what not.
> 
> Mark

Re: [Testperf-general] pg_autovacuum w/ dbt2

From

"Matthew T. O'Connor"

Date:

07 January 2005, 13:51:17

I'm curious, the original run you posted with 3825 NOTPM is still 17% 
faster than the latest pg_autovacuum run which shows 3280 NOTPM.  Is 
this on the same hardware?  Also, did the original non-pg_autovacuum run 
any manual vacuum commands?  Also, does the non-pg_autovacuum run start 
slowing down after a while? The graphs look like there is a slight 
decline in performance as time goes on, what happens if you double the 
length of the test?

Thanks for doing the testing!

Matthew


Mark Wong wrote:

>I apologize for the significant delay, here's a link to results to a
>test with 8.0rc3:
>    http://www.osdl.org/projects/dbt2dev/results/dev4-010/236/
>
>These are the same parameters with as run 215, listed below with the
>but with --enable-debug --enable-cassert.  I also ran pg_autovacuum
>with -d4, where the output can be seen here:
>    http://www.osdl.org/projects/dbt2dev/results/dev4-010/236/dbt2.out
>
>I, uh, wasn't able to reproduce the previous errors after repairing my
>filesystems after a power outage.  So I figure that might be good news.
>The performance is up from run 215 with the errors, so I'll continue
>with trying to tune some of the pg_autovacuum values.
>
>Mark
>
>
>On Tue, Dec 21, 2004 at 09:41:31AM -0800, Mark Wong wrote:
>  
>
>>After all this time I finally got around to vacuuming the database
>>with dbt2 with pg_autovacuum. :)
>>    http://www.osdl.org/projects/dbt2dev/results/dev4-010/215/
>>
>>Doesn't look so good though, probably because I'm not using optimal
>>settings with pg_autovacuum.  So far I have only tried the default
>>settings (running without any arguments, except -D).
>>
>>The only thing that's peculiar is a number of unexpected rollbacks
>>across all of the transactions.  I suspect it was something to do with
>>these messages coming from pg_autovacuum:
>>
>>[2004-12-20 15:48:18 PST] ERROR:   Can not refresh statistics information from the database dbt2.
>>[2004-12-20 15:48:18 PST]          The error is [ERROR:  failed to re-find parent key in "pk_district"
>>]
>>
>>This is with 8.0rc1.  I can get rc2 installed since it just came out.
>>So let me know what I can try and what not.
>>
>>Mark
>>    
>>
>
>  
>

Re: [Testperf-general] pg_autovacuum w/ dbt2

From

Mark Wong

Date:

07 January 2005, 16:48:56

Yeah, same hardware and database configuration.

No manual vacuum commands before.  The decline in performance has been
pretty consistent in all my previous tests and people have told me on
many occasions that the decline in performance was probably because I
was never using vacuum...

Mark

On Fri, Jan 07, 2005 at 08:48:52AM -0500, Matthew T. O'Connor wrote:
> I'm curious, the original run you posted with 3825 NOTPM is still 17% 
> faster than the latest pg_autovacuum run which shows 3280 NOTPM.  Is 
> this on the same hardware?  Also, did the original non-pg_autovacuum run 
> any manual vacuum commands?  Also, does the non-pg_autovacuum run start 
> slowing down after a while? The graphs look like there is a slight 
> decline in performance as time goes on, what happens if you double the 
> length of the test?
> 
> Thanks for doing the testing!
> 
> Matthew
> 
> 
> Mark Wong wrote:
> 
> >I apologize for the significant delay, here's a link to results to a
> >test with 8.0rc3:
> >    http://www.osdl.org/projects/dbt2dev/results/dev4-010/236/
> >
> >These are the same parameters with as run 215, listed below with the
> >but with --enable-debug --enable-cassert.  I also ran pg_autovacuum
> >with -d4, where the output can be seen here:
> >    http://www.osdl.org/projects/dbt2dev/results/dev4-010/236/dbt2.out
> >
> >I, uh, wasn't able to reproduce the previous errors after repairing my
> >filesystems after a power outage.  So I figure that might be good news.
> >The performance is up from run 215 with the errors, so I'll continue
> >with trying to tune some of the pg_autovacuum values.
> >
> >Mark
> >
> >
> >On Tue, Dec 21, 2004 at 09:41:31AM -0800, Mark Wong wrote:
> >  
> >
> >>After all this time I finally got around to vacuuming the database
> >>with dbt2 with pg_autovacuum. :)
> >>    http://www.osdl.org/projects/dbt2dev/results/dev4-010/215/
> >>
> >>Doesn't look so good though, probably because I'm not using optimal
> >>settings with pg_autovacuum.  So far I have only tried the default
> >>settings (running without any arguments, except -D).
> >>
> >>The only thing that's peculiar is a number of unexpected rollbacks
> >>across all of the transactions.  I suspect it was something to do with
> >>these messages coming from pg_autovacuum:
> >>
> >>[2004-12-20 15:48:18 PST] ERROR:   Can not refresh statistics information from the database dbt2.
> >>[2004-12-20 15:48:18 PST]          The error is [ERROR:  failed to re-find parent key in "pk_district"
> >>]
> >>
> >>This is with 8.0rc1.  I can get rc2 installed since it just came out.
> >>So let me know what I can try and what not.
> >>
> >>Mark
> >>    
> >>
> >
> >  
> >

-- 
Mark Wong - - markw@osdl.org
Open Source Development Lab Inc - A non-profit corporation
12725 SW Millikan Way - Suite 400 - Beaverton, OR 97005
(503) 906-1932 (office)
(503) 626-2436 (fax)
http://developer.osdl.org/markw/

Re: [Testperf-general] pg_autovacuum w/ dbt2

From

"Matthew T. O'Connor"

Date:

07 January 2005, 17:29:35

Ok, but what I'm curious to do is see if you run the non-pg_autovacuum 
test for a "long time" (4 hours? more?) when does it get slower that 
running with pg_autovacuum.  And, can you demonstrate that running the 
tests with pg_autovacuum for a long time (say 4 hours) that the 
performance stays steady.

Also, I would very much like to see this test run with pg_autovacuum and 
it's vacuum delay settings enabled.

Matthew

ps: I know time is limited and these tests take a lot of time to run, so 
please take my requests with a grain of salt, all I'm saying is that I 
think these would be interesting results to see.

Mark Wong wrote:

>Yeah, same hardware and database configuration.
>
>No manual vacuum commands before.  The decline in performance has been
>pretty consistent in all my previous tests and people have told me on
>many occasions that the decline in performance was probably because I
>was never using vacuum...
>
>Mark
>
>On Fri, Jan 07, 2005 at 08:48:52AM -0500, Matthew T. O'Connor wrote:
>  
>
>>I'm curious, the original run you posted with 3825 NOTPM is still 17% 
>>faster than the latest pg_autovacuum run which shows 3280 NOTPM.  Is 
>>this on the same hardware?  Also, did the original non-pg_autovacuum run 
>>any manual vacuum commands?  Also, does the non-pg_autovacuum run start 
>>slowing down after a while? The graphs look like there is a slight 
>>decline in performance as time goes on, what happens if you double the 
>>length of the test?
>>
>>Thanks for doing the testing!
>>
>>Matthew
>>    
>>

Re: [Testperf-general] pg_autovacuum w/ dbt2

From

Josh Berkus

Date:

07 January 2005, 17:59:27

Mark,

> No manual vacuum commands before.  The decline in performance has been
> pretty consistent in all my previous tests and people have told me on
> many occasions that the decline in performance was probably because I
> was never using vacuum...

Hmmm ... what autovacuum params are you using?  From the look of the graphs, 
VACUUM is running every 6 minutes or so, and having a huge impact on 
performance when it does run.  Give me a bit of time to crunch the database 
log, and we'll see which table vacuums were taking the most time.

Can you re-run this test with a fairly generous vacuum_delay setting? Given 
that the performance drops seem to correspond to wait times, I think this 
could mean a big difference.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

Re: [Testperf-general] pg_autovacuum w/ dbt2

From

Mark Wong

Date:

07 January 2005, 18:02:04

On Fri, Jan 07, 2005 at 09:58:47AM -0800, Josh Berkus wrote:
> Mark,
> 
> > No manual vacuum commands before.  The decline in performance has been
> > pretty consistent in all my previous tests and people have told me on
> > many occasions that the decline in performance was probably because I
> > was never using vacuum...
> 
> Hmmm ... what autovacuum params are you using?  From the look of the graphs, 
> VACUUM is running every 6 minutes or so, and having a huge impact on 
> performance when it does run.  Give me a bit of time to crunch the database 
> log, and we'll see which table vacuums were taking the most time.
> 
> Can you re-run this test with a fairly generous vacuum_delay setting? Given 
> that the performance drops seem to correspond to wait times, I think this 
> could mean a big difference.
> 

All default parameters.  Matthew also recommended using the
vacuum_delay setting so I was about to try that.

Mark

Re: [Testperf-general] pg_autovacuum w/ dbt2

From

Josh Berkus

Date:

07 January 2005, 18:14:30

Mark,

> All default parameters.  Matthew also recommended using the
> vacuum_delay setting so I was about to try that.

Interesting ... the default parameters are quite conservative, running only 
when the table has doubled in new rows.   So if those spikes are vacuums, 
then the DBT2 test is updating or adding 100% of the rows in the orders table 
every 6-8 minutes?    Does that seem right?

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

Re: [Testperf-general] pg_autovacuum w/ dbt2

From

Mark Wong

Date:

07 January 2005, 18:30:34

On Fri, Jan 07, 2005 at 10:13:52AM -0800, Josh Berkus wrote:
> Mark,
> 
> > All default parameters.  Matthew also recommended using the
> > vacuum_delay setting so I was about to try that.
> 
> Interesting ... the default parameters are quite conservative, running only 
> when the table has doubled in new rows.   So if those spikes are vacuums, 
> then the DBT2 test is updating or adding 100% of the rows in the orders table 
> every 6-8 minutes?    Does that seem right?
> 

Some hand-wavey math here:

We start with 18,000,000 rows in the orders table.

Based one 3281 transactions per minute for just the new-order transaction:288 transactions per minute for delivery

New-order inserts 3281 rows per minute (not counting the 1% of rollbacks.)
Delivery updates 288 rows per minute in orders.

So no, we're not change that many rows per minute...

Mark

Re: [Testperf-general] pg_autovacuum w/ dbt2

From

"Matthew T. O'Connor"

Date:

07 January 2005, 22:51:20

Mark Wong wrote:

>On Fri, Jan 07, 2005 at 10:13:52AM -0800, Josh Berkus wrote:
>  
>
>>Mark,
>>    
>>
>>>All default parameters.  Matthew also recommended using the
>>>vacuum_delay setting so I was about to try that.
>>>      
>>>
>>Interesting ... the default parameters are quite conservative, running only 
>>when the table has doubled in new rows.   So if those spikes are vacuums, 
>>then the DBT2 test is updating or adding 100% of the rows in the orders table 
>>every 6-8 minutes?    Does that seem right?
>>    
>>
>
>Some hand-wavey math here:
>
>We start with 18,000,000 rows in the orders table.
>
>Based one 3281 transactions per minute for just the new-order transaction:
>    288 transactions per minute for delivery
>
>New-order inserts 3281 rows per minute (not counting the 1% of rollbacks.)
>Delivery updates 288 rows per minute in orders.
>
>So no, we're not change that many rows per minute...
>  
>

Except the thing pg_autovacuum is actually doing is Analyzing the 
district table (4800 rows, takes ~15 sec) and vacuuming the warehouse 
table (66,000, takes ~50sec) which takes about 50 seconds.  Since the 
default vacuum sleep time is 5 minutes, that would explain the dip every 
6 minutes.

Matthew

Re: pg_autovacuum w/ dbt2

From

Mark Wong

Date:

12 January 2005, 23:45:10

On Tue, Dec 21, 2004 at 05:56:47PM -0500, Tom Lane wrote:
> If you want to track it yourself, please change those elog(ERROR)s to
> elog(PANIC) so that they'll generate core dumps, then build with
> --enable-debug if you didn't already (--enable-cassert would be good too)
> and get a debugger stack trace from the core dump.

Ok, well I got a core dump with 8.0rc4, but I'm not sure if it's
exactly the same problem.  I have the postgres binary and the core
here:http://developer.osdl.org/markw/pgsql/core/2files.tar.bz2

But it's for ia64, if you got one.  Otherwise, this is what gdb is
telling me with a bt:

(gdb) bt
#0  FunctionCall2 (flinfo=0x6000000000187850, arg1=16, arg2=1043)   at fmgr.c:1141
#1  0x400000000007a320 in _bt_checkkeys (scan=0x60000000000c2d80,    tuple=0x200000000101f660,
dir=ForwardScanDirection,   continuescan=0x60000fffffff8ae0 "\001") at nbtutils.c:542
 
#2  0x4000000000078eb0 in _bt_endpoint (scan=0x6000000000187690,    dir=ForwardScanDirection) at nbtsearch.c:1309
#3  0x40000000000771e0 in _bt_first (scan=0x6000000000187690,    dir=ForwardScanDirection) at nbtsearch.c:482
#4  0x4000000000074350 in btgettuple (fcinfo=0x1) at nbtree.c:265
#5  0x40000000003bd430 in FunctionCall2 (flinfo=0x6000000000187700,    arg1=6917529027642685072, arg2=1) at
fmgr.c:1141
#6  0x400000000006b3a0 in index_getnext (scan=0x6000000000187690,    direction=ForwardScanDirection) at indexam.c:429
#7  0x400000000006a1e0 in systable_getnext (sysscan=0x6000000000187668)   at genam.c:253
#8  0x400000000039c970 in SearchCatCache (cache=0x200000001f1e0140, v1=0,    v2=6917529027641871376, v3=4294966252,
v4=6917546619827097184)  at catcache.c:1217
 
#9  0x40000000003a9ee0 in SearchSysCache (cacheId=33, key1=1043, key2=0,    key3=0, key4=0) at syscache.c:524
#10 0x4000000000049110 in TupleDescInitEntry (desc=0x60000000001872c8,    attributeNumber=4,
attributeName=0x6000000000187614"\023\004",    oidtypeid=1043, typmod=28, attdim=0) at tupdesc.c:444
 
#11 0x40000000001b5fc0 in ExecTypeFromTLInternal (   targetList=0x6000000000135d40, hasoid=-64 'À', skipjunk=1 '\001')
at execTuples.c:570
 
#12 0x40000000001a4a20 in ExecInitJunkFilter (targetList=0x6000000000135b38,    hasoid=-64 'À',
slot=0x60000000001258a0)at execJunk.c:76
 
#13 0x40000000001a6890 in InitPlan (queryDesc=0x6000000000177ed0,    explainOnly=0 '\0') at execMain.c:456
#14 0x40000000001a5800 in ExecutorStart (queryDesc=0x6000000000177ed0,    explainOnly=0 '\0') at execMain.c:160
#15 0x40000000001d6ab0 in _SPI_pquery (queryDesc=0x6000000000177ed0, tcount=0)   at spi.c:1521
#16 0x40000000001d6390 in _SPI_execute_plan (plan=0x60000fffffff9380,    Values=0x0, Nulls=0x0, snapshot=0x0,
crosscheck_snapshot=0x0,   read_only=0 '\0', tcount=0) at spi.c:1452

Re: pg_autovacuum w/ dbt2

From

Tom Lane

Date:

13 January 2005, 02:18:12

Mark Wong <markw@osdl.org> writes:
> Ok, well I got a core dump with 8.0rc4, but I'm not sure if it's
> exactly the same problem.  I have the postgres binary and the core
> here:
>     http://developer.osdl.org/markw/pgsql/core/2files.tar.bz2
> But it's for ia64, if you got one.

Poking around with gdb, it seems that the scankey structure being used
by SearchCatCache got clobbered; which is a bit surprising because
that's just a local variable in that function, and hence isn't really
very exposed.  The contents of cache->cc_skey are okay, but cur_skey[0]
and cur_skey[1] don't match, which implies the clobber happened
somewhere between lines 1110 and 1217 of catcache.c.

(gdb) f 8
#8  0x400000000039c970 in SearchCatCache (cache=0x200000001f1e0140, v1=0,   v2=6917529027641871376, v3=4294966252,
v4=6917546619827097184)  at catcache.c:1217
 
1217    in catcache.c
(gdb) p cache->cc_skey
$7 = {{sk_flags = 0, sk_attno = -2, sk_strategy = 3, sk_subtype = 0,   sk_func = {fn_addr = 0x200000000003a9c8, fn_oid
=184, fn_nargs = 2,     fn_strict = 1 '\001', fn_retset = 0 '\0', fn_extra = 0x0,     fn_mcxt = 0x600000000009e550,
fn_expr= 0x0}, sk_argument = 0}, {   sk_flags = 0, sk_attno = 0, sk_strategy = 0, sk_subtype = 0, sk_func = {
fn_addr= 0, fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\0',     fn_retset = 0 '\0', fn_extra = 0x0, fn_mcxt = 0x0,
fn_expr= 0x0},   sk_argument = 0}, {sk_flags = 0, sk_attno = 0, sk_strategy = 0,   sk_subtype = 0, sk_func = {fn_addr =
0,fn_oid = 0, fn_nargs = 0,     fn_strict = 0 '\0', fn_retset = 0 '\0', fn_extra = 0x0, fn_mcxt = 0x0,     fn_expr =
0x0},sk_argument = 0}, {sk_flags = 0, sk_attno = 0,   sk_strategy = 0, sk_subtype = 0, sk_func = {fn_addr = 0, fn_oid =
0,    fn_nargs = 0, fn_strict = 0 '\0', fn_retset = 0 '\0', fn_extra = 0x0,     fn_mcxt = 0x0, fn_expr = 0x0},
sk_argument= 0}}
 
(gdb) p cur_skey
$8 = {{sk_flags = 0, sk_attno = 1, sk_strategy = 24932, sk_subtype = 24948,   sk_func = {fn_addr = 0, fn_oid = 0,
fn_nargs= 0, fn_strict = 0 '\0',     fn_retset = 0 '\0', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0},   sk_argument =
1043},{sk_flags = 0, sk_attno = 1043, sk_strategy = 0,   sk_subtype = 4294967295, sk_func = {fn_addr = 0, fn_oid = 0,
 fn_nargs = 0, fn_strict = 0 '\0', fn_retset = 0 '\0', fn_extra = 0x0,     fn_mcxt = 0x0, fn_expr = 0x0}, sk_argument =
0},{sk_flags = 0,   sk_attno = 0, sk_strategy = 0, sk_subtype = 0, sk_func = {fn_addr = 0,     fn_oid = 0, fn_nargs =
0,fn_strict = 0 '\0', fn_retset = 0 '\0',     fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0}, sk_argument = 0}, {
sk_flags= 0, sk_attno = 0, sk_strategy = 0, sk_subtype = 0, sk_func = {     fn_addr = 0, fn_oid = 0, fn_nargs = 0,
fn_strict= 0 '\0',     fn_retset = 0 '\0', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0},   sk_argument = 0}}
 

The core dump happens because we eventually try to jump through the
zeroed-out fn_addr function pointer.

Not sure what to make of this.  That's extremely heavily used,
well-debugged code; it's hard to believe that there are any intermittent
bugs in it.

I notice that the backend seems to have been using some nonstandard C
code:

Error while reading shared library symbols:
/home/markw/dbt2/storedproc/pgsql/c/../../../storedproc/pgsql/c/funcs.so: No such file or directory.

What is that, and how much confidence have you got in it?
        regards, tom lane

Re: pg_autovacuum w/ dbt2

From

Mark Wong

Date:

13 January 2005, 07:49:18

On Wed, Jan 12, 2005 at 09:17:33PM -0500, Tom Lane wrote:
> I notice that the backend seems to have been using some nonstandard C
> code:
>
> Error while reading shared library symbols:
> /home/markw/dbt2/storedproc/pgsql/c/../../../storedproc/pgsql/c/funcs.so: No such file or directory.
>
> What is that, and how much confidence have you got in it?

That's my C stored function library.  I'll attached it if anyone wants
to take a persusal.  Well, it was my first attempt with C stored
functions and SPI calls, so it wouldn't surprise me if it was flawed.
Would supplying the .so help the debugging?

Mark

Attachment

funcs.c