Thread: What does this tell me?

What does this tell me?

From
"Josh Berkus"
Date:
Folks,

I'm still having trouble with my massive data transformation procedures
taking forever to finish.   Particularly, many of them will get about
1/2 way through, and then I will start seeing this in the log:

2002-10-08 20:37:13 DEBUG:  recycled transaction log file
0000000A000000E4
2002-10-08 20:37:13 DEBUG:  recycled transaction log file
0000000A000000E5
2002-10-08 20:37:13 DEBUG:  recycled transaction log file
0000000A000000E6
2002-10-08 20:37:13 DEBUG:  recycled transaction log file
0000000A000000E7
2002-10-08 20:37:13 DEBUG:  recycled transaction log file
0000000A000000E8
2002-10-08 20:37:13 DEBUG:  recycled transaction log file
0000000A000000E9
2002-10-08 20:37:13 DEBUG:  recycled transaction log file
0000000A000000EA
2002-10-08 20:37:13 DEBUG:  recycled transaction log file
0000000A000000EB
2002-10-08 20:37:13 DEBUG:  recycled transaction log file
0000000A000000EC
2002-10-08 20:37:13 DEBUG:  recycled transaction log file
0000000A000000ED
2002-10-08 20:37:13 DEBUG:  recycled transaction log file
0000000A000000EE
2002-10-08 20:37:13 DEBUG:  recycled transaction log file
0000000A000000EF
2002-10-08 20:37:13 DEBUG:  reaping dead processes
2002-10-08 20:37:13 DEBUG:  child process (pid 15270) exited with exit
code 0

... repeat ad nauseum.   The problem is, each "recycle transaction log
... reaping dead child process" cycle takes about 4-7 minutes ...
meaning that the procedure can take up to 1/2 hour to finish, and
sometimes not finish at all.

Obviously, the system is telling me that it is running out of resources
somehow.  But I'm at my wit's end to figure out what resources,
exactly.   Suggestions?

-Josh Berkus

Re: What does this tell me?

From
Bruce Momjian
Date:
I think all it means is that is doesn't need some of the pg_clog files
and is reusing them, basically meaning you are pushing through lots of
transactions.  I don't see it as a problem.

---------------------------------------------------------------------------

Josh Berkus wrote:
> Folks,
>
> I'm still having trouble with my massive data transformation procedures
> taking forever to finish.   Particularly, many of them will get about
> 1/2 way through, and then I will start seeing this in the log:
>
> 2002-10-08 20:37:13 DEBUG:  recycled transaction log file
> 0000000A000000E4
> 2002-10-08 20:37:13 DEBUG:  recycled transaction log file
> 0000000A000000E5
> 2002-10-08 20:37:13 DEBUG:  recycled transaction log file
> 0000000A000000E6
> 2002-10-08 20:37:13 DEBUG:  recycled transaction log file
> 0000000A000000E7
> 2002-10-08 20:37:13 DEBUG:  recycled transaction log file
> 0000000A000000E8
> 2002-10-08 20:37:13 DEBUG:  recycled transaction log file
> 0000000A000000E9
> 2002-10-08 20:37:13 DEBUG:  recycled transaction log file
> 0000000A000000EA
> 2002-10-08 20:37:13 DEBUG:  recycled transaction log file
> 0000000A000000EB
> 2002-10-08 20:37:13 DEBUG:  recycled transaction log file
> 0000000A000000EC
> 2002-10-08 20:37:13 DEBUG:  recycled transaction log file
> 0000000A000000ED
> 2002-10-08 20:37:13 DEBUG:  recycled transaction log file
> 0000000A000000EE
> 2002-10-08 20:37:13 DEBUG:  recycled transaction log file
> 0000000A000000EF
> 2002-10-08 20:37:13 DEBUG:  reaping dead processes
> 2002-10-08 20:37:13 DEBUG:  child process (pid 15270) exited with exit
> code 0
>
> ... repeat ad nauseum.   The problem is, each "recycle transaction log
> ... reaping dead child process" cycle takes about 4-7 minutes ...
> meaning that the procedure can take up to 1/2 hour to finish, and
> sometimes not finish at all.
>
> Obviously, the system is telling me that it is running out of resources
> somehow.  But I'm at my wit's end to figure out what resources,
> exactly.   Suggestions?
>
> -Josh Berkus
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: What does this tell me?

From
Sean Chittenden
Date:
> 2002-10-08 20:37:13 DEBUG:  recycled transaction log file
> 0000000A000000E4
> 2002-10-08 20:37:13 DEBUG:  recycled transaction log file
> 0000000A000000E5
> 2002-10-08 20:37:13 DEBUG:  recycled transaction log file
> 0000000A000000E6
> 2002-10-08 20:37:13 DEBUG:  recycled transaction log file
> 0000000A000000E7
> 2002-10-08 20:37:13 DEBUG:  recycled transaction log file
> 0000000A000000E8
> 2002-10-08 20:37:13 DEBUG:  recycled transaction log file
> 0000000A000000E9
> 2002-10-08 20:37:13 DEBUG:  recycled transaction log file
> 0000000A000000EA
> 2002-10-08 20:37:13 DEBUG:  recycled transaction log file
> 0000000A000000EB
> 2002-10-08 20:37:13 DEBUG:  recycled transaction log file
> 0000000A000000EC
> 2002-10-08 20:37:13 DEBUG:  recycled transaction log file
> 0000000A000000ED
> 2002-10-08 20:37:13 DEBUG:  recycled transaction log file
> 0000000A000000EE
> 2002-10-08 20:37:13 DEBUG:  recycled transaction log file
> 0000000A000000EF
> 2002-10-08 20:37:13 DEBUG:  reaping dead processes
> 2002-10-08 20:37:13 DEBUG:  child process (pid 15270) exited with exit
> code 0
>
> ... repeat ad nauseum.  The problem is, each "recycle transaction
> log ... reaping dead child process" cycle takes about 4-7 minutes
> ...  meaning that the procedure can take up to 1/2 hour to finish,
> and sometimes not finish at all.
>
> Obviously, the system is telling me that it is running out of
> resources somehow.  But I'm at my wit's end to figure out what
> resources, exactly.  Suggestions?

You're running out of WAL log space, iirc.  Increase the number of WAL
logs available and you should be okay.  If you're experiencing this
halfway through, I'd increase the size by 50%, say maybe 60-70% for
good measure.  -sc

--
Sean Chittenden

Re: What does this tell me?

From
Bruce Momjian
Date:
Sean Chittenden wrote:
> > 2002-10-08 20:37:13 DEBUG:  recycled transaction log file
> > 0000000A000000E4
> > 2002-10-08 20:37:13 DEBUG:  recycled transaction log file
> > 0000000A000000E5
> > 2002-10-08 20:37:13 DEBUG:  recycled transaction log file
> > 0000000A000000E6
> > 2002-10-08 20:37:13 DEBUG:  recycled transaction log file
> > 0000000A000000E7
> > 2002-10-08 20:37:13 DEBUG:  recycled transaction log file
> > 0000000A000000E8
> > 2002-10-08 20:37:13 DEBUG:  recycled transaction log file
> > 0000000A000000E9
> > 2002-10-08 20:37:13 DEBUG:  recycled transaction log file
> > 0000000A000000EA
> > 2002-10-08 20:37:13 DEBUG:  recycled transaction log file
> > 0000000A000000EB
> > 2002-10-08 20:37:13 DEBUG:  recycled transaction log file
> > 0000000A000000EC
> > 2002-10-08 20:37:13 DEBUG:  recycled transaction log file
> > 0000000A000000ED
> > 2002-10-08 20:37:13 DEBUG:  recycled transaction log file
> > 0000000A000000EE
> > 2002-10-08 20:37:13 DEBUG:  recycled transaction log file
> > 0000000A000000EF
> > 2002-10-08 20:37:13 DEBUG:  reaping dead processes
> > 2002-10-08 20:37:13 DEBUG:  child process (pid 15270) exited with exit
> > code 0
> >
> > ... repeat ad nauseum.  The problem is, each "recycle transaction
> > log ... reaping dead child process" cycle takes about 4-7 minutes
> > ...  meaning that the procedure can take up to 1/2 hour to finish,
> > and sometimes not finish at all.
> >
> > Obviously, the system is telling me that it is running out of
> > resources somehow.  But I'm at my wit's end to figure out what
> > resources, exactly.  Suggestions?
>
> You're running out of WAL log space, iirc.  Increase the number of WAL
> logs available and you should be okay.  If you're experiencing this
> halfway through, I'd increase the size by 50%, say maybe 60-70% for
> good measure.  -sc

Oh, yes, you are right.  My hardware tuning guide mentions it.  Strange
it is called the transaction log file:

    http://www.ca.postgresql.org/docs/momjian/hw_performance/

Unless you are seeing this more freqently than every minute, it should
be fine.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: What does this tell me?

From
"Josh Berkus"
Date:
Bruce, Sean,

> Oh, yes, you are right.  My hardware tuning guide mentions it.
>  Strange
> it is called the transaction log file:
>
>  http://www.ca.postgresql.org/docs/momjian/hw_performance/
>
> Unless you are seeing this more freqently than every minute, it
> should
> be fine.

Actually, it's apparently a real problem, because the function never
completes.   Each cycle of "recycling transaction logs" takes longer
and longer, and eventually locks up completely.

What the function is doing is a succession of data cleanup procedures,
updating the same table about 50 times.  I will be very thankful for
the day when I can commit within a procedure.

Unfortunately, I am already at the maximum number of WAL files (64).
 What do I do now?

-Josh Berkus


Re: What does this tell me?

From
Bruce Momjian
Date:
Josh Berkus wrote:
> Bruce, Sean,
>
> > Oh, yes, you are right.  My hardware tuning guide mentions it.
> >  Strange
> > it is called the transaction log file:
> >
> >  http://www.ca.postgresql.org/docs/momjian/hw_performance/
> >
> > Unless you are seeing this more freqently than every minute, it
> > should
> > be fine.
>
> Actually, it's apparently a real problem, because the function never
> completes.   Each cycle of "recycling transaction logs" takes longer
> and longer, and eventually locks up completely.
>
> What the function is doing is a succession of data cleanup procedures,
> updating the same table about 50 times.  I will be very thankful for
> the day when I can commit within a procedure.
>
> Unfortunately, I am already at the maximum number of WAL files (64).
>  What do I do now?

Wow, that is interesting.  I thought one big transaction wouldn't lock
up the WAL records.  I figured there would be a CHECKPOINT, and then the
WAL records could be recycled, even though the transaction is still
open.

Where do you see 64 as the maximum number of WAL segments.  What is your
checkpoint_segments value?  The actual number of files shouldn't be much
more than twice that value.  What PostgreSQL version are you using?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: What does this tell me?

From
Sean Chittenden
Date:
> > Oh, yes, you are right.  My hardware tuning guide mentions it.
> > Strange it is called the transaction log file:
> >
> >  http://www.ca.postgresql.org/docs/momjian/hw_performance/
> >
> > Unless you are seeing this more freqently than every minute, it
> > should
> > be fine.
>
> Actually, it's apparently a real problem, because the function never
> completes.  Each cycle of "recycling transaction logs" takes longer
> and longer, and eventually locks up completely.
>
> What the function is doing is a succession of data cleanup
> procedures, updating the same table about 50 times.  I will be very
> thankful for the day when I can commit within a procedure.
>
> Unfortunately, I am already at the maximum number of WAL files (64).
> What do I do now?

Isn't it possible to increase the size of your wal logs?  I seem to
remember a tunable existing, but I can't find it in the default
config.  Someone else know how off the top of their head?  -sc

--
Sean Chittenden

Re: What does this tell me?

From
"Josh Berkus"
Date:
Bruce,

> > First, an interesting wierdness from the VACUUM FULL ANALYZE:
> > Analyzing elbs_clidesc
> > 2002-10-08 21:08:08 DEBUG:  SIInsertDataEntry: table is 70% full,
> > signaling postmaster
> >
> > Huh?
>
> Well, you are dealing with elbs.  That is the problem.  ;-)

<grin>  As you probably guessed, the purpose of these procedures is to
take a large amount (about 60mb) of not-normalized data from ELBS and
normalize it for our web-based case management system.

What's really frustrating about it is that we're only going to be doing
this for 2-3 months before we jettison ELBS for reasons that should be
obvious to you.  But for those 2-3 months, the data transfer needs to
work well, and right now it doesn't even finish.

> You shoulnd't need that and it shouldn't lock up when it gets to 64.
>  It
> should checkpoint and move on. The only problem with it being lower
> is
> that it will checkpoint more often.

Well, I'll try 128 and see if that helps any.

>
> > Rest of postgresql.conf params after my signature.   All
> suggestions
> > are welcome.  This server has been acting "sick" since I started
> with
> > it, under-performing my workstation and MS SQL Server.  Either I've
> set
> > something wrong, or there's a hardware problem I need to track
> down.
> >
> > BTW, is there any problem for postgres in turning the fill access
> time
> > recorder in the host filesystem off?  This is often good for a
> minor
> > performance gain.
>
> No problem.
>
> You might want to try pgbench and see if that works.

Yeah.  I was planning on that -- as well as the postgresql.conf tuner
-- as soon as I can get through one data transfer so that I have a
little working time.

-Josh Berkus

Re: What does this tell me?

From
Bruce Momjian
Date:
Josh Berkus wrote:
> > > Rest of postgresql.conf params after my signature.   All
> > suggestions
> > > are welcome.  This server has been acting "sick" since I started
> > with
> > > it, under-performing my workstation and MS SQL Server.  Either I've
> > set
> > > something wrong, or there's a hardware problem I need to track
> > down.
> > >
> > > BTW, is there any problem for postgres in turning the fill access
> > time
> > > recorder in the host filesystem off?  This is often good for a
> > minor
> > > performance gain.
> >
> > No problem.
> >
> > You might want to try pgbench and see if that works.
>
> Yeah.  I was planning on that -- as well as the postgresql.conf tuner
> -- as soon as I can get through one data transfer so that I have a
> little working time.

I was suggesting pgbench because the system should never lock up on you.
Maybe something is very wrong.

What happens if you issue the CHECKPOINT command?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: What does this tell me?

From
Joe Conway
Date:
Josh Berkus wrote:
> What the function is doing is a succession of data cleanup procedures,
> updating the same table about 50 times.  I will be very thankful for
> the day when I can commit within a procedure.

If that's the case, can you split the work up into multiple functions, and
execute them all from a shell script? Or perhaps even offload some of the data
massaging to perl or something? (It would be easier to recommend alternate
approaches with more details.)

Joe



Re: What does this tell me?

From
Tom Lane
Date:
"Josh Berkus" <josh@agliodbs.com> writes:
> Actually, it's apparently a real problem, because the function never
> completes.   Each cycle of "recycling transaction logs" takes longer
> and longer, and eventually locks up completely.
> What the function is doing is a succession of data cleanup procedures,
> updating the same table about 50 times.  I will be very thankful for
> the day when I can commit within a procedure.

I think you are barking up the wrong tree.

The messages you show are perfectly normal operation, and prove nothing
much except that you pumped a lot of database updates through the
system.  I think there's something wrong with your data transformation
application logic; or perhaps you are pumping so many updates through
your tables that you need some intermediate VACUUMs to get rid of
dead tuples.  But messing with the WAL log parameters isn't going to
do a darn thing for you ... IMHO anyway.

            regards, tom lane

Re: What does this tell me?

From
"Josh Berkus"
Date:
Joe,

> If that's the case, can you split the work up into multiple
> functions, and execute them all from a shell script? Or perhaps even
> offload some of the data massaging to perl or something? (It would be
> easier to recommend alternate approaches with more details.)

I've already split it up into 11 functions, which are being managed
through Perl with ANALYZE statements between.   Breaking it down
further would be really unmanageable.

Not to be mean or anything (after all, I just joined pgsql-advocacy),
I'm getting *much* worse performance on large data transformations from
PostgreSQL 7.2.1, than I get from SQL Server 7.0 on inferior hardware
(at least, except where SQL Server 7.0 crashes). I really am determined
to prove that it's because I've misconfigured it, and I thank all of
you for your help in doing so.

PGBench Results:
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 100
number of transactions per client: 10
number of transactions actually processed: 1000/1000
tps = 93.206356(including connections establishing)
tps = 103.237007(excluding connections establishing)

Of course, I don't have much to compare these to, so I don't know if
that's good or bad.

-Josh Berkus



Re: What does this tell me?

From
Robert Treat
Date:
On Wed, 2002-10-09 at 01:22, Josh Berkus wrote:
> Joe,
>
> > If that's the case, can you split the work up into multiple
> > functions, and execute them all from a shell script? Or perhaps even
> > offload some of the data massaging to perl or something? (It would be
> > easier to recommend alternate approaches with more details.)
>
> I've already split it up into 11 functions, which are being managed
> through Perl with ANALYZE statements between.   Breaking it down
> further would be really unmanageable.
>

If I read Tom's suggestion correctly, you should probably change these
to vacuum analyze instead of analyze.

> Not to be mean or anything (after all, I just joined pgsql-advocacy),
> I'm getting *much* worse performance on large data transformations from
> PostgreSQL 7.2.1, than I get from SQL Server 7.0 on inferior hardware
> (at least, except where SQL Server 7.0 crashes).

what?? that's blasphamy!! revoke this mans advocacy membership right
now!!  ;-)


I really am determined
> to prove that it's because I've misconfigured it, and I thank all of
> you for your help in doing so.
>

FWIW I just ran into a similar situation where I was doing 6
simultaneous pg_restores of our production database on my local
workstation.  Apparently this pumps a lot of data through the wal logs.
I did kick up the number of wal files, but I also ended up kicking up
the number of wal_buffers as well and that seemed to help.

Robert Treat