Thread: Curious about dead rows.

Curious about dead rows.

From
Jean-David Beyer
Date:
I am doing lots of INSERTs on a table that starts out empty (I did a
TRUNCATE on it). I am not, AFAIK, doing DELETEs or UPDATEs. Autovacuum is
on. I moved logging up to debug2 level to see what was going on, and I get
things like this:

 "vl_as": scanned 3000 of 5296 pages, containing 232944 live rows and 1033
dead rows; 3000 rows in sample, 411224 estimated total rows

A little later, it says:

"vl_as": scanned 3000 of 6916 pages, containing 233507 live rows and 493
dead rows; 3000 rows in sample, 538311 estimated total rows

(I suppose that means autovacuum is working.) Is this normal, or have I got
something wrong? Why so many dead rows when just doing inserts? It is not
that I think the number is too high, considering the number of rows in the
table at the point where I copied this line. It is just that I do not
understand why there are any.

I could easily understand it if I were doing UPDATEs.

postgresql-8.1.9-1.el5

--
  .~.  Jean-David Beyer          Registered Linux User 85642.
  /V\  PGP-Key: 9A2FC99A         Registered Machine   241939.
 /( )\ Shrewsbury, New Jersey    http://counter.li.org
 ^^-^^ 11:15:01 up 18 days, 4:33, 4 users, load average: 6.18, 5.76, 5.26

Re: Curious about dead rows.

From
Tom Lane
Date:
Jean-David Beyer <jeandavid8@verizon.net> writes:
> I am doing lots of INSERTs on a table that starts out empty (I did a
> TRUNCATE on it). I am not, AFAIK, doing DELETEs or UPDATEs. Autovacuum is
> on. I moved logging up to debug2 level to see what was going on, and I get
> things like this:

>  "vl_as": scanned 3000 of 5296 pages, containing 232944 live rows and 1033
> dead rows; 3000 rows in sample, 411224 estimated total rows

> A little later, it says:

> "vl_as": scanned 3000 of 6916 pages, containing 233507 live rows and 493
> dead rows; 3000 rows in sample, 538311 estimated total rows

Well, *something* is doing deletes or updates in that table.  Better
look a bit harder at your application ...

            regards, tom lane

Re: Curious about dead rows.

From
Jean-David Beyer
Date:
Tom Lane wrote:
> Jean-David Beyer <jeandavid8@verizon.net> writes:
>> I am doing lots of INSERTs on a table that starts out empty (I did a
>> TRUNCATE on it). I am not, AFAIK, doing DELETEs or UPDATEs. Autovacuum is
>> on. I moved logging up to debug2 level to see what was going on, and I get
>> things like this:
>
>>  "vl_as": scanned 3000 of 5296 pages, containing 232944 live rows and 1033
>> dead rows; 3000 rows in sample, 411224 estimated total rows
>
>> A little later, it says:
>
>> "vl_as": scanned 3000 of 6916 pages, containing 233507 live rows and 493
>> dead rows; 3000 rows in sample, 538311 estimated total rows
>
> Well, *something* is doing deletes or updates in that table.  Better
> look a bit harder at your application ...
>
OK, you agree that if I am doing only INSERTs, that there should not be any
dead rows. Therefore, I _must_ be doing deletes or updates.

But the program is pretty simple, and I see no UPDATEs or DELETEs. I
searched all the program source files (that contain none of them) and all
the libraries I have written, and they have none either. Right now the
programs are not to the state where UPDATEs or DELETEs are required (though
they will be later). I am still developing them and it is easier to just
restore from backup or start over from the beginning since most of the
changes are data laundering from an ever-increasing number of spreadsheets.

Am I right that TRUNCATE deletes all the rows of a table. They may then be
still there, but would not autovacuum clean out the dead rows? Or maybe it
has not gotten to them yet? I could do an explicit one earlier.

--
  .~.  Jean-David Beyer          Registered Linux User 85642.
  /V\  PGP-Key: 9A2FC99A         Registered Machine   241939.
 /( )\ Shrewsbury, New Jersey    http://counter.li.org
 ^^-^^ 13:10:01 up 18 days, 6:28, 7 users, load average: 4.46, 4.34, 4.23

Re: Curious about dead rows.

From
"Joshua D. Drake"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sat, 10 Nov 2007 13:38:23 -0500
Jean-David Beyer <jeandavid8@verizon.net> wrote:

> Tom Lane wrote:
> > Jean-David Beyer <jeandavid8@verizon.net> writes:
> >> I am doing lots of INSERTs on a table that starts out empty (I did
> >> a TRUNCATE on it). I am not, AFAIK, doing DELETEs or UPDATEs.
> >> Autovacuum is on. I moved logging up to debug2 level to see what
> >> was going on, and I get things like this:
> > 
> >>  "vl_as": scanned 3000 of 5296 pages, containing 232944 live rows
> >> and 1033 dead rows; 3000 rows in sample, 411224 estimated total
> >> rows
> > 
> >> A little later, it says:
> > 
> >> "vl_as": scanned 3000 of 6916 pages, containing 233507 live rows
> >> and 493 dead rows; 3000 rows in sample, 538311 estimated total rows
> > 
> > Well, *something* is doing deletes or updates in that table.  Better
> > look a bit harder at your application ...
> > 
> OK, you agree that if I am doing only INSERTs, that there should not
> be any dead rows. Therefore, I _must_ be doing deletes or updates.
> 
> But the program is pretty simple, and I see no UPDATEs or DELETEs. I
> searched all the program source files (that contain none of them) and
> all the libraries I have written, and they have none either. Right
> now the programs are not to the state where UPDATEs or DELETEs are
> required (though they will be later). I am still developing them and
> it is easier to just restore from backup or start over from the
> beginning since most of the changes are data laundering from an
> ever-increasing number of spreadsheets.
> 
> Am I right that TRUNCATE deletes all the rows of a table. They may
> then be still there, but would not autovacuum clean out the dead
> rows? Or maybe it has not gotten to them yet? I could do an explicit
> one earlier.

Truncate will not create dead rows. However ROLLBACK will. Are you
getting any duplicate key errors or anything like that when you insert?

Sincerely,

Joshua D. Drake

 


- -- 

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
            UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHNf2pATb/zqfZUUQRApYEAKCWp107koBhpWQbMjwLybBB6SvDmQCgj8Q6
kPAE4qe1fT6RNbFtqlIw52M=
=/5us
-----END PGP SIGNATURE-----

Re: Curious about dead rows.

From
Jean-David Beyer
Date:
Joshua D. Drake wrote:
> On Sat, 10 Nov 2007 13:38:23 -0500 Jean-David Beyer
> <jeandavid8@verizon.net> wrote:
>
>>> Tom Lane wrote:
>>>> Jean-David Beyer <jeandavid8@verizon.net> writes:
>>>>> I am doing lots of INSERTs on a table that starts out empty (I
>>>>> did a TRUNCATE on it). I am not, AFAIK, doing DELETEs or UPDATEs.
>>>>>  Autovacuum is on. I moved logging up to debug2 level to see what
>>>>>  was going on, and I get things like this: "vl_as": scanned 3000
>>>>> of 5296 pages, containing 232944 live rows and 1033 dead rows;
>>>>> 3000 rows in sample, 411224 estimated total rows A little later,
>>>>> it says: "vl_as": scanned 3000 of 6916 pages, containing 233507
>>>>> live rows and 493 dead rows; 3000 rows in sample, 538311
>>>>> estimated total rows
>>>> Well, *something* is doing deletes or updates in that table.
>>>> Better look a bit harder at your application ...
>>>>
>>> OK, you agree that if I am doing only INSERTs, that there should not
>>> be any dead rows. Therefore, I _must_ be doing deletes or updates.
>>>
>>> But the program is pretty simple, and I see no UPDATEs or DELETEs. I
>>> searched all the program source files (that contain none of them) and
>>>  all the libraries I have written, and they have none either. Right
>>> now the programs are not to the state where UPDATEs or DELETEs are
>>> required (though they will be later). I am still developing them and
>>> it is easier to just restore from backup or start over from the
>>> beginning since most of the changes are data laundering from an
>>> ever-increasing number of spreadsheets.
>>>
>>> Am I right that TRUNCATE deletes all the rows of a table. They may
>>> then be still there, but would not autovacuum clean out the dead
>>> rows? Or maybe it has not gotten to them yet? I could do an explicit
>>> one earlier.
>
> Truncate will not create dead rows. However ROLLBACK will. Are you
> getting any duplicate key errors or anything like that when you insert?
>
On the mistaken assumption that TRUNCATE left dead rows, I did a
VACUUM FULL ANALYZE before running the program full of INSERTs. This did not
make any difference.

As far as ROLLBACK are concerned, every one is immediately preceded by a
message output to the standard error file, and no such messages are produced.

--
  .~.  Jean-David Beyer          Registered Linux User 85642.
  /V\  PGP-Key: 9A2FC99A         Registered Machine   241939.
 /( )\ Shrewsbury, New Jersey    http://counter.li.org
 ^^-^^ 14:50:01 up 18 days, 8:08, 5 users, load average: 5.23, 5.35, 5.34

Re: Curious about dead rows.

From
"Scott Marlowe"
Date:
On Nov 10, 2007 1:57 PM, Jean-David Beyer <jeandavid8@verizon.net> wrote:
>
> Joshua D. Drake wrote:
> >
> > Truncate will not create dead rows. However ROLLBACK will. Are you
> > getting any duplicate key errors or anything like that when you insert?
> >
> On the mistaken assumption that TRUNCATE left dead rows, I did a
> VACUUM FULL ANALYZE before running the program full of INSERTs. This did not
> make any difference.
>
> As far as ROLLBACK are concerned, every one is immediately preceded by a
> message output to the standard error file, and no such messages are produced.

So, there are NO failed inserts, and no updates?  Cause that's what
I'd expect to create the dead rows.

Re: Curious about dead rows.

From
Jean-David Beyer
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Scott Marlowe wrote:
> On Nov 10, 2007 1:57 PM, Jean-David Beyer <jeandavid8@verizon.net> wrote:
>> Joshua D. Drake wrote:
>>> Truncate will not create dead rows. However ROLLBACK will. Are you
>>> getting any duplicate key errors or anything like that when you insert?
>>>
>> On the mistaken assumption that TRUNCATE left dead rows, I did a
>> VACUUM FULL ANALYZE before running the program full of INSERTs. This did not
>> make any difference.
>>
>> As far as ROLLBACK are concerned, every one is immediately preceded by a
>> message output to the standard error file, and no such messages are produced.
>
> So, there are NO failed inserts, and no updates?  Cause that's what
> I'd expect to create the dead rows.
>
So would I. Hence the original question.

- --
  .~.  Jean-David Beyer          Registered Linux User 85642.
  /V\  PGP-Key: 9A2FC99A         Registered Machine   241939.
 /( )\ Shrewsbury, New Jersey    http://counter.li.org
 ^^-^^ 21:20:01 up 18 days, 14:38, 0 users, load average: 4.38, 4.40, 4.31
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with CentOS - http://enigmail.mozdev.org

iD8DBQFHNmeBPtu2XpovyZoRAqxzAJ9wLNf7Y9egSd/COtMjWaqKWfJXowCfdDj7
HEulOz8v4DKtAqWCGTf/22Y=
=79AU
-----END PGP SIGNATURE-----

Re: Curious about dead rows.

From
Andrew Sullivan
Date:
On Sat, Nov 10, 2007 at 09:22:58PM -0500, Jean-David Beyer wrote:
> >
> > So, there are NO failed inserts, and no updates?  Cause that's what
> > I'd expect to create the dead rows.
> >
> So would I. Hence the original question.

Foreign keys with cascading deletes or updates?

A

--
Andrew Sullivan
Old sigs will return after re-constitution of blue smoke

Re: Curious about dead rows.

From
Andrew Sullivan
Date:
Please don't drop the list, as someone else may see something.

On Tue, Nov 13, 2007 at 10:06:13AM -0500, Jean-David Beyer wrote:
> OK. I turned logging from "none" to "mod" and got a gawdawful lot of stuff.

Yes.

> Then I ran it and got all the inserts. Using
> grep -i delete file
> grep -i update file
> grep -i rollback file

How about ERROR?

> 2007-11-13 08:11:20 EST DEBUG:  "vl_ranks": scanned 540 of 540 pages,
> containing 67945 live rows and 554 dead rows; 3000 rows in sample, 67945
> estimated total rows

If there are dead rows, something is producing them.  Either INSERT is
firing a trigger that is doing something there (you won't see an UPDATE in
that case), or else something else is causing INSERTs to fail.

A

--
Andrew Sullivan
Old sigs will return after re-constitution of blue smoke

Re: Curious about dead rows.

From
Jean-David Beyer
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Sullivan wrote:
> Please don't drop the list, as someone else may see something.
>
> On Tue, Nov 13, 2007 at 10:06:13AM -0500, Jean-David Beyer wrote:
>> OK. I turned logging from "none" to "mod" and got a gawdawful lot of stuff.
>
> Yes.
>
>> Then I ran it and got all the inserts. Using
>> grep -i delete file
>> grep -i update file
>> grep -i rollback file
>
> How about ERROR?

$ grep -i error Tue.log
$
>
>> 2007-11-13 08:11:20 EST DEBUG:  "vl_ranks": scanned 540 of 540 pages,
>> containing 67945 live rows and 554 dead rows; 3000 rows in sample, 67945
>> estimated total rows
>
> If there are dead rows, something is producing them.  Either INSERT is
> firing a trigger that is doing something there (you won't see an UPDATE in
> that case), or else something else is causing INSERTs to fail.

I have no triggers in that database. I do have two sequences.

                  List of relations
 Schema |          Name          |   Type   |  Owner
- --------+------------------------+----------+---------
 public | company_company_id_seq | sequence | jdbeyer
 public | source_source_id_seq   | sequence | jdbeyer

stock=> \d company_company_id_seq
Sequence "public.company_company_id_seq"
    Column     |  Type
- ---------------+---------
 sequence_name | name
 last_value    | bigint
 increment_by  | bigint
 max_value     | bigint
 min_value     | bigint
 cache_value   | bigint
 log_cnt       | bigint
 is_cycled     | boolean
 is_called     | boolean

stock=> \d source_source_id_seq
Sequence "public.source_source_id_seq"
    Column     |  Type
- ---------------+---------
 sequence_name | name
 last_value    | bigint
 increment_by  | bigint
 max_value     | bigint
 min_value     | bigint
 cache_value   | bigint
 log_cnt       | bigint
 is_cycled     | boolean
 is_called     | boolean

but they are not used after the last VACUUM FULL ANALYZE


- --
  .~.  Jean-David Beyer          Registered Linux User 85642.
  /V\  PGP-Key: 9A2FC99A         Registered Machine   241939.
 /( )\ Shrewsbury, New Jersey    http://counter.li.org
 ^^-^^ 14:40:01 up 21 days, 7:58, 2 users, load average: 4.33, 4.43, 4.39
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with CentOS - http://enigmail.mozdev.org

iD8DBQFHOgAiPtu2XpovyZoRApmZAKDH2JaSlxH+DT1rs8E110P9L4r5+ACZAYGY
z2SQtUvRDHlpCwePE2cskX4=
=xS8V
-----END PGP SIGNATURE-----

Re: Curious about dead rows.

From
Andrew Sullivan
Date:
On Tue, Nov 13, 2007 at 02:50:59PM -0500, Jean-David Beyer wrote:
> > How about ERROR?
>
> $ grep -i error Tue.log
> $

Well, without actually logging into the machine and looking at the
application, I confess I am stumped.  Oh, wait.  You do have the log level
high enough that you should see errors in the log, right?  That's not
controlled by the statement parameter.

> I have no triggers in that database. I do have two sequences.

Sequences should not produce any dead rows on the table, unless they're used
as keys and you're attempting inserts that conflict with used sequence
values.  That should cause errors that you'd get in the log, presuming that
you have the log level set correctly.

A


--
Andrew Sullivan
Old sigs will return after re-constitution of blue smoke

Re: Curious about dead rows.

From
Andrew Sullivan
Date:
I'm not a private support organisation; please send your replies to the
list, not me.

On Tue, Nov 13, 2007 at 04:57:23PM -0500, Jean-David Beyer wrote:
> What is it controlled by? The following are the non-default values in
> postgresql.conf:
>
> redirect_stderr = on
> log_directory = '/srv/dbms/dataB/pgsql/pg_log'
> log_filename = 'postgresql-%a.log'
> log_truncate_on_rotation = on
> log_rotation_age = 1440
> log_rotation_size = 0
> log_min_messages = debug2

This will certainly include error messages, then.  Or it ought to.  You do
see errors in the log when you create one, right?  (Try causing an error in
psql to make sure.)

> log_line_prefix = '%t '
> log_statement = 'none'   (this was 'mod', but it uses too much
>                           disk to leave it turned on -- only
>                           4 GBytes in that partition)
>
> >
> They are; they are the primary keys of two tables. But those are all done
> before the last VACUUM FULL ANALYZE runs, so the dead rows should have been
> eliminated. And the output of the sequence is the only way of generating a
> primary key, so it should be impossible anyhow.

I thought you were doing INSERTs?  It's not true that the output of the
sequence is the only way -- if you insert directly, it will happily insert
into that column.  But it should cause an error to show in the log, which is
what's puzzling me.

A

--
Andrew Sullivan
Old sigs will return after re-constitution of blue smoke

Re: Curious about dead rows.

From
Jean-David Beyer
Date:
Andrew Sullivan wrote:
> I'm not a private support organisation; please send your replies to the
> list, not me.

Sorry. Most of the lists I send to have ReplyTo set, but a few do not.
And then I forget.
>
> On Tue, Nov 13, 2007 at 04:57:23PM -0500, Jean-David Beyer wrote:
>> What is it controlled by? The following are the non-default values in
>> postgresql.conf:
>>
>> redirect_stderr = on
>> log_directory = '/srv/dbms/dataB/pgsql/pg_log'
>> log_filename = 'postgresql-%a.log'
>> log_truncate_on_rotation = on
>> log_rotation_age = 1440
>> log_rotation_size = 0
>> log_min_messages = debug2
>
> This will certainly include error messages, then.  Or it ought to.  You do
> see errors in the log when you create one, right?  (Try causing an error in
> psql to make sure.)
Right: I do see an error message when I try to insert a duplicate entry. It
happens to violate the (company_name, company_permno) uniqueness constraint.

2007-11-13 17:58:30 EST ERROR:  duplicate key violates unique constraint
"company_name_x"

(I tried to insert a duplicate entry in the company_name field of relation
_company_. company_name_x is defined as:
"company_name_x" UNIQUE, btree (company_name, company_permno), tablespace
"stockd" )
>
>> log_line_prefix = '%t '
>> log_statement = 'none'   (this was 'mod', but it uses too much
>>                           disk to leave it turned on -- only
>>                           4 GBytes in that partition)
>>
>> They are; they are the primary keys of two tables. But those are all done
>> before the last VACUUM FULL ANALYZE runs, so the dead rows should have been
>> eliminated. And the output of the sequence is the only way of generating a
>> primary key, so it should be impossible anyhow.
>
> I thought you were doing INSERTs?

Yes.

> It's not true that the output of the
> sequence is the only way -- if you insert directly, it will happily insert
> into that column.

Yes, but I get those keys from a sequence only. I never enter them manually
or from a data file.

> But it should cause an error to show in the log, which is
> what's puzzling me.
>
Me too.


--
  .~.  Jean-David Beyer          Registered Linux User 85642.
  /V\  PGP-Key: 9A2FC99A         Registered Machine   241939.
 /( )\ Shrewsbury, New Jersey    http://counter.li.org
 ^^-^^ 17:50:01 up 21 days, 11:08, 4 users, load average: 5.12, 4.77, 4.68

Re: Curious about dead rows.

From
Alvaro Herrera
Date:
Jean-David Beyer wrote:
> Andrew Sullivan wrote:
> > I'm not a private support organisation; please send your replies to the
> > list, not me.
>
> Sorry. Most of the lists I send to have ReplyTo set, but a few do not.
> And then I forget.

If you use "reply to all", it works wonderfully in both cases.
(Actually it works even when you're not using mailing lists at all).


--
Alvaro Herrera                 http://www.amazon.com/gp/registry/CTMLCN8V17R4
"If it wasn't for my companion, I believe I'd be having
the time of my life"  (John Dunbar)

Re: Curious about dead rows.

From
"Trevor Talbot"
Date:
On 11/13/07, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Jean-David Beyer wrote:
> > Andrew Sullivan wrote:
> > > I'm not a private support organisation; please send your replies to the
> > > list, not me.
> >
> > Sorry. Most of the lists I send to have ReplyTo set, but a few do not.
> > And then I forget.
>
> If you use "reply to all", it works wonderfully in both cases.

Then it upsets the people who don't want to get private copies, only
list copies, on most of the Reply-To lists.

There's no winning :(

Re: Curious about dead rows.

From
Alvaro Herrera
Date:
Trevor Talbot escribió:
> On 11/13/07, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > Jean-David Beyer wrote:
> > > Andrew Sullivan wrote:
> > > > I'm not a private support organisation; please send your replies to the
> > > > list, not me.
> > >
> > > Sorry. Most of the lists I send to have ReplyTo set, but a few do not.
> > > And then I forget.
> >
> > If you use "reply to all", it works wonderfully in both cases.
>
> Then it upsets the people who don't want to get private copies, only
> list copies, on most of the Reply-To lists.
>
> There's no winning :(

I am on a couple of mailing lists with Reply-To set, and what my MUA
does is put only the list on the To:, so there is no extra private copy.
I use "reply-to-group" all the time and it works perfectly well.

--
Alvaro Herrera                         http://www.flickr.com/photos/alvherre/
"PHP is what I call the "Dumb Monkey" language. [A]ny dumb monkey can code
something in PHP. Python takes actual thought to produce something useful."
                                                               (J. Drake)

Re: Curious about dead rows.

From
Ansgar -59cobalt- Wiechers
Date:
On 2007-11-13 Trevor Talbot wrote:
> On 11/13/07, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>> Jean-David Beyer wrote:
>>> Sorry. Most of the lists I send to have ReplyTo set, but a few do
>>> not. And then I forget.
>>
>> If you use "reply to all", it works wonderfully in both cases.
>
> Then it upsets the people who don't want to get private copies, only
> list copies, on most of the Reply-To lists.
>
> There's no winning :(

Unless you use a mailer that supports Reply, Group-Reply, *and*
List-Reply. ;)

Regards
Ansgar Wiechers
--
"The Mac OS X kernel should never panic because, when it does, it
seriously inconveniences the user."
--http://developer.apple.com/technotes/tn2004/tn2118.html

Re: Curious about dead rows.

From
"Merlin Moncure"
Date:
On Nov 10, 2007 1:38 PM, Jean-David Beyer <jeandavid8@verizon.net> wrote:
> Tom Lane wrote:
> > Jean-David Beyer <jeandavid8@verizon.net> writes:
> >> I am doing lots of INSERTs on a table that starts out empty (I did a
> >> TRUNCATE on it). I am not, AFAIK, doing DELETEs or UPDATEs. Autovacuum is
> >> on. I moved logging up to debug2 level to see what was going on, and I get
> >> things like this:
> >
> >>  "vl_as": scanned 3000 of 5296 pages, containing 232944 live rows and 1033
> >> dead rows; 3000 rows in sample, 411224 estimated total rows
> >
> >> A little later, it says:
> >
> >> "vl_as": scanned 3000 of 6916 pages, containing 233507 live rows and 493
> >> dead rows; 3000 rows in sample, 538311 estimated total rows
> >
> > Well, *something* is doing deletes or updates in that table.  Better
> > look a bit harder at your application ...
> >
> OK, you agree that if I am doing only INSERTs, that there should not be any
> dead rows. Therefore, I _must_ be doing deletes or updates.
>
> But the program is pretty simple, and I see no UPDATEs or DELETEs. I
> searched all the program source files (that contain none of them) and all
> the libraries I have written, and they have none either. Right now the
> programs are not to the state where UPDATEs or DELETEs are required (though
> they will be later). I am still developing them and it is easier to just
> restore from backup or start over from the beginning since most of the
> changes are data laundering from an ever-increasing number of spreadsheets.
>
> Am I right that TRUNCATE deletes all the rows of a table. They may then be
> still there, but would not autovacuum clean out the dead rows? Or maybe it
> has not gotten to them yet? I could do an explicit one earlier.

what does pg_stat_all_tables say (assuming row level stats are on)?

merlin

Re: Curious about dead rows.

From
Jean-David Beyer
Date:
Merlin Moncure wrote:
> On Nov 10, 2007 1:38 PM, Jean-David Beyer <jeandavid8@verizon.net> wrote:
>> Tom Lane wrote:
>>> Jean-David Beyer <jeandavid8@verizon.net> writes:
>>>> I am doing lots of INSERTs on a table that starts out empty (I did a
>>>> TRUNCATE on it). I am not, AFAIK, doing DELETEs or UPDATEs. Autovacuum is
>>>> on. I moved logging up to debug2 level to see what was going on, and I get
>>>> things like this:
>>>>  "vl_as": scanned 3000 of 5296 pages, containing 232944 live rows and 1033
>>>> dead rows; 3000 rows in sample, 411224 estimated total rows
>>>> A little later, it says:
>>>> "vl_as": scanned 3000 of 6916 pages, containing 233507 live rows and 493
>>>> dead rows; 3000 rows in sample, 538311 estimated total rows
>>> Well, *something* is doing deletes or updates in that table.  Better
>>> look a bit harder at your application ...
>>>
>> OK, you agree that if I am doing only INSERTs, that there should not be any
>> dead rows. Therefore, I _must_ be doing deletes or updates.
>>
>> But the program is pretty simple, and I see no UPDATEs or DELETEs. I
>> searched all the program source files (that contain none of them) and all
>> the libraries I have written, and they have none either. Right now the
>> programs are not to the state where UPDATEs or DELETEs are required (though
>> they will be later). I am still developing them and it is easier to just
>> restore from backup or start over from the beginning since most of the
>> changes are data laundering from an ever-increasing number of spreadsheets.
>>
>> Am I right that TRUNCATE deletes all the rows of a table. They may then be
>> still there, but would not autovacuum clean out the dead rows? Or maybe it
>> has not gotten to them yet? I could do an explicit one earlier.
>
> what does pg_stat_all_tables say (assuming row level stats are on)?

# - Query/Index Statistics Collector -

#stats_start_collector = on
stats_start_collector = on

#stats_command_string = off
#stats_block_level = off

#stats_row_level = off
stats_row_level = on

#stats_reset_on_server_start = off

>
It says stuff like this:

 relname  | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins |
n_tup_upd | n_tup_del
----------+----------+--------------+----------+---------------+-----------+-
 ibd      |       75 |      9503850 |       11 |       2350555 |   2416845 |
        0 |         0
 vl_cf    |      139 |     38722575 |       22 |       5392609 |   5692814 |
        0 |         0
 vl_li    |      139 |     39992838 |       22 |       5569855 |   5885516 |
        0 |         0

I removed the relid and schemaname and squeezed the other columns so it
would not be quite so wide. Is this what you might like to know?

--
  .~.  Jean-David Beyer          Registered Linux User 85642.
  /V\  PGP-Key: 9A2FC99A         Registered Machine   241939.
 /( )\ Shrewsbury, New Jersey    http://counter.li.org
 ^^-^^ 21:10:01 up 21 days, 14:28, 3 users, load average: 6.20, 5.69, 5.11

Re: Curious about dead rows.

From
"Merlin Moncure"
Date:
On Nov 13, 2007 9:26 PM, Jean-David Beyer <jeandavid8@verizon.net> wrote:
> Merlin Moncure wrote:
> > what does pg_stat_all_tables say (assuming row level stats are on)?
> It says stuff like this:
>
>  relname  | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins |
> n_tup_upd | n_tup_del
> ----------+----------+--------------+----------+---------------+-----------+-
>  ibd      |       75 |      9503850 |       11 |       2350555 |   2416845 |
>         0 |         0
>  vl_cf    |      139 |     38722575 |       22 |       5392609 |   5692814 |
>         0 |         0
>  vl_li    |      139 |     39992838 |       22 |       5569855 |   5885516 |
>         0 |         0
>
> I removed the relid and schemaname and squeezed the other columns so it
> would not be quite so wide. Is this what you might like to know?

it tells me that you aren't crazy, and that rollbacks are the likely
the cause, although you appear to be watching the logs pretty
carefully.  you can check pg_stat_database to confirm if your
rollbacks are in line with your expectations.  or, you might by seeing
some corner case conditions...are any fields in the table foreign
keyed to another table (cascading update/delete)?  do you have any
functions with handled exceptions or savepoints? (I'm guessing no to
the latter).

merlin

Re: Curious about dead rows.

From
Jean-David Beyer
Date:
Merlin Moncure wrote:
> On Nov 13, 2007 9:26 PM, Jean-David Beyer <jeandavid8@verizon.net> wrote:
>> Merlin Moncure wrote:
>>> what does pg_stat_all_tables say (assuming row level stats are on)?
>> It says stuff like this:
>>
>>  relname  | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins |
>> n_tup_upd | n_tup_del
>> ----------+----------+--------------+----------+---------------+-----------+-
>>  ibd      |       75 |      9503850 |       11 |       2350555 |   2416845 |
>>         0 |         0
>>  vl_cf    |      139 |     38722575 |       22 |       5392609 |   5692814 |
>>         0 |         0
>>  vl_li    |      139 |     39992838 |       22 |       5569855 |   5885516 |
>>         0 |         0
>>
>> I removed the relid and schemaname and squeezed the other columns so it
>> would not be quite so wide. Is this what you might like to know?
>
> it tells me that you aren't crazy, and that rollbacks are the likely
> the cause, although you appear to be watching the logs pretty
> carefully.  you can check pg_stat_database to confirm if your
> rollbacks are in line with your expectations.  or, you might by seeing
> some corner case conditions...are any fields in the table foreign
> keyed to another table (cascading update/delete)?  do you have any
> functions with handled exceptions or savepoints? (I'm guessing no to
> the latter).
>
How do I reset the counters in pg_stat_database and pg_stat_all_tables?
I tried just restarting postgres, but it seems to be saved in the database,
not just in the RAM of the server.

Right now I am getting:

stock=> SELECT * FROM pg_stat_database;
 datid |  datname  | numbackends | xact_commit | xact_rollback | blks_read |
blks_hit
-------+-----------+-------------+-------------+---------------+-----------+----------

 16402 | stock     |           1 |   261428429 |       3079861 |         0 |
       0
(4 rows)

I just watched these as the loading program runs, and I can account for all
the new rollbacks, that come after the dead rows are found.

I suppose that blks_read and blks_hit are zero because there are 8 GBytes
RAM on this machine and I give 2GBytes to shared_buffers = 253000 so that
all sits in RAM.

I know there have been rollbacks but I do a REINDEX, CLUSTER, and VACUUM
ANALYZE before starting the inserts in question. Do I need to do a VACUUM
FULL ANALYZE instead?

When there were errors in the input data, the program just rolls back the
transaction and gives up on that input file. (The program processes hundreds
of input files and I get an additional input file each week. I then correct
the error in the input file and start over. I do not do updates because the
input file needs to be corrected anyhow. and the easiest way to check it is
to load it into the database and let the loader programs check it.)

Keeping things in perspective, the autovacuum gets these eventually, and I
do not think it is really hurting performance all that much. But I would
like to understand what is going on.

--
  .~.  Jean-David Beyer          Registered Linux User 85642.
  /V\  PGP-Key: 9A2FC99A         Registered Machine   241939.
 /( )\ Shrewsbury, New Jersey    http://counter.li.org
 ^^-^^ 06:25:01 up 21 days, 23:43, 0 users, load average: 4.02, 4.01, 4.00

Re: Curious about dead rows.

From
Alvaro Herrera
Date:
Jean-David Beyer wrote:

> How do I reset the counters in pg_stat_database and pg_stat_all_tables?
> I tried just restarting postgres, but it seems to be saved in the database,
> not just in the RAM of the server.

There is a function called pg_stat_reset() or some such.

> I suppose that blks_read and blks_hit are zero because there are 8 GBytes
> RAM on this machine and I give 2GBytes to shared_buffers = 253000 so that
> all sits in RAM.

Perhaps you have stats_block_level set to off?

> I know there have been rollbacks but I do a REINDEX, CLUSTER, and VACUUM
> ANALYZE before starting the inserts in question.

You do all three on the same tables?  That seems pretty pointless.  A
sole CLUSTER has the same effect.

> Do I need to do a VACUUM FULL ANALYZE instead?

No.

--
Alvaro Herrera       Valdivia, Chile   ICBM: S 39º 49' 18.1", W 73º 13' 56.4"
"There was no reply" (Kernel Traffic)

Re: Curious about dead rows.

From
Jean-David Beyer
Date:
Alvaro Herrera wrote:
> Jean-David Beyer wrote:
>
>> How do I reset the counters in pg_stat_database and pg_stat_all_tables?
>> I tried just restarting postgres, but it seems to be saved in the database,
>> not just in the RAM of the server.
>
> There is a function called pg_stat_reset() or some such.

I'll find it.
>
>> I suppose that blks_read and blks_hit are zero because there are 8 GBytes
>> RAM on this machine and I give 2GBytes to shared_buffers = 253000 so that
>> all sits in RAM.
>
> Perhaps you have stats_block_level set to off?

True, I will turn them on.
>
>> I know there have been rollbacks but I do a REINDEX, CLUSTER, and VACUUM
>> ANALYZE before starting the inserts in question.
>
> You do all three on the same tables?  That seems pretty pointless.  A
> sole CLUSTER has the same effect.

I was only doing this to be sure to clean everything up for this data
gathering process and wanted to know I did not miss anything.
>
>> Do I need to do a VACUUM FULL ANALYZE instead?
>
> No.
>


--
  .~.  Jean-David Beyer          Registered Linux User 85642.
  /V\  PGP-Key: 9A2FC99A         Registered Machine   241939.
 /( )\ Shrewsbury, New Jersey    http://counter.li.org
 ^^-^^ 09:50:01 up 22 days, 3:08, 4 users, load average: 4.29, 4.17, 4.11

Re: Curious about dead rows.

From
Andrew Sullivan
Date:
On Wed, Nov 14, 2007 at 07:12:45AM -0500, Jean-David Beyer wrote:
>
> I know there have been rollbacks but I do a REINDEX, CLUSTER, and VACUUM
> ANALYZE before starting the inserts in question. Do I need to do a VACUUM
> FULL ANALYZE instead?

I had another idea.  As Alvaro says, CLUSTER will do everything you need.
But are you sure there are _no other_ transactions open when you do that?
This could cause problems, and CLUSTER's behaviour with other open
transactions is not, um, friendly prior to the current beta.

A


--
Andrew Sullivan
Old sigs will return after re-constitution of blue smoke

Re: Curious about dead rows.

From
Jean-David Beyer
Date:
Andrew Sullivan wrote:
> On Wed, Nov 14, 2007 at 07:12:45AM -0500, Jean-David Beyer wrote:
>> I know there have been rollbacks but I do a REINDEX, CLUSTER, and VACUUM
>> ANALYZE before starting the inserts in question. Do I need to do a VACUUM
>> FULL ANALYZE instead?
>
> I had another idea.  As Alvaro says, CLUSTER will do everything you need.
> But are you sure there are _no other_ transactions open when you do that?

I am sure. I have a single-threaded program, so unless the postgres server
processes begin and end transactions on their own initiative, the only
things that would initiate transactions would be my one of my applications
that I run only one at a time, or leaving psql running. But as I understand
it, psql does not bother with transactions, and besides, I normally just do
SELECTs with that. (I also do INSERTs and UPDATEs with it in shell scripts,
but I do not run those when I am running the application either.

> This could cause problems, and CLUSTER's behaviour with other open
> transactions is not, um, friendly prior to the current beta.
>
I suppose it might.

Right now I put

    // Reset statistics counters.
    EXEC SQL BEGIN WORK;
    EXEC SQL
    SELECT pg_stat_reset();
    EXEC SQL COMMIT WORK;

into my application so that the statistics counters will not count previous
UPDATEs and ROLLBACKs when the main program that I intend and believe to do
only INSERTs is running. It will make those statistics easier to read than
having to subtract previous values to get the changes.

Well, it will not work because I must be superuser (i.e., postgres) to
execute that, and if I am, I cannot read the input files. I will do it
manually with psql but that means I have to watch it run to do it at the
right time.


--
  .~.  Jean-David Beyer          Registered Linux User 85642.
  /V\  PGP-Key: 9A2FC99A         Registered Machine   241939.
 /( )\ Shrewsbury, New Jersey    http://counter.li.org
 ^^-^^ 11:20:01 up 22 days, 4:38, 4 users, load average: 6.16, 5.98, 5.62

Re: Curious about dead rows.

From
Andrew Sullivan
Date:
On Wed, Nov 14, 2007 at 11:53:17AM -0500, Jean-David Beyer wrote:
> that I run only one at a time, or leaving psql running. But as I understand
> it, psql does not bother with transactions, and besides, I normally just do

No, every statement in psql is a transaction.  Even SELECT.  Every statement
under PostgreSQL runs in a transaction.  When you type "SELECT (1)", the
server implicitly adds the BEGIN; and END; around it.

> into my application so that the statistics counters will not count previous
> UPDATEs and ROLLBACKs when the main program that I intend and believe to do
> only INSERTs is running. It will make those statistics easier to read than
> having to subtract previous values to get the changes.

Yes.

> Well, it will not work because I must be superuser (i.e., postgres) to
> execute that, and if I am, I cannot read the input files. I will do it

You could grant superuser status to your user (or just connect as postgres
user) for the time being, while debugging this.

A

--
Andrew Sullivan
Old sigs will return after re-constitution of blue smoke

Re: Curious about dead rows.

From
Andrew Sullivan
Date:
On Wed, Nov 14, 2007 at 11:58:23AM -0500, Andrew Sullivan wrote:
> No, every statement in psql is a transaction.  Even SELECT.  Every statement

Err, to be clearer, "Every statement in psql is _somehow_ part of a
transaction; if you don't start one explicitly, the statement runs on its
own as a transaction."

A

--
Andrew Sullivan
Old sigs will return after re-constitution of blue smoke

Re: Curious about dead rows.

From
Jean-David Beyer
Date:
Andrew Sullivan wrote:
> On Wed, Nov 14, 2007 at 07:12:45AM -0500, Jean-David Beyer wrote:
>> I know there have been rollbacks but I do a REINDEX, CLUSTER, and
>> VACUUM ANALYZE before starting the inserts in question. Do I need to do
>> a VACUUM FULL ANALYZE instead?
>
> I had another idea.  As Alvaro says, CLUSTER will do everything you need.
>  But are you sure there are _no other_ transactions open when you do
> that? This could cause problems, and CLUSTER's behaviour with other open
> transactions is not, um, friendly prior to the current beta.
>
These were not done at exactly the same time, but as close as I can.

REINDEX
CLUSTER;
CLUSTER
 (part of a shell script that runs the other stuff)

File `/homeB/jdbeyer/stocks/DATA/valueLine/19860103.tsv' OK
File `/homeB/jdbeyer/stocks/DATA/valueLine/19860131.tsv' OK
File `/homeB/jdbeyer/stocks/DATA/valueLine/19860228.tsv' OK
File `/homeB/jdbeyer/stocks/DATA/valueLine/19860328.tsv' OK
File `/homeB/jdbeyer/stocks/DATA/valueLine/19860502.tsv' OK
File `/homeB/jdbeyer/stocks/DATA/valueLine/19860530.tsv' OK
File `/homeB/jdbeyer/stocks/DATA/valueLine/19860627.tsv' OK
(this is showing the program being run on different data).

stock=# SELECT * FROM pg_stat_database WHERE datname = 'stock';
 datid | datname | numbackends | xact_commit | xact_rollback | blks_read |
blks_hit
-------+---------+-------------+-------------+---------------+-----------+----------
 16402 | stock   |           2 |         152 |             0 |     18048 |
15444563
(1 row)

stock=# SELECT * FROM pg_stat_all_tables WHERE schemaname = 'public' ORDER
BY relname;
 relid | schemaname | relname  | seq_scan | seq_tup_read | idx_scan |
idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del

-------+------------+----------+----------+--------------+----------+---------------+-----------+-----------+-----------
 89000 | public     | co_name  |        0 |            0 |        0 |
      0 |         0 |         0 |         0
 89004 | public     | company  |        0 |            0 |   938764 |
 938764 |         0 |         0 |         0
 89029 | public     | tick     |        0 |            0 |   189737 |
 279580 |         0 |         0 |         0
 89034 | public     | vl_as    |        0 |            0 |        0 |
      0 |    140840 |         0 |         0
 89036 | public     | vl_cf    |        0 |            0 |        0 |
      0 |    140840 |         0 |         0
 89038 | public     | vl_in    |        0 |            0 |        0 |
      0 |    185667 |         0 |         0
 89040 | public     | vl_li    |        0 |            0 |        0 |
      0 |    140840 |         0 |         0
 89042 | public     | vl_mi    |        0 |            0 |        0 |
      0 |    140840 |         0 |         0
 89044 | public     | vl_ranks |        0 |            0 |        0 |
      0 |    189737 |         0 |         0
(18 rows)

2007-11-14 12:00:31 EST DEBUG:  analyzing "public.vl_in"
2007-11-14 12:00:31 EST DEBUG:  "vl_in": scanned 2001 of 2001 pages,
containing 183983 live rows and 52 dead rows; 3000 rows in sample, 183983
estimated total rows
2007-11-14 12:00:31 EST DEBUG:  analyzing "public.vl_cf"
2007-11-14 12:00:31 EST DEBUG:  "vl_cf": scanned 1064 of 1064 pages,
containing 134952 live rows and 89 dead rows; 3000 rows in sample, 134952
estimated total rows
2007-11-14 12:00:31 EST DEBUG:  analyzing "public.vl_as"
2007-11-14 12:00:31 EST DEBUG:  "vl_as": scanned 1732 of 1732 pages,
containing 134952 live rows and 120 dead rows; 3000 rows in sample, 134952
estimated total rows
2007-11-14 12:00:31 EST DEBUG:  analyzing "public.vl_ranks"
2007-11-14 12:00:31 EST DEBUG:  "vl_ranks": scanned 1485 of 1485 pages,
containing 188415 live rows and 162 dead rows; 3000 rows in sample, 188415
estimated total rows
2007-11-14 12:00:31 EST DEBUG:  analyzing "public.vl_mi"
2007-11-14 12:00:31 EST DEBUG:  "vl_mi": scanned 1325 of 1325 pages,
containing 134952 live rows and 191 dead rows; 3000 rows in sample, 134952
estimated total rows
2007-11-14 12:00:31 EST DEBUG:  analyzing "public.vl_li"
2007-11-14 12:00:31 EST DEBUG:  "vl_li": scanned 1326 of 1326 pages,
containing 134952 live rows and 218 dead rows; 3000 rows in sample, 134952
estimated total rows



--
  .~.  Jean-David Beyer          Registered Linux User 85642.
  /V\  PGP-Key: 9A2FC99A         Registered Machine   241939.
 /( )\ Shrewsbury, New Jersey    http://counter.li.org
 ^^-^^ 11:55:01 up 22 days, 5:13, 3 users, load average: 5.13, 4.71, 4.74

Re: Curious about dead rows.

From
Mario Weilguni
Date:
Jean-David Beyer schrieb:
> I am doing lots of INSERTs on a table that starts out empty (I did a
> TRUNCATE on it). I am not, AFAIK, doing DELETEs or UPDATEs. Autovacuum is
> on. I moved logging up to debug2 level to see what was going on, and I get
> things like this:
>
>  "vl_as": scanned 3000 of 5296 pages, containing 232944 live rows and 1033
> dead rows; 3000 rows in sample, 411224 estimated total rows
>
> A little later, it says:
>
> "vl_as": scanned 3000 of 6916 pages, containing 233507 live rows and 493
> dead rows; 3000 rows in sample, 538311 estimated total rows
>
> (I suppose that means autovacuum is working.) Is this normal, or have I got
> something wrong? Why so many dead rows when just doing inserts? It is not
> that I think the number is too high, considering the number of rows in the
> table at the point where I copied this line. It is just that I do not
> understand why there are any.
>
>
Did you rollback some transactions? It will generate dead rows too - at
least I think so.

Re: Curious about dead rows.

From
Jean-David Beyer
Date:
Mario Weilguni wrote:
> Jean-David Beyer schrieb:
>> I am doing lots of INSERTs on a table that starts out empty (I did a
>> TRUNCATE on it). I am not, AFAIK, doing DELETEs or UPDATEs. Autovacuum is
>> on. I moved logging up to debug2 level to see what was going on, and I
>> get
>> things like this:
>>
>>  "vl_as": scanned 3000 of 5296 pages, containing 232944 live rows and
>> 1033
>> dead rows; 3000 rows in sample, 411224 estimated total rows
>>
>> A little later, it says:
>>
>> "vl_as": scanned 3000 of 6916 pages, containing 233507 live rows and 493
>> dead rows; 3000 rows in sample, 538311 estimated total rows
>>
>> (I suppose that means autovacuum is working.) Is this normal, or have
>> I got
>> something wrong? Why so many dead rows when just doing inserts? It is not
>> that I think the number is too high, considering the number of rows in
>> the
>> table at the point where I copied this line. It is just that I do not
>> understand why there are any.
>>
>>
> Did you rollback some transactions? It will generate dead rows too - at
> least I think so.
>
No, and the statistics confirm this.

stock=> SELECT * FROM pg_stat_database WHERE datname = 'stock';
 datid | datname | numbackends | xact_commit | xact_rollback | blks_read |
blks_hit
-------+---------+-------------+-------------+---------------+-----------+-----------
 16402 | stock   |           1 |        1267 |             0 |    232234 |
146426135
(1 row)

stock=> SELECT * FROM pg_stat_all_tables WHERE schemaname = 'public' ORDER
BY relname;
 relid | schemaname | relname  | seq_scan | seq_tup_read | idx_scan |
idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del

-------+------------+----------+----------+--------------+----------+---------------+-----------+-----------+-----------
 89000 | public     | co_name  |        7 |       215873 |        1 |
  30839 |         0 |         0 |         0
 89004 | public     | company  |        9 |       219519 |  5624483 |
5648873 |         0 |         0 |         0
 89008 | public     | div      |        7 |            0 |        1 |
      0 |         0 |         0 |         0
 89010 | public     | djia     |        4 |         2044 |        0 |
      0 |         0 |         0 |         0
 89012 | public     | earn     |        2 |            0 |        0 |
      0 |         0 |         0 |         0
 89014 | public     | ibd      |        5 |            0 |        1 |
      0 |         0 |         0 |         0
 89016 | public     | merg     |        2 |            0 |        0 |
      0 |         0 |         0 |         0
 89018 | public     | price    |        9 |            0 |        1 |
      0 |         0 |         0 |         0
 89022 | public     | source   |        3 |           27 |        0 |
      0 |         0 |         0 |         0
 89025 | public     | sp_500   |        2 |            0 |        0 |
      0 |         0 |         0 |         0
 89027 | public     | split    |        3 |            0 |        1 |
      0 |         0 |         0 |         0
 89029 | public     | tick     |       13 |       400946 |   980983 |
1510922 |         0 |         0 |         0
 89034 | public     | vl_as    |        7 |      6524595 |        1 |
 932085 |    932085 |         0 |         0
 89036 | public     | vl_cf    |        7 |      6317808 |        1 |
 902544 |    902544 |         0 |         0
 89038 | public     | vl_in    |        7 |      6798351 |        1 |
 971193 |    966989 |         0 |         0
 89040 | public     | vl_li    |        7 |      6524595 |        1 |
 932085 |    932085 |         0 |         0
 89042 | public     | vl_mi    |        7 |      6368579 |        1 |
 909797 |    909797 |         0 |         0
 89044 | public     | vl_ranks |        8 |      7624818 |        1 |
 985548 |    980982 |         0 |         0


--
  .~.  Jean-David Beyer          Registered Linux User 85642.
  /V\  PGP-Key: 9A2FC99A         Registered Machine   241939.
 /( )\ Shrewsbury, New Jersey    http://counter.li.org
 ^^-^^ 16:05:01 up 22 days, 9:23, 0 users, load average: 4.45, 4.11, 4.03

Re: Curious about dead rows.

From
Alvaro Herrera
Date:
Jean-David Beyer wrote:
> Mario Weilguni wrote:

> > Did you rollback some transactions? It will generate dead rows too - at
> > least I think so.
> >
> No, and the statistics confirm this.

To recap:

- your app only does inserts
- there has been no rollback lately
- there are no updates
- there are no deletes

The only other source of dead rows I can think is triggers ... do you
have any?  (Not necessarily on this table -- perhaps triggers on other
tables can cause updates on this one).

Oh, rolled back COPY can cause dead rows too.

--
Alvaro Herrera                               http://www.PlanetPostgreSQL.org/
"Before you were born your parents weren't as boring as they are now. They
got that way paying your bills, cleaning up your room and listening to you
tell them how idealistic you are."  -- Charles J. Sykes' advice to teenagers

Re: Curious about dead rows.

From
Russell Smith
Date:
Jean-David Beyer wrote:
> [snip]
> 2007-11-14 12:00:31 EST DEBUG:  analyzing "public.vl_in"
> 2007-11-14 12:00:31 EST DEBUG:  "vl_in": scanned 2001 of 2001 pages,
> containing 183983 live rows and 52 dead rows; 3000 rows in sample, 183983
> estimated total rows
> 2007-11-14 12:00:31 EST DEBUG:  analyzing "public.vl_cf"
> 2007-11-14 12:00:31 EST DEBUG:  "vl_cf": scanned 1064 of 1064 pages,
> containing 134952 live rows and 89 dead rows; 3000 rows in sample, 134952
> estimated total rows
> 2007-11-14 12:00:31 EST DEBUG:  analyzing "public.vl_as"
> 2007-11-14 12:00:31 EST DEBUG:  "vl_as": scanned 1732 of 1732 pages,
> containing 134952 live rows and 120 dead rows; 3000 rows in sample, 134952
> estimated total rows
> 2007-11-14 12:00:31 EST DEBUG:  analyzing "public.vl_ranks"
> 2007-11-14 12:00:31 EST DEBUG:  "vl_ranks": scanned 1485 of 1485 pages,
> containing 188415 live rows and 162 dead rows; 3000 rows in sample, 188415
> estimated total rows
> 2007-11-14 12:00:31 EST DEBUG:  analyzing "public.vl_mi"
> 2007-11-14 12:00:31 EST DEBUG:  "vl_mi": scanned 1325 of 1325 pages,
> containing 134952 live rows and 191 dead rows; 3000 rows in sample, 134952
> estimated total rows
> 2007-11-14 12:00:31 EST DEBUG:  analyzing "public.vl_li"
> 2007-11-14 12:00:31 EST DEBUG:  "vl_li": scanned 1326 of 1326 pages,
> containing 134952 live rows and 218 dead rows; 3000 rows in sample, 134952
> estimated total rows
>
What does vacuum verbose have to say about this situation?
It is possible that analyze is not getting the number of dead rows right?
Does analyze, followed by vacuum verbose give the same dead row counts?

Sorry for lots of questions, I'm just throwing ideas into the mix.

Russell.
>


Re: Curious about dead rows.

From
Jean-David Beyer
Date:
Alvaro Herrera wrote:
> Jean-David Beyer wrote:
>> Mario Weilguni wrote:
>
>>> Did you rollback some transactions? It will generate dead rows too - at
>>> least I think so.
>>>
>> No, and the statistics confirm this.
>
> To recap:
>
> - your app only does inserts

True.

> - there has been no rollback lately

True.

> - there are no updates

True

> - there are no deletes

True.
>
> The only other source of dead rows I can think is triggers ... do you
> have any?

No triggers at all. I have sequences that were not in the IBM DB2 version of
this stuff. But they are all done earlier, before the CLUSTER of the entire
database. Furthermore, they are only for two tables, not the ones that
attracted my notice in the first place.

> (Not necessarily on this table -- perhaps triggers on other
> tables can cause updates on this one).
>
> Oh, rolled back COPY can cause dead rows too.
>
The only copies I ever do are those inside dbdump -- dbrestore, and they
come after all this stuff. And they do not roll back -- though I suppose
they could in principle.

--
  .~.  Jean-David Beyer          Registered Linux User 85642.
  /V\  PGP-Key: 9A2FC99A         Registered Machine   241939.
 /( )\ Shrewsbury, New Jersey    http://counter.li.org
 ^^-^^ 16:45:01 up 22 days, 10:03, 1 user, load average: 4.20, 4.22, 4.17

Re: Curious about dead rows.

From
Tom Lane
Date:
Russell Smith <mr-russ@pws.com.au> writes:
> It is possible that analyze is not getting the number of dead rows right?

Hah, I think you are on to something.  ANALYZE is telling the truth
about how many "dead" rows it saw, but its notion of "dead" is "not good
according to SnapshotNow".  Thus, rows inserted by a not-yet-committed
transaction would be counted as dead.  So if these are background
auto-analyzes being done in parallel with inserting transactions that
run for awhile, seeing a few not-yet-committed rows would be
unsurprising.

I wonder if that is worth fixing?  I'm not especially concerned about
the cosmetic aspect of it, but if we mistakenly launch an autovacuum
on the strength of an inflated estimate of dead rows, that could be
costly.

            regards, tom lane

Re: Curious about dead rows.

From
Craig James
Date:
Alvaro Herrera wrote:
> To recap:
>
> - your app only does inserts
> - there has been no rollback lately
> - there are no updates
> - there are no deletes
>
> The only other source of dead rows I can think is triggers ... do you
> have any?  (Not necessarily on this table -- perhaps triggers on other
> tables can cause updates on this one).
>
> Oh, rolled back COPY can cause dead rows too.


What about an unreliable network that causes lot of disconnects?  Wouldn't the server process do a rollback?

Craig



Re: Curious about dead rows.

From
Jean-David Beyer
Date:
Craig James wrote:
> Alvaro Herrera wrote:
>> To recap:
>>
>> - your app only does inserts
>> - there has been no rollback lately
>> - there are no updates
>> - there are no deletes
>>
>> The only other source of dead rows I can think is triggers ... do you
>> have any?  (Not necessarily on this table -- perhaps triggers on other
>> tables can cause updates on this one).
>>
>> Oh, rolled back COPY can cause dead rows too.
>
>
> What about an unreliable network that causes lot of disconnects?
> Wouldn't the server process do a rollback?
>
Perhaps in theory, but in practice my client and the postgreSQL servers are
on the same machine and the 127.0.0.1 is pretty reliable:

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:30097919 errors:0 dropped:0 overruns:0 frame:0
          TX packets:30097919 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:931924602 (888.7 MiB)  TX bytes:931924602 (888.7 MiB)



--
  .~.  Jean-David Beyer          Registered Linux User 85642.
  /V\  PGP-Key: 9A2FC99A         Registered Machine   241939.
 /( )\ Shrewsbury, New Jersey    http://counter.li.org
 ^^-^^ 22:10:01 up 22 days, 15:28, 0 users, load average: 4.25, 4.21, 4.12

Re: Curious about dead rows.

From
Jean-David Beyer
Date:
Tom Lane wrote:
> Russell Smith <mr-russ@pws.com.au> writes:
>> It is possible that analyze is not getting the number of dead rows
>> right?
>
> Hah, I think you are on to something.  ANALYZE is telling the truth about
> how many "dead" rows it saw, but its notion of "dead" is "not good
> according to SnapshotNow".  Thus, rows inserted by a not-yet-committed
> transaction would be counted as dead.  So if these are background
> auto-analyzes being done in parallel with inserting transactions that run
> for awhile,

They are.

> seeing a few not-yet-committed rows would be unsurprising.

That is a very interesting possibility. I can see that it is certainly a
possible explanation, since my insert transactions take between 0.04 to 0.1
minutes (sorry, decimal stopwatch) of real time, typically putting 1700 rows
into about a half dozen tables. And the ANALYZE is whatever autovacuum
chooses to do. So if new not-yet-committed rows are considered dead, that
would be a sufficient explanation.

So I am, retroactively, unsurprised.

> I wonder if that is worth fixing?  I'm not especially concerned about the
> cosmetic aspect of it, but if we mistakenly launch an autovacuum on the
> strength of an inflated estimate of dead rows, that could be costly.
>
Well, since I was more interested in the explanation than in the fixing, in
that sense I do not care if it is fixed or not. While it may create a slight
slowdown (if it is an error), the applications run "fast enough."

I would not even get the fix until Red Hat get around to putting it in (I
run postgresql-8.1.9-1.el5 that is in their RHEL5 distribution), that
probably will not be until RHEL6 and the soonest, and I will probably skip
that one and wait until RHEL7 comes out in about 3 years.

But somewhere perhaps a reminder of this should be placed where someone like
me would find it, so we would not have to go through this again for someone
else.

--
  .~.  Jean-David Beyer          Registered Linux User 85642.
  /V\  PGP-Key: 9A2FC99A         Registered Machine   241939.
 /( )\ Shrewsbury, New Jersey    http://counter.li.org
 ^^-^^ 22:05:01 up 22 days, 15:23, 0 users, load average: 4.16, 4.22, 4.10

Re: Curious about dead rows.

From
Erik Jones
Date:
On Nov 14, 2007, at 4:46 PM, Tom Lane wrote:

> Russell Smith <mr-russ@pws.com.au> writes:
>> It is possible that analyze is not getting the number of dead rows
>> right?
>
> Hah, I think you are on to something.  ANALYZE is telling the truth
> about how many "dead" rows it saw, but its notion of "dead" is "not
> good
> according to SnapshotNow".  Thus, rows inserted by a not-yet-committed
> transaction would be counted as dead.  So if these are background
> auto-analyzes being done in parallel with inserting transactions that
> run for awhile, seeing a few not-yet-committed rows would be
> unsurprising.

Wouldn't this result in a variable number of dead rows being reported
on separate runs including zero while no pending inserts are
happening?  This may be a good way to verify that this is what is
happening if he can quiet down his app long enough to run an ANALYZE
in isolation.  Perhaps, if the ANALYZE runs fast enough he can just
lock the table for the run.

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



Re: Curious about dead rows.

From
Brad Nicholson
Date:
On Wed, 2007-11-14 at 17:46 -0500, Tom Lane wrote:
> Russell Smith <mr-russ@pws.com.au> writes:
> > It is possible that analyze is not getting the number of dead rows right?
>
> Hah, I think you are on to something.  ANALYZE is telling the truth
> about how many "dead" rows it saw, but its notion of "dead" is "not good
> according to SnapshotNow".  Thus, rows inserted by a not-yet-committed
> transaction would be counted as dead.  So if these are background
> auto-analyzes being done in parallel with inserting transactions that
> run for awhile, seeing a few not-yet-committed rows would be
> unsurprising.
>
> I wonder if that is worth fixing?  I'm not especially concerned about
> the cosmetic aspect of it, but if we mistakenly launch an autovacuum
> on the strength of an inflated estimate of dead rows, that could be
> costly.

Sounds to me like that could result in autovacuum kicking off while
doing large data loads.  This sounds suspiciously like problem someone
on -novice was having - tripping over a windows autovac bug while doing
a data load

http://archives.postgresql.org/pgsql-novice/2007-11/msg00025.php

--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



Re: Curious about dead rows.

From
"Merlin Moncure"
Date:
On Nov 16, 2007 10:56 AM, Brad Nicholson <bnichols@ca.afilias.info> wrote:
> On Wed, 2007-11-14 at 17:46 -0500, Tom Lane wrote:
> > Russell Smith <mr-russ@pws.com.au> writes:
> > > It is possible that analyze is not getting the number of dead rows right?
> >
> > Hah, I think you are on to something.  ANALYZE is telling the truth
> > about how many "dead" rows it saw, but its notion of "dead" is "not good
> > according to SnapshotNow".  Thus, rows inserted by a not-yet-committed
> > transaction would be counted as dead.  So if these are background
> > auto-analyzes being done in parallel with inserting transactions that
> > run for awhile, seeing a few not-yet-committed rows would be
> > unsurprising.
> >
> > I wonder if that is worth fixing?  I'm not especially concerned about
> > the cosmetic aspect of it, but if we mistakenly launch an autovacuum
> > on the strength of an inflated estimate of dead rows, that could be
> > costly.
>
> Sounds to me like that could result in autovacuum kicking off while
> doing large data loads.  This sounds suspiciously like problem someone
> on -novice was having - tripping over a windows autovac bug while doing
> a data load
>
> http://archives.postgresql.org/pgsql-novice/2007-11/msg00025.php

I am almost 100% I've seen this behavior in the field...

merlin

Re: Curious about dead rows.

From
"Kevin Grittner"
Date:
>>> On Fri, Nov 16, 2007 at  4:01 PM, in message
<b42b73150711161401p13e93e4dn19bc8388a2da9208@mail.gmail.com>, "Merlin Moncure"
<mmoncure@gmail.com> wrote:
> On Nov 16, 2007 10:56 AM, Brad Nicholson <bnichols@ca.afilias.info> wrote:
>> On Wed, 2007-11-14 at 17:46 -0500, Tom Lane wrote:
>> > Russell Smith <mr-russ@pws.com.au> writes:
>> > > It is possible that analyze is not getting the number of dead rows right?
>> >
>> > Hah, I think you are on to something.  ANALYZE is telling the truth
>> > about how many "dead" rows it saw, but its notion of "dead" is "not good
>> > according to SnapshotNow".  Thus, rows inserted by a not-yet-committed
>> > transaction would be counted as dead.  So if these are background
>> > auto-analyzes being done in parallel with inserting transactions that
>> > run for awhile, seeing a few not-yet-committed rows would be
>> > unsurprising.
>> >
>> > I wonder if that is worth fixing?  I'm not especially concerned about
>> > the cosmetic aspect of it, but if we mistakenly launch an autovacuum
>> > on the strength of an inflated estimate of dead rows, that could be
>> > costly.
>>
>> Sounds to me like that could result in autovacuum kicking off while
>> doing large data loads.  This sounds suspiciously like problem someone
>> on -novice was having - tripping over a windows autovac bug while doing
>> a data load
>>
>> http://archives.postgresql.org/pgsql-novice/2007-11/msg00025.php
>
> I am almost 100% I've seen this behavior in the field...

I know I've seen bulk loads go significantly faster with autovacuum
turned off.  It always seemed like a bigger difference than what the
ANALYZE would cause.  I bet this explains it.

-Kevin