Thread: ANALYZE getting dead tuple count hopelessly wrong

ANALYZE getting dead tuple count hopelessly wrong

From
Stuart Brooks
Date:
I have a table with about 15 million rows which is constantly having
tuples added to the head and deleted in blocks from the tail to maintain
the size. The dead tuple count in pg_stat_user_tables tracks the deleted
rows fairly accurately until an auto-ANALYZE is done in the background
at which point the value it calculates is wrong by a factor of 2-3 times
(calculated value is 30-50% of the correct value), which completely
throws the auto-VACUUMing. An example is that the auto-VACUUM only ran
when there were 12 million (real) dead rows! Any ideas?

Thanks
 Stuart

PS. Running 8.3.1 on NetBSD 3.

Re: ANALYZE getting dead tuple count hopelessly wrong

From
"Pavan Deolasee"
Date:
On Mon, Mar 31, 2008 at 1:33 PM, Stuart Brooks <stuartb@cat.co.za> wrote:
> I have a table with about 15 million rows which is constantly having
>  tuples added to the head and deleted in blocks from the tail to maintain
>  the size. The dead tuple count in pg_stat_user_tables tracks the deleted
>  rows fairly accurately until an auto-ANALYZE is done in the background
>  at which point the value it calculates is wrong by a factor of 2-3 times
>  (calculated value is 30-50% of the correct value)

(copying -hackers)

Seems like the redirected-dead line pointers are playing spoil-sport here.
In this particular example, the deleted tuples may get truncated to
redirected-dead line pointers. Analyze would report them as empty
slots and not as dead tuples. So in the worst case, if all the deleted
tuples are already truncated to redirected-dead line pointers, analyze
may report "zero" dead tuple count.

This is a slightly tricky situation because in normal case we might want
to delay autovacuum to let subsequent UPDATEs in the page to reuse
the space released by  the deleted tuples. But in this particular example,
delaying autovacuum is not a good thing because the relation would
just keep growing.

I think we should check for redirected-dead line pointers in analyze.c
and report them as dead tuples. The other longer term alternative
could be to track redirected-dead line pointers and give them some
weightage while deciding on autovacuum. We can also update the
FSM information of a page when its pruned/defragged so that the page
can also be used for subsequent INSERTs or non-HOT UPDATEs in
other pages. This might be easier said than done.


Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] ANALYZE getting dead tuple count hopelessly wrong

From
Tom Lane
Date:
"Pavan Deolasee" <pavan.deolasee@gmail.com> writes:
> Seems like the redirected-dead line pointers are playing spoil-sport here.
> In this particular example, the deleted tuples may get truncated to
> redirected-dead line pointers. Analyze would report them as empty
> slots and not as dead tuples. So in the worst case, if all the deleted
> tuples are already truncated to redirected-dead line pointers, analyze
> may report "zero" dead tuple count.

[ Please see if you can stop using the "redirected dead" terminology ]

Yeah, I think I agree.  The page pruning code is set up so that changing
a line pointer to DEAD state doesn't change the count of dead tuples in
the table, so we are counting unreclaimed DEAD pointers as still being
dead tuples requiring VACUUM.  ANALYZE should surely not affect that.

It looks like there's no trivial way to get ANALYZE to do things that
way, though.  heap_release_fetch() doesn't distinguish a DEAD line
pointer from an unused or redirected one.  But in the current
implementation of ANALYZE there's really no benefit to using
heap_release_fetch anyway --- it always examines all line pointers
on each selected page, so we might as well rewrite it to use a simple
loop more like vacuum uses.

I notice that this'd leave heap_release_fetch completely unused...
at least in HEAD I'd be tempted to get rid of it and restore heap_fetch
to its former simplicity.

            regards, tom lane

Re: [HACKERS] ANALYZE getting dead tuple count hopelessly wrong

From
"Pavan Deolasee"
Date:
On Mon, Mar 31, 2008 at 9:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

>
>  [ Please see if you can stop using the "redirected dead" terminology ]
>
>

Apologies, will keep that in mind. Seems like a hang-over from the past :-)

>  Yeah, I think I agree.  The page pruning code is set up so that changing
>  a line pointer to DEAD state doesn't change the count of dead tuples in
>  the table, so we are counting unreclaimed DEAD pointers as still being
>  dead tuples requiring VACUUM.  ANALYZE should surely not affect that.
>
>  It looks like there's no trivial way to get ANALYZE to do things that
>  way, though.  heap_release_fetch() doesn't distinguish a DEAD line
>  pointer from an unused or redirected one.  But in the current
>  implementation of ANALYZE there's really no benefit to using
>  heap_release_fetch anyway --- it always examines all line pointers
>  on each selected page, so we might as well rewrite it to use a simple
>  loop more like vacuum uses.
>

I agree. I would write a patch on these lines, unless you are already on to it.


Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] ANALYZE getting dead tuple count hopelessly wrong

From
Tom Lane
Date:
"Pavan Deolasee" <pavan.deolasee@gmail.com> writes:
> On Mon, Mar 31, 2008 at 9:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It looks like there's no trivial way to get ANALYZE to do things that
>> way, though.  heap_release_fetch() doesn't distinguish a DEAD line
>> pointer from an unused or redirected one.  But in the current
>> implementation of ANALYZE there's really no benefit to using
>> heap_release_fetch anyway --- it always examines all line pointers
>> on each selected page, so we might as well rewrite it to use a simple
>> loop more like vacuum uses.

> I agree. I would write a patch on these lines, unless you are already on to it.

Please do --- I have a lot of other stuff on my plate.

            regards, tom lane

Re: [HACKERS] ANALYZE getting dead tuple count hopelessly wrong

From
"Pavan Deolasee"
Date:
On Tue, Apr 1, 2008 at 1:22 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

>
>  Please do --- I have a lot of other stuff on my plate.
>

Please see the attached patch. One change I made is to hold the SHARE lock
on the page while ANALYZE is reading tuples from it. I thought it would
be a right thing to do instead of repeatedly acquiring/releasing the lock.

Another thing I noticed while working on this is VACUUM probably reports the
number of dead tuples incorrectly. We don't count the DEAD line pointers as
"tups_vacuumed" which is fine if the line pointer was marked DEAD in the
immediately preceding heap_page_prune(). In that case the DEAD line pointer
is counted in "ndeleted" count returned by heap_page_prune(). But it fails to
count already DEAD line pointers.

For example

postgres=# CREATE TABLE test (a int, b char(500));
CREATE TABLE
postgres=# INSERT INTO test VALUES (generate_series(1,15),'foo');
INSERT 0 15
postgres=# DELETE FROM test;
DELETE 15
postgres=# select count(*) from test;
 count
-------
     0
(1 row)

postgres=# VACUUM VERBOSE test;
INFO:  vacuuming "public.test"
INFO:  "test": removed 0 row versions in 1 pages
INFO:  "test": found 0 removable, 0 nonremovable row versions in 1 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
1 pages contain useful free space.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "test": truncated 1 to 0 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM


So VACUUM reports "zero" dead row versions which may seem
counter-intuitive especially in the autovac log message (as
someone may wonder why autovac got triggered on the table)

I am thinking we can make heap_page_prune() to only return
number of HOT tuples pruned and then explicitly count the DEAD
line pointers in tups_vacuumed.


Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

Attachment

Re: [HACKERS] ANALYZE getting dead tuple count hopelessly wrong

From
Simon Riggs
Date:
On Tue, 2008-04-01 at 13:07 +0530, Pavan Deolasee wrote:

> Please see the attached patch. One change I made is to hold the SHARE lock
> on the page while ANALYZE is reading tuples from it. I thought it would
> be a right thing to do instead of repeatedly acquiring/releasing the lock.

ANALYZE is a secondary task and so we shouldn't be speeding it up at the
possible expense of other primary tasks. So I think holding locks for
longer than minimum is not good in this case and I see no reason to make
the change described.

We can speed up ANALYZE by using the background reader to preread the
blocks, assuming bgreader will happen one day.

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com

  PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk


Re: [HACKERS] ANALYZE getting dead tuple count hopelessly wrong

From
Stuart Brooks
Date:
>>  Please do --- I have a lot of other stuff on my plate.
>>
>>
>
> Please see the attached patch. One change I made is to hold the SHARE lock
> on the page while ANALYZE is reading tuples from it. I thought it would
> be a right thing to do instead of repeatedly acquiring/releasing the lock.
>
I have applied the patch and have started my test again, it takes a
little while to fill up so I should have the results sometime tomorrow.

Thanks for the quick response.
 Stuart

Re: [HACKERS] ANALYZE getting dead tuple count hopelessly wrong

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> On Tue, 2008-04-01 at 13:07 +0530, Pavan Deolasee wrote:
>> Please see the attached patch. One change I made is to hold the SHARE lock
>> on the page while ANALYZE is reading tuples from it. I thought it would
>> be a right thing to do instead of repeatedly acquiring/releasing the lock.

> ANALYZE is a secondary task and so we shouldn't be speeding it up at the
> possible expense of other primary tasks. So I think holding locks for
> longer than minimum is not good in this case and I see no reason to make
> the change described.

I think Pavan's change is probably good.  In the first place, it's only
a shared buffer lock and besides ANALYZE isn't going to be holding it
long (all it's doing at this point is counting tuples and copying some
of them into memory).  In the second place, repeated lock release and
re-grab threatens cache line contention and a context swap storm if
there is anyone else trying to access the page.  In the third, whether
there's contention or not the extra acquire/release work will cost CPU
cycles.  In the fourth, if we actually believed this was a problem we'd
need to redesign VACUUM too, as it does the same thing.

I haven't read the patch yet, but I'm inclined to go with the design
Pavan suggests.

            regards, tom lane

Re: [HACKERS] ANALYZE getting dead tuple count hopelessly wrong

From
Simon Riggs
Date:
On Tue, 2008-04-01 at 10:22 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > On Tue, 2008-04-01 at 13:07 +0530, Pavan Deolasee wrote:
> >> Please see the attached patch. One change I made is to hold the SHARE lock
> >> on the page while ANALYZE is reading tuples from it. I thought it would
> >> be a right thing to do instead of repeatedly acquiring/releasing the lock.
>
> > ANALYZE is a secondary task and so we shouldn't be speeding it up at the
> > possible expense of other primary tasks. So I think holding locks for
> > longer than minimum is not good in this case and I see no reason to make
> > the change described.
>
> I think Pavan's change is probably good.  In the first place, it's only
> a shared buffer lock and besides ANALYZE isn't going to be holding it
> long (all it's doing at this point is counting tuples and copying some
> of them into memory).  In the second place, repeated lock release and
> re-grab threatens cache line contention and a context swap storm if
> there is anyone else trying to access the page. In the third, whether
> there's contention or not the extra acquire/release work will cost CPU
> cycles.  In the fourth, if we actually believed this was a problem we'd
> need to redesign VACUUM too, as it does the same thing.

VACUUM waits until nobody else has the buffer pinned, so lock contention
is much less of a consideration there. Plus it rearranges the block,
which is hard to do one tuple at a time even if we wanted to.

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com

  PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk


Re: [HACKERS] ANALYZE getting dead tuple count hopelessly wrong

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> On Tue, 2008-04-01 at 10:22 -0400, Tom Lane wrote:
>> In the fourth, if we actually believed this was a problem we'd
>> need to redesign VACUUM too, as it does the same thing.

> VACUUM waits until nobody else has the buffer pinned, so lock contention
> is much less of a consideration there. Plus it rearranges the block,
> which is hard to do one tuple at a time even if we wanted to.

That's the second scan.  The first scan acts exactly like Pavan is
proposing for ANALYZE.

            regards, tom lane

Re: ANALYZE getting dead tuple count hopelessly wrong

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Simon Riggs <simon@2ndquadrant.com> writes:
>> On Tue, 2008-04-01 at 13:07 +0530, Pavan Deolasee wrote:
>>> Please see the attached patch. One change I made is to hold the SHARE lock
>>> on the page while ANALYZE is reading tuples from it. I thought it would
>>> be a right thing to do instead of repeatedly acquiring/releasing the lock.
>
>> ANALYZE is a secondary task and so we shouldn't be speeding it up at the
>> possible expense of other primary tasks. So I think holding locks for
>> longer than minimum is not good in this case and I see no reason to make
>> the change described.
>
> I think Pavan's change is probably good.  In the first place, it's only
> a shared buffer lock and besides ANALYZE isn't going to be holding it
> long (all it's doing at this point is counting tuples and copying some
> of them into memory).  In the second place, repeated lock release and
> re-grab threatens cache line contention and a context swap storm if
> there is anyone else trying to access the page.  In the third, whether
> there's contention or not the extra acquire/release work will cost CPU
> cycles.  In the fourth, if we actually believed this was a problem we'd
> need to redesign VACUUM too, as it does the same thing.

I'm not sure all those arguments are valid (at least the first two seem
contradictory). However I'm skeptical about Simon's premise. It's not clear
any changes to ANALYZE here are at the expense of other proceses. Any cycles
saved in ANALYZE are available for those other processes after all...



--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!

Re: [HACKERS] ANALYZE getting dead tuple count hopelessly wrong

From
Tom Lane
Date:
"Pavan Deolasee" <pavan.deolasee@gmail.com> writes:
> Please see the attached patch. One change I made is to hold the SHARE lock
> on the page while ANALYZE is reading tuples from it. I thought it would
> be a right thing to do instead of repeatedly acquiring/releasing the lock.

I've applied a modified/extended form of this patch for 8.3.2.

            regards, tom lane

Re: [HACKERS] ANALYZE getting dead tuple count hopelessly wrong

From
"Pavan Deolasee"
Date:
On Thu, Apr 3, 2008 at 10:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

>
>  I've applied a modified/extended form of this patch for 8.3.2.
>

Thanks. I had another concern about VACUUM not reporting DEAD line
pointers (please see up thread). Any comments on that ?

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] ANALYZE getting dead tuple count hopelessly wrong

From
Tom Lane
Date:
"Pavan Deolasee" <pavan.deolasee@gmail.com> writes:
> Thanks. I had another concern about VACUUM not reporting DEAD line
> pointers (please see up thread). Any comments on that ?

If you want to work on that, go ahead, but I wanted it separate because
I didn't think it merited back-patching.  It's strictly cosmetic in
terms of being about what VACUUM VERBOSE prints, no?

            regards, tom lane

Re: [HACKERS] ANALYZE getting dead tuple count hopelessly wrong

From
"Pavan Deolasee"
Date:
On Thu, Apr 3, 2008 at 10:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Pavan Deolasee" <pavan.deolasee@gmail.com> writes:
>
> > Thanks. I had another concern about VACUUM not reporting DEAD line
>  > pointers (please see up thread). Any comments on that ?
>
>  If you want to work on that, go ahead

Ok. I would do that.

>  but I wanted it separate because
>  I didn't think it merited back-patching.  It's strictly cosmetic in
>  terms of being about what VACUUM VERBOSE prints, no?
>

Umm.. Whatever we decide on the fix, I think we should backpatch it to
8.3 because I am worried that someone way get completely confused with
the current vacuum report, especially if the autovac is triggered just
because of
heap full of DEAD line pointers. The num of dead rows reported may
awfully be low in that case.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] ANALYZE getting dead tuple count hopelessly wrong

From
Tom Lane
Date:
"Pavan Deolasee" <pavan.deolasee@gmail.com> writes:
> On Thu, Apr 3, 2008 at 10:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I didn't think it merited back-patching.  It's strictly cosmetic in
>> terms of being about what VACUUM VERBOSE prints, no?

> Umm.. Whatever we decide on the fix, I think we should backpatch it to
> 8.3 because I am worried that someone way get completely confused with
> the current vacuum report,

"Somebody might misread an optional report" doesn't seem to me to be on
the same risk level as "we might destabilize a stable release".  The
policy of this project is that we only put nontrivial bug fixes into
back branches, and I don't think this item qualifies ...

            regards, tom lane

Re: [HACKERS] ANALYZE getting dead tuple count hopelessly wrong

From
"Pavan Deolasee"
Date:
On Fri, Apr 4, 2008 at 11:10 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

>  The
>  policy of this project is that we only put nontrivial bug fixes into
>  back branches, and I don't think this item qualifies ...
>

Got it. I will submit a patch for HEAD.

Thanks,
Pavan


--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] ANALYZE getting dead tuple count hopelessly wrong

From
Stuart Brooks
Date:
Pavan Deolasee wrote:
> On Fri, Apr 4, 2008 at 11:10 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>
>>  The
>>  policy of this project is that we only put nontrivial bug fixes into
>>  back branches, and I don't think this item qualifies ...
>>
>>
>
> Got it. I will submit a patch for HEAD.
>
> Thanks,
As I mentioned earlier, I patched 8.3.1 with Pavan's patch and have been
running tests. After a few days I have got postgres to lock up - not
sure if it is related. Below is a ps from my system (NetBSD 3).

TEST> ps -ax | grep post
 1952 ?      IW<s  13:52.24 postgres: writer process
 2113 ?      S<s    0:03.04 postgres: logger process
 2157 ?      S<s    0:03.12 postgres: autovacuum launcher process
 2199 ?      I<s    0:00.04 postgres: metauser metadb [local] SELECT
 2472 ?      DW<s 814:23.50 postgres: metauser metadb localhost(65524)
COMMIT
 2661 ?      DW<s   0:11.27 postgres: metauser metadb localhost(65525) idle
 2680 ?      S<s    1:18.75 postgres: stats collector process
 3156 ?      S<s    0:45.12 postgres: wal writer process
24362 ?      IW<s   0:00.00 postgres: autovacuum worker process
25024 ?      IW<s   0:00.00 postgres: autovacuum worker process
25134 ?      IW<s   0:00.00 postgres: autovacuum worker process
 3289 ttyp5  I<     0:01.96 /usr/local/pgsql/bin/postgres -D ../data/metadb

and I was disconnected in my client app with the following message:

[WARN] PSQL:exec - failed in command <SELECT
relname,n_tup_ins,n_live_tup,n_dead_tup,pg_total_relation_size('s8_0000.'
|| relname)*10/(1024*1024),last_autovacuum FROM pg_stat_user_tables
WHERE schemaname='s8_0000' ORDER BY n_tup_ins DESC>
[WARN]   error = 'server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.'
[WARN] ConnectionNB: PQconsumeInput failed with error 'server closed the
connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.'


The server is still running but I can't access it. A top yields:

load averages:  0.23,  0.23,  0.21    09:53:58
110 processes: 109 sleeping, 1 on processor

Memory: 513M Act, 256M Inact, 1336K Wired, 75M Exec, 557M File, 2776K Free
Swap: 600M Total, 600M Free


  PID USERNAME PRI NICE   SIZE   RES STATE      TIME   WCPU    CPU COMMAND
  463 root       2    0  6132K   14M select     0:06  0.05%  0.05% kdeinit
 2472 postgres -22   -2  4580K    4K mclpl    814:23  0.00%  0.00% <postgres>
 2631 root     -22    0   644K    4K mclpl    606:25  0.00%  0.00% <test_writer
 1622 root       2    0  8456K   14M select    19:05  0.00%  0.00% kdeinit
 1952 postgres   2   -2  3544K    4K netlck    13:52  0.00%  0.00% <postgres>
  233 root       2    0    24M   31M select     4:47  0.00%  0.00% XFree86
  451 root       2    0  3544K   15M select     4:45  0.00%  0.00% kdeinit
   16 root      18    0     0K  182M syncer     3:51  0.00%  0.00% [ioflush]
   17 root     -18    0     0K  182M aiodoned   1:46  0.00%  0.00% [aiodoned]
   15 root     -18    0     0K  182M pgdaemon   1:30  0.00%  0.00% [pagedaemon]
 1301 root     -22    0  4092K    4K mclpl      1:23  0.00%  0.00% <kdeinit>
 2680 postgres   2   -2  3560K 1588K poll       1:18  0.00%  0.00% postgres
 1493 root       2    0  3488K   17M select     1:09  0.00%  0.00% korgac
  461 root       2    0  3748K   16M select     0:57  0.00%  0.00% kdeinit
 3156 postgres   2   -2  3448K 1792K select     0:45  0.00%  0.00% postgres
 1174 root       2    0  2608K 2928K select     0:40  0.00%  0.00% profiler
 1428 root       2    0  3376K   13M select     0:26  0.00%  0.00% kdeinit
 2661 postgres -22   -2  4896K    4K mclpl      0:11  0.00%  0.00% <postgres>

I'm not convinced this is a postgresql bug (state=mclpl concerns me),
but it's the first time I've seen it. I suppose it could be:
http://www.netbsd.org/cgi-bin/query-pr-single.pl?number=35224.

Anything I can do which might help isolating the problem?

Regards
 Stuart