Re: BUG #7853: Incorrect statistics in table with many dead rows. - Mailing list pgsql-bugs
From | James R Skaggs |
---|---|
Subject | Re: BUG #7853: Incorrect statistics in table with many dead rows. |
Date | |
Msg-id | CAHqB0wQhGzjzbtK1t0t+xixcdJ8L3FxX+XouJw4-LCaG+B8Sjw@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #7853: Incorrect statistics in table with many dead rows. (Kevin Grittner <kgrittn@ymail.com>) |
Responses |
Re: BUG #7853: Incorrect statistics in table with many dead rows.
|
List | pgsql-bugs |
Kevin - Agreed, we shouldn't have so many dead rows. Our autovacuum is set on but has default parameters. We are clustering today. This, of course, means downtime and inconvenience to the users. Here is the troublesome query: select sum(case when t1.cdlc < 0 then 0 else t1.cdlc end) as code_line_cnt, sum(case when t1.cmlc < 0 then 0 else t1.cmlc end) as comment_line_cnt, sum(case when t1.bllc < 0 then 0 else t1.bllc end) as blank_line_cnt from ( select max(sf.current_code_line_count) as cdlc, max(sf.current_comment_line_count) as cmlc, max(sf.current_blank_line_count) as bllc from stream_file sf group by sf.file_path_id, sf.current_source_md5 ) as t1; And the explain plan. Against a optimized database, the plan was better, as well as execution Aggregate (cost=32073953.42..32073953.43 rows=1 width=12) (actual time=77394.354..77394.354 rows=1 loops=1) -> GroupAggregate (cost=29271854.25..31808889.97 rows=15146482 width=53) (actual time=62490.514..77333.105 rows=168591 loops=1) -> Sort (cost=29271854.25..29650516.30 rows=151464819 width=53) (actual time=62490.492..73098.009 rows=14403547 loops=1) Sort Key: sf.file_path_id, sf.current_source_md5 Sort Method: external merge Disk: 942440kB -> Seq Scan on stream_file sf (cost=0.00..3514999.19 rows=151464819 width=53) (actual time=0.327..20620.230 rows=14403547 loops=1) Total runtime: 77836.949 ms Thanks and Regards, James Skaggs IT/FIS Longmont SeaTel: 8 684 1048 General: +1 612 367 6224 On Tue, Feb 5, 2013 at 3:00 PM, Kevin Grittner <kgrittn@ymail.com> wrote: > "jimbob@seagate.com" <jimbob@seagate.com> wrote: > > > INFO: analyzing "public.stream_file" > > INFO: "stream_file": scanned 30000 of 2123642 pages, containing > > 184517 live rows and 2115512 dead rows; 30000 rows in sample, > > 158702435 estimated total rows > > 184517 live rows in 30000 randomly sampled pages out of 2123642 > total pages, means that the statistics predict that a select > count(*) will find about 13 million live rows to count. > > > After "analyze verbose", the table shows 158 million rows. A > > select count(1) yields 13.8 million rows. > > OK, the estimate was 13 million and there were actually 13.8 > million, but it is a random sample used to generate estimates. > That seems worse than average, but close enough to be useful. > The 158.7 million total rows includes dead rows, which must be > visited to determine visibility, but will not be counted because > they are not visible to the counting transaction. Having over 90% > of your table filled with dead rows is a bad situation to be in, > from a performance standpoint. You should use aggressive > maintenance (like VACUUM FULL or CLUSTER) to fix the existing > extreme bloat, and then review your autovacuum settings and overall > vacuum regimen to prevent future bloat. > > This does not look like a bug from the information provided so far. > > -Kevin >
pgsql-bugs by date: