Thread: Indexes not always used after inserts/updates/vacuum analyze
I recently upgraded to 7.2 from 7.1. Prior to 7,2, I was shutting down the database, droping indexes, vacuuming analayze, re-building all the indexes on a nightly basis ( all automated of course ;) ). Things ran fine. After upgrading to 7.2, I replaced all that with a nightly on-line /usr/local/pgsql/bin/vacuumdb -v -z on the database. The problems I am seeing is this. As new data is added and modified, indexes are not being used with some queries, but work fine on others. Even a vacuum full analyze did not fix the problem. I had to drop and re-build the index for the query to use the index. Now, I see the problem starting again. The table has about 20-30 million rows ( 4-5Gigs in size), so seq scan kills any access. This email is lengthy, but I wanted to detail this well. Here is the table definition: Table "symbol_data" Column | Type | Modifiers --------------------+-----------------------+----------- symbol_name | character varying(10) | not null date | date | not null open | numeric(15,3) | high | numeric(15,3) | low | numeric(15,3) | d_close | numeric(15,3) | volume | numeric(15,0) | earnings | numeric(15,3) | dividend | numeric(15,3) | source | character varying(50) | daily_mp_12a_long | character(3) | weekly_mp_12a_long | character(3) | daily_mp_32a_long | character(3) | weekly_mp_32a_long | character(3) | Indexes: symbol_data_date_indx Unique keys: symbol_data_pkey ------------------------------------------------------ Index "symbol_data_date_indx" Column | Type --------+------ date | date btree ------------------------------------------------------ Index "symbol_data_pkey" Column | Type -------------+----------------------- symbol_name | character varying(10) date | date unique btree ---------------------------------------------------------- Here is what I would expect which usually happens: explain select * from symbol_data where symbol_name='IBM'; Index Scan using symbol_data_pkey on symbol_data (cost=0.00..512.99 rows=128 width=129) Here is one that fails: explain select * from symbol_data where symbol_name='ELTE'; Seq Scan on symbol_data (cost=0.00..707415.32 rows=438015 width=129) Now I thought maybe it had something to do with the concatenated primary key, but: explain select * from symbol_data where symbol_name='IBM' and date between '1990-01-01' and '2002-01-01'; Index Scan using symbol_data_pkey on symbol_data (cost=0.00..402.55 rows=100 width=129) explain select * from symbol_data where symbol_name='ELTE' and date between '1990-01-01' and '2002-01-01'; Seq Scan on symbol_data (cost=0.00..810075.06 rows=342903 width=129) Now, changing the date range will eventually use the index: explain select * from symbol_data where symbol_name='ELTE' and date between '2002-01-01' and '2002-02-01'; NOTICE: QUERY PLAN: Index Scan using symbol_data_pkey on symbol_data (cost=0.00..10815.42 rows=2706 width=129) ...now I do a vacuum analyze: VACUUM verbose ANALYZE symbol_data; NOTICE: --Relation symbol_data-- NOTICE: Index symbol_data_date_indx: Pages 49709; Tuples 20536054: Deleted 4221. CPU 4.35s/16.30u sec elapsed 45.33 sec. NOTICE: Index symbol_data_pkey: Pages 74029; Tuples 20536054: Deleted 4221. CPU 6.44s/15.15u sec elapsed 31.00 sec. NOTICE: Removed 4221 tuples in 70 pages. CPU 0.00s/0.04u sec elapsed 0.08 sec. NOTICE: Pages 450911: Changed 0, Empty 0; Tup 20536054: Vac 4221, Keep 0, UnUsed 1858963. Total CPU 49.20s/36.31u sec elapsed 149.00 sec. NOTICE: Analyzing symbol_data VACUUM explain select * from symbol_data where symbol_name='ELTE' and date between '1990-01-01' and '2002-01-01'; Seq Scan on symbol_data (cost=0.00..810291.94 rows=292916 width=129) No change on the name. A new index: create index test on symbol_data (symbol_name); explain select * from symbol_data where symbol_name='ELTE'; NOTICE: QUERY PLAN: Seq Scan on symbol_data (cost=0.00..707611.68 rows=383340 width=129) EXPLAIN vpm=> VACUUM verbose ANALYZE symbol_data; NOTICE: --Relation symbol_data-- NOTICE: Pages 450911: Changed 0, Empty 0; Tup 20536054: Vac 0, Keep 0, UnUsed 1863184. Total CPU 38.99s/4.50u sec elapsed 67.95 sec. NOTICE: Analyzing symbol_data VACUUM vpm=> explain select * from symbol_data where symbol_name='ELTE'; NOTICE: QUERY PLAN: Seq Scan on symbol_data (cost=0.00..707611.68 rows=355958 width=129) Doesnt works. I think the only way to make this ever work is to drop the indexes, vacuum full, and rebuild. Any thoughts? This is a very dynamic table, but I was hoping the online vacuum in 7.2 would fix the problems. I'm in the process of splitting this table up into smaller pieces which will make life easier anyway, but I think there is something going on here. Thanks, Michael
"Michael G. Martin" <michael@vpmonline.com> writes: > Here is what I would expect which usually happens: > explain select * from symbol_data where symbol_name='IBM'; > Index Scan using symbol_data_pkey on symbol_data (cost=0.00..512.99 rows=128 width=129) > Here is one that fails: > explain select * from symbol_data where symbol_name='ELTE'; > Seq Scan on symbol_data (cost=0.00..707415.32 rows=438015 width=129) The planner thinks that there are 438K occurrences of 'ELTE' in your table. If that's true, a sequential scan is not obviously the wrong choice. How many are there, in reality? What do you get from select * from pg_stats where tablename = 'symbol_data' and attname = 'symbol_name'; Also, to put the rubber to the road: if you force an indexscan by doing "set enable_seqscan = off", does it get faster or slower? (EXPLAIN ANALYZE would be useful here.) regards, tom lane
Hi Tom,
Here is what is actually there:
select count(*) from symbol_data where symbol_name='ELTE';
count
-------
687
Here is the pg_stat query:
select * from pg_stats where tablename = 'symbol_data' and attname ='symbol_name';
tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation
-------------+-------------+-----------+-----------+------------+----------------------------------------------+---------------------------------------------------------------------------------------------------------------+-----------------------------------------------------+-------------
symbol_data | symbol_name | 0 | 7 | 152988 | {EBALX,ELTE,LIT,OEX,RESC,BS,ESH,HOC,IBC,IDA} | {0.0183333,0.0173333,0.00166667,0.00166667,0.00166667,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333} | {A,BMO,DBD,FSCHX,IIX,MAS,NSANY,PTEC,SR,UTIL,_^^VPM} | 0.128921
(1 row)
Interesting eh? EBALX also does a full scan--all others in the above list get an index scan.
Here's the variable stuff--I attached the verbose outputs.
set enable_seqscan = on;
SET VARIABLE
vpm=> explain select * from symbol_data where symbol_name='ELTE';
NOTICE: QUERY PLAN:
Seq Scan on symbol_data (cost=0.00..707611.68 rows=355958 width=129)
set enable_seqscan = off;
SET VARIABLE
vpm=> explain select * from symbol_data where symbol_name='ELTE';
NOTICE: QUERY PLAN:
Index Scan using symbol_data_pkey on symbol_data (cost=0.00..1420899.51 rows=355958 width=129)
Index scan appears slower in explain, but the rows value is weird.
Thanks,
Michael
Tom Lane wrote:
Here is what is actually there:
select count(*) from symbol_data where symbol_name='ELTE';
count
-------
687
Here is the pg_stat query:
select * from pg_stats where tablename = 'symbol_data' and attname ='symbol_name';
tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation
-------------+-------------+-----------+-----------+------------+----------------------------------------------+---------------------------------------------------------------------------------------------------------------+-----------------------------------------------------+-------------
symbol_data | symbol_name | 0 | 7 | 152988 | {EBALX,ELTE,LIT,OEX,RESC,BS,ESH,HOC,IBC,IDA} | {0.0183333,0.0173333,0.00166667,0.00166667,0.00166667,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333} | {A,BMO,DBD,FSCHX,IIX,MAS,NSANY,PTEC,SR,UTIL,_^^VPM} | 0.128921
(1 row)
Interesting eh? EBALX also does a full scan--all others in the above list get an index scan.
Here's the variable stuff--I attached the verbose outputs.
set enable_seqscan = on;
SET VARIABLE
vpm=> explain select * from symbol_data where symbol_name='ELTE';
NOTICE: QUERY PLAN:
Seq Scan on symbol_data (cost=0.00..707611.68 rows=355958 width=129)
set enable_seqscan = off;
SET VARIABLE
vpm=> explain select * from symbol_data where symbol_name='ELTE';
NOTICE: QUERY PLAN:
Index Scan using symbol_data_pkey on symbol_data (cost=0.00..1420899.51 rows=355958 width=129)
Index scan appears slower in explain, but the rows value is weird.
Thanks,
Michael
Tom Lane wrote:
"Michael G. Martin" <michael@vpmonline.com> writes:Here is what I would expect which usually happens:explain select * from symbol_data where symbol_name='IBM';
Index Scan using symbol_data_pkey on symbol_data (cost=0.00..512.99 rows=128 width=129)Here is one that fails:explain select * from symbol_data where symbol_name='ELTE';
Seq Scan on symbol_data (cost=0.00..707415.32 rows=438015 width=129)
The planner thinks that there are 438K occurrences of 'ELTE' in your
table. If that's true, a sequential scan is not obviously the wrong
choice. How many are there, in reality? What do you get from
select * from pg_stats where tablename = 'symbol_data' and attname =
'symbol_name';
Also, to put the rubber to the road: if you force an indexscan by
doing "set enable_seqscan = off", does it get faster or slower?
(EXPLAIN ANALYZE would be useful here.)
regards, tom lane
"Michael G. Martin" <michael@vpmonline.com> writes: > Here is what is actually there: > select count(*) from symbol_data where symbol_name='ELTE'; > 687 Hmm. Do you have reason to think that that was also true when you last did VACUUM ANALYZE or VACUUM? > Here is the pg_stat query: > select * from pg_stats where tablename = 'symbol_data' and attname > ='symbol_name'; > tablename | attname | null_frac | avg_width | n_distinct > | most_common_vals > | > most_common_freqs > | histogram_bounds | correlation > -------------+-------------+-----------+-----------+------------+----------------------------------------------+---------------------------------------------------------------------------------------------------------------+-----------------------------------------------------+------------- > symbol_data | symbol_name | 0 | 7 | 152988 | > {EBALX,ELTE,LIT,OEX,RESC,BS,ESH,HOC,IBC,IDA} | > {0.0183333,0.0173333,0.00166667,0.00166667,0.00166667,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333} > | {A,BMO,DBD,FSCHX,IIX,MAS,NSANY,PTEC,SR,UTIL,_^^VPM} | 0.128921 > (1 row) What this says is that in the last ANALYZE, EBALX accounted for 18% of the sample, and ELTE for 17%. Does that seem plausible to you? If the sample was accurate then I'd agree with the planner's choices. It'd seem that either your table contents are changing drastically (in which case more-frequent ANALYZEs may be the answer), or you had the bad luck to get a very unrepresentative sample, or there's some bug in the statistical calculations. regards, tom lane
yes. each symbol_name only gets one row added and maybe a few updated each market day. This is interesting too. Planner thinks 128 rows on this symbol, GE, yet there are really 5595. Not as off as ELTE, but a large factor. at least the index get hit here. explain select * from symbol_data where symbol_name='GE'; NOTICE: QUERY PLAN: Index Scan using symbol_data_pkey on symbol_data (cost=0.00..513.09 rows=128 width=129) EXPLAIN vpm=> select count(*) from symbol_data where symbol_name='GE'; count ------- 5595 Tom Lane wrote: >"Michael G. Martin" <michael@vpmonline.com> writes: > >>Here is what is actually there: >> > >> select count(*) from symbol_data where symbol_name='ELTE'; >> 687 >> > >Hmm. Do you have reason to think that that was also true when you last >did VACUUM ANALYZE or VACUUM? > >>Here is the pg_stat query: >> select * from pg_stats where tablename = 'symbol_data' and attname >>='symbol_name'; >> tablename | attname | null_frac | avg_width | n_distinct >>| most_common_vals >>| >>most_common_freqs >>| histogram_bounds | correlation >>-------------+-------------+-----------+-----------+------------+----------------------------------------------+---------------------------------------------------------------------------------------------------------------+-----------------------------------------------------+------------- >> symbol_data | symbol_name | 0 | 7 | 152988 | >>{EBALX,ELTE,LIT,OEX,RESC,BS,ESH,HOC,IBC,IDA} | >>{0.0183333,0.0173333,0.00166667,0.00166667,0.00166667,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333} >>| {A,BMO,DBD,FSCHX,IIX,MAS,NSANY,PTEC,SR,UTIL,_^^VPM} | 0.128921 >>(1 row) >> > >What this says is that in the last ANALYZE, EBALX accounted for 18% of >the sample, and ELTE for 17%. Does that seem plausible to you? If the >sample was accurate then I'd agree with the planner's choices. It'd >seem that either your table contents are changing drastically (in which >case more-frequent ANALYZEs may be the answer), or you had the bad luck >to get a very unrepresentative sample, or there's some bug in the >statistical calculations. > > regards, tom lane >
I said: >> symbol_data | symbol_name | 0 | 7 | 152988 | >> {EBALX,ELTE,LIT,OEX,RESC,BS,ESH,HOC,IBC,IDA} | >> {0.0183333,0.0173333,0.00166667,0.00166667,0.00166667,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333} >> | {A,BMO,DBD,FSCHX,IIX,MAS,NSANY,PTEC,SR,UTIL,_^^VPM} | 0.128921 >> (1 row) > What this says is that in the last ANALYZE, EBALX accounted for 18% of > the sample, and ELTE for 17%. Argh, make that 1.8% and 1.7%. That's still orders of magnitude away from what you say the correct frequency is, however: 687 out of 20+ million. I'd like to think that the statistical sampling would be unlikely to make such a large error. regards, tom lane
Sorry, I missed your bottom part before I replied last. The table breakdown consists of about 8000 symbol_names with at most 5000 rows of data for each symbol ( stock market history ). So, those sample percents seem huge. The most any symbol would have would be about 5000 / (8000*5000) = .0125% --Michael Tom Lane wrote: >"Michael G. Martin" <michael@vpmonline.com> writes: > >>Here is what is actually there: >> > >> select count(*) from symbol_data where symbol_name='ELTE'; >> 687 >> > >Hmm. Do you have reason to think that that was also true when you last >did VACUUM ANALYZE or VACUUM? > >>Here is the pg_stat query: >> select * from pg_stats where tablename = 'symbol_data' and attname >>='symbol_name'; >> tablename | attname | null_frac | avg_width | n_distinct >>| most_common_vals >>| >>most_common_freqs >>| histogram_bounds | correlation >>-------------+-------------+-----------+-----------+------------+----------------------------------------------+---------------------------------------------------------------------------------------------------------------+-----------------------------------------------------+------------- >> symbol_data | symbol_name | 0 | 7 | 152988 | >>{EBALX,ELTE,LIT,OEX,RESC,BS,ESH,HOC,IBC,IDA} | >>{0.0183333,0.0173333,0.00166667,0.00166667,0.00166667,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333} >>| {A,BMO,DBD,FSCHX,IIX,MAS,NSANY,PTEC,SR,UTIL,_^^VPM} | 0.128921 >>(1 row) >> > >What this says is that in the last ANALYZE, EBALX accounted for 18% of >the sample, and ELTE for 17%. Does that seem plausible to you? If the >sample was accurate then I'd agree with the planner's choices. It'd >seem that either your table contents are changing drastically (in which >case more-frequent ANALYZEs may be the answer), or you had the bad luck >to get a very unrepresentative sample, or there's some bug in the >statistical calculations. > > regards, tom lane > >---------------------------(end of broadcast)--------------------------- >TIP 5: Have you checked our extensive FAQ? > >http://www.postgresql.org/users-lounge/docs/faq.html >
Heh--i was gonna ask why the strange percent representation in the stats table. I just ran a vacuum analyze with the specific column. Still get the same explain plan: Seq Scan on symbol_data (cost=0.00..709962.90 rows=369782 width=129) --Michael Tom Lane wrote: >I said: > >>>symbol_data | symbol_name | 0 | 7 | 152988 | >>>{EBALX,ELTE,LIT,OEX,RESC,BS,ESH,HOC,IBC,IDA} | >>>{0.0183333,0.0173333,0.00166667,0.00166667,0.00166667,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333} >>>| {A,BMO,DBD,FSCHX,IIX,MAS,NSANY,PTEC,SR,UTIL,_^^VPM} | 0.128921 >>>(1 row) >>> > >>What this says is that in the last ANALYZE, EBALX accounted for 18% of >>the sample, and ELTE for 17%. >> > >Argh, make that 1.8% and 1.7%. > >That's still orders of magnitude away from what you say the correct >frequency is, however: 687 out of 20+ million. I'd like to think that >the statistical sampling would be unlikely to make such a large error. > > regards, tom lane >
"Michael G. Martin" <michael@vpmonline.com> writes: > I just ran a vacuum analyze with the specific column. Still get the > same explain plan: Did the pg_stats data change noticeably? ANALYZE is a statistical sampling process in 7.2, so I'd expect the results to move around somewhat each time you repeat it. But if it changes a lot then we have a problem. You could also try ALTER TABLE symbol_data ALTER symbol_name SET STATISTICS n for larger values of n (10 is the default) and then re-ANALYZE to see if the stats get any more accurate. The default of 10 was more or less picked out of the air ... perhaps it's too small. regards, tom lane
Here's the new stats since the vacuum on that column--quite a few changes. select * from pg_stats where tablename = 'symbol_data' and attname ='symbol_name'; tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation -------------+-------------+-----------+-----------+------------+----------------------------------------------+-----------------------------------------------------------------------------------------------------------+------------------------------------------------------+------------- symbol_data | symbol_name | 0 | 7 | 150712 | {EBALX,ELTE,SRP,KMG,MKC,AEN,BAC,BDX,BKF,BRT} | {0.0233333,0.018,0.00266667,0.00166667,0.00166667,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333} | {A,BRN,DPAC,FUTR,INTC,MDSN,OCA,RAA,SSYS,USTB,_^^VPM} | 0.112971 (1 row) I'll alter and play with the table tomorrow and let you know what I find. -Michael Tom Lane wrote: >"Michael G. Martin" <michael@vpmonline.com> writes: > >>I just ran a vacuum analyze with the specific column. Still get the >>same explain plan: >> > >Did the pg_stats data change noticeably? > >ANALYZE is a statistical sampling process in 7.2, so I'd expect the >results to move around somewhat each time you repeat it. But if it >changes a lot then we have a problem. > >You could also try > >ALTER TABLE symbol_data ALTER symbol_name SET STATISTICS n > >for larger values of n (10 is the default) and then re-ANALYZE >to see if the stats get any more accurate. The default of 10 >was more or less picked out of the air ... perhaps it's too small. > > regards, tom lane > >---------------------------(end of broadcast)--------------------------- >TIP 3: if posting/reading through Usenet, please send an appropriate >subscribe-nomail command to majordomo@postgresql.org so that your >message can get through to the mailing list cleanly >
On Wed, 27 Feb 2002 at 22:57, Tom Lane wrote: > Also, to put the rubber to the road: if you force an indexscan by > doing "set enable_seqscan = off", does it get faster or slower? > (EXPLAIN ANALYZE would be useful here.) I've just found a case where forcing indexscans results in much higher speed. On the the 350000 rows table mentioned in my other post after a "VACUUM FULL ANALYZE": max=# set enable_seqscan to false; max=# EXPLAIN analyze SELECT count(foo.id) FROM foo, bar WHERE foo.id = bar.ref2foo; Aggregate (cost=27513.65..27513.65 rows=1 width=8) (actual time=652.38..652.38 rows=1 loops=1) -> Merge Join (cost=0.00..27417.57 rows=38431 width=8) (actual time=0.06..603.02 rows=38431 loops=1) -> Index Scan using foo_pkey on foo (cost=0.00..25153.18 rows=352072 width=4) (actual time=0.03..157.57 rows=38432 loops=1) -> Index Scan using idx_bar_ref2foo on bar (cost=0.00..807.74 rows=38431 width=4) (actual time=0.02..170.25 rows=38431 loops=1) Total runtime: 652.58 msec ^^^^^^^^^^^ max=# set enable_seqscan to true; max=# EXPLAIN analyze SELECT count(foo.id) FROM foo, bar WHERE foo.id = bar.ref2foo; Aggregate (cost=18560.65..18560.65 rows=1 width=8) (actual time=4951.57..4951.57 rows=1 loops=1) -> Hash Join (cost=911.39..18464.58 rows=38431 width=8) (actual time=653.26..4905.37 rows=38431 loops=1) -> Seq Scan on foo (cost=0.00..9251.72 rows=352072 width=4) (actual time=0.02..769.60 rows=352072 loops=1) -> Hash (cost=683.31..683.31 rows=38431 width=4) (actual time=140.60..140.60 rows=0 loops=1) -> Seq Scan on bar (cost=0.00..683.31 rows=38431 width=4) (actual time=0.02..78.57 rows=38431 loops=1) Total runtime: 4951.70 msec ^^^^^^^^^^^^ I've reproduced that several times. Even on a newly started postmaster the query takes less than 2.5 seconds with seqscans swited off. cu Reinhard
Ok, so this morning after the automated nightly vacuum -z -v on the database, ELTE no longer appears in the pg_stats table, and the index is picked no problem. The table data has not changed since last eve. However, now there is a new symbol which is behaving the same way--I. This symbol was just loaded into the database yesterday. There are officially 4108 rows in the symbol_data table where symbol_name='I'. I bumped the STATISTICS value up to 100, re-analyzed, but the pg_stats table still shows I first on the list with a value of 0.0182--didn't change much from the original STATISTICS value of 10. Here are the explain analyzes: set enable_seqscan = on; explain analyze select * from symbol_data where symbol_name='I' order by date; NOTICE: QUERY PLAN: Sort (cost=811813.33..811813.33 rows=373904 width=129) (actual time=93423.45..93427.02 rows=4108 loops=1) -> Seq Scan on symbol_data (cost=0.00..709994.20 rows=373904 width=129) (actual time=92483.55..93399.60 rows=4108 loops=1) Total runtime: 93431.50 msec set enable_seqscan = off; SET VARIABLE vpm=> explain analyze select * from symbol_data where symbol_name='I' order by date; NOTICE: QUERY PLAN: Sort (cost=1584564.49..1584564.49 rows=373904 width=129) (actual time=129.38..133.01 rows=4108 loops=1) -> Index Scan using symbol_data_pkey on symbol_data (cost=0.00..1482745.36 rows=373904 width=129) (actual time=21.54..105.46 rows=4108 loops=1) Total runtime: 137.55 msec Even though the optimizer thinks the index will cost more, it does pick it and use it with the performance expected when enable_seqscan = off; -Michael Tom Lane wrote: >"Michael G. Martin" <michael@vpmonline.com> writes: > >>I just ran a vacuum analyze with the specific column. Still get the >>same explain plan: >> > >Did the pg_stats data change noticeably? > >ANALYZE is a statistical sampling process in 7.2, so I'd expect the >results to move around somewhat each time you repeat it. But if it >changes a lot then we have a problem. > >You could also try > >ALTER TABLE symbol_data ALTER symbol_name SET STATISTICS n > >for larger values of n (10 is the default) and then re-ANALYZE >to see if the stats get any more accurate. The default of 10 >was more or less picked out of the air ... perhaps it's too small. > > regards, tom lane > >---------------------------(end of broadcast)--------------------------- >TIP 3: if posting/reading through Usenet, please send an appropriate >subscribe-nomail command to majordomo@postgresql.org so that your >message can get through to the mailing list cleanly >
Reinhard Max <max@suse.de> writes: > I've just found a case where forcing indexscans results in much higher > speed. > -> Index Scan using foo_pkey on foo > (cost=0.00..25153.18 rows=352072 width=4) > (actual time=0.03..157.57 rows=38432 loops=1) The major estimation error is evidently in this indexscan. What statistics does pg_stats show for this table? regards, tom lane
Good news. I looked through the code and after a little debugging found that the STATISTICS * 300 gives you the sample size of rows used to gather statistics. With the symbol_data table with 20million tuples and on this column with about 8000 unique values, i needed a very large sample size. Even with a STATISTICS of 500 ( 150,000 random rows) I still got a few symbols with a most_common_freqs of .01 or so. Bumping the STATISTICS to 1000 put the highest most_common_freqs at 0.00788667, so no seq scans now. Not too much of a time difference in the analyze either--at least not an impact. The only strange thing I see is still the estimated rows returned. The index is picked, so I don't know that it matters. Even though this query has 688 tuples, the explain thinks 17k+: Index Scan using symbol_data_pkey on symbol_data (cost=0.00..70648.22 rows=17700 width=129) -Michael > > Tom Lane wrote: > >>"Michael G. Martin" <michael@vpmonline.com> writes: >> >>>I just ran a vacuum analyze with the specific column. Still get the >>>same explain plan: >>> >> >>Did the pg_stats data change noticeably? >> >>ANALYZE is a statistical sampling process in 7.2, so I'd expect the >>results to move around somewhat each time you repeat it. But if it >>changes a lot then we have a problem. >> >>You could also try >> >>ALTER TABLE symbol_data ALTER symbol_name SET STATISTICS n >> >>for larger values of n (10 is the default) and then re-ANALYZE >>to see if the stats get any more accurate. The default of 10 >>was more or less picked out of the air ... perhaps it's too small. >> >> regards, tom lane >> >>---------------------------(end of broadcast)--------------------------- >>TIP 3: if posting/reading through Usenet, please send an appropriate >>subscribe-nomail command to majordomo@postgresql.org so that your >>message can get through to the mailing list cleanly >> >
I've had this happen on 2 seperate servers now. After reading the docs, I bumped up shared_buffers. On one machine with 2G pyhsical ram, I set the param to use 1G of memory ( 131072 value), on another machine with 800M of RAM, I set the value to about 500M ( 64000 ). ipcs shows the correct amounts allocated. Both servers run fine for a bit, then at some point, the entire box freezes. Pings work, but nothing else does, so a hard reboot is necessary. Any ideas. Any limits on what you can set these to. I thought these values would leave plenty for the other stuff to run on the server. Here is a top output before freezing: 9:14pm up 38 days, 12:47, 2 users, load average: 4.78, 5.12, 4.91 101 processes: 99 sleeping, 2 running, 0 zombie, 0 stopped CPU0 states: 24.1% user, 7.2% system, 0.0% nice, 68.1% idle CPU1 states: 28.0% user, 6.4% system, 0.0% nice, 64.4% idle Mem: 898892K av, 897300K used, 1592K free, 0K shrd, 0K buff Swap: 819272K av, 65792K used, 753480K free 805924K cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 3370 postgres 9 0 382M 382M 381M S 0.1 43.5 10:53 postmaster 32762 postgres 10 0 104M 104M 103M S 15.2 11.9 65:25 postmaster 1226 postgres 9 0 54372 53M 52852 S 0.0 6.0 0:08 postmaster 1334 postgres 9 0 47756 46M 46240 S 0.0 5.3 0:03 postmaster 1181 postgres 9 0 46184 45M 44592 S 0.0 5.1 0:12 postmaster 1227 postgres 9 0 39796 38M 38328 S 0.0 4.4 0:06 postmaster 1228 postgres 9 0 25072 24M 23580 S 0.0 2.7 0:05 postmaster 9082 postgres 10 0 16608 16M 15180 D 5.0 1.8 0:00 postmaster 9084 postgres 10 0 14700 14M 13316 S 4.6 1.6 0:00 postmaster 3244 postgres 9 0 13376 13M 12052 S 0.0 1.4 0:00 postmaster 32668 postgres 9 0 11488 11M 10224 S 0.0 1.2 0:02 postmaster 32669 postgres 9 0 11136 10M 9888 S 0.0 1.2 0:55 postmaster 9085 postgres 15 0 10820 10M 9520 S 2.5 1.1 0:00 postmaster 9087 postgres 18 0 10796 10M 9496 R 2.9 1.1 0:00 postmaster 9086 postgres 16 0 10696 10M 9400 S 2.3 1.1 0:00 postmaster Thanks, Michael
I read an earlier post by Tom where he recommends 1/4 of physical ram. I will go to 1/5 to be safe and I assume it will be ok. I'm guessing my 50% was probably overkill. --Michael Michael G. Martin wrote: > I've had this happen on 2 seperate servers now. > > After reading the docs, I bumped up shared_buffers. On one machine > with 2G pyhsical ram, I set the param to use 1G of memory ( 131072 > value), on another machine with 800M of RAM, I set the value to about > 500M ( 64000 ). ipcs shows the correct amounts allocated. > > Both servers run fine for a bit, then at some point, the entire box > freezes. Pings work, but nothing else does, so a hard reboot is > necessary. > > Any ideas. Any limits on what you can set these to. I thought these > values would leave plenty for the other stuff to run on the server. > > Here is a top output before freezing: > > 9:14pm up 38 days, 12:47, 2 users, load average: 4.78, 5.12, 4.91 > 101 processes: 99 sleeping, 2 running, 0 zombie, 0 stopped > CPU0 states: 24.1% user, 7.2% system, 0.0% nice, 68.1% idle > CPU1 states: 28.0% user, 6.4% system, 0.0% nice, 64.4% idle > Mem: 898892K av, 897300K used, 1592K free, 0K shrd, > 0K buff > Swap: 819272K av, 65792K used, 753480K free > 805924K cached > > PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND > 3370 postgres 9 0 382M 382M 381M S 0.1 43.5 10:53 postmaster > 32762 postgres 10 0 104M 104M 103M S 15.2 11.9 65:25 postmaster > 1226 postgres 9 0 54372 53M 52852 S 0.0 6.0 0:08 postmaster > 1334 postgres 9 0 47756 46M 46240 S 0.0 5.3 0:03 postmaster > 1181 postgres 9 0 46184 45M 44592 S 0.0 5.1 0:12 postmaster > 1227 postgres 9 0 39796 38M 38328 S 0.0 4.4 0:06 postmaster > 1228 postgres 9 0 25072 24M 23580 S 0.0 2.7 0:05 postmaster > 9082 postgres 10 0 16608 16M 15180 D 5.0 1.8 0:00 postmaster > 9084 postgres 10 0 14700 14M 13316 S 4.6 1.6 0:00 postmaster > 3244 postgres 9 0 13376 13M 12052 S 0.0 1.4 0:00 postmaster > 32668 postgres 9 0 11488 11M 10224 S 0.0 1.2 0:02 postmaster > 32669 postgres 9 0 11136 10M 9888 S 0.0 1.2 0:55 postmaster > 9085 postgres 15 0 10820 10M 9520 S 2.5 1.1 0:00 postmaster > 9087 postgres 18 0 10796 10M 9496 R 2.9 1.1 0:00 postmaster > 9086 postgres 16 0 10696 10M 9400 S 2.3 1.1 0:00 postmaster > > Thanks, > Michael > > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster