Re: Help with a seq scan on multi-million row table - Mailing list pgsql-sql
From | |
---|---|
Subject | Re: Help with a seq scan on multi-million row table |
Date | |
Msg-id | 20060511034711.46266.qmail@web50305.mail.yahoo.com Whole thread Raw |
In response to | Re: Help with a seq scan on multi-million row table (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Help with a seq scan on multi-million row table
|
List | pgsql-sql |
Not sure if I'm showing you what you asked for, but here it is: select * from pg_stats where tablename='user_url_tag' and attname='user_url_id';schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation ------------+--------------+-------------+-----------+-----------+------------+------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-------------public | user_url_tag | user_url_id | 0 | 4 | 60825 | {458321,1485346,16304,68027,125417,153465,182503,201175,202973,218423}| {0.00133333,0.001,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667}| {195,195993,325311,480323,647778,782598,1014527,1201726,1424822,1614712,1853719}| 0.795521 You asked if the table has been analyzed recently. I think so - I run ANALYZE on the whole DB every night, like this: $ psql -U me -c "ANALYZE;" mydb For a good measure, I just analyzed the table now: $ psql -U me -c "ANALYZE user_url_tag;" mydb Then I set the enable_hashjoin back to ON and re-run the EXPLAIN ANALYZE. I still get the sequential scan, even after analyzing the table :( I'm not sure which numbers you are referring to when you said the estimate is off, but here are some numbers: The whole tablehas 6-7 M rows. That query matches about 2500 rows. If there are other things I can play with and help narrow this down, please let me know. Thanks, Otis ----- Original Message ---- From: Tom Lane <tgl@sss.pgh.pa.us> To: ogjunk-pgjedan@yahoo.com Cc: pgsql-sql@postgresql.org Sent: Wednesday, May 10, 2006 9:53:49 PM Subject: Re: [SQL] Help with a seq scan on multi-million row table <ogjunk-pgjedan@yahoo.com> writes: > Aha! set hashjoin=off did the trick. > -> Index Scan using ix_user_url_tag_user_url_id on user_url_tag userurltag0_ (cost=0.00..157.34 rows=103width=14) (actual time=1.223..1.281 rows=5 loops=1666) > Index Cond: (userurltag0_.user_url_id = "outer".id) This seems to be the problem right here: the estimate of matching rows is off by a factor of 20, and that inflates the overall cost estimate for this plan about the same, causing the planner to think the other way is cheaper. What does the pg_stats row for user_url_tag.user_url_id contain? Have you analyzed that table recently? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend