Thread: Query does not use index
Hi, I hope this message won't be sent several times to the list. If so, please accept my apologies. It would be very nice, if someone could have a look at the query I'm trying to optimize. At the moment, I don't understand PostgreSQL's behaviour and are stuck. Thanks a lot in advance. I am using PostgreSQL 7.4.1 In my database are the following tables: Table "public.token" Column | Type | Modifiers ----------+------------------------+----------- text_id | integer | not null position | integer | not null word | character varying(255) | Indexes: "token_pkey" primary key, btree (text_id, "position") "word_idx" btree (word) and Table "public.s" Column | Type | Modifiers ---------+---------+----------- text_id | integer | not null s | integer | not null start | integer | stop | integer | Indexes: "s_pkey" primary key, btree (text_id, s) "s_begin_idx" btree (text_id, "start") "s_end_idx" btree (text_id, stop) Foreign-key constraints: "$1" FOREIGN KEY (text_id, "start") REFERENCES token(text_id, "position") "$2" FOREIGN KEY (text_id, stop) REFERENCES token(text_id, "position") (I hope it's readable) I have the following query: select * from token, s where token.word = 'FACTSHEET' and s.text_id = token.text_id and s.start = token.position and PostgreSQL generates the following query plan QUERY PLAN ------------------------------------------------------------------------ -------------------------- Nested Loop (cost=0.00..39120.95 rows=1 width=32) -> Index Scan using s_pkey on s (cost=0.00..24698.44 rows=3367 width=16) -> Index Scan using token_pkey on token (cost=0.00..4.27 rows=1 width=16) Index Cond: (("outer".text_id = token.text_id) AND ("outer"."start" = token."position")) Filter: ((word)::text = 'FACTSHEET'::text) and the execution takes ages. Now, 'FACTSHEET' is *very* seldom (5 of 23 million) and I intended it to first search for 'FACTSHEET' and then use index s_begin_idx (and the values of token.text_id and token.position) to derive the corresponding records in s. I tried to force PostgreSQL in doing it in this order by trying the following: select * from (select text_id, position from token where word = 'FACTSHEET') t left join s on (s.text_id = t.text_id and s.start = t.position ) QUERY PLAN ------------------------------------------------------------------------ ----------------------- Hash Left Join (cost=24715.28..56630.78 rows=7892 width=24) Hash Cond: (("outer".text_id = "inner".text_id) AND ("outer"."position" = "inner"."start")) -> Index Scan using word_idx on token (cost=0.00..31402.51 rows=7892 width=8) Index Cond: ((word)::text = 'FACTSHEET'::text) -> Hash (cost=24698.44..24698.44 rows=3367 width=16) -> Index Scan using s_pkey on s (cost=0.00..24698.44 rows=3367 width=16) Now it *does* first search for 'FACTSHEET' but it still does not use s_begin_idx and I have no idea why. Any ideas, what I could do? Thanks, Martin.
Martin Hampl <Martin.Hampl@gmx.de> writes: > It would be very nice, if someone could have a look at the query I'm > trying to optimize. At the moment, I don't understand PostgreSQL's > behaviour and are stuck. Thanks a lot in advance. Did you ANALYZE these tables? Also, please post EXPLAIN ANALYZE not just EXPLAIN when complaining about bad plans. Since the essence of your complaint is that the planner's estimates are wrong, showing us only estimates and not reality makes it hard to give constructive suggestions ... regards, tom lane
Hi, Am 30.04.2004 um 01:32 schrieb Tom Lane: > Martin Hampl <Martin.Hampl@gmx.de> writes: >> It would be very nice, if someone could have a look at the query I'm >> trying to optimize. At the moment, I don't understand PostgreSQL's >> behaviour and are stuck. Thanks a lot in advance. > > Did you ANALYZE these tables? I did. > Also, please post EXPLAIN ANALYZE not > just EXPLAIN when complaining about bad plans. Since the essence of > your complaint is that the planner's estimates are wrong, showing us > only estimates and not reality makes it hard to give constructive > suggestions ... OK. bnc23Mio=# EXPLAIN ANALYZE select * from bnc23Mio-# token, bnc23Mio-# s bnc23Mio-# where bnc23Mio-# token.word = 'FACTSHEET' and bnc23Mio-# s.text_id = token.text_id and bnc23Mio-# s.start = token.position; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------ Nested Loop (cost=0.00..39120.95 rows=1 width=32) (actual time=102.263..692248.553 rows=3 loops=1) -> Index Scan using s_pkey on s (cost=0.00..24698.44 rows=3367 width=16) (actual time=75.933..589743.642 rows=1111220 loops=1) -> Index Scan using token_pkey on token (cost=0.00..4.27 rows=1 width=16) (actual time=0.086..0.086 rows=0 loops=1111220) Index Cond: (("outer".text_id = token.text_id) AND ("outer"."start" = token."position")) Filter: ((word)::text = 'FACTSHEET'::text) Total runtime: 692249.314 ms bnc23Mio=# EXPLAIN ANALYZE select * from (select text_id, position from token where word = 'FACTSHEET') t left join s on (s.text_id = t.text_id and s.start = t.position ); QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------- Hash Left Join (cost=24715.28..56630.78 rows=7892 width=24) (actual time=255329.976..255355.967 rows=5 loops=1) Hash Cond: (("outer".text_id = "inner".text_id) AND ("outer"."position" = "inner"."start")) -> Index Scan using word_idx on token (cost=0.00..31402.51 rows=7892 width=8) (actual time=91.010..109.394 rows=5 loops=1) Index Cond: ((word)::text = 'FACTSHEET'::text) -> Hash (cost=24698.44..24698.44 rows=3367 width=16) (actual time=255236.914..255236.914 rows=0 loops=1) -> Index Scan using s_pkey on s (cost=0.00..24698.44 rows=3367 width=16) (actual time=105.100..247798.661 rows=1111220 loops=1) Total runtime: 255502.736 ms Maybe that *is* what i wanted it to do? However, searching just for 'FACTSHEET' is very quick (I rebooted before this query to clear any cache---is there a better way to do this?): bnc23Mio=# EXPLAIN ANALYZE select * from token where word = 'FACTSHEET'; QUERY PLAN ------------------------------------------------------------------------ ---------------------------------------------------- Index Scan using word_idx on token (cost=0.00..31402.51 rows=7892 width=16) (actual time=102.350..125.032 rows=5 loops=1) Index Cond: ((word)::text = 'FACTSHEET'::text) Total runtime: 125.289 ms and I would have thought that the results of this query could have been used to search for the respective records in s (using on of the indexes)? Regards, Martin.
Martin Hampl <Martin.Hampl@gmx.de> writes: > Am 30.04.2004 um 01:32 schrieb Tom Lane: >> Did you ANALYZE these tables? > I did. Hm. I'm wondering why the row estimates for 's' are off by several orders of magnitude: > -> Index Scan using s_pkey on s (cost=0.00..24698.44 rows=3367 > width=16) (actual time=75.933..589743.642 rows=1111220 loops=1) It could be that this table has a lot of empty pages near the front, which is a condition that's known to lead to underestimated row count from ANALYZE. (Manfred is working on a better ANALYZE sampling method that should avoid such errors in future.) Try doing a straight VACUUM and see if the row count estimate gets better. If so, it might be worth the trouble to do a VACUUM FULL to get rid of the empty space. (And you should also think about doing routine vacuums more often, and perhaps increasing the FSM settings, to ensure you don't get back into this state.) The other thing I'm wondering about is why the devil it's choosing an indexscan at all, when it has no indexscan conditions to use. Are you perhaps forcing that choice via "enable_seqscan = false"? If so, don't. > Maybe that *is* what i wanted it to do? However, searching just for > 'FACTSHEET' is very quick (I rebooted before this query to clear any > cache---is there a better way to do this?): > and I would have thought that the results of this query could have been > used to search for the respective records in s (using on of the > indexes)? Undoubtedly it did consider that plan, but rejected it because it looked more expensive than the alternatives. This is not too surprising given the overestimate of the number of rows matching 'FACTSHEET' (7892 vs reality of 5). You might need to increase the statistics target for token.word (see ALTER TABLE SET STATISTICS) to give the planner more data to work with about the distribution of words. regards, tom lane
Am 30.04.2004 um 17:52 schrieb Tom Lane: > It could be that this table has a lot of empty pages near the front, > which is a condition that's known to lead to underestimated row count > from ANALYZE. (Manfred is working on a better ANALYZE sampling method > that should avoid such errors in future.) Try doing a straight VACUUM > and see if the row count estimate gets better. It did! Thanks! Explain analyze now yields the following result: Explain analyze now QUERY PLAN ------------------------------------------------------------------------ ---------------------------------------------------------- Nested Loop (cost=0.00..70650.10 rows=12 width=32) (actual time=170.449..181.663 rows=3 loops=1) -> Index Scan using word_idx on token (cost=0.00..30403.79 rows=7621 width=16) (actual time=158.237..179.105 rows=5 loops=1) Index Cond: ((word)::text = 'FACTSHEET'::text) -> Index Scan using s_begin_idx on s (cost=0.00..5.27 rows=1 width=16) (actual time=0.429..0.434 rows=1 loops=5) Index Cond: ((s.text_id = "outer".text_id) AND (s."start" = "outer"."position")) Total runtime: 182.207 ms I never deleted anything from this database, but it might be that I dropped some of the tables a couple of times when populating the database. Also 'copy from' failed a couple of times. Can that have the same effect? Thanks again, Martin
Martin Hampl <Martin.Hampl@gmx.de> writes: > Am 30.04.2004 um 17:52 schrieb Tom Lane: >> It could be that this table has a lot of empty pages near the front, > I never deleted anything from this database, but it might be that I > dropped some of the tables a couple of times when populating the > database. Also 'copy from' failed a couple of times. Can that have the > same effect? A failed copy-in would leave a dead row for each input line that it was able to process before hitting the error. So potentially that could account for a lot of dead rows. I think there is a hint on the COPY reference page suggesting that you VACUUM in such a situation ... regards, tom lane
Am 01.05.2004 um 17:48 schrieb Tom Lane: > Martin Hampl <Martin.Hampl@gmx.de> writes: >> Am 30.04.2004 um 17:52 schrieb Tom Lane: >>> It could be that this table has a lot of empty pages near the front, > >> I never deleted anything from this database, but it might be that I >> dropped some of the tables a couple of times when populating the >> database. Also 'copy from' failed a couple of times. Can that have the >> same effect? > > A failed copy-in would leave a dead row for each input line that it was > able to process before hitting the error. So potentially that could > account for a lot of dead rows. I think there is a hint on the COPY > reference page suggesting that you VACUUM in such a situation ... There is. Well... Regards, Martin
Hi, I would like to set up an SSL test environment to further develop my frontend-backend implementation and add SSL support to it. Trouble is, I don't know much about SSL (and command shells)--I'm on OS X and come from a traditional Mac culture. I've been searching the docs and archives and it's all very cryptic to me. I guess this is the best hint to what I'm looking for: http://archives.postgresql.org/pgsql-docs/2002-09/msg00038.php But I'm not sure on how to interpret this and it looks like a step somewhere in the middle and not the whole procedure. Can anyone direct me to some good entry level documentation or go through the trouble of explaining me the whole thing if it's not too much--I bet it is though? Thanks, Marc
I found a not too high level explanation here: http://www.gtlib.cc.gatech.edu/pub/linux/docs/HOWTO/other-formats/html_single/SSL-Certificates-HOWTO.html Now what I'd really like to find next is a OpenSSL GUI for OS X. Does anyone know of one? I checked versiontracker, google, Apple, etc. without luck so far. I hear there's one that comes with OS X Server. Marc