Thread:
I run the following command three times to prevent cache/disk results. [...] dps=> explain analyze SELECT rec_id FROM url WHERE crc32!=0 AND crc32=764518963 AND status IN (200,304,206) ORDER BY rec_id LIMIT 1; QUERY PLAN ------------------------------------------------------------------------ ----------------------------------------------------- Limit (cost=173.14..173.14 rows=1 width=4) (actual time=0.357..0.358 rows=1 loops=1) -> Sort (cost=173.14..173.22 rows=32 width=4) (actual time=0.354..0.354 rows=1 loops=1) Sort Key: rec_id -> Index Scan using url_crc on url (cost=0.00..172.34 rows=32 width=4) (actual time=0.039..0.271 rows=50 loops=1) Index Cond: (crc32 = 764518963) Filter: ((crc32 <> 0) AND ((status = 200) OR (status = 304) OR (status = 206))) Total runtime: 0.410 ms (7 rows) dps=> explain analyze SELECT rec_id FROM url WHERE crc32!=0 AND crc32=764518963 AND status IN (200,304,206) ORDER BY crc32,rec_id LIMIT 1; QUERY PLAN ------------------------------------------------------------------------ ----------------------------------------------------- Limit (cost=173.14..173.14 rows=1 width=8) (actual time=0.378..0.378 rows=1 loops=1) -> Sort (cost=173.14..173.22 rows=32 width=8) (actual time=0.375..0.375 rows=1 loops=1) Sort Key: crc32, rec_id -> Index Scan using url_crc on url (cost=0.00..172.34 rows=32 width=8) (actual time=0.038..0.278 rows=50 loops=1) Index Cond: (crc32 = 764518963) Filter: ((crc32 <> 0) AND ((status = 200) OR (status = 304) OR (status = 206))) Total runtime: 0.432 ms (7 rows) dps=> explain analyze SELECT rec_id FROM url WHERE crc32!=0 AND crc32=419903683 AND status IN (200,304,206) ORDER BY rec_id LIMIT 1; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------ Limit (cost=0.00..37.03 rows=1 width=4) (actual time=156.712..156.713 rows=1 loops=1) -> Index Scan using url_pkey on url (cost=0.00..14996.82 rows=405 width=4) (actual time=156.707..156.707 rows=1 loops=1) Filter: ((crc32 <> 0) AND (crc32 = 419903683) AND ((status = 200) OR (status = 304) OR (status = 206))) Total runtime: 156.769 ms (4 rows) dps=> explain analyze SELECT rec_id FROM url WHERE crc32!=0 AND crc32=419903683 AND status IN (200,304,206) ORDER BY crc32,rec_id LIMIT 1; QUERY PLAN ------------------------------------------------------------------------ -------------------------------------------------------- Limit (cost=1910.14..1910.14 rows=1 width=8) (actual time=4.558..4.559 rows=1 loops=1) -> Sort (cost=1910.14..1911.15 rows=405 width=8) (actual time=4.555..4.555 rows=1 loops=1) Sort Key: crc32, rec_id -> Index Scan using url_crc on url (cost=0.00..1892.60 rows=405 width=8) (actual time=0.042..2.935 rows=719 loops=1) Index Cond: (crc32 = 419903683) Filter: ((crc32 <> 0) AND ((status = 200) OR (status = 304) OR (status = 206))) Total runtime: 4.636 ms (7 rows) The value 764518963 is not common, it appears 50 times in the table. The value 419903683 is the third most common value of the table url. dps=> select u.crc32, count(*) from url u group by u.crc32 order by count(*) desc; crc32 | count -------------+------ 0 | 82202 -946427862 | 10545 419903683 | 719 945866756 | 670 [...] How to setup pgsql to correctly select the good index for index scan ? I run Pgsql 7.4.x The database runs under pg_autovacuum daemon. And a VACUUM FULL VERBOSE ANALYZE was done 10 hours before. Cordialement, Jean-Gérard Pailloncy
=?ISO-8859-1?Q?Pailloncy_Jean-G=E9rard?= <pailloncy@ifrance.com> writes: > I run the following command three times to prevent cache/disk results. Do you think that's actually representative of how your database will behave under load? If the DB is small enough to be completely cached in RAM, and you expect it to remain so, then it's sensible to optimize on the basis of fully-cached test cases. Otherwise I think you are optimizing the wrong thing. If you do want to plan on this basis, you want to set random_page_cost to 1, make sure effective_cache_size is large, and perhaps increase the cpu_xxx cost numbers. (What you're essentially doing here is reducing the estimated cost of a page fetch relative to CPU effort.) regards, tom lane
Hi, I apologize for the mistake. So, I dump the database, I reload it then VACUUM ANALYZE. For each statement: I then quit postgres, start it, execute one command, then quit. Le 14 avr. 04, à 14:39, Pailloncy Jean-Gérard a écrit : dps=# explain analyze SELECT rec_id FROM url WHERE crc32!=0 AND crc32=764518963 AND status IN (200,304,206) ORDER BY rec_id LIMIT 1; QUERY PLAN ------------------------------------------------------------------------ -------------------------------------------------------- Limit (cost=169.79..169.79 rows=1 width=4) (actual time=502.397..502.398 rows=1 loops=1) -> Sort (cost=169.79..169.86 rows=30 width=4) (actual time=502.393..502.393 rows=1 loops=1) Sort Key: rec_id -> Index Scan using url_crc on url (cost=0.00..169.05 rows=30 width=4) (actual time=43.545..490.895 rows=56 loops=1) Index Cond: (crc32 = 764518963) Filter: ((crc32 <> 0) AND ((status = 200) OR (status = 304) OR (status = 206))) Total runtime: 502.520 ms (7 rows) dps=# \q dps=# explain analyze SELECT rec_id FROM url WHERE crc32!=0 AND crc32=764518963 AND status IN (200,304,206) ORDER BY crc32,rec_id LIMIT 1; QUERY PLAN ------------------------------------------------------------------------ ----------------------------------------------------- Limit (cost=169.79..169.79 rows=1 width=8) (actual time=5.893..5.894 rows=1 loops=1) -> Sort (cost=169.79..169.86 rows=30 width=8) (actual time=5.889..5.889 rows=1 loops=1) Sort Key: crc32, rec_id -> Index Scan using url_crc on url (cost=0.00..169.05 rows=30 width=8) (actual time=0.445..5.430 rows=56 loops=1) Index Cond: (crc32 = 764518963) Filter: ((crc32 <> 0) AND ((status = 200) OR (status = 304) OR (status = 206))) Total runtime: 6.020 ms (7 rows) dps=# \q dps=# explain analyze SELECT rec_id FROM url WHERE crc32!=0 AND crc32=419903683 AND status IN (200,304,206) ORDER BY rec_id LIMIT 1; QUERY PLAN ------------------------------------------------------------------------ ---------------------------------------------------------- Limit (cost=0.00..27.95 rows=1 width=4) (actual time=11021.875..11021.876 rows=1 loops=1) -> Index Scan using url_pkey on url (cost=0.00..11625.49 rows=416 width=4) (actual time=11021.868..11021.868 rows=1 loops=1) Filter: ((crc32 <> 0) AND (crc32 = 419903683) AND ((status = 200) OR (status = 304) OR (status = 206))) Total runtime: 11021.986 ms (4 rows) dps=# \q dps=# explain analyze SELECT rec_id FROM url WHERE crc32!=0 AND crc32=419903683 AND status IN (200,304,206) ORDER BY crc32,rec_id LIMIT 1; QUERY PLAN ------------------------------------------------------------------------ --------------------------------------------------------- Limit (cost=2000.41..2000.41 rows=1 width=8) (actual time=48.503..48.504 rows=1 loops=1) -> Sort (cost=2000.41..2001.45 rows=416 width=8) (actual time=48.499..48.499 rows=1 loops=1) Sort Key: crc32, rec_id -> Index Scan using url_crc on url (cost=0.00..1982.31 rows=416 width=8) (actual time=4.848..45.452 rows=796 loops=1) Index Cond: (crc32 = 419903683) Filter: ((crc32 <> 0) AND ((status = 200) OR (status = 304) OR (status = 206))) Total runtime: 48.656 ms (7 rows) dps=# \q So, with all fresh data, everything rebuild from scratch, on a backend that will done one and only one query, the results is strange. Why adding an ORDER BY clause on a column with one value speed up the stuff 502ms to 6ms ? Why when crc32=419903683, which is one of the most often used value in the table, the query planner chose a plan so bad (225 times slower) ? Cordialement, Jean-Gérard Pailloncy
=?ISO-8859-1?Q?Pailloncy_Jean-G=E9rard?= <pailloncy@ifrance.com> writes: > dps=# explain analyze SELECT rec_id FROM url WHERE crc32!=0 AND > crc32=419903683 AND status IN (200,304,206) ORDER BY rec_id LIMIT 1; > QUERY PLAN > ------------------------------------------------------------------------ > ---------------------------------------------------------- > Limit (cost=0.00..27.95 rows=1 width=4) (actual > time=11021.875..11021.876 rows=1 loops=1) > -> Index Scan using url_pkey on url (cost=0.00..11625.49 rows=416 > width=4) (actual time=11021.868..11021.868 rows=1 loops=1) > Filter: ((crc32 <> 0) AND (crc32 = 419903683) AND ((status = > 200) OR (status = 304) OR (status = 206))) > Total runtime: 11021.986 ms > (4 rows) > dps=# \q The planner is guessing that scanning in rec_id order will produce a matching row fairly quickly (sooner than selecting all the matching rows and sorting them would do). It's wrong in this case, but I'm not sure it could do better without very detailed cross-column statistics. Am I right to guess that the rows that match the WHERE clause are not evenly distributed in the rec_id order, but rather there are no such rows till you get well up in the ordering? regards, tom lane