using LIMIT only on primary table - Mailing list pgsql-sql
From | Dan Langille |
---|---|
Subject | using LIMIT only on primary table |
Date | |
Msg-id | 3D42D7AA.27447.3EE190A0@localhost Whole thread Raw |
List | pgsql-sql |
This is an extension of the problem solved by http://archives.postgresql.org/pgsql-sql/2002-03/msg00020.php but with a slightly different complication. I want to get the last 100 port commits from the database. Commits are stored in commit_log and commit_log_ports relates commits to ports. A given commit may affect more than one port (i.e. there is a 1-N relationship between commit_log and commit_log_ports). This gives me the last 100 commits: SELECT commit_log.* FROM commit_log ORDER BY commit_date DESC, id LIMIT 100; So a starting point for the last 100 port commits is: explain analyze SELECT distinct commit_log.* FROM commit_log_ports, commit_log WHERE commit_log.id = commit_log_ports.commit_log_id ORDER BY commit_log.commit_date DESC, commit_log_ports.commit_log_id LIMIT 100; But has a very high cost: Limit (cost=11275.92..11283.42 rows=100 width=55) (actual time=5769.07..5771.92 rows=100 loops=1) -> Unique (cost=11275.92..11643.73 rows=4904 width=55) (actual time=5769.05..5770.93 rows=101 loops=1) -> Sort (cost=11275.92..11275.92 rows=49042 width=55) (actual time=5769.04..5769.68 rows=112 loops=1) -> Hash Join (cost=3478.15..6387.22 rows=49042 width=55) (actual time=1263.69..4319.53 rows=49042 loops=1) -> Seq Scan on commit_log_ports (cost=0.00..825.42 rows=49042 width=4) (actual time=0.12..346.32 rows=49042 loops=1) -> Hash (cost=1749.51..1749.51 rows=42951 width=51) (actual time=1074.15..1074.15 rows=0 loops=1) -> Seq Scan on commit_log (cost=0.00..1749.51 rows=42951 width=51) (actual time=0.14..396.99 rows=42953 loops=1) Total runtime: 6158.10 msec I'd like to get that time down. I suspect it is high this is because the entire table is being joined, then the limit occurs. When I add "commit_log.id = 1" to the WHERE clause, the query plan changes to: Limit (cost=30.42..30.43 rows=1 width=55) (actual time=0.96..0.99 rows=1 loops=1) -> Unique (cost=30.42..30.43 rows=1 width=55) (actual time=0.95..0.97 rows=1 loops=1) -> Sort (cost=30.42..30.42 rows=1 width=55) (actual time=0.94..0.94 rows=1 loops=1) -> Nested Loop (cost=0.00..30.41 rows=1 width=55) (actual time=0.61..0.64 rows=1 loops=1) -> Index Scan using commit_log_pkey on commit_log (cost=0.00..5.93 rows=1 width=51) (actual time=0.31..0.32 rows=1 loops=1) -> Index Scan using commit_log_ports_cli on commit_log_ports (cost=0.00..24.41 rows=6 width=4) (actual time=0.25..0.27 rows=1 loops=1) I think that confirms that my indexes and PK/FK are set correctly. I could do something like: SELECT distinct commit_log_id FROM commit_log_ports ORDER BY commit_log_id desc LIMIT 100; But that doesn't take into consider the fact that commits can be added out of order and are not necessarily added in commit_date order. Any clues please? thanks -- Dan Langille And yes, I'm looking for a computer job: http://www.freebsddiary.org/dan_langille.php