using LIMIT only on primary table - Mailing list pgsql-sql

From Dan Langille
Subject using LIMIT only on primary table
Date
Msg-id 3D42D7AA.27447.3EE190A0@localhost
Whole thread Raw
List pgsql-sql
This is an extension of the problem solved by 
http://archives.postgresql.org/pgsql-sql/2002-03/msg00020.php but 
with a slightly different complication.

I want to get the last 100 port commits from the database.  Commits 
are stored in 
commit_log and commit_log_ports relates commits to ports.  A given 
commit may 
affect more than one port (i.e. there is a 1-N relationship between 
commit_log and 
commit_log_ports).

This gives me the last 100 commits:
 SELECT commit_log.*   FROM commit_log
ORDER BY commit_date DESC, id  LIMIT 100;

So a starting point for the last 100 port commits is:

explain analyze SELECT distinct commit_log.*   FROM commit_log_ports, commit_log  WHERE commit_log.id =
commit_log_ports.commit_log_id
ORDER BY commit_log.commit_date DESC, commit_log_ports.commit_log_id  LIMIT 100;

But has a very high cost:

Limit  (cost=11275.92..11283.42 rows=100 width=55) (actual 
time=5769.07..5771.92 rows=100 loops=1) ->  Unique  (cost=11275.92..11643.73 rows=4904 width=55) (actual 
time=5769.05..5770.93 rows=101 loops=1)       ->  Sort  (cost=11275.92..11275.92 rows=49042 width=55) 
(actual time=5769.04..5769.68 rows=112 loops=1)             ->  Hash Join  (cost=3478.15..6387.22 rows=49042 
width=55) (actual time=1263.69..4319.53 rows=49042 loops=1)                   ->  Seq Scan on commit_log_ports  
(cost=0.00..825.42 rows=49042 width=4) (actual time=0.12..346.32 
rows=49042 loops=1)                   ->  Hash  (cost=1749.51..1749.51 rows=42951 
width=51) (actual time=1074.15..1074.15 rows=0 loops=1)                         ->  Seq Scan on commit_log  
(cost=0.00..1749.51 rows=42951 width=51) (actual time=0.14..396.99 
rows=42953 loops=1)
Total runtime: 6158.10 msec

I'd like to get that time down.  I suspect it is high this is because 
the entire table 
is being joined, then the limit occurs.   When I add "commit_log.id = 
1" to the WHERE clause, 
the query plan changes to:

Limit  (cost=30.42..30.43 rows=1 width=55) (actual time=0.96..0.99 
rows=1 loops=1) ->  Unique  (cost=30.42..30.43 rows=1 width=55) (actual 
time=0.95..0.97 rows=1 loops=1)       ->  Sort  (cost=30.42..30.42 rows=1 width=55) (actual 
time=0.94..0.94 rows=1 loops=1)             ->  Nested Loop  (cost=0.00..30.41 rows=1 width=55) 
(actual time=0.61..0.64 rows=1 loops=1)                   ->  Index Scan using commit_log_pkey on 
commit_log  (cost=0.00..5.93 rows=1 width=51) (actual time=0.31..0.32 
rows=1 loops=1)                   ->  Index Scan using commit_log_ports_cli on 
commit_log_ports  (cost=0.00..24.41 rows=6 width=4) (actual 
time=0.25..0.27 rows=1 loops=1)

I think that confirms that my indexes and PK/FK are set correctly.

I could do something like:
 SELECT distinct commit_log_id    FROM commit_log_ports 
ORDER BY commit_log_id desc   LIMIT 100;

But that doesn't take into consider the fact that commits can be 
added out of order
and are not necessarily added in commit_date order.

Any clues please?

thanks

-- 
Dan Langille
And yes, I'm looking for a computer job:
http://www.freebsddiary.org/dan_langille.php



pgsql-sql by date:

Previous
From: Roberto Mello
Date:
Subject: Re: RES: [GENERAL] set DateStyle to 'SQL'
Next
From: Markus Bertheau
Date:
Subject: performance difference in count(1) vs. count(*)?