On Wed, 2014-06-18 at 18:08 -0600, Rob Sargent wrote:
On 06/18/2014 05:47 PM, Jason Long wrote:
I have a large table of access logs to an application.
I want is to find all rows that overlap startdate and enddate with any
other rows.
The query below seems to work, but does not finish unless I specify a
single id.
select distinct a1.id
from t_access a1, t_access a2
where tstzrange(a1.startdate, a1.enddate) && tstzrange(a2.startdate, a2.enddate)
I'm sure you're best bet is a windowing function, but your descriptions suggests there is no index on start/end date columns. Probably want those in any event.
There are indexs on startdate and enddate.
If I specify a known a1.id=1234 then the query returns all records that overlap it, but this takes 1.7 seconds.
There are about 2 million records in the table.
I will see what I come up with on the window function.
If anyone else has some suggestions let me know.
I get with for EXPLAIN ANALYZE the id specified.
Nested Loop (cost=0.43..107950.50 rows=8825 width=84) (actual time=2803.932..2804.558 rows=11 loops=1)
Join Filter: (tstzrange(a1.startdate, a1.enddate) && tstzrange(a2.startdate, a2.enddate))
Rows Removed by Join Filter: 1767741
-> Index Scan using t_access_pkey on t_access a1 (cost=0.43..8.45 rows=1 width=24) (actual time=0.016..0.019 rows=1 loops=1)
Index Cond: (id = 1928761)
-> Seq Scan on t_access a2 (cost=0.00..77056.22 rows=1764905 width=60) (actual time=0.006..1200.657 rows=1767752 loops=1)
Filter: (enddate IS NOT NULL)
Rows Removed by Filter: 159270
Total runtime: 2804.599 ms
and for EXPLAIN without the id specified. EXPLAIN ANALYZE will not complete without the id specified.
Nested Loop (cost=0.00..87949681448.20 rows=17005053815 width=84)
Join Filter: (tstzrange(a1.startdate, a1.enddate) && tstzrange(a2.startdate, a2.enddate))
-> Seq Scan on t_access a2 (cost=0.00..77056.22 rows=1764905 width=60)
Filter: (enddate IS NOT NULL)
-> Materialize (cost=0.00..97983.33 rows=1927022 width=24)
-> Seq Scan on t_access a1 (cost=0.00..77056.22 rows=1927022 width=24)