Re: Overlapping ranges - Mailing list pgsql-general

From Jason Long
Subject Re: Overlapping ranges
Date
Msg-id 1403138745.32369.5.camel@localhost.localdomain
Whole thread Raw
In response to Re: Overlapping ranges  (Rob Sargent <robjsargent@gmail.com>)
List pgsql-general
On Wed, 2014-06-18 at 18:08 -0600, Rob Sargent wrote:
On 06/18/2014 05:47 PM, Jason Long wrote:

I have a large table of access logs to an application.  

I want is to find all rows that overlap startdate and enddate with any
other rows.

The query below seems to work, but does not finish unless I specify a
single id.  

select distinct a1.id
from t_access a1,        t_access a2 
where tstzrange(a1.startdate, a1.enddate) &&      tstzrange(a2.startdate, a2.enddate) 




I'm sure you're best bet is a windowing function, but your descriptions suggests there is no index on start/end date columns.  Probably want those in any event.

There are indexs on startdate and enddate.
If I specify a known a1.id=1234 then the query returns all records that overlap it, but this takes 1.7 seconds.

There are about 2 million records in the table.

I will see what I come up with on the window function.

If anyone else has some suggestions let me know.

I get with for EXPLAIN ANALYZE the id specified.

Nested Loop  (cost=0.43..107950.50 rows=8825 width=84) (actual time=2803.932..2804.558 rows=11 loops=1)
   Join Filter: (tstzrange(a1.startdate, a1.enddate) && tstzrange(a2.startdate, a2.enddate))
   Rows Removed by Join Filter: 1767741
   ->  Index Scan using t_access_pkey on t_access a1  (cost=0.43..8.45 rows=1 width=24) (actual time=0.016..0.019 rows=1 loops=1)
         Index Cond: (id = 1928761)
   ->  Seq Scan on t_access a2  (cost=0.00..77056.22 rows=1764905 width=60) (actual time=0.006..1200.657 rows=1767752 loops=1)
         Filter: (enddate IS NOT NULL)
         Rows Removed by Filter: 159270
Total runtime: 2804.599 ms


and for EXPLAIN without the id specified.  EXPLAIN ANALYZE will not complete without the id specified.

Nested Loop  (cost=0.00..87949681448.20 rows=17005053815 width=84)
   Join Filter: (tstzrange(a1.startdate, a1.enddate) && tstzrange(a2.startdate, a2.enddate))
   ->  Seq Scan on t_access a2  (cost=0.00..77056.22 rows=1764905 width=60)
         Filter: (enddate IS NOT NULL)
   ->  Materialize  (cost=0.00..97983.33 rows=1927022 width=24)
         ->  Seq Scan on t_access a1  (cost=0.00..77056.22 rows=1927022 width=24)

pgsql-general by date:

Previous
From: Rob Sargent
Date:
Subject: Re: Overlapping ranges
Next
From: Edson Richter
Date:
Subject: Global value/global variable?