Re: Optimize date query for large child tables: GiST or GIN? - Mailing list pgsql-performance

From David Jarvis
Subject Re: Optimize date query for large child tables: GiST or GIN?
Date
Msg-id AANLkTimpwoKQw2sXOl4n2lNOzZz_YpBYLKIkOhVgaTD4@mail.gmail.com
Whole thread Raw
In response to Re: Optimize date query for large child tables: GiST or GIN?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Optimize date query for large child tables: GiST or GIN?  (Matthew Wakeling <matthew@flymine.org>)
List pgsql-performance
I took out the date conditions:

SELECT
  m.*
FROM
  climate.measurement m
WHERE
  m.category_id = 1 and
  m.station_id = 2043

This uses the station indexes:

"Result  (cost=0.00..21781.18 rows=8090 width=28)"
"  ->  Append  (cost=0.00..21781.18 rows=8090 width=28)"
"        ->  Seq Scan on measurement m  (cost=0.00..28.00 rows=1 width=38)"
"              Filter: ((category_id = 1) AND (station_id = 2043))"
"        ->  Bitmap Heap Scan on measurement_01_001 m  (cost=11.79..1815.67 rows=677 width=28)"
"              Recheck Cond: (station_id = 2043)"
"              Filter: (category_id = 1)"
"              ->  Bitmap Index Scan on measurement_01_001_s_idx  (cost=0.00..11.62 rows=677 width=0)"
"                    Index Cond: (station_id = 2043)"
"        ->  Bitmap Heap Scan on measurement_02_001 m  (cost=14.47..1682.18 rows=627 width=28)"
"              Recheck Cond: (station_id = 2043)"
"              Filter: (category_id = 1)"
"              ->  Bitmap Index Scan on measurement_02_001_s_idx  (cost=0.00..14.32 rows=627 width=0)"
"                    Index Cond: (station_id = 2043)"

2500+ rows in 185 milliseconds.

That is pretty good (I'll need it to be better but for now it works).

Then combined the selection of the station:

SELECT
  m.*
FROM
  climate.measurement m,
  (SELECT
     s.id
   FROM
     climate.station s,
     climate.city c
   WHERE
     c.id = 5182 AND
     s.elevation BETWEEN 0 AND 3000 AND
     6371.009 * SQRT(
       POW(RADIANS(c.latitude_decimal - s.latitude_decimal), 2) +
       (COS(RADIANS(c.latitude_decimal + s.latitude_decimal) / 2) *
        POW(RADIANS(c.longitude_decimal - s.longitude_decimal), 2))
     ) <= 25
   ) t
WHERE
  m.category_id = 1 and
  m.station_id = t.id

The station index is no longer used, resulting in full table scans:

"Hash Join  (cost=1045.52..1341150.09 rows=14556695 width=28)"
"  Hash Cond: (m.station_id = s.id)"
"  ->  Append  (cost=0.00..867011.99 rows=43670085 width=28)"
"        ->  Seq Scan on measurement m  (cost=0.00..25.00 rows=6 width=38)"
"              Filter: (category_id = 1)"
"        ->  Seq Scan on measurement_01_001 m  (cost=0.00..71086.96 rows=3580637 width=28)"
"              Filter: (category_id = 1)"
"        ->  Seq Scan on measurement_02_001 m  (cost=0.00..64877.40 rows=3267872 width=28)"
"              Filter: (category_id = 1)"
"        ->  Seq Scan on measurement_03_001 m  (cost=0.00..71131.44 rows=3582915 width=28)"
"              Filter: (category_id = 1)"

How do I avoid the FTS?

(I know about PostGIS but I can only learn and do so much at once.) ;-)

Here's the station query:

SELECT
  s.id
FROM
  climate.station s,
  climate.city c
WHERE
  c.id = 5182 AND
  s.elevation BETWEEN 0 AND 3000 AND
  6371.009 * SQRT(
    POW(RADIANS(c.latitude_decimal - s.latitude_decimal), 2) +
    (COS(RADIANS(c.latitude_decimal + s.latitude_decimal) / 2) *
    POW(RADIANS(c.longitude_decimal - s.longitude_decimal), 2))
  ) <= 25

And its EXPLAIN:

"Nested Loop  (cost=0.00..994.94 rows=4046 width=4)"
"  Join Filter: ((6371.009::double precision * sqrt((pow(radians(((c.latitude_decimal - s.latitude_decimal))::double precision), 2::double precision) + (cos((radians(((c.latitude_decimal + s.latitude_decimal))::double precision) / 2::double precision)) * pow(radians(((c.longitude_decimal - s.longitude_decimal))::double precision), 2::double precision))))) <= 25::double precision)"
"  ->  Index Scan using city_pkey1 on city c  (cost=0.00..6.27 rows=1 width=16)"
"        Index Cond: (id = 5182)"
"  ->  Seq Scan on station s  (cost=0.00..321.08 rows=12138 width=20)"
"        Filter: ((s.elevation >= 0) AND (s.elevation <= 3000))"

I get a set of 78 rows returned in very little time.

Thanks again!
Dave

pgsql-performance by date:

Previous
From: David Jarvis
Date:
Subject: Re: Optimize date query for large child tables: GiST or GIN?
Next
From: Matthew Wakeling
Date:
Subject: Re: Optimize date query for large child tables: GiST or GIN?