Any ideas how can I speed up this query? - Mailing list pgsql-performance

From Priyank Tiwari
Subject Any ideas how can I speed up this query?
Date
Msg-id CADbPz3G+rUaf3gDm5QQ97=38E2HNd619ZJ=9v61uY24p3P0xUA@mail.gmail.com
Whole thread Raw
Responses Re: Any ideas how can I speed up this query?
Re: Any ideas how can I speed up this query?
List pgsql-performance

Hi,

I have following table definition with 6209888 rows in it. It stores the occurrences of species in various regions.

TABLE DEFINITION

    Column    |          Type          |                        Modifiers                         

--------------+------------------------+----------------------------------------------------------

 id           | integer                | not null default nextval('occurrences_id_seq'::regclass)

 gbifid       | integer                | not null

 sname        | character varying(512) | 

 cname        | character varying(512) | 

 species      | character varying(512) | 

 location     | geometry               | not null

 month        | integer                | 

 year         | integer                | 

 event_date   | date                   | 

 dataset_key  | character varying(512) | 

 taxon_key    | character varying(512) | 

 taxon_rank   | character varying(512) | 

 record_basis | character varying(512) | 

 category_id  | integer                | 

 country      | character varying(512) | 

 lat          | double precision       | 

 lng          | double precision       | 

Indexes:

    "occurrences_pkey" PRIMARY KEY, btree (id)

    "unique_occurrences_gbifid" UNIQUE, btree (gbifid)

    "index_occurences_taxon_key" btree (taxon_key)

    "index_occurrences_category_id" btree (category_id)

    "index_occurrences_cname" btree (cname)

    "index_occurrences_country" btree (country)

    "index_occurrences_lat" btree (lat)

    "index_occurrences_lng" btree (lng)

    "index_occurrences_month" btree (month)

    "index_occurrences_sname" btree (sname)

    "occurrence_location_gix" gist (location)


I am trying to fetch the count of number of occurrences within a certain region. I save the location of each occurrence as a geometric field as well as lat, lng combination. Both fields are indexed. The query that is issued is as follows.

QUERY

SELECT COUNT(*) FROM "occurrences" WHERE ("lat" >= -27.91550355958 AND "lat" <= -27.015680440420002 AND "lng" >= 152.13307044728307 AND "lng" <= 153.03137355271693 AND "category_id" = 1 AND (ST_Intersects( ST_Buffer(ST_PointFromText('POINT(152.582222 -27.465592)')::geography, 50000)::geography, location::geography)));

The problem is it takes more than acceptable time to execute the query. Below is the explain analyze output for the same query.

EXPLAIN ANALYZE QUERY OUTPUT  (http://explain.depesz.com/s/p2a)

Aggregate  (cost=127736.06..127736.07 rows=1 width=0) (actual time=13491.678..13491.679 rows=1 loops=1)

   Buffers: shared hit=3 read=56025

   ->  Bitmap Heap Scan on occurrences  (cost=28249.46..127731.08 rows=1995 width=0) (actual time=528.053..13388.458 rows=167511 loops=1)

         Recheck Cond: ((lat >= (-27.91550355958)::double precision) AND (lat <= (-27.01568044042)::double precision) AND (lng >= 152.133070447283::double precision) AND (lng <= 153.031373552717::double precision))

         Rows Removed by Index Recheck: 748669

         Filter: ((category_id = 1) AND ('0103000020E6100000010000002100000090D8AD28D32263403905504558773BC0CADDAF0384226340E7AD43F4E38D3BC0B559D93A98216340B7BE554692A33BC0C904C18C18206340DF8EA9338DB73BC052F75181131E6340A1D9E30E0FC93BC00BDCB5E39C1B6340A1A40E496AD73BC074D30D03CD1863405DD7BF5110E23BC05DD3A2C0BF156340439B784797E83BC078E9287593126340EF9E5C37BEEA3BC072EB40B9670F63409E25A1BA6FE83BC06964481F5C0C6340B331B5D2C2E13BC08F6785ED8E0963409979FBF9F9D63BC0135DB3E71B0763402F78807480C83BC0E321E8351B0563405E96CB00E6B63BC0672CD874A00363403BC84B1BD9A23BC018FAE8F8B90263400314CD15208D3BC039DE8C4A70026340653B324F91763BC0F5BA5CDFC502634086322DDE0A603BC0E90E8A10B7036340C5FA1C046A4A3BC01ECCC14C3A056340CE4011BC82363BC022F38481400763407655DBB517253BC05B3B5AB6B50963404C9AA306D3163BC079EF01D3810C634010A732D03F0C3BC05152F188890F634044CE5D16C5053BC0EDDABA57AF126340926E14EFA1033BC08D7E9CA3D41563401EFB9E2DEB053BC071A929D5DA186340A3F1F29C8A0C3BC049A7E478A41B63404F8BD2CF3F173BC04B409855161E634057CC1080A2253BC0F9C65E70182063404BEB146926373BC0349D83F596216340449EEA7C204B3BC07803D7FD82226340A16F1747CD603BC090D8AD28D32263403905504558773BC0'::geography && (location)::geography) AND (_st_distance('0103000020E6100000010000002100000090D8AD28D32263403905504558773BC0CADDAF0384226340E7AD43F4E38D3BC0B559D93A98216340B7BE554692A33BC0C904C18C18206340DF8EA9338DB73BC052F75181131E6340A1D9E30E0FC93BC00BDCB5E39C1B6340A1A40E496AD73BC074D30D03CD1863405DD7BF5110E23BC05DD3A2C0BF156340439B784797E83BC078E9287593126340EF9E5C37BEEA3BC072EB40B9670F63409E25A1BA6FE83BC06964481F5C0C6340B331B5D2C2E13BC08F6785ED8E0963409979FBF9F9D63BC0135DB3E71B0763402F78807480C83BC0E321E8351B0563405E96CB00E6B63BC0672CD874A00363403BC84B1BD9A23BC018FAE8F8B90263400314CD15208D3BC039DE8C4A70026340653B324F91763BC0F5BA5CDFC502634086322DDE0A603BC0E90E8A10B7036340C5FA1C046A4A3BC01ECCC14C3A056340CE4011BC82363BC022F38481400763407655DBB517253BC05B3B5AB6B50963404C9AA306D3163BC079EF01D3810C634010A732D03F0C3BC05152F188890F634044CE5D16C5053BC0EDDABA57AF126340926E14EFA1033BC08D7E9CA3D41563401EFB9E2DEB053BC071A929D5DA186340A3F1F29C8A0C3BC049A7E478A41B63404F8BD2CF3F173BC04B409855161E634057CC1080A2253BC0F9C65E70182063404BEB146926373BC0349D83F596216340449EEA7C204B3BC07803D7FD82226340A16F1747CD603BC090D8AD28D32263403905504558773BC0'::geography, (location)::geography, 0::double precision, false) < 1e-05::double precision))

         Rows Removed by Filter: 6357

         Heap Blocks: exact=29947 lossy=22601

         Buffers: shared hit=3 read=56025

         ->  BitmapAnd  (cost=28249.46..28249.46 rows=32476 width=0) (actual time=519.091..519.091 rows=0 loops=1)

               Buffers: shared read=3477

               ->  Bitmap Index Scan on index_occurrences_lat  (cost=0.00..11691.20 rows=365877 width=0) (actual time=218.999..218.999 rows=392415 loops=1)

                     Index Cond: ((lat >= (-27.91550355958)::double precision) AND (lat <= (-27.01568044042)::double precision))

                     Buffers: shared read=1444

               ->  Bitmap Index Scan on index_occurrences_lng  (cost=0.00..16557.01 rows=517658 width=0) (actual time=285.211..285.211 rows=550523 loops=1)

                     Index Cond: ((lng >= 152.133070447283::double precision) AND (lng <= 153.031373552717::double precision))

                     Buffers: shared read=2033

 Planning time: 2.812 ms

 Execution time: 13493.617 ms

(19 rows)


It seems that the planner is underestimating the number of rows returned in Bitmap Heap Scan on occurrences. I have run vacuum analyze on this table couple of times, but it still produces the same result. Any idea how I can speed up this query? How I can assist planner in providing better row estimates for Bitmap Heap Scan section?

POSTGRESQL VERSION INFO

                                               version                                                

------------------------------------------------------------------------------------------------------

 PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit


HARDWARE

I am running the Postgresql instance on a digital ocean vm with 1 core, SSD disk and 1 GB of ram.


Appreciate your help.

Thanks,
Priyank

pgsql-performance by date:

Previous
From: Jeff Janes
Date:
Subject: Re: Are many idle connections bad?
Next
From: "Graeme B. Bell"
Date:
Subject: Re: Any ideas how can I speed up this query?