BUG #8620: SELECT on Materialized View Fails to Use Index - Mailing list pgsql-bugs

From jesse.denardo@myfarms.com
Subject BUG #8620: SELECT on Materialized View Fails to Use Index
Date
Msg-id E1VkE3v-0006jl-7n@wrigleys.postgresql.org
Whole thread Raw
Responses Re: BUG #8620: SELECT on Materialized View Fails to Use Index  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      8620
Logged by:          Jesse Denardo
Email address:      jesse.denardo@myfarms.com
PostgreSQL version: 9.3.1
Operating system:   Arch Linux x86_64 3.11.6-1-ARCH
Description:

I've got two seemingly identical servers running on two different machines.
Both are running Postgres 9.3.1, PostGIS 2.1.0, and both on Arch Linux.


On both servers I used this SQL script to create a materialized view and two
indexes on it:


CREATE MATERIALIZED VIEW gis.mat_land_boundaries AS (
    SELECT 'US.' || id AS id
        , ( SELECT id FROM i18n.countries WHERE code = 'US' ) AS
countries_id
        , ST_SetSRID(geom,4326) AS geom
    FROM gis.clu
    UNION ALL
    SELECT 'ZA.' || gid AS id
        , ( SELECT id FROM i18n.countries WHERE code = 'ZA' ) AS
countries_id
        , ST_SetSRID(geom,4326) AS geom
    FROM gis.za_clu
);


CREATE UNIQUE INDEX mat_land_boundaries_id_unique
  ON gis.mat_land_boundaries (id);


CREATE INDEX mat_land_boundaries_geom_gist
  ON gis.mat_land_boundaries USING gist (geom);


This view is several gigabytes large and holds 28 million rows of geometry
data. I then run the following query on both servers:


explain analyze SELECT "id",
ST_AsGeoJSON(ST_SimplifyPreserveTopology(geom,0.00003)) as geometry FROM
"gis"."mat_land_boundaries" WHERE "geom" IS NOT NULL AND
ST_Intersects(geom,ST_SetSRID(ST_GeomFromText('POLYGON((-84.98954772949219
40.327701904195926, -84.98954772949219 40.362961345186555,
-84.89994049072266 40.362961345186555, -84.89994049072266
40.327701904195926, -84.98954772949219 40.327701904195926))'),4326)) LIMIT
800;




On one server, the query returns in a couple hundred milliseconds because it
used the mat_land_boundaries_geom_gist index:


                                            QUERY PLAN




----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
-----------------
 Limit  (cost=0.42..1523.40 rows=116 width=536) (actual time=34.701..293.895
rows=
670 loops=1)
   ->  Index Scan using mat_land_boundaries_geom_gist on mat_land_boundaries
 (cos
t=0.42..1523.40 rows=116 width=536) (actual time=34.700..293.720 rows=670
loops=1)
         Index Cond: ((geom IS NOT NULL) AND (geom &&
'0103000020E6100000010000000
5000000000000C0543F55C0E1ADD022F2294440000000C0543F55C0EBA47184752E4440000000A0983
955C0EBA47184752E4440000000A0983955C0E1ADD022F2294440000000C0543F55C0E1ADD022F2294
440'::geometry))
         Filter: _st_intersects(geom,
'0103000020E61000000100000005000000000000C05
43F55C0E1ADD022F2294440000000C0543F55C0EBA47184752E4440000000A0983955C0EBA47184752
E4440000000A0983955C0E1ADD022F2294440000000C0543F55C0E1ADD022F2294440'::geometry)
         Rows Removed by Filter: 1
 Total runtime: 294.120 ms
(6 rows)




The other server takes well over a minute because it does not use the
index:



QUERY PL
AN






-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------
 Limit  (cost=0.00..4187.64 rows=800 width=64) (actual
time=23768.060..112201.963 r
ows=670 loops=1)
   ->  Seq Scan on mat_land_boundaries  (cost=0.00..9898135.26 rows=1890924
width=6
4) (actual time=23768.057..112201.846 rows=670 loops=1)
         Filter: ((geom IS NOT NULL) AND (geom &&
'0103000020E610000001000000050000
00000000C0543F55C0E1ADD022F2294440000000C0543F55C0EBA47184752E4440000000A0983955C0E
BA47184752E4440000000A0983955C0E1ADD022F2294440000000C0543F55C0E1ADD022F2294440'::g
eometry) AND _st_intersects(geom,
'0103000020E61000000100000005000000000000C0543F55
C0E1ADD022F2294440000000C0543F55C0EBA47184752E4440000000A0983955C0EBA47184752E44400
00000A0983955C0E1ADD022F2294440000000C0543F55C0E1ADD022F2294440'::geometry))
         Rows Removed by Filter: 28505727
 Total runtime: 112202.105 ms
(5 rows)




(Interestingly, the query returned one fewer row, even though both views
should contain identical data.) I've confirmed that the views were created
correctly, and the indexes were created. I tried dropping/recreating the
index and rebooting PostgreSQL and the entire box, but the behavior
persists. I've even ensured that the packages on both machines were all up
to date with identical versions. Any idea why Postgres would fail to use the
index on one machine but not the other, for identical queries?


# \d mat_land_boundaries
Materialized view "gis.mat_land_boundaries"
    Column    |   Type   | Modifiers
--------------+----------+-----------
 id           | text     |
 countries_id | integer  |
 geom         | geometry |
Indexes:
    "mat_land_boundaries_id_unique" UNIQUE, btree (id)
    "mat_land_boundaries_geom_gist" gist (geom)


# \di mat_land_boundaries_geom_gist
                                List of relations
 Schema |             Name              | Type  |  Owner   |        Table

--------+-------------------------------+-------+----------+---------------------
 gis    | mat_land_boundaries_geom_gist | index | postgres |
mat_land_boundaries
(1 row)

pgsql-bugs by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: pgdump not dumping my database
Next
From: "wanganshi00"
Date:
Subject: Could you give me some suopport ,thanks!