Thread: How to properly index hstore tags column to faster search for keys
I imported a large area of OpenStreetMap's planet.osm file into a postgresql database. The database contains a table called nodes. Each node has a geometry column called geom and a hstore column called tags. I need to extract nodes along a line that have certain keys in the tags column. To do that I use the following query:
SELECT id, tags
FROM nodes
WHERE ST_DWithin(nodes.geom, ST_MakeLine('{$geom1}', '{$geom2}'), 0.001)
AND tags ? '{$type}';
$geom1 and $geom2 are geometries for start and end points of my line.
The $type variable contains the key I want to search for. Now, it can have one of the following values: 'historic' or 'tourist'.
The query given above works but it is too slow. I guess searching for a key in tags column takes too much time. I read about GIN and GIST indexes and I generated a GIN index using the following query:
CREATE INDEX nodes_tags_idx ON nodes USING GIN(tags);
After creating the index I searched again for nodes using the same first query but there is no change in performance.
How can I properly use GIN and GIST to index tags column so I can faster search for nodes that have a certain key in tags column?
Thank you,
Radu-Stefan
On 07/07/13 08:28, Radu-Stefan Zugravu wrote: > Each node has a geometry column called geom and a hstore column > called tags. I need to extract nodes along a line that have certain > keys in the tags column. To do that I use the following query: > SELECT id, tags > FROM nodes > WHERE ST_DWithin(nodes.geom, ST_MakeLine('{$geom1}', '{$geom2}'), 0.001) > AND tags ? '{$type}'; > CREATE INDEX nodes_tags_idx ON nodes USING GIN(tags); > > After creating the index I searched again for nodes using the same first > query but there is no change in performance. > > How can I properly use GIN and GIST to index tags column so I can faster > search for nodes that have a certain key in tags column? Your index definition looks OK. Try showing the output of EXPLAIN ANALYSE for your query - that way we'll see if the index is being used. You can always paste explain output to: http://explain.depesz.com/ if it's too long for the email. -- Richard Huxton Archonet Ltd
Re: How to properly index hstore tags column to faster search for keys
On 07/07/13 08:28, Radu-Stefan Zugravu wrote:Each node has a geometry column called geom and a hstore column
called tags. I need to extract nodes along a line that have certain
keys in the tags column. To do that I use the following query:SELECT id, tags
FROM nodes
WHERE ST_DWithin(nodes.geom, ST_MakeLine('{$geom1}', '{$geom2}'), 0.001)
AND tags ? '{$type}';Your index definition looks OK. Try showing the output of EXPLAIN ANALYSE for your query - that way we'll see if the index is being used. You can always paste explain output to: http://explain.depesz.com/ if it's too long for the email.CREATE INDEX nodes_tags_idx ON nodes USING GIN(tags);
After creating the index I searched again for nodes using the same first
query but there is no change in performance.
How can I properly use GIN and GIST to index tags column so I can faster
search for nodes that have a certain key in tags column?
--
Richard Huxton
Archonet Ltd
0760 903 464
raduzugravu90@gmail.com
radu.zugravu@yahoo.com
On 08/07/13 09:31, Radu-Stefan Zugravu wrote: > Hi, > Thank you for your answer. > My EXPLAIN ANALYZE output can be found here: > http://explain.depesz.com/s/Wbo. Thanks > Also, there is a discution on this subject on dba.stackexchange.com > <http://dba.stackexchange.com>: > http://dba.stackexchange.com/questions/45820/how-to-properly-index-hstore-tags-column-to-faster-search-for-keys Thanks - also useful to know. I can't see anything wrong with your query. Reading it from the bottom upwards: 1. Index used for "historic" search - builds a bitmap of blocks 2. Index used for geometry search - builds a bitmap of blocks 3. See where the bitmaps overlap (BitmapAnd) 4. Grab those disk blocks and find the rows (Bitmap Heap Scan) The whole thing takes under 20ms - what sort of time were you hoping for? The bulk of it (15ms) is taken up locating the "historic" rows. There are 36351 of those, but presumably most of them are far away on the map. Could you post the explain without the index? I'm curious as to how slow it is just testing the tags after doing the geometry search. -- Richard Huxton Archonet Ltd
Re: How to properly index hstore tags column to faster search for keys
On 08/07/13 09:31, Radu-Stefan Zugravu wrote:ThanksHi,
Thank you for your answer.
My EXPLAIN ANALYZE output can be found here:
http://explain.depesz.com/s/Wbo.Also, there is a discution on this subject on dba.stackexchange.com<http://dba.stackexchange.com>:
http://dba.stackexchange.com/questions/45820/how-to-properly-index-hstore-tags-column-to-faster-search-for-keys
Thanks - also useful to know.
I can't see anything wrong with your query. Reading it from the bottom upwards:
1. Index used for "historic" search - builds a bitmap of blocks
2. Index used for geometry search - builds a bitmap of blocks
3. See where the bitmaps overlap (BitmapAnd)
4. Grab those disk blocks and find the rows (Bitmap Heap Scan)
The whole thing takes under 20ms - what sort of time were you hoping for?
The bulk of it (15ms) is taken up locating the "historic" rows. There are 36351 of those, but presumably most of them are far away on the map.
Could you post the explain without the index? I'm curious as to how slow it is just testing the tags after doing the geometry search.
--
Richard Huxton
Archonet Ltd
0760 903 464
raduzugravu90@gmail.com
radu.zugravu@yahoo.com
On 08/07/13 10:20, Radu-Stefan Zugravu wrote: > Any improvement is welcomed. The overall performance of the application > is not very good. It takes about 200 seconds to compute a path for not > so far star and end points. So you have to call this query 1000 times with different start and end points? > I want to improve this query as much as I can. There's only two ways I can see to get this much below 20ms. This will only work if you want a very restricted range of tags. Drop the tag index and create multiple geometry indexes instead: CREATE INDEX node_geo_tourist_idx <index details> WHERE tags ? 'tourist'; CREATE INDEX node_geo_tourist_idx <index details> WHERE tags ? 'history'; etc. This will only work if you have a literal WHERE clause that checks the tag. It should be fast though. The second way would be to delete all the nodes that aren't tagged tourist or history. That assumes you are never interested in them of course. > How exactly should I post the explain without the index? Do I have to > drop all created indexes for the tags column? It takes some time to > create them back. Not important - I was just curious. -- Richard Huxton Archonet Ltd
Re: How to properly index hstore tags column to faster search for keys
CREATE INDEX nodes_tags_btree_historic_idx on nodes USING BTREE ((tags ? 'historic'));
On 08/07/13 10:20, Radu-Stefan Zugravu wrote:So you have to call this query 1000 times with different start and end points?Any improvement is welcomed. The overall performance of the application
is not very good. It takes about 200 seconds to compute a path for not
so far star and end points.There's only two ways I can see to get this much below 20ms. This will only work if you want a very restricted range of tags.
> I want to improve this query as much as I can.
Drop the tag index and create multiple geometry indexes instead:
CREATE INDEX node_geo_tourist_idx <index details> WHERE tags ? 'tourist';
CREATE INDEX node_geo_tourist_idx <index details> WHERE tags ? 'history';
etc.
This will only work if you have a literal WHERE clause that checks the tag. It should be fast though.
The second way would be to delete all the nodes that aren't tagged tourist or history. That assumes you are never interested in them of course.Not important - I was just curious.How exactly should I post the explain without the index? Do I have to
drop all created indexes for the tags column? It takes some time to
create them back.
--
Richard Huxton
Archonet Ltd
0760 903 464
raduzugravu90@gmail.com
radu.zugravu@yahoo.com
From: Radu-Stefan Zugravu <raduzugravu90@gmail.com>
To: Richard Huxton <dev@archonet.com>
Cc: pgsql-performance@postgresql.org
Sent: Monday, July 8, 2013 2:01 PM
Subject: Re: [PERFORM] How to properly index hstore tags column to faster search for keys
CREATE INDEX nodes_tags_btree_historic_idx on nodes USING BTREE ((tags ? 'historic'));
On 08/07/13 10:20, Radu-Stefan Zugravu wrote:So you have to call this query 1000 times with different start and end points?Any improvement is welcomed. The overall performance of the application
is not very good. It takes about 200 seconds to compute a path for not
so far star and end points.There's only two ways I can see to get this much below 20ms. This will only work if you want a very restricted range of tags.
> I want to improve this query as much as I can.
Drop the tag index and create multiple geometry indexes instead:
CREATE INDEX node_geo_tourist_idx <index details> WHERE tags ? 'tourist';
CREATE INDEX node_geo_tourist_idx <index details> WHERE tags ? 'history';
etc.
This will only work if you have a literal WHERE clause that checks the tag. It should be fast though.
The second way would be to delete all the nodes that aren't tagged tourist or history. That assumes you are never interested in them of course.Not important - I was just curious.How exactly should I post the explain without the index? Do I have to
drop all created indexes for the tags column? It takes some time to
create them back.
--
Richard Huxton
Archonet Ltd
0760 903 464
raduzugravu90@gmail.com
radu.zugravu@yahoo.com
Dear Radu-Stefan,
It seems to me that you trying hard to solve a problem by SQL that probably can't be solved. Take a look please on Apache HBase. You can access HBase from PostgreSQL as well by utilizing Java or Python for example.
Sincerely yours,
Yuri Levinsky, DBA
Celltick Technologies Ltd., 32 Maskit St., Herzliya 46733, Israel
Mobile: +972 54 6107703, Office: +972 9 9710239; Fax: +972 9 9710222
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Radu-Stefan Zugravu
Sent: Monday, July 08, 2013 12:20 PM
To: Richard Huxton
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] How to properly index hstore tags column to faster search for keys
Any improvement is welcomed. The overall performance of the application is not very good. It takes about 200 seconds to compute a path for not so far star and end points. I want to improve this query as much as I can.
How exactly should I post the explain without the index? Do I have to drop all created indexes for the tags column? It takes some time to create them back.
On Mon, Jul 8, 2013 at 11:53 AM, Richard Huxton <dev@archonet.com> wrote:
On 08/07/13 09:31, Radu-Stefan Zugravu wrote:
Hi,
Thank you for your answer.
My EXPLAIN ANALYZE output can be found here:
http://explain.depesz.com/s/Wbo.
Thanks
Also, there is a discution on this subject on dba.stackexchange.com
<http://dba.stackexchange.com>:
http://dba.stackexchange.com/questions/45820/how-to-properly-index-hstore-tags-column-to-faster-search-for-keys
Thanks - also useful to know.
I can't see anything wrong with your query. Reading it from the bottom upwards:
1. Index used for "historic" search - builds a bitmap of blocks
2. Index used for geometry search - builds a bitmap of blocks
3. See where the bitmaps overlap (BitmapAnd)
4. Grab those disk blocks and find the rows (Bitmap Heap Scan)
The whole thing takes under 20ms - what sort of time were you hoping for?
The bulk of it (15ms) is taken up locating the "historic" rows. There are 36351 of those, but presumably most of them are far away on the map.
Could you post the explain without the index? I'm curious as to how slow it is just testing the tags after doing the geometry search.
--
Richard Huxton
Archonet Ltd
--
Radu-Stefan Zugravu
0755 950 145
0760 903 464
raduzugravu90@gmail.com
radu.zugravu@yahoo.com
This mail was received via Mail-SeCure System.
Attachment
Dear Radu-Stefan,
It seems to me that you trying hard to solve a problem by SQL that probably can't be solved. Take a look please on Apache HBase. You can access HBase from PostgreSQL as well by utilizing Java or Python for example.
Sincerely yours,
Yuri Levinsky, DBA
Celltick Technologies Ltd., 32 Maskit St., Herzliya 46733, Israel
Mobile: +972 54 6107703, Office: +972 9 9710239; Fax: +972 9 9710222
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Radu-Stefan Zugravu
Sent: Monday, July 08, 2013 12:20 PM
To: Richard Huxton
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] How to properly index hstore tags column to faster search for keys
Any improvement is welcomed. The overall performance of the application is not very good. It takes about 200 seconds to compute a path for not so far star and end points. I want to improve this query as much as I can.
How exactly should I post the explain without the index? Do I have to drop all created indexes for the tags column? It takes some time to create them back.
On Mon, Jul 8, 2013 at 11:53 AM, Richard Huxton <dev@archonet.com> wrote:
On 08/07/13 09:31, Radu-Stefan Zugravu wrote:
Hi,
Thank you for your answer.
My EXPLAIN ANALYZE output can be found here:
http://explain.depesz.com/s/Wbo.
Thanks
Also, there is a discution on this subject on dba.stackexchange.com
<http://dba.stackexchange.com>:
http://dba.stackexchange.com/questions/45820/how-to-properly-index-hstore-tags-column-to-faster-search-for-keys
Thanks - also useful to know.
I can't see anything wrong with your query. Reading it from the bottom upwards:
1. Index used for "historic" search - builds a bitmap of blocks
2. Index used for geometry search - builds a bitmap of blocks
3. See where the bitmaps overlap (BitmapAnd)
4. Grab those disk blocks and find the rows (Bitmap Heap Scan)
The whole thing takes under 20ms - what sort of time were you hoping for?
The bulk of it (15ms) is taken up locating the "historic" rows. There are 36351 of those, but presumably most of them are far away on the map.
Could you post the explain without the index? I'm curious as to how slow it is just testing the tags after doing the geometry search.
--
Richard Huxton
Archonet Ltd
--
Radu-Stefan Zugravu
0755 950 145
0760 903 464
raduzugravu90@gmail.com
radu.zugravu@yahoo.com
This mail was received via Mail-SeCure System.