Thread: How to properly index hstore tags column to faster search for keys

How to properly index hstore tags column to faster search for keys

From
Radu-Stefan Zugravu
Date:

I imported a large area of OpenStreetMap's planet.osm file into a postgresql database. The database contains a table called nodes. Each node has a geometry column called geom and a hstore column called tags. I need to extract nodes along a line that have certain keys in the tags column. To do that I use the following query:

SELECT id, tags  
FROM nodes  
WHERE ST_DWithin(nodes.geom, ST_MakeLine('{$geom1}', '{$geom2}'), 0.001)  
AND tags ? '{$type}';

$geom1 and $geom2 are geometries for start and end points of my line.
The $type variable contains the key I want to search for. Now, it can have one of the following values: 'historic' or 'tourist'.

The query given above works but it is too slow. I guess searching for a key in tags column takes too much time. I read about GIN and GIST indexes and I generated a GIN index using the following query:

CREATE INDEX nodes_tags_idx ON nodes USING GIN(tags);

After creating the index I searched again for nodes using the same first query but there is no change in performance.

How can I properly use GIN and GIST to index tags column so I can faster search for nodes that have a certain key in tags column?

Thank you,

Radu-Stefan

Re: How to properly index hstore tags column to faster search for keys

From
Richard Huxton
Date:
On 07/07/13 08:28, Radu-Stefan Zugravu wrote:
> Each node has a geometry column called geom and a hstore column
> called tags. I need to extract nodes along a line that have certain
> keys in the tags column. To do that I use the following query:

> SELECT id, tags
> FROM nodes
> WHERE ST_DWithin(nodes.geom, ST_MakeLine('{$geom1}', '{$geom2}'), 0.001)
> AND tags ? '{$type}';

> CREATE  INDEX  nodes_tags_idx ON nodes USING GIN(tags);
>
> After creating the index I searched again for nodes using the same first
> query but there is no change in performance.
>
> How can I properly use GIN and GIST to index tags column so I can faster
> search for nodes that have a certain key in tags column?

Your index definition looks OK. Try showing the output of EXPLAIN
ANALYSE for your query - that way we'll see if the index is being used.
You can always paste explain output to: http://explain.depesz.com/ if
it's too long for the email.

--
   Richard Huxton
   Archonet Ltd


Re: How to properly index hstore tags column to faster search for keys

From
Radu-Stefan Zugravu
Date:
Hi,
Thank you for your answer.
My EXPLAIN ANALYZE output can be found here: http://explain.depesz.com/s/Wbo.


On Mon, Jul 8, 2013 at 10:44 AM, Richard Huxton <dev@archonet.com> wrote:
On 07/07/13 08:28, Radu-Stefan Zugravu wrote:
Each node has a geometry column called geom and a hstore column
called tags. I need to extract nodes along a line that have certain
keys in the tags column. To do that I use the following query:

SELECT id, tags
FROM nodes
WHERE ST_DWithin(nodes.geom, ST_MakeLine('{$geom1}', '{$geom2}'), 0.001)
AND tags ? '{$type}';

CREATE  INDEX  nodes_tags_idx ON nodes USING GIN(tags);

After creating the index I searched again for nodes using the same first
query but there is no change in performance.

How can I properly use GIN and GIST to index tags column so I can faster
search for nodes that have a certain key in tags column?

Your index definition looks OK. Try showing the output of EXPLAIN ANALYSE for your query - that way we'll see if the index is being used. You can always paste explain output to: http://explain.depesz.com/ if it's too long for the email.

--
  Richard Huxton
  Archonet Ltd



--
Radu-Stefan Zugravu
0755 950 145
0760 903 464
raduzugravu90@gmail.com
radu.zugravu@yahoo.com

Re: How to properly index hstore tags column to faster search for keys

From
Richard Huxton
Date:
On 08/07/13 09:31, Radu-Stefan Zugravu wrote:
> Hi,
> Thank you for your answer.
> My EXPLAIN ANALYZE output can be found here:
> http://explain.depesz.com/s/Wbo.

Thanks

> Also, there is a discution on this subject on dba.stackexchange.com
> <http://dba.stackexchange.com>:
> http://dba.stackexchange.com/questions/45820/how-to-properly-index-hstore-tags-column-to-faster-search-for-keys

Thanks - also useful to know.

I can't see anything wrong with your query. Reading it from the bottom
upwards:
1. Index used for "historic" search - builds a bitmap of blocks
2. Index used for geometry search - builds a bitmap of blocks
3. See where the bitmaps overlap (BitmapAnd)
4. Grab those disk blocks and find the rows (Bitmap Heap Scan)

The whole thing takes under 20ms - what sort of time were you hoping for?

The bulk of it (15ms) is taken up locating the "historic" rows. There
are 36351 of those, but presumably most of them are far away on the map.

Could you post the explain without the index? I'm curious as to how slow
it is just testing the tags after doing the geometry search.

--
   Richard Huxton
   Archonet Ltd


Re: How to properly index hstore tags column to faster search for keys

From
Radu-Stefan Zugravu
Date:
Any improvement is welcomed. The overall performance of the application is not very good. It takes about 200 seconds to compute a path for not so far star and end points. I want to improve this query as much as I can.
How exactly should I post the explain without the index? Do I have to drop all created indexes for the tags column? It takes some time to create them back.


On Mon, Jul 8, 2013 at 11:53 AM, Richard Huxton <dev@archonet.com> wrote:
On 08/07/13 09:31, Radu-Stefan Zugravu wrote:
Hi,
Thank you for your answer.
My EXPLAIN ANALYZE output can be found here:
http://explain.depesz.com/s/Wbo.

Thanks

Also, there is a discution on this subject on dba.stackexchange.com
<http://dba.stackexchange.com>:
http://dba.stackexchange.com/questions/45820/how-to-properly-index-hstore-tags-column-to-faster-search-for-keys

Thanks - also useful to know.

I can't see anything wrong with your query. Reading it from the bottom upwards:
1. Index used for "historic" search - builds a bitmap of blocks
2. Index used for geometry search - builds a bitmap of blocks
3. See where the bitmaps overlap (BitmapAnd)
4. Grab those disk blocks and find the rows (Bitmap Heap Scan)

The whole thing takes under 20ms - what sort of time were you hoping for?

The bulk of it (15ms) is taken up locating the "historic" rows. There are 36351 of those, but presumably most of them are far away on the map.

Could you post the explain without the index? I'm curious as to how slow it is just testing the tags after doing the geometry search.


--
  Richard Huxton
  Archonet Ltd



--
Radu-Stefan Zugravu
0755 950 145
0760 903 464
raduzugravu90@gmail.com
radu.zugravu@yahoo.com

Re: How to properly index hstore tags column to faster search for keys

From
Richard Huxton
Date:
On 08/07/13 10:20, Radu-Stefan Zugravu wrote:
> Any improvement is welcomed. The overall performance of the application
> is not very good. It takes about 200 seconds to compute a path for not
> so far star and end points.

So you have to call this query 1000 times with different start and end
points?

 > I want to improve this query as much as I can.

There's only two ways I can see to get this much below 20ms. This will
only work if you want a very restricted range of tags.

Drop the tag index and create multiple geometry indexes instead:

CREATE INDEX node_geo_tourist_idx <index details> WHERE tags ? 'tourist';
CREATE INDEX node_geo_tourist_idx <index details> WHERE tags ? 'history';
etc.

This will only work if you have a literal WHERE clause that checks the
tag. It should be fast though.


The second way would be to delete all the nodes that aren't tagged
tourist or history. That assumes you are never interested in them of course.

> How exactly should I post the explain without the index? Do I have to
> drop all created indexes for the tags column? It takes some time to
> create them back.

Not important - I was just curious.

--
   Richard Huxton
   Archonet Ltd


Re: How to properly index hstore tags column to faster search for keys

From
Radu-Stefan Zugravu
Date:
I do call the query for each neighbour node to find which one is better in building my path.
I think I will try the first way you mentioned. I also found some references using BTREE indexes:

CREATE INDEX nodes_tags_btree_historic_idx on nodes USING BTREE ((tags ? 'historic'));
CREATE INDEX nodes_tags_btree_tourist_idx on nodes USING BTREE ((tags ? 'tourist));

Do you think this could make a difference?


On Mon, Jul 8, 2013 at 1:27 PM, Richard Huxton <dev@archonet.com> wrote:
On 08/07/13 10:20, Radu-Stefan Zugravu wrote:
Any improvement is welcomed. The overall performance of the application
is not very good. It takes about 200 seconds to compute a path for not
so far star and end points.

So you have to call this query 1000 times with different start and end points?


> I want to improve this query as much as I can.

There's only two ways I can see to get this much below 20ms. This will only work if you want a very restricted range of tags.

Drop the tag index and create multiple geometry indexes instead:

CREATE INDEX node_geo_tourist_idx <index details> WHERE tags ? 'tourist';
CREATE INDEX node_geo_tourist_idx <index details> WHERE tags ? 'history';
etc.

This will only work if you have a literal WHERE clause that checks the tag. It should be fast though.


The second way would be to delete all the nodes that aren't tagged tourist or history. That assumes you are never interested in them of course.


How exactly should I post the explain without the index? Do I have to
drop all created indexes for the tags column? It takes some time to
create them back.

Not important - I was just curious.


--
  Richard Huxton
  Archonet Ltd



--
Radu-Stefan Zugravu
0755 950 145
0760 903 464
raduzugravu90@gmail.com
radu.zugravu@yahoo.com

Re: How to properly index hstore tags column to faster search for keys

From
idc danny
Date:
Hi Stefan
1 - If you have a fixed data that does not change a lot, like I assume is your fixed 'map' try implementing in your app the hashtrie method. This looks as better approach as your query is quite fast. Usually I am starting to query my queries (or the query planner) when they start to take more the 2 seconds. The fact that you continuously call it for your next node it might not be the best approach.
2 - As mentioned by Richard, try either to delete the nodes that does not belong to "historic" / "tourist" or simply split the table in 2. One that have only them and the rest to the other table. Assuming this will not change a lot the other already implemented queries in your app (because you'll have to make a 1-to-1 now) it might save your day.
Danny


From: Radu-Stefan Zugravu <raduzugravu90@gmail.com>
To: Richard Huxton <dev@archonet.com>
Cc: pgsql-performance@postgresql.org
Sent: Monday, July 8, 2013 2:01 PM
Subject: Re: [PERFORM] How to properly index hstore tags column to faster search for keys

I do call the query for each neighbour node to find which one is better in building my path.
I think I will try the first way you mentioned. I also found some references using BTREE indexes:

CREATE INDEX nodes_tags_btree_historic_idx on nodes USING BTREE ((tags ? 'historic'));
CREATE INDEX nodes_tags_btree_tourist_idx on nodes USING BTREE ((tags ? 'tourist));

Do you think this could make a difference?


On Mon, Jul 8, 2013 at 1:27 PM, Richard Huxton <dev@archonet.com> wrote:
On 08/07/13 10:20, Radu-Stefan Zugravu wrote:
Any improvement is welcomed. The overall performance of the application
is not very good. It takes about 200 seconds to compute a path for not
so far star and end points.

So you have to call this query 1000 times with different start and end points?


> I want to improve this query as much as I can.

There's only two ways I can see to get this much below 20ms. This will only work if you want a very restricted range of tags.

Drop the tag index and create multiple geometry indexes instead:

CREATE INDEX node_geo_tourist_idx <index details> WHERE tags ? 'tourist';
CREATE INDEX node_geo_tourist_idx <index details> WHERE tags ? 'history';
etc.

This will only work if you have a literal WHERE clause that checks the tag. It should be fast though.


The second way would be to delete all the nodes that aren't tagged tourist or history. That assumes you are never interested in them of course.


How exactly should I post the explain without the index? Do I have to
drop all created indexes for the tags column? It takes some time to
create them back.

Not important - I was just curious.


--
  Richard Huxton
  Archonet Ltd



--
Radu-Stefan Zugravu
0755 950 145
0760 903 464
raduzugravu90@gmail.com
radu.zugravu@yahoo.com


Re: How to properly index hstore tags column to faster search for keys

From
Yuri Levinsky
Date:

Dear Radu-Stefan,

It seems to me that you trying hard to solve a problem by SQL that probably can't be solved. Take a look please on Apache HBase. You can access HBase from PostgreSQL as well by utilizing Java or Python for example.      

 

Sincerely yours,

 

Description: Celltick logo_highres

Yuri Levinsky, DBA

Celltick Technologies Ltd., 32 Maskit St., Herzliya 46733, Israel

Mobile: +972 54 6107703, Office: +972 9 9710239; Fax: +972 9 9710222

 

From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Radu-Stefan Zugravu
Sent: Monday, July 08, 2013 12:20 PM
To: Richard Huxton
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] How to properly index hstore tags column to faster search for keys

 

Any improvement is welcomed. The overall performance of the application is not very good. It takes about 200 seconds to compute a path for not so far star and end points. I want to improve this query as much as I can.

How exactly should I post the explain without the index? Do I have to drop all created indexes for the tags column? It takes some time to create them back.

 

On Mon, Jul 8, 2013 at 11:53 AM, Richard Huxton <dev@archonet.com> wrote:

On 08/07/13 09:31, Radu-Stefan Zugravu wrote:

Hi,
Thank you for your answer.
My EXPLAIN ANALYZE output can be found here:
http://explain.depesz.com/s/Wbo.

 

Thanks

Also, there is a discution on this subject on dba.stackexchange.com

<http://dba.stackexchange.com>:
http://dba.stackexchange.com/questions/45820/how-to-properly-index-hstore-tags-column-to-faster-search-for-keys


Thanks - also useful to know.

I can't see anything wrong with your query. Reading it from the bottom upwards:
1. Index used for "historic" search - builds a bitmap of blocks
2. Index used for geometry search - builds a bitmap of blocks
3. See where the bitmaps overlap (BitmapAnd)
4. Grab those disk blocks and find the rows (Bitmap Heap Scan)

The whole thing takes under 20ms - what sort of time were you hoping for?

The bulk of it (15ms) is taken up locating the "historic" rows. There are 36351 of those, but presumably most of them are far away on the map.

Could you post the explain without the index? I'm curious as to how slow it is just testing the tags after doing the geometry search.



--
  Richard Huxton
  Archonet Ltd



 

--

Radu-Stefan Zugravu

0755 950 145
0760 903 464
raduzugravu90@gmail.com
radu.zugravu@yahoo.com


This mail was received via Mail-SeCure System.

Attachment

Re: How to properly index hstore tags column to faster search for keys

From
Stefan Keller
Date:
Hi Yuri and Radu-Stefan

I would'nt give too fast on PostgreSQL! 
When looking at your query plan I wonder if one could reformulate the query to compute the ST_DWithin first (assuming you have an index on the node geometries!) before it filters the tags.
To investigate that you could formulate a CTE query [1] which computes the ST_DWithin first.

Yours, Stefan



2013/7/8 Yuri Levinsky <yuril@celltick.com>

Dear Radu-Stefan,

It seems to me that you trying hard to solve a problem by SQL that probably can't be solved. Take a look please on Apache HBase. You can access HBase from PostgreSQL as well by utilizing Java or Python for example.      

 

Sincerely yours,

 

Description: Celltick logo_highres

Yuri Levinsky, DBA

Celltick Technologies Ltd., 32 Maskit St., Herzliya 46733, Israel

Mobile: +972 54 6107703, Office: +972 9 9710239; Fax: +972 9 9710222

 

From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Radu-Stefan Zugravu
Sent: Monday, July 08, 2013 12:20 PM
To: Richard Huxton
Cc: pgsql-performance@postgresql.org


Subject: Re: [PERFORM] How to properly index hstore tags column to faster search for keys

 

Any improvement is welcomed. The overall performance of the application is not very good. It takes about 200 seconds to compute a path for not so far star and end points. I want to improve this query as much as I can.

How exactly should I post the explain without the index? Do I have to drop all created indexes for the tags column? It takes some time to create them back.

 

On Mon, Jul 8, 2013 at 11:53 AM, Richard Huxton <dev@archonet.com> wrote:

On 08/07/13 09:31, Radu-Stefan Zugravu wrote:

Hi,
Thank you for your answer.
My EXPLAIN ANALYZE output can be found here:
http://explain.depesz.com/s/Wbo.

 

Thanks

Also, there is a discution on this subject on dba.stackexchange.com

<http://dba.stackexchange.com>:
http://dba.stackexchange.com/questions/45820/how-to-properly-index-hstore-tags-column-to-faster-search-for-keys


Thanks - also useful to know.

I can't see anything wrong with your query. Reading it from the bottom upwards:
1. Index used for "historic" search - builds a bitmap of blocks
2. Index used for geometry search - builds a bitmap of blocks
3. See where the bitmaps overlap (BitmapAnd)
4. Grab those disk blocks and find the rows (Bitmap Heap Scan)

The whole thing takes under 20ms - what sort of time were you hoping for?

The bulk of it (15ms) is taken up locating the "historic" rows. There are 36351 of those, but presumably most of them are far away on the map.

Could you post the explain without the index? I'm curious as to how slow it is just testing the tags after doing the geometry search.



--
  Richard Huxton
  Archonet Ltd



 

--

Radu-Stefan Zugravu

0755 950 145
0760 903 464
raduzugravu90@gmail.com
radu.zugravu@yahoo.com


This mail was received via Mail-SeCure System.


Attachment