Thread: Suitable Index for my Table

Suitable Index for my Table

From
"Janek Sendrowski"
Date:
Hi,

I've got a table with many Values of the Type REAL.
These are my metric distances or my pivots to my sentences.
The table looks like this:

ID INTEGER, distance1 REAL, distance2 REAL, distance3 REAL, distance4 REAL, ..., distance24 REAL

The range of the Value is in between 0 and 1. So it looks like this 0.196 or 0.891

That my query

WHERE value BETWEEN (distance1 - radius) AND (distance1 + radius)
WHERE value BETWEEN (distance2 - radius) AND (distance2 + radius)
WHERE value BETWEEN (distance3 - radius) AND (distance3 + radius)
WHERE value BETWEEN (distance4 - radius) AND (distance4 + radius)
...

Now I'm searching for a suitable index.

Does any of you have an idea?

TX very much for any support!

Janek Sendrowski


Re: Suitable Index for my Table

From
Kevin Grittner
Date:
Janek Sendrowski <janek12@web.de> wrote:

> I've got a table with many Values of the Type REAL.
> These are my metric distances or my pivots to my sentences.
> The table looks like this:
>
> ID INTEGER, distance1 REAL, distance2 REAL, distance3 REAL,
> distance4 REAL, ..., distance24 REAL

It should always raise a big red flag when you see column names
with numeric suffixes like that.  Usually these represent data
which should be normalized out to another table, or possibly
represented by a single column which is an array, hstore, or json
type.

> The range of the Value is in between 0 and 1. So it looks like
> this 0.196 or 0.891
>
> That my query
>
> WHERE value BETWEEN (distance1 - radius) AND (distance1 + radius)
> WHERE value BETWEEN (distance2 - radius) AND (distance2 + radius)
> WHERE value BETWEEN (distance3 - radius) AND (distance3 + radius)
> WHERE value BETWEEN (distance4 - radius) AND (distance4 + radius)
> ...

Are those the WHERE clauses of four different queries, or did you
mean for those to be four criteria on a single query?  It would be
better to show an actual, working query and table layout, so people
have a more clear idea of the problem they are being asked to help
solve.

> Now I'm searching for a suitable index.

A btree index on each distance column, maybe?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Suitable Index for my Table

From
Martijn van Oosterhout
Date:
On Mon, Nov 04, 2013 at 07:21:11PM +0100, Janek Sendrowski wrote:
> Hi,
>
> I've got a table with many Values of the Type REAL.
> These are my metric distances or my pivots to my sentences.
> The table looks like this:
>
> ID INTEGER, distance1 REAL, distance2 REAL, distance3 REAL, distance4 REAL, ..., distance24 REAL
>
> The range of the Value is in between 0 and 1. So it looks like this 0.196 or 0.891
>
> That my query
>
> WHERE value BETWEEN (distance1 - radius) AND (distance1 + radius)
> WHERE value BETWEEN (distance2 - radius) AND (distance2 + radius)
> WHERE value BETWEEN (distance3 - radius) AND (distance3 + radius)
> WHERE value BETWEEN (distance4 - radius) AND (distance4 + radius)
> ...
>
> Now I'm searching for a suitable index.

This sounds like a job for a geometric datatype, a la GiST.

http://www.postgresql.org/docs/9.3/static/cube.html

CREATE INDEX foo ON bar USING GIST ( cube( ARRAY(distance1), ARRAY(distance1) ) );

The you can do lookups with:

SELECT * FROM bar WHERE
      cube( ARRAY(distance1), ARRAY(distance1) )
      &&
      cube( ARRAY(value-radius), ARRAY(value+radius) )

If you commonly use sets of columns you can go multiple dimensional for
extra benefit.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer

Attachment

Re: Suitable Index for my Table

From
"Janek Sendrowski"
Date:
Thanks for your Answers!
 
@Martijn van Oosterhout
 
I am always searching for single colums. The  values are different every time, because these are distances from my new
sentenceto my pivots. 
Does the Index support a query with this WHERE statement: WHERE value BETWEEN (distance1 - radius) AND (distance1 +
radius)?
 
@Kevin Grittner
okay, I will use arrays instead of multiple columns.
A working query looks like this:
SELECT id FROM distance WHERE
value BETWEEN (distance1 - radius) AND (distance1 + radius) AND
value BETWEEN (distance2 - radius) AND (distance2 + radius) AND
value BETWEEN (distance3 - radius) AND (distance3 + radius) AND
value BETWEEN (distance4 - radius) AND (distance4 + radius) AND
value BETWEEN (distance5 - radius) AND (distance5 + radius) AND
value BETWEEN (distance6 - radius) AND (distance6 + radius) AND
value BETWEEN (distance7 - radius) AND (distance7 + radius) AND
value BETWEEN (distance8 - radius) AND (distance8 + radius) AND
value BETWEEN (distance9 - radius) AND (distance9 + radius) AND
value BETWEEN (distance10 - radius) AND (distance10 + radius) AND
value BETWEEN (distance11 - radius) AND (distance11 + radius) AND
value BETWEEN (distance12 - radius) AND (distance12 + radius) AND
value BETWEEN (distance13 - radius) AND (distance13 + radius) AND
value BETWEEN (distance14 - radius) AND (distance14 + radius) AND
value BETWEEN (distance15 - radius) AND (distance15 + radius) AND
value BETWEEN (distance16 - radius) AND (distance16 + radius) AND
value BETWEEN (distance17 - radius) AND (distance17 + radius) AND
value BETWEEN (distance18 - radius) AND (distance18 + radius) AND
value BETWEEN (distance19 - radius) AND (distance19 + radius) AND
value BETWEEN (distance20 - radius) AND (distance20 + radius) AND
value BETWEEN (distance21 - radius) AND (distance22 + radius) AND
value BETWEEN (distance22 - radius) AND (distance23 + radius) AND
value BETWEEN (distance23 - radius) AND (distance24 + radius);
 
I'm not sure about the number of pivots yet. It can get bigger,
but this is a single query to determine the fitting sentences.
The query just returns the ID of the colum which is a foreign key to a sentence in another table.
The table distance contains the metric distance of every pivot to all of my sentences in the other table.
I haven't found yet the right pivots, but the distance should be between 0 and 1 in most cases, but it's also likely
thatit's 0. 
Until now It just does a Seq Scan, when I'm searching through the table 'distances'. I can show your the Query Plan, if
youwant. 
The number of rows which are resulting have a range of 0 until something like 100 for the begining.
 
Hope these were enough informations
 
Thank you for your help
Janek Sendrowki


Re: Suitable Index for my Table

From
Kevin Grittner
Date:
Janek Sendrowski <janek12@web.de> wrote:

> Does the Index support a query with this WHERE statement: WHERE
> value BETWEEN (distance1 - radius) AND (distance1 + radius)?

Note that the following are all equivalent:

value BETWEEN (distance1 - radius) AND (distance1 + radius)

value >= (distance1 - radius) and value <= (distance1 + radius)

(value + radius) >= distance1 and (value - radius) <= distance1

distance1 >= (value - radius) and distance1 <= (value + radius)

distance BETWEEN (value - radius) AND (value + radius)

The first two are not suitable for an index scan on distance1, but
the last three are.  If you can rewrite your query to use that
syntax, it will be able to use a btree index on distance1.

> okay, I will use arrays instead of multiple columns.

It's probably worth looking at, but I can't say that is the best
way from information available so far.

> A working query looks like this:
> SELECT id FROM distance WHERE
> value BETWEEN (distance1 - radius) AND (distance1 + radius) AND
> value BETWEEN (distance2 - radius) AND (distance2 + radius) AND
> value BETWEEN (distance3 - radius) AND (distance3 + radius) AND
> value BETWEEN (distance4 - radius) AND (distance4 + radius) AND
> value BETWEEN (distance5 - radius) AND (distance5 + radius) AND
> value BETWEEN (distance6 - radius) AND (distance6 + radius) AND
> value BETWEEN (distance7 - radius) AND (distance7 + radius) AND
> value BETWEEN (distance8 - radius) AND (distance8 + radius) AND
> value BETWEEN (distance9 - radius) AND (distance9 + radius) AND
> value BETWEEN (distance10 - radius) AND (distance10 + radius) AND
> value BETWEEN (distance11 - radius) AND (distance11 + radius) AND
> value BETWEEN (distance12 - radius) AND (distance12 + radius) AND
> value BETWEEN (distance13 - radius) AND (distance13 + radius) AND
> value BETWEEN (distance14 - radius) AND (distance14 + radius) AND
> value BETWEEN (distance15 - radius) AND (distance15 + radius) AND
> value BETWEEN (distance16 - radius) AND (distance16 + radius) AND
> value BETWEEN (distance17 - radius) AND (distance17 + radius) AND
> value BETWEEN (distance18 - radius) AND (distance18 + radius) AND
> value BETWEEN (distance19 - radius) AND (distance19 + radius) AND
> value BETWEEN (distance20 - radius) AND (distance20 + radius) AND
> value BETWEEN (distance21 - radius) AND (distance22 + radius) AND
> value BETWEEN (distance22 - radius) AND (distance23 + radius) AND
> value BETWEEN (distance23 - radius) AND (distance24 + radius);

An array column called dist_array might support something along the
lines of (untested):

SELECT id FROM distance
  WHERE (value - radius) <= ALL (dist_array)
    AND (value + radius) >= ALL (dist_array);

I'm not sure whether a GIN index on the dist_array column would be
usable by such a query, but it might be worth testing.

> Until now It just does a Seq Scan, when I'm searching through the
> table 'distances'. I can show your the Query Plan, if you want.

Actual query text, table definitions (with indexes), and EXPLAIN
ANALYZE output are always helpful.

http://wiki.postgresql.org/wiki/SlowQueryQuestions

Which reminds me, this sort of question might be better on the
pgsql-performance list next time.

> The number of rows which are resulting have a range of 0 until
> something like 100 for the begining.

Keep in mind that indexes will rarely be used on small tables.  It
isn't until there are many data pages that access through indexes
begins to be faster.

Also note that (as previously mentioned) the names of variables
here suggest that geometry or PostGIS types may be a cleaner way to
implement this than dealing in raw coordinates.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Suitable Index for my Table

From
Martijn van Oosterhout
Date:
On Mon, Nov 04, 2013 at 10:44:29PM +0100, Janek Sendrowski wrote:
> I am always searching for single colums. The  values are different every time, because these are distances from my
newsentence to my pivots. 
> Does the Index support a query with this WHERE statement: WHERE value BETWEEN (distance1 - radius) AND (distance1 +
radius)?

Ok, this is not consistant. You say you're only searching single
columns, but in your example query you're doing 23 columns in one
query.

> A working query looks like this:
> SELECT id FROM distance WHERE
> value BETWEEN (distance1 - radius) AND (distance1 + radius) AND
> value BETWEEN (distance2 - radius) AND (distance2 + radius) AND
> value BETWEEN (distance3 - radius) AND (distance3 + radius) AND

So you're looking for the same value amongst all distances? That's
doesn't seem very useful. Then if distance1=0 and distance2=1 and
radius=0.2 then this query will never match that row, no matter what
value of value.

Anyway, unless you can describe this problem as something geometric
(such that you can consider it a search for overlapping cubes) I'd do
what the other post suggests and just put a btree index on every
column.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer

Attachment