Thread: Suitable Index for my Table
Hi, I've got a table with many Values of the Type REAL. These are my metric distances or my pivots to my sentences. The table looks like this: ID INTEGER, distance1 REAL, distance2 REAL, distance3 REAL, distance4 REAL, ..., distance24 REAL The range of the Value is in between 0 and 1. So it looks like this 0.196 or 0.891 That my query WHERE value BETWEEN (distance1 - radius) AND (distance1 + radius) WHERE value BETWEEN (distance2 - radius) AND (distance2 + radius) WHERE value BETWEEN (distance3 - radius) AND (distance3 + radius) WHERE value BETWEEN (distance4 - radius) AND (distance4 + radius) ... Now I'm searching for a suitable index. Does any of you have an idea? TX very much for any support! Janek Sendrowski
Janek Sendrowski <janek12@web.de> wrote: > I've got a table with many Values of the Type REAL. > These are my metric distances or my pivots to my sentences. > The table looks like this: > > ID INTEGER, distance1 REAL, distance2 REAL, distance3 REAL, > distance4 REAL, ..., distance24 REAL It should always raise a big red flag when you see column names with numeric suffixes like that. Usually these represent data which should be normalized out to another table, or possibly represented by a single column which is an array, hstore, or json type. > The range of the Value is in between 0 and 1. So it looks like > this 0.196 or 0.891 > > That my query > > WHERE value BETWEEN (distance1 - radius) AND (distance1 + radius) > WHERE value BETWEEN (distance2 - radius) AND (distance2 + radius) > WHERE value BETWEEN (distance3 - radius) AND (distance3 + radius) > WHERE value BETWEEN (distance4 - radius) AND (distance4 + radius) > ... Are those the WHERE clauses of four different queries, or did you mean for those to be four criteria on a single query? It would be better to show an actual, working query and table layout, so people have a more clear idea of the problem they are being asked to help solve. > Now I'm searching for a suitable index. A btree index on each distance column, maybe? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Nov 04, 2013 at 07:21:11PM +0100, Janek Sendrowski wrote: > Hi, > > I've got a table with many Values of the Type REAL. > These are my metric distances or my pivots to my sentences. > The table looks like this: > > ID INTEGER, distance1 REAL, distance2 REAL, distance3 REAL, distance4 REAL, ..., distance24 REAL > > The range of the Value is in between 0 and 1. So it looks like this 0.196 or 0.891 > > That my query > > WHERE value BETWEEN (distance1 - radius) AND (distance1 + radius) > WHERE value BETWEEN (distance2 - radius) AND (distance2 + radius) > WHERE value BETWEEN (distance3 - radius) AND (distance3 + radius) > WHERE value BETWEEN (distance4 - radius) AND (distance4 + radius) > ... > > Now I'm searching for a suitable index. This sounds like a job for a geometric datatype, a la GiST. http://www.postgresql.org/docs/9.3/static/cube.html CREATE INDEX foo ON bar USING GIST ( cube( ARRAY(distance1), ARRAY(distance1) ) ); The you can do lookups with: SELECT * FROM bar WHERE cube( ARRAY(distance1), ARRAY(distance1) ) && cube( ARRAY(value-radius), ARRAY(value+radius) ) If you commonly use sets of columns you can go multiple dimensional for extra benefit. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > He who writes carelessly confesses thereby at the very outset that he does > not attach much importance to his own thoughts. -- Arthur Schopenhauer
Attachment
Thanks for your Answers! @Martijn van Oosterhout I am always searching for single colums. The values are different every time, because these are distances from my new sentenceto my pivots. Does the Index support a query with this WHERE statement: WHERE value BETWEEN (distance1 - radius) AND (distance1 + radius)? @Kevin Grittner okay, I will use arrays instead of multiple columns. A working query looks like this: SELECT id FROM distance WHERE value BETWEEN (distance1 - radius) AND (distance1 + radius) AND value BETWEEN (distance2 - radius) AND (distance2 + radius) AND value BETWEEN (distance3 - radius) AND (distance3 + radius) AND value BETWEEN (distance4 - radius) AND (distance4 + radius) AND value BETWEEN (distance5 - radius) AND (distance5 + radius) AND value BETWEEN (distance6 - radius) AND (distance6 + radius) AND value BETWEEN (distance7 - radius) AND (distance7 + radius) AND value BETWEEN (distance8 - radius) AND (distance8 + radius) AND value BETWEEN (distance9 - radius) AND (distance9 + radius) AND value BETWEEN (distance10 - radius) AND (distance10 + radius) AND value BETWEEN (distance11 - radius) AND (distance11 + radius) AND value BETWEEN (distance12 - radius) AND (distance12 + radius) AND value BETWEEN (distance13 - radius) AND (distance13 + radius) AND value BETWEEN (distance14 - radius) AND (distance14 + radius) AND value BETWEEN (distance15 - radius) AND (distance15 + radius) AND value BETWEEN (distance16 - radius) AND (distance16 + radius) AND value BETWEEN (distance17 - radius) AND (distance17 + radius) AND value BETWEEN (distance18 - radius) AND (distance18 + radius) AND value BETWEEN (distance19 - radius) AND (distance19 + radius) AND value BETWEEN (distance20 - radius) AND (distance20 + radius) AND value BETWEEN (distance21 - radius) AND (distance22 + radius) AND value BETWEEN (distance22 - radius) AND (distance23 + radius) AND value BETWEEN (distance23 - radius) AND (distance24 + radius); I'm not sure about the number of pivots yet. It can get bigger, but this is a single query to determine the fitting sentences. The query just returns the ID of the colum which is a foreign key to a sentence in another table. The table distance contains the metric distance of every pivot to all of my sentences in the other table. I haven't found yet the right pivots, but the distance should be between 0 and 1 in most cases, but it's also likely thatit's 0. Until now It just does a Seq Scan, when I'm searching through the table 'distances'. I can show your the Query Plan, if youwant. The number of rows which are resulting have a range of 0 until something like 100 for the begining. Hope these were enough informations Thank you for your help Janek Sendrowki
Janek Sendrowski <janek12@web.de> wrote: > Does the Index support a query with this WHERE statement: WHERE > value BETWEEN (distance1 - radius) AND (distance1 + radius)? Note that the following are all equivalent: value BETWEEN (distance1 - radius) AND (distance1 + radius) value >= (distance1 - radius) and value <= (distance1 + radius) (value + radius) >= distance1 and (value - radius) <= distance1 distance1 >= (value - radius) and distance1 <= (value + radius) distance BETWEEN (value - radius) AND (value + radius) The first two are not suitable for an index scan on distance1, but the last three are. If you can rewrite your query to use that syntax, it will be able to use a btree index on distance1. > okay, I will use arrays instead of multiple columns. It's probably worth looking at, but I can't say that is the best way from information available so far. > A working query looks like this: > SELECT id FROM distance WHERE > value BETWEEN (distance1 - radius) AND (distance1 + radius) AND > value BETWEEN (distance2 - radius) AND (distance2 + radius) AND > value BETWEEN (distance3 - radius) AND (distance3 + radius) AND > value BETWEEN (distance4 - radius) AND (distance4 + radius) AND > value BETWEEN (distance5 - radius) AND (distance5 + radius) AND > value BETWEEN (distance6 - radius) AND (distance6 + radius) AND > value BETWEEN (distance7 - radius) AND (distance7 + radius) AND > value BETWEEN (distance8 - radius) AND (distance8 + radius) AND > value BETWEEN (distance9 - radius) AND (distance9 + radius) AND > value BETWEEN (distance10 - radius) AND (distance10 + radius) AND > value BETWEEN (distance11 - radius) AND (distance11 + radius) AND > value BETWEEN (distance12 - radius) AND (distance12 + radius) AND > value BETWEEN (distance13 - radius) AND (distance13 + radius) AND > value BETWEEN (distance14 - radius) AND (distance14 + radius) AND > value BETWEEN (distance15 - radius) AND (distance15 + radius) AND > value BETWEEN (distance16 - radius) AND (distance16 + radius) AND > value BETWEEN (distance17 - radius) AND (distance17 + radius) AND > value BETWEEN (distance18 - radius) AND (distance18 + radius) AND > value BETWEEN (distance19 - radius) AND (distance19 + radius) AND > value BETWEEN (distance20 - radius) AND (distance20 + radius) AND > value BETWEEN (distance21 - radius) AND (distance22 + radius) AND > value BETWEEN (distance22 - radius) AND (distance23 + radius) AND > value BETWEEN (distance23 - radius) AND (distance24 + radius); An array column called dist_array might support something along the lines of (untested): SELECT id FROM distance WHERE (value - radius) <= ALL (dist_array) AND (value + radius) >= ALL (dist_array); I'm not sure whether a GIN index on the dist_array column would be usable by such a query, but it might be worth testing. > Until now It just does a Seq Scan, when I'm searching through the > table 'distances'. I can show your the Query Plan, if you want. Actual query text, table definitions (with indexes), and EXPLAIN ANALYZE output are always helpful. http://wiki.postgresql.org/wiki/SlowQueryQuestions Which reminds me, this sort of question might be better on the pgsql-performance list next time. > The number of rows which are resulting have a range of 0 until > something like 100 for the begining. Keep in mind that indexes will rarely be used on small tables. It isn't until there are many data pages that access through indexes begins to be faster. Also note that (as previously mentioned) the names of variables here suggest that geometry or PostGIS types may be a cleaner way to implement this than dealing in raw coordinates. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Nov 04, 2013 at 10:44:29PM +0100, Janek Sendrowski wrote: > I am always searching for single colums. The values are different every time, because these are distances from my newsentence to my pivots. > Does the Index support a query with this WHERE statement: WHERE value BETWEEN (distance1 - radius) AND (distance1 + radius)? Ok, this is not consistant. You say you're only searching single columns, but in your example query you're doing 23 columns in one query. > A working query looks like this: > SELECT id FROM distance WHERE > value BETWEEN (distance1 - radius) AND (distance1 + radius) AND > value BETWEEN (distance2 - radius) AND (distance2 + radius) AND > value BETWEEN (distance3 - radius) AND (distance3 + radius) AND So you're looking for the same value amongst all distances? That's doesn't seem very useful. Then if distance1=0 and distance2=1 and radius=0.2 then this query will never match that row, no matter what value of value. Anyway, unless you can describe this problem as something geometric (such that you can consider it a search for overlapping cubes) I'd do what the other post suggests and just put a btree index on every column. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > He who writes carelessly confesses thereby at the very outset that he does > not attach much importance to his own thoughts. -- Arthur Schopenhauer