Thread: Re: [GENERAL] Query is not using index when it should
On Fri, 10 Dec 2004, Tomas [iso-8859-1] Sk�re wrote: > I have a table that looks like this: > > Table "public.cjm_object" > Column | Type | Modifiers > -----------+-------------------+----------- > timestamp | bigint | not null > jobid | bigint | not null > objectid | bigint | not null > class | integer | not null > field | character varying | not null In 7.4.x and earlier, you need to cast the value you're comparing to into a bigint in order to make sure the indexes are used (in your timestamp case it appears to work because the value doesn't fit in a plain integer). 8.0 should handle this better. > But when doing a search with objectid, class and field, it doesn't use > the idx_cjm_object1 index. > db=# explain analyze select * from cjm_object where objectid=4534 and class=12 and field='paroid'; Using one of objectid=4534::bigint objectid='4534' objectid=CAST(4534 as bigint) rather than objectid=4534 should make this indexable in 7.4.x.
Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > On Fri, 10 Dec 2004, Tomas [iso-8859-1] Skäre wrote: > > > I have a table that looks like this: > > > > Table "public.cjm_object" > > Column | Type | Modifiers > > -----------+-------------------+----------- > > timestamp | bigint | not null > > jobid | bigint | not null > > objectid | bigint | not null > > class | integer | not null > > field | character varying | not null > > In 7.4.x and earlier, you need to cast the value you're comparing to into > a bigint in order to make sure the indexes are used (in your timestamp > case it appears to work because the value doesn't fit in a plain integer). > 8.0 should handle this better. Thanks, casting worked well for that query. Now, could someone please help me to get this query faster? With the 283465 rows, it takes far too long time, I think. This is on a 2GHz Celeron running Linux 2.6. shared_buffers=1000, sort_mem=1024. select c.* from cjm_object c inner join (select max(timestamp) as timestamp,objectid,field from cjm_object group by objectid,field) t using(timestamp,objectid,field) where 1=1 and data is not null order by objectid,field; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=145511.85..150759.75 rows=1 width=54) (actual time=17036.147..20968.811 rows=208246 loops=1) Merge Cond: (("outer".objectid = "inner".objectid) AND ("outer"."?column7?" = "inner"."?column4?") AND ("outer"."timestamp"= "inner"."timestamp")) -> Sort (cost=47007.75..47611.06 rows=241324 width=54) (actual time=5113.099..5586.094 rows=236710 loops=1) Sort Key: c.objectid, (c.field)::text, c."timestamp" -> Seq Scan on cjm_object c (cost=0.00..5862.65 rows=241324 width=54) (actual time=0.129..1788.125 rows=236710loops=1) Filter: (data IS NOT NULL) -> Sort (cost=98504.09..99212.75 rows=283465 width=48) (actual time=11922.081..12427.683 rows=255001 loops=1) Sort Key: t.objectid, (t.field)::text, t."timestamp" -> Subquery Scan t (cost=45534.39..51912.35 rows=283465 width=48) (actual time=5484.943..9289.061 rows=255001loops=1) -> GroupAggregate (cost=45534.39..49077.70 rows=283465 width=25) (actual time=5484.925..8178.531 rows=255001loops=1) -> Sort (cost=45534.39..46243.05 rows=283465 width=25) (actual time=5484.285..6324.067 rows=283465loops=1) Sort Key: objectid, field -> Seq Scan on cjm_object (cost=0.00..5862.65 rows=283465 width=25) (actual time=0.124..852.749rows=283465 loops=1) Total runtime: 21161.144 ms Quick explanation of the query: Each row in the table is a field, which is part of an object. Ex: timestamp objectid field data 1 1 name test 1 1 type something 1 2 name test2 1 2 type whatever Timestamp is when the entry was inserted in the databas. When updating a single field for an object, a new line with the new value is added, data set to NULL if the field is deleted. So the above content could now be: timestamp objectid field data 1 1 name test 1 1 type something 1 2 name test2 1 2 type whatever 2 1 name newname 2 1 type <NULL> Now, the query picks out the highest timestamp for each (objectid,field) and then selects all columns for each match, filtering out NULL data and ordering per objectid. Is there any way to make this query faster? I've tried rewriting it, putting the subquery as EXISTS condition, but it doesn't make it faster. I've tried to create different indices, but they don't seem to be used in this query. Greetings, Tomas
On Sat, Dec 11, 2004 at 03:17:13PM +0100, Tomas Skäre wrote: > select c.* from cjm_object c > inner join > (select max(timestamp) as timestamp,objectid,field from cjm_object > group by objectid,field) t > using(timestamp,objectid,field) > where 1=1 and data is not null > order by objectid,field; Usually, SELECT max(field) FROM table is better written in PostgreSQL as SELECT field FROM table ORDER field DESC LIMIT 1. I don't see the point of "where 1=1", though... /* Steinar */ -- Homepage: http://www.sesse.net/
On Sat, Dec 11, 2004 at 03:32:13PM +0100, Steinar H. Gunderson wrote: > On Sat, Dec 11, 2004 at 03:17:13PM +0100, Tomas Skäre wrote: > > select c.* from cjm_object c > > inner join > > (select max(timestamp) as timestamp,objectid,field from cjm_object > > group by objectid,field) t > > using(timestamp,objectid,field) > > where 1=1 and data is not null > > order by objectid,field; > > Usually, SELECT max(field) FROM table is better written in PostgreSQL as > SELECT field FROM table ORDER field DESC LIMIT 1. > > I don't see the point of "where 1=1", though... I've seen that in generated queries. The generating program uses "WHERE 1=1" to simplify the addition of other conditions: instead of checking if it needs to add a WHERE and putting ANDs in the right places, it simply appends subsequent conditions with " AND condition". -- Michael Fuhr http://www.fuhr.org/~mfuhr/
"Steinar H. Gunderson" <sgunderson@bigfoot.com> writes: > On Sat, Dec 11, 2004 at 03:17:13PM +0100, Tomas Skäre wrote: > > select c.* from cjm_object c > > inner join > > (select max(timestamp) as timestamp,objectid,field from cjm_object > > group by objectid,field) t > > using(timestamp,objectid,field) > > where 1=1 and data is not null > > order by objectid,field; > > Usually, SELECT max(field) FROM table is better written in PostgreSQL as > SELECT field FROM table ORDER field DESC LIMIT 1. Well, my subquery doesn't return just one row, but one for each objectid,field combination in the table. I could rewrite it to something like this: select c.* from cjm_object c where exists (select timestamp from cjm_object t where c.objectid=t.objectid and c.field=t.field order by timestamp desc limit 1) and data is not null order by objectid; But that seems to be even slower, even if it can use an index scan in the subquery. Also it doesn't give the same result set, but I haven't looked into what's wrong yet. > I don't see the point of "where 1=1", though... It's just because the actual query is generated by a program, and it's easier to always have "where 1=1" and then add optional conditions with "and ...". Tomas