Thread: faster search

faster search

From
Clark Slater
Date:
Hi-

Would someone please enlighten me as
to why I'm not seeing a faster execution
time on the simple scenario below?

there are 412,485 rows in the table and the
query matches on 132,528 rows, taking
almost a minute to execute.  vaccuum
analyze was just run.

Thanks!
Clark

           test
-------------------------
  id            | integer
  partnumber    | character varying(32)
  productlistid | integer
  typeid        | integer


Indexes:
"test_id" btree (id)
"test_plid" btree (productlistid)
"test_typeid" btree (typeid)
"test_plidtypeid" btree (productlistid, typeid)


explain analyze select * from test where productlistid=3 and typeid=9
order by partnumber limit 15;

                                    QUERY PLAN
---------------------------------------------------------------------------------------------
  Limit  (cost=201073.76..201073.79 rows=15 width=722) (actual
time=58092.477..58092.518 rows=15 loops=1)
    ->  Sort  (cost=201073.76..201451.76 rows=151200 width=722) (actual
time=58092.470..58092.505 rows=15 loops=1)
          Sort Key: partnumber
          ->  Seq Scan on test  (cost=0.00..96458.27 rows=151200 width=722)
(actual time=2.515..40201.275 rows=132528 loops=1)
                Filter: ((productlistid = 3) AND (typeid = 9))
  Total runtime: 59664.765 ms
(6 rows)


System specs:
PostgreSQL 7.4.2 on RedHat 9
dual AMD Athlon 2GHz processors
1 gig memory
mirrored 7200 RPM IDE disks


Re: faster search

From
"Steinar H. Gunderson"
Date:
On Fri, Jun 10, 2005 at 01:45:05PM -0400, Clark Slater wrote:
> Indexes:
> "test_id" btree (id)
> "test_plid" btree (productlistid)
> "test_typeid" btree (typeid)
> "test_plidtypeid" btree (productlistid, typeid)
>
>
> explain analyze select * from test where productlistid=3 and typeid=9
> order by partnumber limit 15;

You do not have an index on partnumber. Try adding one.

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: faster search

From
John A Meinel
Date:
Clark Slater wrote:
> Hi-
>
> Would someone please enlighten me as
> to why I'm not seeing a faster execution
> time on the simple scenario below?
>
> there are 412,485 rows in the table and the
> query matches on 132,528 rows, taking
> almost a minute to execute.  vaccuum
> analyze was just run.

Well, if you are matching 130k out of 400k rows, then a sequential scan
is certainly prefered to an index scan. And then you have to sort those
130k rows by partnumber. This *might* be spilling to disk depending on
what your workmem/sortmem is set to.

I would also say that what you would really want is some way to get the
whole thing from an index. And I think the way to do that is:

CREATE INDEX test_partnum_listid_typeid_idx ON
    test(partnumber, productlistid, typeid);

VACUUM ANALYZE test;

EXPLAIN ANALYZE SELECT * FROM test
    WHERE productlistid=3 AND typeid=9
    ORDER BY partnumber, productlistid, typeid
    LIMIT 15
;

The trick is that you have to match the order by exactly with the index,
so the planner realizes it can do an indexed lookup to get the information.

You could also just create an index on partnumber, and see how that
affects your original query. I think the planner could use an index
lookup on partnumber to get the ordering correct. But it will have to do
filtering after the fact based on productlistid and typeid.
With my extended index, I think the planner can be smarter and lookup
all 3 by the index.

>
> Thanks!
> Clark

Good luck,
John
=:->

Attachment

Re: faster search

From
Tobias Brox
Date:
[Clark Slater - Fri at 01:45:05PM -0400]
> Would someone please enlighten me as
> to why I'm not seeing a faster execution
> time on the simple scenario below?

Just some thoughts from a novice PG-DBA .. :-)

My general experience is that PG usually prefers sequal scans to indices if
a large portion of the table is to be selected, because it is faster to do a
seqscan than to follow an index and constantly seek between different
positions on the hard disk.

However, most of the time is spent sorting on partnumber, and you only want
15 rows, so of course you should have an index on partnumber!  Picking up 15
rows will be ligtning fast with that index.

If you may want to select significantly more than 15 rows, you can also try
to make a partial index:

create index test_pli3_ti9_by_part on test (partnumber) where
productlistid=3 and typeid=9;

If 3 and 9 are not constants in the query, try to make a three-key index
(it's important with partnumber because a lot of time is spent sorting):

create index test_pli_type_part on test (productslistid,typeid,partnumber);

To get pg to recognize the index, you will probably have to help it a bit:

select * from test where productlistid=3 and typeid=9 order by
productlistid,typeid,partnumber limit 15;

--
Tobias Brox, +47-91700050


Re: faster search

From
Steve Atkins
Date:
On Fri, Jun 10, 2005 at 01:45:05PM -0400, Clark Slater wrote:
> Hi-
>
> Would someone please enlighten me as
> to why I'm not seeing a faster execution
> time on the simple scenario below?

Because you need to extract a huge number of rows via a seqscan, sort
them and then throw them away, I think.

> explain analyze select * from test where productlistid=3 and typeid=9
> order by partnumber limit 15;

Create an index on (productlistid, typeid, partnumber) then

  select * from test where productlistid=3 and typeid=9
   order by productlistid, typeid, partnumber LIMIT 15;

?

Cheers,
  Steve

Re: faster search

From
Clark Slater
Date:
hmm, i'm baffled.  i simplified the query
and it is still taking forever...


           test
-------------------------
  id            | integer
  partnumber    | character varying(32)
  productlistid | integer
  typeid        | integer


Indexes:
"test_productlistid" btree (productlistid)
"test_typeid" btree (typeid)
"test_productlistid_typeid" btree (productlistid, typeid)


explain analyze select * from test where (productlistid=3 and typeid=9);

                               QUERY PLAN
-----------------------------------------------------------------------
  Seq Scan on test  (cost=0.00..96458.27 rows=156194 width=725) (actual
time=516.459..41930.250 rows=132528 loops=1)
    Filter: ((productlistid = 3) AND (typeid = 9))
  Total runtime: 41975.154 ms
(3 rows)


System specs:
PostgreSQL 7.4.2 on RedHat 9
dual AMD Athlon 2GHz processors
1 gig memory
mirrored 7200 RPM IDE disks


On Fri, 10 Jun 2005, John A Meinel wrote:

> Clark Slater wrote:
>> Hi-
>>
>> Would someone please enlighten me as
>> to why I'm not seeing a faster execution
>> time on the simple scenario below?
>>
>> there are 412,485 rows in the table and the
>> query matches on 132,528 rows, taking
>> almost a minute to execute.  vaccuum
>> analyze was just run.
>
> Well, if you are matching 130k out of 400k rows, then a sequential scan
> is certainly prefered to an index scan. And then you have to sort those
> 130k rows by partnumber. This *might* be spilling to disk depending on
> what your workmem/sortmem is set to.
>
> I would also say that what you would really want is some way to get the
> whole thing from an index. And I think the way to do that is:
>
> CREATE INDEX test_partnum_listid_typeid_idx ON
>     test(partnumber, productlistid, typeid);
>
> VACUUM ANALYZE test;
>
> EXPLAIN ANALYZE SELECT * FROM test
>     WHERE productlistid=3 AND typeid=9
>     ORDER BY partnumber, productlistid, typeid
>     LIMIT 15
> ;
>
> The trick is that you have to match the order by exactly with the index,
> so the planner realizes it can do an indexed lookup to get the information.
>
> You could also just create an index on partnumber, and see how that
> affects your original query. I think the planner could use an index
> lookup on partnumber to get the ordering correct. But it will have to do
> filtering after the fact based on productlistid and typeid.
> With my extended index, I think the planner can be smarter and lookup
> all 3 by the index.
>
>>
>> Thanks!
>> Clark
>
> Good luck,
> John
> =:->
>

Re: faster search

From
"Joshua D. Drake"
Date:
Clark Slater wrote:
> hmm, i'm baffled.  i simplified the query
> and it is still taking forever...

What happens if you:

alter table test alter column productlistid set statistics 150;
alter table test alter column typeid set statistics 150;
explain analyze select * from test where (productlistid=3 and typeid=9);

Sincerely,

Joshua D. Drake


>
>
>           test
> -------------------------
>  id            | integer
>  partnumber    | character varying(32)
>  productlistid | integer
>  typeid        | integer
>
>
> Indexes:
> "test_productlistid" btree (productlistid)
> "test_typeid" btree (typeid)
> "test_productlistid_typeid" btree (productlistid, typeid)
>
>
> explain analyze select * from test where (productlistid=3 and typeid=9);
>
>                               QUERY PLAN
> -----------------------------------------------------------------------
>  Seq Scan on test  (cost=0.00..96458.27 rows=156194 width=725) (actual
> time=516.459..41930.250 rows=132528 loops=1)
>    Filter: ((productlistid = 3) AND (typeid = 9))
>  Total runtime: 41975.154 ms
> (3 rows)
>
>
> System specs:
> PostgreSQL 7.4.2 on RedHat 9
> dual AMD Athlon 2GHz processors
> 1 gig memory
> mirrored 7200 RPM IDE disks
>
>
> On Fri, 10 Jun 2005, John A Meinel wrote:
>
>> Clark Slater wrote:
>>
>>> Hi-
>>>
>>> Would someone please enlighten me as
>>> to why I'm not seeing a faster execution
>>> time on the simple scenario below?
>>>
>>> there are 412,485 rows in the table and the
>>> query matches on 132,528 rows, taking
>>> almost a minute to execute.  vaccuum
>>> analyze was just run.
>>
>>
>> Well, if you are matching 130k out of 400k rows, then a sequential scan
>> is certainly prefered to an index scan. And then you have to sort those
>> 130k rows by partnumber. This *might* be spilling to disk depending on
>> what your workmem/sortmem is set to.
>>
>> I would also say that what you would really want is some way to get the
>> whole thing from an index. And I think the way to do that is:
>>
>> CREATE INDEX test_partnum_listid_typeid_idx ON
>>     test(partnumber, productlistid, typeid);
>>
>> VACUUM ANALYZE test;
>>
>> EXPLAIN ANALYZE SELECT * FROM test
>>     WHERE productlistid=3 AND typeid=9
>>     ORDER BY partnumber, productlistid, typeid
>>     LIMIT 15
>> ;
>>
>> The trick is that you have to match the order by exactly with the index,
>> so the planner realizes it can do an indexed lookup to get the
>> information.
>>
>> You could also just create an index on partnumber, and see how that
>> affects your original query. I think the planner could use an index
>> lookup on partnumber to get the ordering correct. But it will have to do
>> filtering after the fact based on productlistid and typeid.
>> With my extended index, I think the planner can be smarter and lookup
>> all 3 by the index.
>>
>>>
>>> Thanks!
>>> Clark
>>
>>
>> Good luck,
>> John
>> =:->
>>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)


--
Your PostgreSQL solutions provider, Command Prompt, Inc.
24x7 support - 1.800.492.2240, programming, and consulting
Home of PostgreSQL Replicator, plPHP, plPerlNG and pgPHPToolkit
http://www.commandprompt.com / http://www.postgresql.org

Re: faster search

From
Clark Slater
Date:
thanks for your suggestion.
a small improvement.  still pretty slow...

vbp=# alter table test alter column productlistid set statistics 150;
ALTER TABLE
vbp=# alter table test alter column typeid set statistics 150;
ALTER TABLE
vbp=# explain analyze select * from test where (productlistid=3 and typeid=9);
                             QUERY PLAN
------------------------------------------------------------------------------
  Seq Scan on test  (cost=0.00..96458.27 rows=156194 width=725) (actual
time=525.617..36802.556 rows=132528 loops=1)
    Filter: ((productlistid = 3) AND (typeid = 9))
  Total runtime: 36847.754 ms
(3 rows)

Time: 36850.719 ms


On Fri, 10 Jun 2005, Joshua D. Drake wrote:

> Clark Slater wrote:
>> hmm, i'm baffled.  i simplified the query
>> and it is still taking forever...
>
> What happens if you:
>
> alter table test alter column productlistid set statistics 150;
> alter table test alter column typeid set statistics 150;
> explain analyze select * from test where (productlistid=3 and typeid=9);
>
> Sincerely,
>
> Joshua D. Drake
>
>
>>
>>
>>           test
>> -------------------------
>>  id            | integer
>>  partnumber    | character varying(32)
>>  productlistid | integer
>>  typeid        | integer
>>
>>
>> Indexes:
>> "test_productlistid" btree (productlistid)
>> "test_typeid" btree (typeid)
>> "test_productlistid_typeid" btree (productlistid, typeid)
>>
>>
>> explain analyze select * from test where (productlistid=3 and typeid=9);
>>
>>                               QUERY PLAN
>> -----------------------------------------------------------------------
>>  Seq Scan on test  (cost=0.00..96458.27 rows=156194 width=725) (actual
>> time=516.459..41930.250 rows=132528 loops=1)
>>    Filter: ((productlistid = 3) AND (typeid = 9))
>>  Total runtime: 41975.154 ms
>> (3 rows)
>>
>>
>> System specs:
>> PostgreSQL 7.4.2 on RedHat 9
>> dual AMD Athlon 2GHz processors
>> 1 gig memory
>> mirrored 7200 RPM IDE disks
>>
>>
>> On Fri, 10 Jun 2005, John A Meinel wrote:
>>
>>> Clark Slater wrote:
>>>
>>>> Hi-
>>>>
>>>> Would someone please enlighten me as
>>>> to why I'm not seeing a faster execution
>>>> time on the simple scenario below?
>>>>
>>>> there are 412,485 rows in the table and the
>>>> query matches on 132,528 rows, taking
>>>> almost a minute to execute.  vaccuum
>>>> analyze was just run.
>>>
>>>
>>> Well, if you are matching 130k out of 400k rows, then a sequential scan
>>> is certainly prefered to an index scan. And then you have to sort those
>>> 130k rows by partnumber. This *might* be spilling to disk depending on
>>> what your workmem/sortmem is set to.
>>>
>>> I would also say that what you would really want is some way to get the
>>> whole thing from an index. And I think the way to do that is:
>>>
>>> CREATE INDEX test_partnum_listid_typeid_idx ON
>>>     test(partnumber, productlistid, typeid);
>>>
>>> VACUUM ANALYZE test;
>>>
>>> EXPLAIN ANALYZE SELECT * FROM test
>>>     WHERE productlistid=3 AND typeid=9
>>>     ORDER BY partnumber, productlistid, typeid
>>>     LIMIT 15
>>> ;
>>>
>>> The trick is that you have to match the order by exactly with the index,
>>> so the planner realizes it can do an indexed lookup to get the
>>> information.
>>>
>>> You could also just create an index on partnumber, and see how that
>>> affects your original query. I think the planner could use an index
>>> lookup on partnumber to get the ordering correct. But it will have to do
>>> filtering after the fact based on productlistid and typeid.
>>> With my extended index, I think the planner can be smarter and lookup
>>> all 3 by the index.
>>>
>>>>
>>>> Thanks!
>>>> Clark
>>>
>>>
>>> Good luck,
>>> John
>>> =:->
>>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 2: you can get off all lists at once with the unregister command
>>    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>
>
> --
> Your PostgreSQL solutions provider, Command Prompt, Inc.
> 24x7 support - 1.800.492.2240, programming, and consulting
> Home of PostgreSQL Replicator, plPHP, plPerlNG and pgPHPToolkit
> http://www.commandprompt.com / http://www.postgresql.org
>

Re: faster search

From
"Joshua D. Drake"
Date:
Clark Slater wrote:
> thanks for your suggestion.
> a small improvement.  still pretty slow...
>
> vbp=# alter table test alter column productlistid set statistics 150;
> ALTER TABLE
> vbp=# alter table test alter column typeid set statistics 150;
> ALTER TABLE
> vbp=# explain analyze select * from test where (productlistid=3 and
> typeid=9);
>                             QUERY PLAN
> ------------------------------------------------------------------------------
>
>  Seq Scan on test  (cost=0.00..96458.27 rows=156194 width=725) (actual
> time=525.617..36802.556 rows=132528 loops=1)
>    Filter: ((productlistid = 3) AND (typeid = 9))
>  Total runtime: 36847.754 ms
> (3 rows)
>
> Time: 36850.719 ms
>
>
> On Fri, 10 Jun 2005, Joshua D. Drake wrote:
>
>> Clark Slater wrote:
>>
>>> hmm, i'm baffled.  i simplified the query
>>> and it is still taking forever...
>>
>>
>> What happens if you:
>>
>> alter table test alter column productlistid set statistics 150;
>> alter table test alter column typeid set statistics 150;
>> explain analyze select * from test where (productlistid=3 and typeid=9);

How many rows should it return?

>>
>> Sincerely,
>>
>> Joshua D. Drake
>>
>>
>>>
>>>
>>>           test
>>> -------------------------
>>>  id            | integer
>>>  partnumber    | character varying(32)
>>>  productlistid | integer
>>>  typeid        | integer
>>>
>>>
>>> Indexes:
>>> "test_productlistid" btree (productlistid)
>>> "test_typeid" btree (typeid)
>>> "test_productlistid_typeid" btree (productlistid, typeid)
>>>
>>>
>>> explain analyze select * from test where (productlistid=3 and typeid=9);
>>>
>>>                               QUERY PLAN
>>> -----------------------------------------------------------------------
>>>  Seq Scan on test  (cost=0.00..96458.27 rows=156194 width=725) (actual
>>> time=516.459..41930.250 rows=132528 loops=1)
>>>    Filter: ((productlistid = 3) AND (typeid = 9))
>>>  Total runtime: 41975.154 ms
>>> (3 rows)
>>>
>>>
>>> System specs:
>>> PostgreSQL 7.4.2 on RedHat 9
>>> dual AMD Athlon 2GHz processors
>>> 1 gig memory
>>> mirrored 7200 RPM IDE disks
>>>
>>>
>>> On Fri, 10 Jun 2005, John A Meinel wrote:
>>>
>>>> Clark Slater wrote:
>>>>
>>>>> Hi-
>>>>>
>>>>> Would someone please enlighten me as
>>>>> to why I'm not seeing a faster execution
>>>>> time on the simple scenario below?
>>>>>
>>>>> there are 412,485 rows in the table and the
>>>>> query matches on 132,528 rows, taking
>>>>> almost a minute to execute.  vaccuum
>>>>> analyze was just run.
>>>>
>>>>
>>>>
>>>> Well, if you are matching 130k out of 400k rows, then a sequential scan
>>>> is certainly prefered to an index scan. And then you have to sort those
>>>> 130k rows by partnumber. This *might* be spilling to disk depending on
>>>> what your workmem/sortmem is set to.
>>>>
>>>> I would also say that what you would really want is some way to get the
>>>> whole thing from an index. And I think the way to do that is:
>>>>
>>>> CREATE INDEX test_partnum_listid_typeid_idx ON
>>>>     test(partnumber, productlistid, typeid);
>>>>
>>>> VACUUM ANALYZE test;
>>>>
>>>> EXPLAIN ANALYZE SELECT * FROM test
>>>>     WHERE productlistid=3 AND typeid=9
>>>>     ORDER BY partnumber, productlistid, typeid
>>>>     LIMIT 15
>>>> ;
>>>>
>>>> The trick is that you have to match the order by exactly with the
>>>> index,
>>>> so the planner realizes it can do an indexed lookup to get the
>>>> information.
>>>>
>>>> You could also just create an index on partnumber, and see how that
>>>> affects your original query. I think the planner could use an index
>>>> lookup on partnumber to get the ordering correct. But it will have
>>>> to do
>>>> filtering after the fact based on productlistid and typeid.
>>>> With my extended index, I think the planner can be smarter and lookup
>>>> all 3 by the index.
>>>>
>>>>>
>>>>> Thanks!
>>>>> Clark
>>>>
>>>>
>>>>
>>>> Good luck,
>>>> John
>>>> =:->
>>>>
>>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 2: you can get off all lists at once with the unregister command
>>>    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>>
>>
>>
>> --
>> Your PostgreSQL solutions provider, Command Prompt, Inc.
>> 24x7 support - 1.800.492.2240, programming, and consulting
>> Home of PostgreSQL Replicator, plPHP, plPerlNG and pgPHPToolkit
>> http://www.commandprompt.com / http://www.postgresql.org
>>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org


--
Your PostgreSQL solutions provider, Command Prompt, Inc.
24x7 support - 1.800.492.2240, programming, and consulting
Home of PostgreSQL Replicator, plPHP, plPerlNG and pgPHPToolkit
http://www.commandprompt.com / http://www.postgresql.org

Re: faster search

From
"Joshua D. Drake"
Date:
Clark Slater wrote:
> thanks for your suggestion.
> a small improvement.  still pretty slow...
>
> vbp=# alter table test alter column productlistid set statistics 150;
> ALTER TABLE
> vbp=# alter table test alter column typeid set statistics 150;
> ALTER TABLE
> vbp=# explain analyze select * from test where (productlistid=3 and

Hello,

Also what happens if you:

set enable_seqscan = false;
explain analyze query....

Sincerely,

Joshua D. Drake



> typeid=9);
>                             QUERY PLAN
> ------------------------------------------------------------------------------
>
>  Seq Scan on test  (cost=0.00..96458.27 rows=156194 width=725) (actual
> time=525.617..36802.556 rows=132528 loops=1)
>    Filter: ((productlistid = 3) AND (typeid = 9))
>  Total runtime: 36847.754 ms
> (3 rows)
>
> Time: 36850.719 ms
>
>
> On Fri, 10 Jun 2005, Joshua D. Drake wrote:
>
>> Clark Slater wrote:
>>
>>> hmm, i'm baffled.  i simplified the query
>>> and it is still taking forever...
>>
>>
>> What happens if you:
>>
>> alter table test alter column productlistid set statistics 150;
>> alter table test alter column typeid set statistics 150;
>> explain analyze select * from test where (productlistid=3 and typeid=9);
>>
>> Sincerely,
>>
>> Joshua D. Drake
>>
>>
>>>
>>>
>>>           test
>>> -------------------------
>>>  id            | integer
>>>  partnumber    | character varying(32)
>>>  productlistid | integer
>>>  typeid        | integer
>>>
>>>
>>> Indexes:
>>> "test_productlistid" btree (productlistid)
>>> "test_typeid" btree (typeid)
>>> "test_productlistid_typeid" btree (productlistid, typeid)
>>>
>>>
>>> explain analyze select * from test where (productlistid=3 and typeid=9);
>>>
>>>                               QUERY PLAN
>>> -----------------------------------------------------------------------
>>>  Seq Scan on test  (cost=0.00..96458.27 rows=156194 width=725) (actual
>>> time=516.459..41930.250 rows=132528 loops=1)
>>>    Filter: ((productlistid = 3) AND (typeid = 9))
>>>  Total runtime: 41975.154 ms
>>> (3 rows)
>>>
>>>
>>> System specs:
>>> PostgreSQL 7.4.2 on RedHat 9
>>> dual AMD Athlon 2GHz processors
>>> 1 gig memory
>>> mirrored 7200 RPM IDE disks
>>>
>>>
>>> On Fri, 10 Jun 2005, John A Meinel wrote:
>>>
>>>> Clark Slater wrote:
>>>>
>>>>> Hi-
>>>>>
>>>>> Would someone please enlighten me as
>>>>> to why I'm not seeing a faster execution
>>>>> time on the simple scenario below?
>>>>>
>>>>> there are 412,485 rows in the table and the
>>>>> query matches on 132,528 rows, taking
>>>>> almost a minute to execute.  vaccuum
>>>>> analyze was just run.
>>>>
>>>>
>>>>
>>>> Well, if you are matching 130k out of 400k rows, then a sequential scan
>>>> is certainly prefered to an index scan. And then you have to sort those
>>>> 130k rows by partnumber. This *might* be spilling to disk depending on
>>>> what your workmem/sortmem is set to.
>>>>
>>>> I would also say that what you would really want is some way to get the
>>>> whole thing from an index. And I think the way to do that is:
>>>>
>>>> CREATE INDEX test_partnum_listid_typeid_idx ON
>>>>     test(partnumber, productlistid, typeid);
>>>>
>>>> VACUUM ANALYZE test;
>>>>
>>>> EXPLAIN ANALYZE SELECT * FROM test
>>>>     WHERE productlistid=3 AND typeid=9
>>>>     ORDER BY partnumber, productlistid, typeid
>>>>     LIMIT 15
>>>> ;
>>>>
>>>> The trick is that you have to match the order by exactly with the
>>>> index,
>>>> so the planner realizes it can do an indexed lookup to get the
>>>> information.
>>>>
>>>> You could also just create an index on partnumber, and see how that
>>>> affects your original query. I think the planner could use an index
>>>> lookup on partnumber to get the ordering correct. But it will have
>>>> to do
>>>> filtering after the fact based on productlistid and typeid.
>>>> With my extended index, I think the planner can be smarter and lookup
>>>> all 3 by the index.
>>>>
>>>>>
>>>>> Thanks!
>>>>> Clark
>>>>
>>>>
>>>>
>>>> Good luck,
>>>> John
>>>> =:->
>>>>
>>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 2: you can get off all lists at once with the unregister command
>>>    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>>
>>
>>
>> --
>> Your PostgreSQL solutions provider, Command Prompt, Inc.
>> 24x7 support - 1.800.492.2240, programming, and consulting
>> Home of PostgreSQL Replicator, plPHP, plPerlNG and pgPHPToolkit
>> http://www.commandprompt.com / http://www.postgresql.org
>>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org


--
Your PostgreSQL solutions provider, Command Prompt, Inc.
24x7 support - 1.800.492.2240, programming, and consulting
Home of PostgreSQL Replicator, plPHP, plPerlNG and pgPHPToolkit
http://www.commandprompt.com / http://www.postgresql.org

Re: faster search

From
John A Meinel
Date:
Clark Slater wrote:
> hmm, i'm baffled.  i simplified the query
> and it is still taking forever...
>
>
>           test
> -------------------------
>  id            | integer
>  partnumber    | character varying(32)
>  productlistid | integer
>  typeid        | integer
>
>
> Indexes:
> "test_productlistid" btree (productlistid)
> "test_typeid" btree (typeid)
> "test_productlistid_typeid" btree (productlistid, typeid)
>
>
> explain analyze select * from test where (productlistid=3 and typeid=9);
>
>                               QUERY PLAN
> -----------------------------------------------------------------------
>  Seq Scan on test  (cost=0.00..96458.27 rows=156194 width=725) (actual
> time=516.459..41930.250 rows=132528 loops=1)
>    Filter: ((productlistid = 3) AND (typeid = 9))
>  Total runtime: 41975.154 ms
> (3 rows)
>
>

This query is still going to take a long time, because you have to scan
the whole table. Your WHERE clause is not very specific (it takes 25% of
the table). Convention says that any time you want > 5-10% of a table, a
sequential scan is better, because it does it in order.

Now if you did:

explain analyze select * from test where (productlistid=3 and typeid=9)
limit 15;

I think that would be very fast.

I am a little surprised that it is taking 40s to scan only 400k rows,
though. On an older machine of mine (with only 256M ram and dual 450MHz
Celerons), I have a table with 74k rows which takes about .5 sec. At
those numbers it should take more like 4s not 40.

John
=:->

Attachment

Re: faster search

From
Clark Slater
Date:
Query should return 132,528 rows.

vbp=# set enable_seqscan = false;
SET
vbp=# explain analyze select * from test where (productlistid=3 and typeid=9);

                            QUERY PLAN
------------------------------------------------------------------------
  Index Scan using test_typeid on test  (cost=0.00..137223.89 rows=156194
width=725) (actual time=25.999..25708.478 rows=132528
  loops=1)
    Index Cond: (typeid = 9)
    Filter: (productlistid = 3)
  Total runtime: 25757.679 ms
(4 rows)


On Fri, 10 Jun 2005, Joshua D. Drake wrote:

> Clark Slater wrote:
>> thanks for your suggestion.
>> a small improvement.  still pretty slow...
>>
>> vbp=# alter table test alter column productlistid set statistics 150;
>> ALTER TABLE
>> vbp=# alter table test alter column typeid set statistics 150;
>> ALTER TABLE
>> vbp=# explain analyze select * from test where (productlistid=3 and
>
> Hello,
>
> Also what happens if you:
>
> set enable_seqscan = false;
> explain analyze query....
>
> Sincerely,
>
> Joshua D. Drake
>
>
>
>> typeid=9);
>>                             QUERY PLAN
>>
>> ------------------------------------------------------------------------------
>>  Seq Scan on test  (cost=0.00..96458.27 rows=156194 width=725) (actual
>> time=525.617..36802.556 rows=132528 loops=1)
>>    Filter: ((productlistid = 3) AND (typeid = 9))
>>  Total runtime: 36847.754 ms
>> (3 rows)
>>
>> Time: 36850.719 ms
>>
>>
>> On Fri, 10 Jun 2005, Joshua D. Drake wrote:
>>
>>> Clark Slater wrote:
>>>
>>>> hmm, i'm baffled.  i simplified the query
>>>> and it is still taking forever...
>>>
>>>
>>> What happens if you:
>>>
>>> alter table test alter column productlistid set statistics 150;
>>> alter table test alter column typeid set statistics 150;
>>> explain analyze select * from test where (productlistid=3 and typeid=9);
>>>
>>> Sincerely,
>>>
>>> Joshua D. Drake
>>>
>>>
>>>>
>>>>
>>>>           test
>>>> -------------------------
>>>>  id            | integer
>>>>  partnumber    | character varying(32)
>>>>  productlistid | integer
>>>>  typeid        | integer
>>>>
>>>>
>>>> Indexes:
>>>> "test_productlistid" btree (productlistid)
>>>> "test_typeid" btree (typeid)
>>>> "test_productlistid_typeid" btree (productlistid, typeid)
>>>>
>>>>
>>>> explain analyze select * from test where (productlistid=3 and typeid=9);
>>>>
>>>>                               QUERY PLAN
>>>> -----------------------------------------------------------------------
>>>>  Seq Scan on test  (cost=0.00..96458.27 rows=156194 width=725) (actual
>>>> time=516.459..41930.250 rows=132528 loops=1)
>>>>    Filter: ((productlistid = 3) AND (typeid = 9))
>>>>  Total runtime: 41975.154 ms
>>>> (3 rows)
>>>>
>>>>
>>>> System specs:
>>>> PostgreSQL 7.4.2 on RedHat 9
>>>> dual AMD Athlon 2GHz processors
>>>> 1 gig memory
>>>> mirrored 7200 RPM IDE disks
>>>>
>>>>
>>>> On Fri, 10 Jun 2005, John A Meinel wrote:
>>>>
>>>>> Clark Slater wrote:
>>>>>
>>>>>> Hi-
>>>>>>
>>>>>> Would someone please enlighten me as
>>>>>> to why I'm not seeing a faster execution
>>>>>> time on the simple scenario below?
>>>>>>
>>>>>> there are 412,485 rows in the table and the
>>>>>> query matches on 132,528 rows, taking
>>>>>> almost a minute to execute.  vaccuum
>>>>>> analyze was just run.
>>>>>
>>>>>
>>>>>
>>>>> Well, if you are matching 130k out of 400k rows, then a sequential scan
>>>>> is certainly prefered to an index scan. And then you have to sort those
>>>>> 130k rows by partnumber. This *might* be spilling to disk depending on
>>>>> what your workmem/sortmem is set to.
>>>>>
>>>>> I would also say that what you would really want is some way to get the
>>>>> whole thing from an index. And I think the way to do that is:
>>>>>
>>>>> CREATE INDEX test_partnum_listid_typeid_idx ON
>>>>>     test(partnumber, productlistid, typeid);
>>>>>
>>>>> VACUUM ANALYZE test;
>>>>>
>>>>> EXPLAIN ANALYZE SELECT * FROM test
>>>>>     WHERE productlistid=3 AND typeid=9
>>>>>     ORDER BY partnumber, productlistid, typeid
>>>>>     LIMIT 15
>>>>> ;
>>>>>
>>>>> The trick is that you have to match the order by exactly with the index,
>>>>> so the planner realizes it can do an indexed lookup to get the
>>>>> information.
>>>>>
>>>>> You could also just create an index on partnumber, and see how that
>>>>> affects your original query. I think the planner could use an index
>>>>> lookup on partnumber to get the ordering correct. But it will have to do
>>>>> filtering after the fact based on productlistid and typeid.
>>>>> With my extended index, I think the planner can be smarter and lookup
>>>>> all 3 by the index.
>>>>>
>>>>>>
>>>>>> Thanks!
>>>>>> Clark
>>>>>
>>>>>
>>>>>
>>>>> Good luck,
>>>>> John
>>>>> =:->
>>>>>
>>>>
>>>> ---------------------------(end of broadcast)---------------------------
>>>> TIP 2: you can get off all lists at once with the unregister command
>>>>    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>>>
>>>
>>>
>>> --
>>> Your PostgreSQL solutions provider, Command Prompt, Inc.
>>> 24x7 support - 1.800.492.2240, programming, and consulting
>>> Home of PostgreSQL Replicator, plPHP, plPerlNG and pgPHPToolkit
>>> http://www.commandprompt.com / http://www.postgresql.org
>>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>
>
> --
> Your PostgreSQL solutions provider, Command Prompt, Inc.
> 24x7 support - 1.800.492.2240, programming, and consulting
> Home of PostgreSQL Replicator, plPHP, plPerlNG and pgPHPToolkit
> http://www.commandprompt.com / http://www.postgresql.org
>

Re: faster search

From
"Joshua D. Drake"
Date:
Clark Slater wrote:
> Query should return 132,528 rows.

O.k. then the planner is doing fine it looks like. The problem is you
are pulling 132,528 rows. I would suggest moving to a cursor which will
allow you to fetch in smaller chunks much quicker.

Sincerely,

Joshua D. Drake


>
> vbp=# set enable_seqscan = false;
> SET
> vbp=# explain analyze select * from test where (productlistid=3 and
> typeid=9);
>
>                            QUERY PLAN
> ------------------------------------------------------------------------
>  Index Scan using test_typeid on test  (cost=0.00..137223.89 rows=156194
> width=725) (actual time=25.999..25708.478 rows=132528
>  loops=1)
>    Index Cond: (typeid = 9)
>    Filter: (productlistid = 3)
>  Total runtime: 25757.679 ms
> (4 rows)
>
>
> On Fri, 10 Jun 2005, Joshua D. Drake wrote:
>
>> Clark Slater wrote:
>>
>>> thanks for your suggestion.
>>> a small improvement.  still pretty slow...
>>>
>>> vbp=# alter table test alter column productlistid set statistics 150;
>>> ALTER TABLE
>>> vbp=# alter table test alter column typeid set statistics 150;
>>> ALTER TABLE
>>> vbp=# explain analyze select * from test where (productlistid=3 and
>>
>>
>> Hello,
>>
>> Also what happens if you:
>>
>> set enable_seqscan = false;
>> explain analyze query....
>>
>> Sincerely,
>>
>> Joshua D. Drake
>>
>>
>>
>>> typeid=9);
>>>                             QUERY PLAN
>>>
>>> ------------------------------------------------------------------------------
>>>  Seq Scan on test  (cost=0.00..96458.27 rows=156194 width=725)
>>> (actual time=525.617..36802.556 rows=132528 loops=1)
>>>    Filter: ((productlistid = 3) AND (typeid = 9))
>>>  Total runtime: 36847.754 ms
>>> (3 rows)
>>>
>>> Time: 36850.719 ms
>>>
>>>
>>> On Fri, 10 Jun 2005, Joshua D. Drake wrote:
>>>
>>>> Clark Slater wrote:
>>>>
>>>>> hmm, i'm baffled.  i simplified the query
>>>>> and it is still taking forever...
>>>>
>>>>
>>>>
>>>> What happens if you:
>>>>
>>>> alter table test alter column productlistid set statistics 150;
>>>> alter table test alter column typeid set statistics 150;
>>>> explain analyze select * from test where (productlistid=3 and
>>>> typeid=9);
>>>>
>>>> Sincerely,
>>>>
>>>> Joshua D. Drake
>>>>
>>>>
>>>>>
>>>>>
>>>>>           test
>>>>> -------------------------
>>>>>  id            | integer
>>>>>  partnumber    | character varying(32)
>>>>>  productlistid | integer
>>>>>  typeid        | integer
>>>>>
>>>>>
>>>>> Indexes:
>>>>> "test_productlistid" btree (productlistid)
>>>>> "test_typeid" btree (typeid)
>>>>> "test_productlistid_typeid" btree (productlistid, typeid)
>>>>>
>>>>>
>>>>> explain analyze select * from test where (productlistid=3 and
>>>>> typeid=9);
>>>>>
>>>>>                               QUERY PLAN
>>>>> -----------------------------------------------------------------------
>>>>>
>>>>>  Seq Scan on test  (cost=0.00..96458.27 rows=156194 width=725) (actual
>>>>> time=516.459..41930.250 rows=132528 loops=1)
>>>>>    Filter: ((productlistid = 3) AND (typeid = 9))
>>>>>  Total runtime: 41975.154 ms
>>>>> (3 rows)
>>>>>
>>>>>
>>>>> System specs:
>>>>> PostgreSQL 7.4.2 on RedHat 9
>>>>> dual AMD Athlon 2GHz processors
>>>>> 1 gig memory
>>>>> mirrored 7200 RPM IDE disks
>>>>>
>>>>>
>>>>> On Fri, 10 Jun 2005, John A Meinel wrote:
>>>>>
>>>>>> Clark Slater wrote:
>>>>>>
>>>>>>> Hi-
>>>>>>>
>>>>>>> Would someone please enlighten me as
>>>>>>> to why I'm not seeing a faster execution
>>>>>>> time on the simple scenario below?
>>>>>>>
>>>>>>> there are 412,485 rows in the table and the
>>>>>>> query matches on 132,528 rows, taking
>>>>>>> almost a minute to execute.  vaccuum
>>>>>>> analyze was just run.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Well, if you are matching 130k out of 400k rows, then a sequential
>>>>>> scan
>>>>>> is certainly prefered to an index scan. And then you have to sort
>>>>>> those
>>>>>> 130k rows by partnumber. This *might* be spilling to disk
>>>>>> depending on
>>>>>> what your workmem/sortmem is set to.
>>>>>>
>>>>>> I would also say that what you would really want is some way to
>>>>>> get the
>>>>>> whole thing from an index. And I think the way to do that is:
>>>>>>
>>>>>> CREATE INDEX test_partnum_listid_typeid_idx ON
>>>>>>     test(partnumber, productlistid, typeid);
>>>>>>
>>>>>> VACUUM ANALYZE test;
>>>>>>
>>>>>> EXPLAIN ANALYZE SELECT * FROM test
>>>>>>     WHERE productlistid=3 AND typeid=9
>>>>>>     ORDER BY partnumber, productlistid, typeid
>>>>>>     LIMIT 15
>>>>>> ;
>>>>>>
>>>>>> The trick is that you have to match the order by exactly with the
>>>>>> index,
>>>>>> so the planner realizes it can do an indexed lookup to get the
>>>>>> information.
>>>>>>
>>>>>> You could also just create an index on partnumber, and see how that
>>>>>> affects your original query. I think the planner could use an index
>>>>>> lookup on partnumber to get the ordering correct. But it will have
>>>>>> to do
>>>>>> filtering after the fact based on productlistid and typeid.
>>>>>> With my extended index, I think the planner can be smarter and lookup
>>>>>> all 3 by the index.
>>>>>>
>>>>>>>
>>>>>>> Thanks!
>>>>>>> Clark
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Good luck,
>>>>>> John
>>>>>> =:->
>>>>>>
>>>>>
>>>>> ---------------------------(end of
>>>>> broadcast)---------------------------
>>>>> TIP 2: you can get off all lists at once with the unregister command
>>>>>    (send "unregister YourEmailAddressHere" to
>>>>> majordomo@postgresql.org)
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Your PostgreSQL solutions provider, Command Prompt, Inc.
>>>> 24x7 support - 1.800.492.2240, programming, and consulting
>>>> Home of PostgreSQL Replicator, plPHP, plPerlNG and pgPHPToolkit
>>>> http://www.commandprompt.com / http://www.postgresql.org
>>>>
>>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>>
>>
>>
>> --
>> Your PostgreSQL solutions provider, Command Prompt, Inc.
>> 24x7 support - 1.800.492.2240, programming, and consulting
>> Home of PostgreSQL Replicator, plPHP, plPerlNG and pgPHPToolkit
>> http://www.commandprompt.com / http://www.postgresql.org
>>


--
Your PostgreSQL solutions provider, Command Prompt, Inc.
24x7 support - 1.800.492.2240, programming, and consulting
Home of PostgreSQL Replicator, plPHP, plPerlNG and pgPHPToolkit
http://www.commandprompt.com / http://www.postgresql.org

Re: faster search

From
Cosimo Streppone
Date:
Steve Atkins wrote:

> On Fri, Jun 10, 2005 at 01:45:05PM -0400, Clark Slater wrote:
>
>>Hi-
>>
>>Would someone please enlighten me as
>>to why I'm not seeing a faster execution
>>time on the simple scenario below?
>
 > [...]
 >
> Create an index on (productlistid, typeid, partnumber) then
>
>   select * from test where productlistid=3 and typeid=9
>    order by productlistid, typeid, partnumber LIMIT 15;
>

Clark, try also adding (just for testing) partnumber to your
where clause, like this:

    select * from test where productlistid=3 and typeid=9
    and partnumber='foo' order by productlistid,
    typeid, partnumber;

and check output of explain analyze.

I had experiences of planner "bad" use of indexes when attribute
types were integer and cardinality was low (a single attribute
value, like "typeid=9" selects one or few rows).
However, this was on 7.1.3, and probably is not relevant to your case.

--
Cosimo


Re: faster search

From
Tom Lane
Date:
John A Meinel <john@arbash-meinel.com> writes:
> I am a little surprised that it is taking 40s to scan only 400k rows,
> though.

Yeah, that seemed high to me too.  Table bloat maybe?  It would be
interesting to look at the output of "vacuum verbose test" to see
how much dead space there is.

            regards, tom lane