Thread: Optimizing query

Optimizing query

From

Poul Møller Hansen

Date:

15 August 2005, 05:39:45

I have a problem creating a usable index for the following simple query:
SELECT * FROM my.table WHERE node = '10' ORDER BY id DESC LIMIT 1

id is a serial, so the query is to find the latest entry to a given node
and id is the primary key.
The table contains around 1 million records and the query takes around 2
seconds.

I have tried to make an index on node and also on both id & node, but is
doesn't lower the
query time.

What am I doing wrong ?

Thanks,
Poul

Re: Optimizing query

From

Richard Huxton

Date:

15 August 2005, 06:15:24

Poul Møller Hansen wrote:
> I have a problem creating a usable index for the following simple query:
> SELECT * FROM my.table WHERE node = '10' ORDER BY id DESC LIMIT 1
>
> id is a serial, so the query is to find the latest entry to a given node
> and id is the primary key.

You're not necessarily getting the latest entry, just the one with the
highest "id". Sequences guarantee uniqueness but if you have concurrent
inserts not necessarily ordering.

> The table contains around 1 million records and the query takes around 2
> seconds.

Well, you don't say how many different values for "node" there are, nor
how many rows you would expect where node='10'.

> I have tried to make an index on node and also on both id & node, but is
> doesn't lower the query time.

Difficult to say what's happening since you don't supply any EXPLAIN
ANALYSE output.

However, if you have an index on (node,id) you might want to try:
   SELECT ... ORDER BY node DESC, id DESC LIMIT 1;
That way the "ORDER BY" part clearly tells the planner that a
reverse-order on your index will be useful.

--
   Richard Huxton
   Archonet Ltd

Re: Optimizing query

From

Poul Møller Hansen

Date:

15 August 2005, 06:46:51

>> I have a problem creating a usable index for the following simple query:
>> SELECT * FROM my.table WHERE node = '10' ORDER BY id DESC LIMIT 1
>>
>> id is a serial, so the query is to find the latest entry to a given
>> node and id is the primary key.
>
>
> You're not necessarily getting the latest entry, just the one with the
> highest "id". Sequences guarantee uniqueness but if you have
> concurrent inserts not necessarily ordering.
>
Right you are, but I have no concurrent inserts from the same node.

>
> Difficult to say what's happening since you don't supply any EXPLAIN
> ANALYSE output.
>
> However, if you have an index on (node,id) you might want to try:
>   SELECT ... ORDER BY node DESC, id DESC LIMIT 1;
> That way the "ORDER BY" part clearly tells the planner that a
> reverse-order on your index will be useful.
>
Thanks a lot, that did the trick !

explain analyze SELECT * FROM my.table WHERE node = '10' ORDER BY id
DESC LIMIT 1

QUERY
PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..764.00 rows=1 width=246) (actual
time=1874.890..1874.896 rows=1 loops=1)
   ->  Index Scan Backward using table_pkey on table
(cost=0.00..4347913.94 rows=5691 width=246) (actual
time=1874.867..1874.867 rows=1 loops=1)
         Filter: ((node)::text = '10'::text)
 Total runtime: 1875.111 ms

explain analyze SELECT * FROM my.table WHERE node = '10' ORDER BY node,
id DESC LIMIT 1
                                                                 QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=22638.36..22638.36 rows=1 width=246) (actual
time=3.001..3.007 rows=1 loops=1)
   ->  Sort  (cost=22638.36..22652.59 rows=5691 width=246) (actual
time=2.984..2.984 rows=1 loops=1)
         Sort Key: node, id
         ->  Index Scan using node_date on table  (cost=0.00..21898.65
rows=5691 width=246) (actual time=0.077..1.852 rows=62 loops=1)
               Index Cond: ((node)::text = '10'::text)
 Total runtime: 3.127 ms


Poul

Re: Optimizing query

From

Dennis Bjorklund

Date:

15 August 2005, 07:11:15

On Mon, 15 Aug 2005, Poul Møller Hansen wrote:

> I have a problem creating a usable index for the following simple query:
> SELECT * FROM my.table WHERE node = '10' ORDER BY id DESC LIMIT 1
>
> id is a serial, so the query is to find the latest entry to a given node
> and id is the primary key.
> The table contains around 1 million records and the query takes around 2
> seconds.
>
> I have tried to make an index on node and also on both id & node, but is
> doesn't lower the
> query time.

Try to make an index on (node,id) and write the query as:

SELECT * FROM my.table WHERE node = '10' ORDER BY node desc, id desc LIMIT 1;

Then i'm pretty sure it will use that index.

--
/Dennis Björklund

Re: Optimizing query

From

Tom Lane

Date:

15 August 2005, 10:46:17

=?ISO-8859-1?Q?Poul_M=F8ller_Hansen?= <freebsd@pbnet.dk> writes:
> explain analyze SELECT * FROM my.table WHERE node = '10' ORDER BY node,
> id DESC LIMIT 1
>                                                                  QUERY
> PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=22638.36..22638.36 rows=1 width=246) (actual
> time=3.001..3.007 rows=1 loops=1)
>    ->  Sort  (cost=22638.36..22652.59 rows=5691 width=246) (actual
> time=2.984..2.984 rows=1 loops=1)
>          Sort Key: node, id
>          ->  Index Scan using node_date on table  (cost=0.00..21898.65
> rows=5691 width=246) (actual time=0.077..1.852 rows=62 loops=1)
>                Index Cond: ((node)::text = '10'::text)
>  Total runtime: 3.127 ms

You're not there yet: you want what Richard said, namely

explain analyze SELECT * FROM my.table WHERE node = '10' ORDER BY node DESC,
id DESC LIMIT 1

There shouldn't be any Sort in the plan, just the indexscan and Limit.
The plan above is going to suck if there are a lot of rows with node = '10'.

            regards, tom lane

Re: Optimizing query

From

Poul Møller Hansen

Date:

15 August 2005, 11:54:17

>
> You're not there yet: you want what Richard said, namely
>

I realized that it wasn't optimal for all nodes, namely those with a lot
of rows.

So you are absolutely right, I followed the suggestion of Richard and it
works perfect.
Thank you all, I learned a lesson of indexes today...


Poul

Question about the NAME type used in pg_proc and pg_class

From

Tony Caduto

Date:

15 August 2005, 12:06:27

Hi,
Just had a quick question about the name type used by pg_proc and
pg_class etc to return the name of a function,table,seq,view etc.

Is this type limited to 64  bytes?  ( could not find it in the docs)

Must  function/table names be limited to 64 characters in length?

Thanks,

Tony

Re: Question about the NAME type used in pg_proc and pg_class

From

Tony Caduto

Date:

15 August 2005, 12:17:00

Never mind, I found it.
I just did not scroll down on the page to see the "Special Char" types.

Thanks,

Tony

> Hi,
> Just had a quick question about the name type used by pg_proc and
> pg_class etc to return the name of a function,table,seq,view etc.
>
> Is this type limited to 64  bytes?  ( could not find it in the docs)
>
> Must  function/table names be limited to 64 characters in length?
>

Re: Question about the NAME type used in pg_proc and pg_class

From

Michael Fuhr

Date:

15 August 2005, 12:28:55

On Mon, Aug 15, 2005 at 10:02:01AM -0500, Tony Caduto wrote:
> Just had a quick question about the name type used by pg_proc and
> pg_class etc to return the name of a function,table,seq,view etc.
>
> Is this type limited to 64  bytes?  ( could not find it in the docs)

See the "Character Types" section of the "Data Types" chapter:

http://www.postgresql.org/docs/8.0/static/datatype-character.html

"The name type exists _only_ for storage of identifiers in the internal
system catalogs and is not intended for use by the general user.  Its
length is currently defined as 64 bytes (63 usable characters plus
terminator) but should be referenced using the constant NAMEDATALEN.
The length is set at compile time (and is therefore adjustable for
special uses); the default maximum length may change in a future
release."

> Must  function/table names be limited to 64 characters in length?

See "Identifiers and Key Words" in the "SQL Syntax" chapter:

http://www.postgresql.org/docs/8.0/static/sql-syntax.html#SQL-SYNTAX-IDENTIFIERS

"The system uses no more than NAMEDATALEN-1 characters of an identifier;
longer names can be written in commands, but they will be truncated.  By
default, NAMEDATALEN is 64 so the maximum identifier length is 63."

--
Michael Fuhr

Re: Question about the NAME type used in pg_proc and pg_class

From

Tom Lane

Date:

15 August 2005, 13:53:32

Michael Fuhr <mike@fuhr.org> writes:
> See "Identifiers and Key Words" in the "SQL Syntax" chapter:

> "The system uses no more than NAMEDATALEN-1 characters of an identifier;
> longer names can be written in commands, but they will be truncated.  By
> default, NAMEDATALEN is 64 so the maximum identifier length is 63."

This limit also applies to operator names, and I just noticed that
scan.l isn't enforcing the limit for operators.  In a build with asserts
enabled this leads to an assertion failure :-(

regression=# select 1 *********************************************************************************** 2;
server closed the connection unexpectedly

with this in the log:
TRAP: FailedAssertion("!(keylen < 64)", File: "hashfunc.c", Line: 129)

I believe that there would be no real ill effect in a non-assertion
build, it would just say it couldn't find the operator.  Too lazy to
recompile that way to find out though.

I kinda think that truncation isn't a real sensible way to deal with
overly long operator names anyway, and that throwing an ERROR would be
more reasonable; if the scanner thinks it is looking at an 80-character
operator name, you've probably messed up the syntax somewhere along the
line.  Comments?

            regards, tom lane

Re: Question about the NAME type used in pg_proc and pg_class

From

Michael Fuhr

Date:

15 August 2005, 16:11:20

On Mon, Aug 15, 2005 at 12:53:25PM -0400, Tom Lane wrote:
>
> regression=# select 1 *********************************************************************************** 2;
> server closed the connection unexpectedly
>
> with this in the log:
> TRAP: FailedAssertion("!(keylen < 64)", File: "hashfunc.c", Line: 129)
>
> I believe that there would be no real ill effect in a non-assertion
> build, it would just say it couldn't find the operator.  Too lazy to
> recompile that way to find out though.

Confirmed.  The following is from REL8_0_STABLE built without assertions:

test=> select 1 *********************************************************************************** 2;
ERROR:  operator does not exist: integer
***********************************************************************************integer 
HINT:  No operator matches the given name and argument type(s). You may need to add explicit type casts.

> I kinda think that truncation isn't a real sensible way to deal with
> overly long operator names anyway, and that throwing an ERROR would be
> more reasonable; if the scanner thinks it is looking at an 80-character
> operator name, you've probably messed up the syntax somewhere along the
> line.  Comments?

An error sounds reasonable to me.

--
Michael Fuhr