Interesting behaviour ! - Mailing list pgsql-hackers

From Constantin Teodorescu
Subject Interesting behaviour !
Date
Msg-id 378C394C.3487DBAC@flex.ro
Whole thread Raw
Responses Re: [HACKERS] Interesting behaviour !
List pgsql-hackers
Please Cc: to teo@flex.ro 

I have a table called "note" that looks like this :

create table "note" (id serial,perioada int2,schema int2,explicatie
text,...);

The "note" table has 22.000 records and the record length is about 75
bytes (is has also a "text" field").

Because I am frequently accesing the table with queries like "... where
perioada=12" I was tempted to make also indexes on "perioada" and
"schema" field.

The tables have the following sizes (their file sizes into
/usr/local/pgsql/data/base....)

note         2.890 Kb
note_id        385 Kb
note_perioada  409 Kb
note_schema    466 Kb      

I ran previusly "vacuum analyze" on that database ensuring that
statistical tables have been updated.

Trying some selects with explain I got the following results:

contabil=> explain select * from note where id=15;
NOTICE:  QUERY PLAN:
Index Scan using note_id on note  (cost=2.05 rows=2 width=87)  


contabil=> explain select * from note where perioada=15;
NOTICE:  QUERY PLAN:
Seq Scan on note  (cost=1099.99 rows=1600 width=87)


contabil=> explain select * from note where schema=15;
NOTICE:  QUERY PLAN:
Seq Scan on note  (cost=1099.99 rows=432 width=87)


That means that searching on "perioada" field don't use "note_perioada"
index!!!

I know that the query optimisation take care of record lengths, table
sizes, index sizes, but I thought that in this case it will use
"note_perioada" index.

The distribution of "perioada" values within "note" records is like that
:

contabil=> select perioada,count(*) from note group by perioada;
perioada|count
--------+-----      4|    2      7|   66      8|  108      9|  135     10|  151     11|  146     12| 4468     13| 3045
  14| 3377     15| 3207     16| 3100     17| 3039     18| 1789     19|    1     22|    2
 
(15 rows)  

So, I think that PostgreSQL is doing right when he chooses not to use
"note_perioada" index for that type of query by comparing different
costs (althought it still remains strange at the first look).

Is there any chance to speed up that type of simple query (select * from
note where perioada=N) ?

I dropped the index and try with a "hash" index on the same "perioada"
field. The same result.

In this case, it seems that the "note_perioada" index will never be
used. That means it can be safely dropped without affecting the
application performance, isn't it? It is expected that the database will
grow in the same manner, with approx. the same nr. of records per
"perioada" field every month.

Best regards,

Please Cc: to teo@flex.ro

===============================
Constantin Teodorescu
FLEX Consulting Braila, ROMANIA


pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: RE: [HACKERS] Updated TODO list
Next
From: Nuchanach Klinjun
Date:
Subject: upgrade problem