Thread: Indexes not used in 7.1RC4: Bug?

Indexes not used in 7.1RC4: Bug?

From
Alvar Freude
Date:
Hi,

I have the following table, containing about 570000 Rows, but some
indexes are not used, on 7.1RC4, freshly vacuumed (analyse). It was the
same at least in 7.1RC1


 --- SNIP ---

      CREATE TABLE access_log(
         site_id     int2        NOT NULL DEFAULT 0,
         access_time timestamp   NOT NULL DEFAULT NOW(),
         time_taken  interval    NOT NULL,
         remote_ip   inet        NOT NULL,
         method_num  int2        NOT NULL,
         url_id      int4        NOT NULL REFERENCES urls(id),
         referer_id  int4                 REFERENCES referer(id),
         browser_id  int4                 REFERENCES browser(id),
         status      int2        NOT NULL DEFAULT 0,
         bytes       int4        NOT NULL DEFAULT 0,
         content_id  int2        NOT NULL REFERENCES content_types(id),

         https_flag  boolean     NOT NULL DEFAULT 'f',

         session_id  char(32),
         user_id     int4                 REFERENCES users(id),
         uname       varchar(255),

         note_id     int4

         );

      CREATE INDEX site_idx         ON access_log(site_id);
      CREATE INDEX access_time_idx  ON access_log(access_time);
      CREATE INDEX time_taken_idx   ON access_log(time_taken);
      CREATE INDEX remote_ip_idx    ON access_log(remote_ip);
      CREATE INDEX method_idx       ON access_log(method_num);
      CREATE INDEX url_idx          ON access_log(url_id);
      CREATE INDEX referer_idx      ON access_log(referer_id);
      CREATE INDEX browser_idx      ON access_log(browser_id);
      CREATE INDEX status_idx       ON access_log(status);
      CREATE INDEX bytes_idx        ON access_log(bytes);
      CREATE INDEX content_idx      ON access_log(content_id);
      CREATE INDEX https_idx        ON access_log(https_flag);
      CREATE INDEX session_idx      ON access_log(session_id);
      CREATE INDEX user_id_idx      ON access_log(user_id);
      CREATE INDEX user_idx         ON access_log(uname);
      CREATE INDEX note_idx         ON access_log(note_id);


 --- SNAP ---


url_idx seems OK:

  logger=# EXPLAIN SELECT * FROM access_log WHERE url_id = 1000;
  Index Scan using url_idx on access_log
     (cost=0.00..3618.92 rows=1002 width=89)



But the others not:

  logger=# EXPLAIN SELECT * FROM access_log WHERE method_num = 0;
  Seq Scan on access_log  (cost=0.00..16443.71 rows=559371 width=89)

  logger=# EXPLAIN SELECT * FROM access_log WHERE browser_id = 500;
  Seq Scan on access_log  (cost=0.00..16443.71 rows=7935 width=89)

  logger=# EXPLAIN SELECT * FROM access_log WHERE content_id = 20;
  Seq Scan on access_log  (cost=0.00..16443.71 rows=20579 width=89)

....


And very strange:

  logger=# EXPLAIN SELECT * FROM access_log WHERE access_time >
           '2001-04-10 10:10:10';
  Index Scan using access_time_idx on access_log
           (cost=0.00..10605.12 rows=3251 width=89)

  logger=# EXPLAIN SELECT * FROM access_log WHERE access_time >
           '2001-04-08 10:10:10';
  Seq Scan on access_log  (cost=0.00..16443.71 rows=152292 width=89)



Indexes are also not used in Subselects:


  logger=# EXPLAIN SELECT * FROM access_log WHERE url_id in (1);
  Index Scan using url_idx on access_log
           (cost=0.00..3618.92 rows=1002 width=89)

  logger=# EXPLAIN SELECT * FROM access_log WHERE url_id in (1,2,3);
  Index Scan using url_idx, url_idx, url_idx on access_log
           (cost=0.00..10871.79 rows=3000 width=89)


But:

  logger=# EXPLAIN SELECT * FROM access_log
                           WHERE url_id IN (SELECT 1);
  Seq Scan on access_log  (cost=0.00..16443.71 rows=572537 width=89)
  SubPlan
    ->  Materialize  (cost=0.00..0.00 rows=0 width=0)
          ->  Result  (cost=0.00..0.00 rows=0 width=0)



Indexes are also not used for remote_ip, ORDER BY access_time
(timestamp), ORDER BY time_taken (interval), status, method_num etc. The
only I found where indexes are used is url_id!

hmmm, any hints? Bug?


Postgres configuration is default for all optimizations (geqo_...);
others:

  sort_mem       = 1024
  shared_buffers = 512


Tested on Linux and Win2K/Cygwin.


For the hackers: explain verbose follows:

logger=# EXPLAIN VERBOSE SELECT * FROM access_log WHERE url_id = 1000;
NOTICE:  QUERY DUMP:

{ INDEXSCAN :startup_cost 0.00 :total_cost 3618.92 :rows 1002 :width 89
:qptargetlist ({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 21
:restypmod -1 :resname site_id :reskey 0 :reskeyop 0 :ressortgroupref 0
:resjunk false } :expr { VAR :varno 1 :varattno 1 :vartype 21 :vartypmod
-1  :varlevelsup 0 :varnoold 1 :varoattno 1}} { TARGETENTRY :resdom {
RESDOM :resno 2 :restype 1184 :restypmod -1 :resname access_time :reskey
0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 1
:varattno 2 :vartype 1184 :vartypmod -1  :varlevelsup 0 :varnoold 1
:varoattno 2}} { TARGETENTRY :resdom { RESDOM :resno 3 :restype 1186
:restypmod -1 :resname time_taken :reskey 0 :reskeyop 0 :ressortgroupref
0 :resjunk false } :expr { VAR :varno 1 :varattno 3 :vartype 1186
:vartypmod -1  :varlevelsup 0 :varnoold 1 :varoattno 3}} { TARGETENTRY
:resdom { RESDOM :resno 4 :restype 869 :restypmod -1 :resname remote_ip
:reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR
:varno 1 :varattno 4 :vartype 869 :vartypmod -1  :varlevelsup 0
:varnoold 1 :varoattno 4}} { TARGETENTRY :resdom { RESDOM :resno 5
:restype 21 :restypmod -1 :resname method_num :reskey 0 :reskeyop 0
:ressortgroupref 0 :resjunk false } :expr { VAR :varno 1 :varattno 5
:vartype 21 :vartypmod -1  :varlevelsup 0 :varnoold 1 :varoattno 5}} {
TARGETENTRY :resdom { RESDOM :resno 6 :restype 23 :restypmod -1 :resname
url_id :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr {
VAR :varno 1 :varattno 6 :vartype 23 :vartypmod -1  :varlevelsup 0
:varnoold 1 :varoattno 6}} { TARGETENTRY :resdom { RESDOM :resno 7
:restype 23 :restypmod -1 :resname referer_id :reskey 0 :reskeyop 0
:ressortgroupref 0 :resjunk false } :expr { VAR :varno 1 :varattno 7
:vartype 23 :vartypmod -1  :varlevelsup 0 :varnoold 1 :varoattno 7}} {
TARGETENTRY :resdom { RESDOM :resno 8 :restype 23 :restypmod -1 :resname
browser_id :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false }
:expr { VAR :varno 1 :varattno 8 :vartype 23 :vartypmod -1  :varlevelsup
0 :varnoold 1 :varoattno 8}} { TARGETENTRY :resdom { RESDOM :resno 9
:restype 21 :restypmod -1 :resname status :reskey 0 :reskeyop 0
:ressortgroupref 0 :resjunk false } :expr { VAR :varno 1 :varattno 9
:vartype 21 :vartypmod -1  :varlevelsup 0 :varnoold 1 :varoattno 9}} {
TARGETENTRY :resdom { RESDOM :resno 10 :restype 23 :restypmod -1
:resname bytes :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false }
:expr { VAR :varno 1 :varattno 10 :vartype 23 :vartypmod -1
:varlevelsup 0 :varnoold 1 :varoattno 10}} { TARGETENTRY :resdom {
RESDOM :resno 11 :restype 21 :restypmod -1 :resname content_id :reskey 0
:reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 1
:varattno 11 :vartype 21 :vartypmod -1  :varlevelsup 0 :varnoold 1
:varoattno 11}} { TARGETENTRY :resdom { RESDOM :resno 12 :restype 16
:restypmod -1 :resname https_flag :reskey 0 :reskeyop 0 :ressortgroupref
0 :resjunk false } :expr { VAR :varno 1 :varattno 12 :vartype 16
:vartypmod -1  :varlevelsup 0 :varnoold 1 :varoattno 12}} { TARGETENTRY
:resdom { RESDOM :resno 13 :restype 1042 :restypmod 36 :resname
session_id :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false }
:expr { VAR :varno 1 :varattno 13 :vartype 1042 :vartypmod 36
:varlevelsup 0 :varnoold 1 :varoattno 13}} { TARGETENTRY :resdom {
RESDOM :resno 14 :restype 23 :restypmod -1 :resname user_id :reskey 0
:reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 1
:varattno 14 :vartype 23 :vartypmod -1  :varlevelsup 0 :varnoold 1
:varoattno 14}} { TARGETENTRY :resdom { RESDOM :resno 15 :restype 1043
:restypmod 259 :resname uname :reskey 0 :reskeyop 0 :ressortgroupref 0
:resjunk false } :expr { VAR :varno 1 :varattno 15 :vartype 1043
:vartypmod 259  :varlevelsup 0 :varnoold 1 :varoattno 15}} { TARGETENTRY
:resdom { RESDOM :resno 16 :restype 23 :restypmod -1 :resname note_id
:reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR
:varno 1 :varattno 16 :vartype 23 :vartypmod -1  :varlevelsup 0
:varnoold 1 :varoattno 16}}) :qpqual <> :lefttree <> :righttree <>
:extprm () :locprm () :initplan <> :nprm 0  :scanrelid 1 :indxid (
1870492) :indxqual (({ EXPR :typeOid 16  :opType op :oper { OPER :opno
96 :opid 65 :opresulttype 16 } :args ({ VAR :varno 1 :varattno 1
:vartype 23 :vartypmod -1  :varlevelsup 0 :varnoold 1 :varoattno 6} {
CONST :consttype 23 :constlen 4 :constbyval true :constisnull false
:constvalue  4 [ -24 3 0 0 ] })})) :indxqualorig (({ EXPR :typeOid 16
:opType op :oper { OPER :opno 96 :opid 65 :opresulttype 16 } :args ({
VAR :varno 1 :varattno 6 :vartype 23 :vartypmod -1  :varlevelsup 0
:varnoold 1 :varoattno 6} { CONST :consttype 23 :constlen 4 :constbyval
true :constisnull false :constvalue  4 [ -24 3 0 0 ] })})) :indxorderdir
1 }
NOTICE:  QUERY PLAN:

Index Scan using url_idx on access_log  (cost=0.00..3618.92 rows=1002
width=89)

EXPLAIN



logger=# EXPLAIN VERBOSE SELECT * FROM access_log WHERE referer_id =
1000;
NOTICE:  QUERY DUMP:

{ SEQSCAN :startup_cost 0.00 :total_cost 16443.71 :rows 11715 :width 89
:qptargetlist ({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 21
:restypmod -1 :resname site_id :reskey 0 :reskeyop 0 :ressortgroupref 0
:resjunk false } :expr { VAR :varno 1 :varattno 1 :vartype 21 :vartypmod
-1  :varlevelsup 0 :varnoold 1 :varoattno 1}} { TARGETENTRY :resdom {
RESDOM :resno 2 :restype 1184 :restypmod -1 :resname access_time :reskey
0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 1
:varattno 2 :vartype 1184 :vartypmod -1  :varlevelsup 0 :varnoold 1
:varoattno 2}} { TARGETENTRY :resdom { RESDOM :resno 3 :restype 1186
:restypmod -1 :resname time_taken :reskey 0 :reskeyop 0 :ressortgroupref
0 :resjunk false } :expr { VAR :varno 1 :varattno 3 :vartype 1186
:vartypmod -1  :varlevelsup 0 :varnoold 1 :varoattno 3}} { TARGETENTRY
:resdom { RESDOM :resno 4 :restype 869 :restypmod -1 :resname remote_ip
:reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR
:varno 1 :varattno 4 :vartype 869 :vartypmod -1  :varlevelsup 0
:varnoold 1 :varoattno 4}} { TARGETENTRY :resdom { RESDOM :resno 5
:restype 21 :restypmod -1 :resname method_num :reskey 0 :reskeyop 0
:ressortgroupref 0 :resjunk false } :expr { VAR :varno 1 :varattno 5
:vartype 21 :vartypmod -1  :varlevelsup 0 :varnoold 1 :varoattno 5}} {
TARGETENTRY :resdom { RESDOM :resno 6 :restype 23 :restypmod -1 :resname
url_id :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr {
VAR :varno 1 :varattno 6 :vartype 23 :vartypmod -1  :varlevelsup 0
:varnoold 1 :varoattno 6}} { TARGETENTRY :resdom { RESDOM :resno 7
:restype 23 :restypmod -1 :resname referer_id :reskey 0 :reskeyop 0
:ressortgroupref 0 :resjunk false } :expr { VAR :varno 1 :varattno 7
:vartype 23 :vartypmod -1  :varlevelsup 0 :varnoold 1 :varoattno 7}} {
TARGETENTRY :resdom { RESDOM :resno 8 :restype 23 :restypmod -1 :resname
browser_id :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false }
:expr { VAR :varno 1 :varattno 8 :vartype 23 :vartypmod -1  :varlevelsup
0 :varnoold 1 :varoattno 8}} { TARGETENTRY :resdom { RESDOM :resno 9
:restype 21 :restypmod -1 :resname status :reskey 0 :reskeyop 0
:ressortgroupref 0 :resjunk false } :expr { VAR :varno 1 :varattno 9
:vartype 21 :vartypmod -1  :varlevelsup 0 :varnoold 1 :varoattno 9}} {
TARGETENTRY :resdom { RESDOM :resno 10 :restype 23 :restypmod -1
:resname bytes :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false }
:expr { VAR :varno 1 :varattno 10 :vartype 23 :vartypmod -1
:varlevelsup 0 :varnoold 1 :varoattno 10}} { TARGETENTRY :resdom {
RESDOM :resno 11 :restype 21 :restypmod -1 :resname content_id :reskey 0
:reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 1
:varattno 11 :vartype 21 :vartypmod -1  :varlevelsup 0 :varnoold 1
:varoattno 11}} { TARGETENTRY :resdom { RESDOM :resno 12 :restype 16
:restypmod -1 :resname https_flag :reskey 0 :reskeyop 0 :ressortgroupref
0 :resjunk false } :expr { VAR :varno 1 :varattno 12 :vartype 16
:vartypmod -1  :varlevelsup 0 :varnoold 1 :varoattno 12}} { TARGETENTRY
:resdom { RESDOM :resno 13 :restype 1042 :restypmod 36 :resname
session_id :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false }
:expr { VAR :varno 1 :varattno 13 :vartype 1042 :vartypmod 36
:varlevelsup 0 :varnoold 1 :varoattno 13}} { TARGETENTRY :resdom {
RESDOM :resno 14 :restype 23 :restypmod -1 :resname user_id :reskey 0
:reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 1
:varattno 14 :vartype 23 :vartypmod -1  :varlevelsup 0 :varnoold 1
:varoattno 14}} { TARGETENTRY :resdom { RESDOM :resno 15 :restype 1043
:restypmod 259 :resname uname :reskey 0 :reskeyop 0 :ressortgroupref 0
:resjunk false } :expr { VAR :varno 1 :varattno 15 :vartype 1043
:vartypmod 259  :varlevelsup 0 :varnoold 1 :varoattno 15}} { TARGETENTRY
:resdom { RESDOM :resno 16 :restype 23 :restypmod -1 :resname note_id
:reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR
:varno 1 :varattno 16 :vartype 23 :vartypmod -1  :varlevelsup 0
:varnoold 1 :varoattno 16}}) :qpqual ({ EXPR :typeOid 16  :opType op
:oper { OPER :opno 96 :opid 65 :opresulttype 16 } :args ({ VAR :varno 1
:varattno 7 :vartype 23 :vartypmod -1  :varlevelsup 0 :varnoold 1
:varoattno 7} { CONST :consttype 23 :constlen 4 :constbyval true
:constisnull false :constvalue  4 [ -24 3 0 0 ] })}) :lefttree <>
:righttree <> :extprm () :locprm () :initplan <> :nprm 0  :scanrelid 1 }
NOTICE:  QUERY PLAN:

Seq Scan on access_log  (cost=0.00..16443.71 rows=11715 width=89)

EXPLAIN



Thanks & Ciao

  Alvar

--
AGI
Magirusstrasse 21B, 70469 Stuttgart
Fon +49 (0)711.228 74-50, Fax +49 (0)711.228 74-88
+++news+++news+++news+++
Beste Image-Website 2001 kommt von AGI
http://www.agi.de/tagebuch
http://www.agi.com/diary (english)

Re: Indexes not used in 7.1RC4: Bug?

From
Thomas Lockhart
Date:
> I have the following table, containing about 570000 Rows, but some
> indexes are not used, on 7.1RC4, freshly vacuumed (analyse). It was the
> same at least in 7.1RC1
>       CREATE TABLE access_log(
>          access_time timestamp   NOT NULL DEFAULT NOW(),
>          method_num  int2        NOT NULL,
>          url_id      int4        NOT NULL REFERENCES urls(id),
>          );
>       CREATE INDEX method_idx       ON access_log(method_num);
>       CREATE INDEX url_idx          ON access_log(url_id);
> url_idx seems OK:
> But the others not:
>   logger=# EXPLAIN SELECT * FROM access_log WHERE method_num = 0;
>   Seq Scan on access_log  (cost=0.00..16443.71 rows=559371 width=89)

The parser does not know that your int4 constant "0" can be represented
as an int2. Try

  SELECT * FROM access_log WHERE method_num = int2 '0';

(note the type coersion on the constant; there are other ways of
specifying the same thing).

For the other cases, PostgreSQL is estimating the query cost to be lower
with a sequential scan. For the "SELECT 1" subselect case, it may be
that the optimizer does not cheat and determine that there will be only
one row returned, or that the query can be reformulated to use a simple
constant.

HTH

                     - Thomas

Re: Indexes not used in 7.1RC4: Bug?

From
Stephan Szabo
Date:
> url_idx seems OK:
>
>   logger=# EXPLAIN SELECT * FROM access_log WHERE url_id = 1000;
>   Index Scan using url_idx on access_log
>      (cost=0.00..3618.92 rows=1002 width=89)
>
>
>
> But the others not:
>
>   logger=# EXPLAIN SELECT * FROM access_log WHERE method_num = 0;
>   Seq Scan on access_log  (cost=0.00..16443.71 rows=559371 width=89)

In this case you'd need to coerce the 0 to int2 in any case, but it
figures that most of your rows are returned so a sequence scan will
be faster.  For an index scan, it's got to seek around the heap file
doing random access to determine if rows are visible to your transaction
which is more expensive than sequential reads, so at some point the
optimizer will guess the sequence scan to be faster.

>   logger=# EXPLAIN SELECT * FROM access_log WHERE browser_id = 500;
>   Seq Scan on access_log  (cost=0.00..16443.71 rows=7935 width=89)
>
>   logger=# EXPLAIN SELECT * FROM access_log WHERE content_id = 20;
>   Seq Scan on access_log  (cost=0.00..16443.71 rows=20579 width=89)

Not sure on these two, it probably is estimating less disk access
for doing the sequence scan, my guess would be that the break point
is probably somewhere between the 1000 and 8000 row point for the
two queries.  And I believe the second was an int2, so you'll need
to cast the 20.

> And very strange:
>
>   logger=# EXPLAIN SELECT * FROM access_log WHERE access_time >
>            '2001-04-10 10:10:10';
>   Index Scan using access_time_idx on access_log
>            (cost=0.00..10605.12 rows=3251 width=89)
>
>   logger=# EXPLAIN SELECT * FROM access_log WHERE access_time >
>            '2001-04-08 10:10:10';
>   Seq Scan on access_log  (cost=0.00..16443.71 rows=152292 width=89)

Same as above, for 3000 rows it thinks index scan will be faster,
for 152000 rows the sequence scan.

> But:
>
>   logger=# EXPLAIN SELECT * FROM access_log
>                            WHERE url_id IN (SELECT 1);
>   Seq Scan on access_log  (cost=0.00..16443.71 rows=572537 width=89)
>   SubPlan
>     ->  Materialize  (cost=0.00..0.00 rows=0 width=0)
>           ->  Result  (cost=0.00..0.00 rows=0 width=0)

My guess is it doesn't realize that SELECT 1 is a constant that it
can use the index for.  IN (subselect) isn't handled very well right
now.

> Indexes are also not used for remote_ip, ORDER BY access_time
> (timestamp), ORDER BY time_taken (interval), status, method_num etc. The
> only I found where indexes are used is url_id!

Any of the int2s will require explicit casting of a constant in order
to use the index.  I'm not sure on the others.


Re: Indexes not used in 7.1RC4: Bug?

From
Alvar Freude
Date:
Thomas Lockhart wrote:
>
> The parser does not know that your int4 constant "0" can be represented
> as an int2. Try
>
>   SELECT * FROM access_log WHERE method_num = int2 '0';

hmmm, but its still a sequentiell scan:


  logger=# explain SELECT * FROM access_log
                           WHERE method_num = int2 '0';
  Seq Scan on access_log  (cost=0.00..16443.71 rows=559371 width=89)

But:
Now I realised: the number of rows! :)
If I make "WHERE method_num = int2 '2', then the index is used,
interesting -- so it seems that the optimizer uses the value of the
WHERE clause to check what might be faster and guesses, that an index
scan is more overhead and slower. Nice!


> For the other cases, PostgreSQL is estimating the query cost to be lower
> with a sequential scan.

hm, OK, but I guess, that he is estimating wrong ;)

After re-reading the using-explain chapter in the docs I guess I
understand the problems of estimating the number of rows ...


Do you have any hints how to optimize the ..._cost-Values?

Perhaps it is possible to write a test program, which checks out some
good ..._cost-Values -- I'm volunteer, but I guess it should possible
for this to force some optimizer results to measure the real time some
different methods cost.



> For the "SELECT 1" subselect case, it may be that the optimizer does not cheat and > determine that there will be
only
> one row returned, or that the query can be reformulated to use a simple
> constant.

yes, it was only an example -- i hope nobody is really so stupid and
uses a "select 1" subselect ;)

It might be an optimization, that the hole subselect is performed before
the outer select is called, so the result of the subselect can be used
in the query planer.


Ciao
  Alvar

--
AGI
Magirusstrasse 21B, 70469 Stuttgart
Fon +49 (0)711.228 74-50, Fax +49 (0)711.228 74-88
+++news+++news+++news+++
Beste Image-Website 2001 kommt von AGI
http://www.agi.de/tagebuch
http://www.agi.com/diary (english)

Re: Indexes not used in 7.1RC4: Bug?

From
Mark Butler
Date:
Thomas Lockhart wrote:

> The parser does not know that your int4 constant "0" can be represented
> as an int2. Try
> 
>   SELECT * FROM access_log WHERE method_num = int2 '0';
> 
> (note the type coersion on the constant; there are other ways of
> specifying the same thing).

Surely this is something that should be fixed.  An int2 column ought to behave
exactly like an int4 with a CHECK() constraint forcing the value to be in
range. 

In object oriented terms:
 a smallint isA integer a integer isA bigint

Likewise:
 a integer isA smallint if it falls in -32768..32767 a bigint isA integer if it falls in -2147483648..2147483647

Similar promotion rules should apply for all other numeric types. Any floating
point value without a fractional part should be treated exactly like a big
integer.

The issues here are closely related to the 7.1 changes in INHERITS semantics. 
If any operator treats a smaller precision (more highly constrained) type in a
materially different way than a compatible higher precision type, it is
fundamentally broken for exactly the same reason that we expect a query on a
super-class would be if if did not return all matching instances of every sub
class.

If a function is overloaded with multiple compatible scalar data types, the
database should be free to call any matching implementation after performing
an arbitrary number of *lossless* compatible type conversions.

i.e. if you have f(smallint), f(integer), and f(double) the actual function
called by f(0) should be undefined.  The distinction between smallint '0',
integer '0', and double '0' is meaningless and should be explicitly ignored.

This is a little extreme, but I do not think it makes a lot of sense to
maintain semantic differences between different representations of the same
number. (Oracle certainly doesn't)

Any comments?

- Mark Butler


Re: Indexes not used in 7.1RC4: Bug?

From
Thomas Lockhart
Date:
Hmm. The problem is as you describe, but the requirements for a solution
are more severe than you (or I) would hope. 

We would like to have an extensible mechanism for type promotion and
demotion, but it is not (yet) clear how to implement it. In this case,
we must demote a constant assigned as "int4" by the parser into an
"int2" to be directly comparable to the indexed column. We could
probably do this with some hack code as a brute-force exercise, but no
one has yet bothered (patches welcome ;) But in general, we must handle
the case that the specified constraint is *not* directly convertible to
the indexed type (e.g. is out of range) even though this would seem to
reduce to a choice between a trivial noop or a sequential scan of the
entire table. If we can do this without cluttering up the code too much,
we should go ahead and do it, but it has apparently been a low priority.
                        - Thomas


Speaking of Indexing... (Text indexing)

From
Poet/Joshua Drake
Date:
Good day,

I've been experimenting a bit with Full Text Indexing in PostgreSQL. I
have found several conflicting sites various places on the net pertaining
to whether or not PostgreSQL supports FTI, and I was hoping I could find
an authoritative answer here - I tried searching the website's archives,
but the search seems to be having some problems.

At any rate, I am running a CVS snapshot of 7.1, and I have been trying to
create a full text index on a series of resumes. Some of these exceed 8k
in size, which is no longer a storage problem of course with 7.1, but I
seem to have run into the wicked 8k once again. Specifically:

ERROR:  index_formtuple: data takes 9344 bytes, max is 8191

Furthermore, after trying to just index on a 8191-character long substring
of the resume, I run into the following:

ERROR:  btree: index item size 3948 exceeds maximum 2713

The only way I could actually get the index created was to substring the
body of the resumes down to 2k. I also later tried using HASH rather than
BTREE, which worked, but none of these solutions really appreciably
increased performance in the way we were hoping.

Are these known and accepted limitations of the current 7.1
implementation, or am I doing something terribly wrong? ;)
On Tue, 10 Apr 2001, Thomas Lockhart wrote:

>> I have the following table, containing about 570000 Rows, but some
>> indexes are not used, on 7.1RC4, freshly vacuumed (analyse). It was the
>> same at least in 7.1RC1
>>       CREATE TABLE access_log(
>>          access_time timestamp   NOT NULL DEFAULT NOW(),
>>          method_num  int2        NOT NULL,
>>          url_id      int4        NOT NULL REFERENCES urls(id),
>>          );
>>       CREATE INDEX method_idx       ON access_log(method_num);
>>       CREATE INDEX url_idx          ON access_log(url_id);
>> url_idx seems OK:
>> But the others not:
>>   logger=# EXPLAIN SELECT * FROM access_log WHERE method_num = 0;
>>   Seq Scan on access_log  (cost=0.00..16443.71 rows=559371 width=89)
>
>The parser does not know that your int4 constant "0" can be represented
>as an int2. Try
>
>  SELECT * FROM access_log WHERE method_num = int2 '0';
>
>(note the type coersion on the constant; there are other ways of
>specifying the same thing).
>
>For the other cases, PostgreSQL is estimating the query cost to be lower
>with a sequential scan. For the "SELECT 1" subselect case, it may be
>that the optimizer does not cheat and determine that there will be only
>one row returned, or that the query can be reformulated to use a simple
>constant.
>
>HTH
>
>                     - Thomas
>
>---------------------------(end of broadcast)---------------------------
>TIP 4: Don't 'kill -9' the postmaster
>

--
--
<COMPANY>CommandPrompt    - http://www.commandprompt.com    </COMPANY>
<PROJECT>OpenDocs, LLC.    - http://www.opendocs.org    </PROJECT>
<PROJECT>LinuxPorts     - http://www.linuxports.com     </PROJECT>
<WEBMASTER>LDP        - http://www.linuxdoc.org    </WEBMASTER>
--
Instead of asking why a piece of software is using "1970s technology,"
start asking why software is ignoring 30 years of accumulated wisdom.
--


Re: Extensible mechanism for type promotion / demotion

From
Mark Butler
Date:
I believe that the basis for such a mechanism should be a model of the
semantic type inheritance for primitive data types.  Note that type
inheritance is a completely different concept than representation inheritance,
as witnessed by the confusion over the now implemented proposal to correct the
semantics of table inheritance.

Logically, a sub-type expresses the idea that any instance of the sub-type is
also an instance of the super-type.

For example, semantically speaking, a smallint is an integer because the set
of all small integers is a subset of the set of all integers.

We could represent this fact with something like a the pg_inherits table with
entries for conversion functions to convert the canonical representation of
the 
sub-type into the canonical representation of the super-type and vice versa.

In a normal implementation, the index scan boundary values should be stored
internally using the representation of the lowest common super-type.  That way
you can get a correct result for queries like(*):
 select * from table where smallint_column < 100000

Alternatively, the query engine could internally down cast the value to be
compared to the index column type extended with flags like the following:

COMPATIBLE_VALUE_GREATER   - value is comparable and always greater than                            any instance of
columntype
 
COMPATIBLE_VALUE_LESS      - value is comparable and always less than                            any instance of column
type
INCOMPATIBLE_VALUE         - value is not comparable to column type

The type down-conversion function would need to clear the resulting value and
set the appropriate flag if the conversion does not succeed.

The flags would then be used to calculate which index scan boundary values are
equivalent to the original query predicate by substituting the maximum and
minimum allowed values of the column type as appropriate.

I have not looked at the source code in detail yet, but I believe the basic
idea is sound.
- Mark Butler


Note:  Oracle avoids this whole problem for numeric types by using a common
variable precision format for *all* numbers.  The nice thing is that you can
increase the precision / scale of any numeric column without touching the data
in each row.

Thomas Lockhart wrote:
> 
> Hmm. The problem is as you describe, but the requirements for a solution
> are more severe than you (or I) would hope.
> 
> We would like to have an extensible mechanism for type promotion and
> demotion, but it is not (yet) clear how to implement it. In this case,
> we must demote a constant assigned as "int4" by the parser into an
> "int2" to be directly comparable to the indexed column. We could
> probably do this with some hack code as a brute-force exercise, but no
> one has yet bothered (patches welcome ;) But in general, we must handle
> the case that the specified constraint is *not* directly convertible to
> the indexed type (e.g. is out of range) even though this would seem to
> reduce to a choice between a trivial noop or a sequential scan of the
> entire table. If we can do this without cluttering up the code too much,
> we should go ahead and do it, but it has apparently been a low priority.
> 
>                          - Thomas


Re: Speaking of Indexing... (Text indexing)

From
Andrew McMillan
Date:
Poet/Joshua Drake wrote:
> 
> Good day,
> 
> I've been experimenting a bit with Full Text Indexing in PostgreSQL. I
> have found several conflicting sites various places on the net pertaining
> to whether or not PostgreSQL supports FTI, and I was hoping I could find
> an authoritative answer here - I tried searching the website's archives,
> but the search seems to be having some problems.
> 
> At any rate, I am running a CVS snapshot of 7.1, and I have been trying to
> create a full text index on a series of resumes. Some of these exceed 8k
> in size, which is no longer a storage problem of course with 7.1, but I
> seem to have run into the wicked 8k once again. Specifically:
> 
> ERROR:  index_formtuple: data takes 9344 bytes, max is 8191
> 
> Furthermore, after trying to just index on a 8191-character long substring
> of the resume, I run into the following:
> 
> ERROR:  btree: index item size 3948 exceeds maximum 2713
> 
> The only way I could actually get the index created was to substring the
> body of the resumes down to 2k. I also later tried using HASH rather than
> BTREE, which worked, but none of these solutions really appreciably
> increased performance in the way we were hoping.
> 
> Are these known and accepted limitations of the current 7.1
> implementation, or am I doing something terribly wrong? ;)
> On Tue, 10 Apr 2001, Thomas Lockhart wrote:

You need to use the 'contrib' code for full-text indexing.  The indexing you are
trying to do with that is just using the whole content of the string as the index
value.  Close to useless.

The contrib code is in contrib/fulltextindex.

I have a hacked version of that which changes it to keyword indexing, if you're
interested.

Regards,                Andrew.
-- 
_____________________________________________________________________          Andrew McMillan, e-mail:
Andrew@catalyst.net.nz
Catalyst IT Ltd, PO Box 10-225, Level 22, 105 The Terrace, Wellington
Me: +64 (21) 635 694, Fax: +64 (4) 499 5596, Office: +64 (4) 499 2267


Re: Indexes not used in 7.1RC4: Bug?

From
Thomas Swan
Date:
At 4/10/2001 02:42 PM, Thomas Lockhart wrote:
>Hmm. The problem is as you describe, but the requirements for a solution
>are more severe than you (or I) would hope.
>
>We would like to have an extensible mechanism for type promotion and
>demotion, but it is not (yet) clear how to implement it. In this case,
>we must demote a constant assigned as "int4" by the parser into an
>"int2" to be directly comparable to the indexed column. We could
>probably do this with some hack code as a brute-force exercise, but no
>one has yet bothered (patches welcome ;) But in general, we must handle
>the case that the specified constraint is *not* directly convertible to
>the indexed type (e.g. is out of range) even though this would seem to
>reduce to a choice between a trivial noop or a sequential scan of the
>entire table. If we can do this without cluttering up the code too much,
>we should go ahead and do it, but it has apparently been a low priority.

What about going the other way around... Promote the int2 to an int4 
(lossless).  Actually for all int1,int2 datatypes (regardless of whether it 
was the constant or the column) you could promote all to a common int4 and 
then do comparisons.   Promoting all to int8 and then doing a comparison 
would be excessively slow.



Re: Indexes not used in 7.1RC4: Bug?

From
Thomas Lockhart
Date:
> What about going the other way around... Promote the int2 to an int4
> (lossless).  Actually for all int1,int2 datatypes (regardless of whether it
> was the constant or the column) you could promote all to a common int4 and
> then do comparisons.

That is why the index is not used: the backend is promoting all of the
int2 column values to 
int4 for the comparison, and concludes that the available index is not
relevant.

The index traversal code would need to know how to promote individual
values in the index for comparison, which is an interesting idea but I
haven't thought about how efficient it would be. Clearly the cost would
be different than a simple comparison.
                      - Thomas


Re: Speaking of Indexing... (Text indexing)

From
"Mitch Vincent"
Date:
> Hmm. I'm pretty sure that a single index on the entire contents of a
> resume *as a single field* is close to useless. And an index on an 8k
> piece is also useless. Presumably you really want an index covering each
> significant word of each resume, in which case you would not run into
> the 4k limit (or 2k limit? it is documented somewhere) on the size of an
> *index* field (which is still a limitation on PostgreSQL built with the
> standard 8k block size. Of course, you can build with a larger block
> size).

Just an FYI..

I asked the other day and someone (Tom?) told me it was about 2k..

-Mitch


Re: Index type promotion

From
Mark Butler
Date:
There are several ways to solve the problem:

1. Convert to common numeric format for all numbers, ala Oracle
2. Promote for comparison during the index scan
3. Promote index boundary values for comparison in query planner only  Convert back to index column type for actual
scan

Option 1 doesn't solve the general problem, has a space / performance penalty,
and would be a major change.

Option 2 involves making serious changes to every index access method, and
also has a performance penalty.

Option 3 appears to me to be the way to go.  The main general requirement is
method similar to typeInheritsFrom() in backend/parser/parse_func.c to
determine whether a true promotion is possible for a pair of non-complex data
types.

One thing I am not clear on is how much re-planning is done when a query is
executed with different parameter values.  If re-planning is not done, is it
acceptable to make minor plan changes according to the parameter values? 

For example, it would be necessary to change a "<" operator to a "<=" operator
to get proper index scan behavior on a smallint index if the original right
hand side was greater than 32767.

- Mark

Thomas Lockhart wrote:

> That is why the index is not used: the backend is promoting all of the
> int2 column values to
> int4 for the comparison, and concludes that the available index is not
> relevant.
> 
> The index traversal code would need to know how to promote individual
> values in the index for comparison, which is an interesting idea but I
> haven't thought about how efficient it would be. Clearly the cost would
> be different than a simple comparison.