Thread: Interesting behaviour !

Interesting behaviour !

From
Constantin Teodorescu
Date:
Please Cc: to teo@flex.ro 

I have a table called "note" that looks like this :

create table "note" (id serial,perioada int2,schema int2,explicatie
text,...);

The "note" table has 22.000 records and the record length is about 75
bytes (is has also a "text" field").

Because I am frequently accesing the table with queries like "... where
perioada=12" I was tempted to make also indexes on "perioada" and
"schema" field.

The tables have the following sizes (their file sizes into
/usr/local/pgsql/data/base....)

note         2.890 Kb
note_id        385 Kb
note_perioada  409 Kb
note_schema    466 Kb      

I ran previusly "vacuum analyze" on that database ensuring that
statistical tables have been updated.

Trying some selects with explain I got the following results:

contabil=> explain select * from note where id=15;
NOTICE:  QUERY PLAN:
Index Scan using note_id on note  (cost=2.05 rows=2 width=87)  


contabil=> explain select * from note where perioada=15;
NOTICE:  QUERY PLAN:
Seq Scan on note  (cost=1099.99 rows=1600 width=87)


contabil=> explain select * from note where schema=15;
NOTICE:  QUERY PLAN:
Seq Scan on note  (cost=1099.99 rows=432 width=87)


That means that searching on "perioada" field don't use "note_perioada"
index!!!

I know that the query optimisation take care of record lengths, table
sizes, index sizes, but I thought that in this case it will use
"note_perioada" index.

The distribution of "perioada" values within "note" records is like that
:

contabil=> select perioada,count(*) from note group by perioada;
perioada|count
--------+-----      4|    2      7|   66      8|  108      9|  135     10|  151     11|  146     12| 4468     13| 3045
  14| 3377     15| 3207     16| 3100     17| 3039     18| 1789     19|    1     22|    2
 
(15 rows)  

So, I think that PostgreSQL is doing right when he chooses not to use
"note_perioada" index for that type of query by comparing different
costs (althought it still remains strange at the first look).

Is there any chance to speed up that type of simple query (select * from
note where perioada=N) ?

I dropped the index and try with a "hash" index on the same "perioada"
field. The same result.

In this case, it seems that the "note_perioada" index will never be
used. That means it can be safely dropped without affecting the
application performance, isn't it? It is expected that the database will
grow in the same manner, with approx. the same nr. of records per
"perioada" field every month.

Best regards,

Please Cc: to teo@flex.ro

===============================
Constantin Teodorescu
FLEX Consulting Braila, ROMANIA


Re: [HACKERS] Interesting behaviour !

From
Hannu Krosing
Date:
Constantin Teodorescu wrote:
> 
> 
> create table "note" (id serial,perioada int2,schema int2,explicatie
> text,...);
> 
...
> 
> contabil=> explain select * from note where perioada=15;
> NOTICE:  QUERY PLAN:
> Seq Scan on note  (cost=1099.99 rows=1600 width=87)

You may try :

explain select * from note where perioada=15::int2;

i think that the default for 'untyped' numbers is int4 and 
this currently confuses the optimiser.

------------------
Hannu


Re: [HACKERS] Interesting behaviour !

From
Constantin Teodorescu
Date:
Hannu Krosing wrote:
> 
> You may try :
> 
> explain select * from note where perioada=15::int2;
> 
> i think that the default for 'untyped' numbers is int4 and
> this currently confuses the optimiser.

You are right! Thanks a lot! Watch this!

contabil=> explain select * from note where perioada=29::int2;
NOTICE:  QUERY PLAN:
Index Scan using note_perioada on note  (cost=108.96 rows=1600 width=87)

EXPLAIN
contabil=> explain select * from note where perioada=29;
NOTICE:  QUERY PLAN:
Seq Scan on note  (cost=1099.99 rows=1600 width=87)

My queries are faster now!

I think that this thing should be fixed. You need more than common SQL
in order to optimize your queries.
That conversions should be automatically assumed by the query optimizer
in order to deliver real performances.
I don't know how difficult that would be.

Thanks a lot,
Best regards,
-- 
Constantin Teodorescu
FLEX Consulting Braila, ROMANIA


Re: [HACKERS] Interesting behaviour !

From
Tom Lane
Date:
Constantin Teodorescu <teo@flex.ro> writes:
> So, I think that PostgreSQL is doing right when he chooses not to use
> "note_perioada" index for that type of query by comparing different
> costs (althought it still remains strange at the first look).

Although the real problem here was a type clash (which I agree ought
to be fixed), it should be pointed out that there *is* a threshold of
selectivity below which the optimizer will choose not to use an index
scan.  I'm not sure what it is offhand, nor whether it's set at a
good level.  This behavior emerges indirectly from the cost estimate
functions for sequential and index scans, and I'm not convinced that
they are as accurate as they need to be...
        regards, tom lane


Re: [HACKERS] Interesting behaviour !

From
Bruce Momjian
Date:
> Hannu Krosing wrote:
> > 
> > You may try :
> > 
> > explain select * from note where perioada=15::int2;
> > 
> > i think that the default for 'untyped' numbers is int4 and
> > this currently confuses the optimiser.
> 
> You are right! Thanks a lot! Watch this!
> 
> contabil=> explain select * from note where perioada=29::int2;
> NOTICE:  QUERY PLAN:
> Index Scan using note_perioada on note  (cost=108.96 rows=1600 width=87)
> 
> EXPLAIN
> contabil=> explain select * from note where perioada=29;
> NOTICE:  QUERY PLAN:
> Seq Scan on note  (cost=1099.99 rows=1600 width=87)
> 
> My queries are faster now!
> 
> I think that this thing should be fixed. You need more than common SQL
> in order to optimize your queries.
> That conversions should be automatically assumed by the query optimizer
> in order to deliver real performances.
> I don't know how difficult that would be.

I thought we had this fixed in 6.5.  Is that what you are using?

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Interesting behaviour !

From
Constantin Teodorescu
Date:
Bruce Momjian wrote:
> 
> 
> I thought we had this fixed in 6.5.  Is that what you are using?

Yes!

Sorry that I forgot to describe the environment :

RedHat 5.2 i386 + PostgreSQL 6.5 final version

-- 
Constantin Teodorescu
FLEX Consulting Braila, ROMANIA


Re: [HACKERS] Interesting behaviour !

From
Tom Lane
Date:
Constantin Teodorescu <teo@flex.ro> writes:
> Bruce Momjian wrote:
>> I thought we had this fixed in 6.5.  Is that what you are using?

> Yes!

It's not fixed, obviously.  We talked about the issue a while back,
but it isn't 100% clear what the most reasonable fix is.  Just making
the parser label small constants as int2 rather than int4 is no
answer; that'll only move the problem over to int4 tables :-(.

I have not looked closely at the parse trees produced for this sort
of thing, but if we are lucky they come out likevar int2eq (int4toint2(int4constant))
in which case the constant-subexpression-folding pass that I want to
add would solve the problem by reducing the righthand side to a simple
constant.

But it's more likely that the parser is producingint2toint4(var) int4eq int4constant
in which case it would take some actual intelligence to decide that this
could and should be converted to the other form.  I'd still be inclined
to tackle the issue in a post-rewriter, pre-planner pass over the tree.
        regards, tom lane


Re: [HACKERS] Interesting behaviour !

From
Tom Lane
Date:
I wrote:
> But it's more likely that the parser is producing
>     int2toint4(var) int4eq int4constant
> in which case it would take some actual intelligence to decide that this
> could and should be converted to the other form.

Hmm, actually it seems to be producingvar int24eq int4constant
My first thought on seeing this was that int24eq didn't have an entry
in pg_amop, but it does.  So why doesn't the system realize it can use
an index scan?  This might be a relatively simple bug to fix after all,
but it needs more time to find exactly where things are going wrong...
and I have to get some Real Work done...
        regards, tom lane


Re: [HACKERS] Interesting behaviour !

From
Thomas Lockhart
Date:
> This might be a relatively simple bug to fix after all,
> but it needs more time to find exactly where things are going wrong...
> and I have to get some Real Work done...

Don't let me stop anyone from looking at this, but fyi this is the one
area I didn't yet touch for the "transparent type coersion" work I did
for v6.4 and which is still ongoing of course. 

istm that wherever index use is evaluated one could allow
pre-evaluated functions on constants, rather than just strict
constants as is the case now. There is a precedent for pre-evaluation
of elements of the query tree.

If function calls are allowed, then we can try coercing constants
using these existing coersion functions, at least when the target
column is a "superset type" of the constant. You still run into
trouble for cases like
 select intcol from table1 where intcol < 2.5;
                  - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California


Re: [HACKERS] Interesting behaviour !

From
Bruce Momjian
Date:
> Constantin Teodorescu <teo@flex.ro> writes:
> > Bruce Momjian wrote:
> >> I thought we had this fixed in 6.5.  Is that what you are using?
> 
> > Yes!
> 
> It's not fixed, obviously.  We talked about the issue a while back,
> but it isn't 100% clear what the most reasonable fix is.  Just making
> the parser label small constants as int2 rather than int4 is no
> answer; that'll only move the problem over to int4 tables :-(.

Added to TODO:
* Allow SELECT * FROM tab WHERE int2col = 4 use int2col index 

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Interesting behaviour !

From
Tom Lane
Date:
I wrote:
>> This might be a relatively simple bug to fix after all,
>> but it needs more time to find exactly where things are going wrong...
>> and I have to get some Real Work done...

Well, no, it's not simple.  After looking at the executor I can see that
indexscan support is only prepared to deal with comparison operators
that are in the pg_amop class associated with the index.  In other
words, for an int2 index the indexquals have to be "int2 op int2".
The optimizer is doing the right thing by not trying to use int24eq
as an indexqual.

So, we're back to needing to figure out that we can reduce the int4
constant to an int2 constant and adjust the comparison operator
appropriately.

Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
> Don't let me stop anyone from looking at this, but fyi this is the one
> area I didn't yet touch for the "transparent type coersion" work I did
> for v6.4 and which is still ongoing of course. 
>
> istm that wherever index use is evaluated one could allow
> pre-evaluated functions on constants, rather than just strict
> constants as is the case now. There is a precedent for pre-evaluation
> of elements of the query tree.

Perhaps that could be handled by the constant-subexpression-reducer
that I want to add.  That is, "typeconversionfunction(constant)" would
be reduced to "constant", and then the optimizer has the same case to
deal with as it has now.

> If function calls are allowed, then we can try coercing constants
> using these existing coersion functions,

Where are said functions?  I have not run across them yet...

> at least when the target
> column is a "superset type" of the constant. You still run into
> trouble for cases like
>   select intcol from table1 where intcol < 2.5;

Right, you don't want to truncate the float constant to integer
(at least not without adding even more smarts).

I think we are probably going to have to do this in the form of code
that has some type-specific knowledge about conversions between certain
standard types, and knows some things about the operators on those types
as well.  Here is another example that can produce trouble:select int2col + 30000 from table1;
If we reduce the int4 constant to int2 and change int24add to int2add,
we have just created a potential for int2 overflow in an expression
that did not have it before.  So, while folding an int4 constant to int2
if it's within int2 range is safe when the constant is an argument of
a comparison operator, it is *not* safe in the general case.

I don't see any real good way to build a type-independent transformation
routine that can do this sort of thing.  Probably best just to hardcode
it for the standard numeric and character-string types, and leave it at
that.
        regards, tom lane


RE: [HACKERS] Interesting behaviour !

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: owner-pgsql-hackers@postgreSQL.org
> [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Tom Lane
> Sent: Thursday, July 15, 1999 7:48 AM
> To: Thomas Lockhart
> Cc: Constantin Teodorescu; Bruce Momjian; Hannu Krosing;
> pgsql-hackers@postgreSQL.org
> Subject: Re: [HACKERS] Interesting behaviour ! 
> 
> 
> I wrote:
> >> This might be a relatively simple bug to fix after all,
> >> but it needs more time to find exactly where things are going wrong...
> >> and I have to get some Real Work done...
> 
> Well, no, it's not simple.  After looking at the executor I can see that
> indexscan support is only prepared to deal with comparison operators
> that are in the pg_amop class associated with the index.  In other
> words, for an int2 index the indexquals have to be "int2 op int2".
> The optimizer is doing the right thing by not trying to use int24eq
> as an indexqual.
> 
> So, we're back to needing to figure out that we can reduce the int4
> constant to an int2 constant and adjust the comparison operator
> appropriately.
> 
> Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
> > Don't let me stop anyone from looking at this, but fyi this is the one
> > area I didn't yet touch for the "transparent type coersion" work I did
> > for v6.4 and which is still ongoing of course. 
> >
> > istm that wherever index use is evaluated one could allow
> > pre-evaluated functions on constants, rather than just strict
> > constants as is the case now. There is a precedent for pre-evaluation
> > of elements of the query tree.
> 
> Perhaps that could be handled by the constant-subexpression-reducer
> that I want to add.  That is, "typeconversionfunction(constant)" would
> be reduced to "constant", and then the optimizer has the same case to
> deal with as it has now.
>

Each type has a typeinput(char * => type ) proc and a typeoutput(
type -> char *) proc.

For example int2in/int2out for type int2 and int4in/int4out for type int4.
Doesn't int2in(int4out()) convert int4 to int2 ?

However,typeinput proc causes elog(ERROR) in most cases if it 
couldn't convert correctly.  

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp  


Re: [HACKERS] Interesting behaviour !

From
Thomas Lockhart
Date:
> Each type has a typeinput(char * => type ) proc and a typeoutput(
> type -> char *) proc.
> Doesn't int2in(int4out()) convert int4 to int2 ?
> However,typeinput proc causes elog(ERROR) in most cases if it
> couldn't convert correctly.

Conversion using an intermediate string is possible, but not the
preferred technique.

The "automatic type coersion" code, used earlier in the parser, uses
the convention that any single-argument function taking the source
type as input and with the same name as the target type can be used
for type conversion. For example, the function int4(int2) would
convert int2 to int4. There are now routines in the parser for
choosing conversion strategies and for finding candidates, and these
could be reused for similar purposes when trying to match index
arguments.
                 - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California


RE: [HACKERS] Interesting behaviour !

From
"Hiroshi Inoue"
Date:
> 
> > Each type has a typeinput(char * => type ) proc and a typeoutput(
> > type -> char *) proc.
> > Doesn't int2in(int4out()) convert int4 to int2 ?
> > However,typeinput proc causes elog(ERROR) in most cases if it
> > couldn't convert correctly.
> 
> Conversion using an intermediate string is possible, but not the
> preferred technique.
>

Every type of PostgreSQL must have typeinput/typeoutput procs.
So this technique doesn't need new procs/operators any more.
Isn't it an advantage ? 
> The "automatic type coersion" code, used earlier in the parser, uses
> the convention that any single-argument function taking the source
> type as input and with the same name as the target type can be used
> for type conversion. For example, the function int4(int2) would
> convert int2 to int4. There are now routines in the parser for
> choosing conversion strategies and for finding candidates, and these
> could be reused for similar purposes when trying to match index
> arguments.
>

It seems reasonable.
But I'm afraid that the defintion of new type requires many functions 
of type conversion.

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp


RE: [HACKERS] Interesting behaviour !

From
"Hiroshi Inoue"
Date:
Sorry,I've misunderstood Thomas's posting.
Please ignore my previous posting.

Hiroshi Inoue
Inoue@tpf.co.jp

> -----Original Message-----
> From: owner-pgsql-hackers@postgreSQL.org
> [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Hiroshi Inoue
> Sent: Friday, July 16, 1999 9:56 AM
> To: Thomas Lockhart
> Cc: Tom Lane; Constantin Teodorescu; Bruce Momjian; Hannu Krosing;
> pgsql-hackers@postgreSQL.org
> Subject: RE: [HACKERS] Interesting behaviour !
> 
> 
> > 
> > > Each type has a typeinput(char * => type ) proc and a typeoutput(
> > > type -> char *) proc.
> > > Doesn't int2in(int4out()) convert int4 to int2 ?
> > > However,typeinput proc causes elog(ERROR) in most cases if it
> > > couldn't convert correctly.
> > 
> > Conversion using an intermediate string is possible, but not the
> > preferred technique.
> >
> 
> Every type of PostgreSQL must have typeinput/typeoutput procs.
> So this technique doesn't need new procs/operators any more.
> Isn't it an advantage ?
>   
> > The "automatic type coersion" code, used earlier in the parser, uses
> > the convention that any single-argument function taking the source
> > type as input and with the same name as the target type can be used
> > for type conversion. For example, the function int4(int2) would
> > convert int2 to int4. There are now routines in the parser for
> > choosing conversion strategies and for finding candidates, and these
> > could be reused for similar purposes when trying to match index
> > arguments.
> >
> 
> It seems reasonable.
> But I'm afraid that the defintion of new type requires many functions 
> of type conversion.
> 
> Regards.
> 
> Hiroshi Inoue
> Inoue@tpf.co.jp
> 
>