Thread: 335 times faster (!)

335 times faster (!)

From
Mikael Carneholm
Date:
I discovered a strange thing when doing a simple search on a (comparably) large table with ~900K rows today:

When searching for a specific row on the primary key (type: bigint), the search took about 6,5 seconds. The column has
adefault btree index as created by the primary key constraint. However, when searching for the same row on one of it's
columns(type: text) which has a functional index on lower(column name), the same row was retrieved in 19ms! That's ~335
timesfaster! 

My idea is thus that one could create tables with a text type column holding the value of the identifier without using
the'primary key' clause, and then afterwards add a functional index on lower(column name). If performance is the main
issue,this must be an interesting solution. The downside is of course that the text data type may result in invalid
integervalues being inserted as keys. 

Anyone tried this before?

- Mikael

_____________________________________________________________
Här börjar internet!
Skaffa gratis e-mail och gratis internet på http://www.spray.se

Träffa folk från hela Sverige på ett och samma ställe - http://chat.spray.se/



Re: 335 times faster (!)

From
Stephan Szabo
Date:
On Mon, 3 Feb 2003, Mikael Carneholm wrote:

> When searching for a specific row on the primary key (type: bigint),
> the search took about 6,5 seconds. The column has a default btree
> index as created by the primary key constraint. However, when
> searching for the same row on one of it's columns (type: text) which
> has a functional index on lower(column name), the same row was
> retrieved in 19ms! That's ~335 times faster!

Did you remember to cast the constant into bigint?  If not, it probably
ignored the bigint index and did a table scan.


Re: 335 times faster (!)

From
Bruno Wolff III
Date:
On Mon, Feb 03, 2003 at 18:42:31 +0100,
  Mikael Carneholm <carniz@spray.se> wrote:
> I discovered a strange thing when doing a simple search on a (comparably) large table with ~900K rows today:
>
> When searching for a specific row on the primary key (type: bigint), the search took about 6,5 seconds. The column
hasa default btree index as created by the primary key constraint. However, when searching for the same row on one of
it'scolumns (type: text) which has a functional index on lower(column name), the same row was retrieved in 19ms! That's
~335times faster! 

This is probably a type coersion issue. You can probably get the first search
to run much faster by including an explicit cast to bigint.

Re: 335 times faster (!)

From
"Nigel J. Andrews"
Date:
On Mon, 3 Feb 2003, Mikael Carneholm wrote:

> I discovered a strange thing when doing a simple search on a (comparably) large table with ~900K rows today:
>
> When searching for a specific row on the primary key (type: bigint), the search took about 6,5 seconds. The column
hasa default btree index as created by the primary key constraint. However, when searching for the same row on one of
it'scolumns (type: text) which has a functional index on lower(column name), the same row was retrieved in 19ms! That's
~335times faster! 
>
> My idea is thus that one could create tables with a text type column holding the value of the identifier without
usingthe 'primary key' clause, and then afterwards add a functional index on lower(column name). If performance is the
mainissue, this must be an interesting solution. The downside is of course that the text data type may result in
invalidinteger values being inserted as keys. 
>
> Anyone tried this before?


Could it be that you've got a reasonably powerful machine and that your table
isn't very wide?

Are you sure your search using the primary key was actually using the primary
key index, i.e. did you just do:

     SELECT * FROM mytable WHERE pkcol = 45

or did you quote the number or cast to bigint? Perhaps this has changed in 7.3
I don't know.

Also, did you perhaps do your search on the text type column just after doing
the first SELECT? You might find there's some caching issue.

Not sure about anyone else but I think we'd want to see the plans used for your
queries, in addition to the queries, before accepting this.


--
Nigel J. Andrews


Re: 335 times faster (!)

From
Mikael Carneholm
Date:
> ------- Ursprungligt meddelande -------
>
> Från:     Nigel J. Andrews  <nandrews@investsystems.co.uk>
> Datum:    Mon, 3 Feb 2003 17:59:12 +0000 (GMT)
>
>
>Could it be that you've got a reasonably powerful machine and that your table
>isn't very wide?

Machine: Intel P3 650 laptop /w 256 RAM
Table: 10 columns

>Are you sure your search using the primary key was actually using the primary
>key index, i.e. did you just do:
>
>  SELECT * FROM mytable WHERE pkcol = 45
>

Yep:
select * from enheter where enhetsid = xxxxxxxxx;

>Also, did you perhaps do your search on the text type column just after doing
>the first SELECT? You might find there's some caching issue.

I tested this (after you pointed it out) by alterating between the same two queries back and forth, but they still
differby the same amount. 

>
>Not sure about anyone else but I think we'd want to see the plans used for your
>queries, in addition to the queries, before accepting this.
>

explain select * from enheter where enhetsid = 200178146;
                    QUERY PLAN
------------------------------------------------------------
                                                                  Seq Scan on enheter  (cost=0.00..15678.50 rows=1
width=91)                                                                    Filter: (enhetsid = 200178146) 
(2 rows)


explain select * from enheter where lower(enhetsnamn1) = 'donalds foto ab';
                    QUERY PLAN
---------------------------------------------------------------------------------------------
                                 Index Scan using idx_enheter_enhetsnamn1 on enheter  (cost=0.00..1342.82 rows=337
width=91)                                   Index Cond: (lower(enhetsnamn1) = 'donalds foto ab'::text) 
(2 rows)

- Mikael

_____________________________________________________________
Här börjar internet!
Skaffa gratis e-mail och gratis internet på http://www.spray.se

Träffa folk från hela Sverige på ett och samma ställe - http://chat.spray.se/



Re: 335 times faster (!)

From
Mikael Carneholm
Date:
> ------- Ursprungligt meddelande -------
>
> Från:    Mario Weilguni <mweilguni@sime.com>
> Datum:    Mon, 3 Feb 2003 20:05:48 +0100
>
>try:
>explain select * from enheter where enhetsid = '200178146';
>or
>explain select * from enheter where enhetsid = 200178146::bigint
>

explain select * from enheter where enhetsid = '200178146';
            QUERY PLAN
---------------------------------------------------------------------------
Index Scan using pk_enheter on enheter  (cost=0.00..4.05 rows=1 width=91)
Index Cond: (enhetsid = 200178146::bigint)
(2 rows)

Strange...using:
200178146::bigint
or
'200178146'
..the query is lightning fast. Since the PK column is of integer type, I don't think it's logical to pass a string-type
argument...oram I different than most people on this point? :) 

What about third party frameworks (such as Hibernate, eg) - I'm sure they will look at the column datatype and think:
'Oh,it's an integer...I'll pass an integer argument then', which will result in unnecessary poor performance. 

What's the cause of this behaviour? Why isn't psql (or the backend) converting integer type arguments into char/string
types,if there's this much to gain? 

(Sorry for being quite a Pg newbie..have never used Pg for such "heavy" load before)

- Mikael

_____________________________________________________________
Här börjar internet!
Skaffa gratis e-mail och gratis internet på http://www.spray.se

Träffa folk från hela Sverige på ett och samma ställe - http://chat.spray.se/



Re: 335 times faster (!)

From
Dennis Gearon
Date:
My surrogate primary keys will all be INT4's. Seems like it would change something passed in
to an INT4 to match the primary key if that's what the key is.

2/3/2003 11:23:28 AM, Mikael Carneholm <carniz@spray.se> wrote:
<snip>
>Strange...using:
>200178146::bigint
>or
>'200178146'
>..the query is lightning fast. Since the PK column is of integer type, I don't think it's
logical to pass a string-type argument...or am I different than most people on this point? :)
>
>What about third party frameworks (such as Hibernate, eg) - I'm sure they will look at the
column datatype and think: 'Oh, it's an integer...I'll pass an integer argument then', which
will result in unnecessary poor performance.
>
>What's the cause of this behaviour? Why isn't psql (or the backend) converting integer type
arguments into char/string types, if there's this much to gain?
>
>(Sorry for being quite a Pg newbie..have never used Pg for such "heavy" load before)
>
>- Mikael
>
>_____________________________________________________________
>Här börjar internet!
>Skaffa gratis e-mail och gratis internet på http://www.spray.se
>
>Träffa folk från hela Sverige på ett och samma ställe - http://chat.spray.se/
>
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 6: Have you searched our list archives?
>
>http://archives.postgresql.org
>




Re: 335 times faster (!)

From
"Nigel J. Andrews"
Date:
On Mon, 3 Feb 2003, Mikael Carneholm wrote:

> > ------- Ursprungligt meddelande -------
> >
> > Från:    Mario Weilguni <mweilguni@sime.com>
> > Datum:    Mon, 3 Feb 2003 20:05:48 +0100
> >
> >try:
> >explain select * from enheter where enhetsid = '200178146';
> >or
> >explain select * from enheter where enhetsid = 200178146::bigint
> >
>
> explain select * from enheter where enhetsid = '200178146';
>             QUERY PLAN
---------------------------------------------------------------------------
> Index Scan using pk_enheter on enheter  (cost=0.00..4.05 rows=1 width=91)
> Index Cond: (enhetsid = 200178146::bigint)
> (2 rows)
>
> Strange...using:
> 200178146::bigint
> or
> '200178146'
> ..the query is lightning fast. Since the PK column is of integer type, I don't think it's logical to pass a
string-typeargument...or am I different than most people on this point? :) 

No, I think a lot of us have been caught by this in the past.

> What about third party frameworks (such as Hibernate, eg) - I'm sure they will look at the column datatype and think:
'Oh,it's an integer...I'll pass an integer argument then', which will result in unnecessary poor performance. 
>
> What's the cause of this behaviour? Why isn't psql (or the backend) converting integer type arguments into
char/stringtypes, if there's this much to gain? 

Someone more familiar with the backend can answer this a lot better than I
could so I won't give a half assed comment.

>
> (Sorry for being quite a Pg newbie..have never used Pg for such "heavy" load before)

I feel like that most days.


--
Nigel J. Andrews


Re: 335 times faster (!)

From
Mikael Carneholm
Date:
> ------- Ursprungligt meddelande -------
>
> Från:     Nigel J. Andrews  <nandrews@investsystems.co.uk>
> Datum:    Mon, 3 Feb 2003 20:00:32 +0000 (GMT)
>
>No, I think a lot of us have been caught by this in the past.

Then I suggest that this is 'fixed' in the next release (i.e., the query optimizer should automatically translate
integer/biginttype arguments into char/string/text type arguments, or something like that) 

- Mikael

_____________________________________________________________
Här börjar internet!
Skaffa gratis e-mail och gratis internet på http://www.spray.se

Träffa folk från hela Sverige på ett och samma ställe - http://chat.spray.se/



Re: 335 times faster (!)

From
Jan Wieck
Date:
"Nigel J. Andrews" wrote:
>
> On Mon, 3 Feb 2003, Mikael Carneholm wrote:
>
> > What's the cause of this behaviour? Why isn't psql (or the backend) converting integer type arguments into
char/stringtypes, if there's this much to gain? 
>
> Someone more familiar with the backend can answer this a lot better than I
> could so I won't give a half assed comment.

It is the other way round. The backend converts a non-quoted sequence of
digits too early into an int4 and cannot recover from that "mistake".

A single quoted literal value is treated as a constant of unknown
datatype and get's parsed into what fits best much later.


Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

Re: 335 times faster (!)

From
Mikael Carneholm
Date:
> ------- Ursprungligt meddelande -------
>
> Från:    Jan Wieck <JanWieck@Yahoo.com>
> Datum:    Mon, 03 Feb 2003 16:11:53 -0500
>
>It is the other way round. The backend converts a non-quoted sequence of
>digits too early into an int4 and cannot recover from that "mistake".
>

Is this on the TODO list for 7.4? The current docs
(http://developer.postgresql.org/docs/postgres/datatype.html#DATATYPE-INT)state that "PostgreSQL currently cannot use
anindex when two different data types are involved" - which makes it sound like this is a current-only drawback that
willbe fixed later on. T/F? 

- Mikael

_____________________________________________________________
Här börjar internet!
Skaffa gratis e-mail och gratis internet på http://www.spray.se

Träffa folk från hela Sverige på ett och samma ställe - http://chat.spray.se/



Re: 335 times faster (!) [Viruschecked]

From
"Patric Bechtel"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Mon, 03 Feb 2003 20:23:28 +0100, Mikael Carneholm wrote:

Hallo Mikael,

it's been around for about a month or so:
I've submitted a patch on the patches list, which fixes
this (kind of, at least for Castor/OJB/Hibernate) when
the URL-parameter useExplicitTyping=true is set.
Just try it, there are several other very nice
features; but nobody seems to had an eye on it till now... :-(

If there's interest, I've improved the patch even
further, and keep it in sync with the current CVS;
since there's no interest so far, I'm not submitting
them; except someone tries and applies them.

tia
Patric


>> ------- Ursprungligt meddelande -------
>>
>> Från:    Mario Weilguni <mweilguni@sime.com>
>> Datum:    Mon, 3 Feb 2003 20:05:48 +0100
>>
>>try:
>>explain select * from enheter where enhetsid = '200178146';
>>or
>>explain select * from enheter where enhetsid = 200178146::bigint
>>

>explain select * from enheter where enhetsid = '200178146';
>            QUERY PLAN
- ---------------------------------------------------------------------------
>Index Scan using pk_enheter on enheter  (cost=0.00..4.05 rows=1 width=91)
>Index Cond: (enhetsid = 200178146::bigint)
>(2 rows)

>Strange...using:
>200178146::bigint
>or
>'200178146'
>..the query is lightning fast. Since the PK column is of integer type, I don't think it's logical to pass a
string-typeargument...or am I different than most  
people on this point? :)

>What about third party frameworks (such as Hibernate, eg) - I'm sure they will look at the column datatype and think:
'Oh,it's an integer...I'll pass an  
integer argument then', which will result in unnecessary poor performance.

>What's the cause of this behaviour? Why isn't psql (or the backend) converting integer type arguments into char/string
types,if there's this much to gain? 

>(Sorry for being quite a Pg newbie..have never used Pg for such "heavy" load before)

>- Mikael

>_____________________________________________________________
>Här börjar internet!
>Skaffa gratis e-mail och gratis internet på http://www.spray.se

>Träffa folk från hela Sverige på ett och samma ställe - http://chat.spray.se/



>---------------------------(end of broadcast)---------------------------
>TIP 6: Have you searched our list archives?

>http://archives.postgresql.org




PGP Public Key Fingerprint: 2636 F26E F523 7D62  4377 D206 7C68 06BB

-----BEGIN PGP SIGNATURE-----
Version: PGPsdk version 1.7.1 (C) 1997-1999 Network Associates, Inc. and its affiliated companies.

iQA/AwUBPj7VW3xoBrvMu8qQEQLfAwCgl0Dd0K4QLtt4E8Seqr4ArRm8Kv0An3Vd
mmL6pYa9PZDb4osUxw7q5xSZ
=NZGE
-----END PGP SIGNATURE-----





Re: 335 times faster (!) [Viruschecked] [Viruschecked]

From
"Patric Bechtel"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Mon, 03 Feb 2003 22:47:22 +0100, Patric Bechtel wrote:

Hello Mikael,

sorry for bothering, but I oversaw that you wrote on the
GENERAL list... my patch is only for the JDBC drive. It
probably isn't the "right" way to fix it, but digging into
that backend issue is clearly beyond my skills... :-)

Patric

>On Mon, 03 Feb 2003 20:23:28 +0100, Mikael Carneholm wrote:

>Hallo Mikael,

>it's been around for about a month or so:
>I've submitted a patch on the patches list, which fixes
>this (kind of, at least for Castor/OJB/Hibernate) when
>the URL-parameter useExplicitTyping=true is set.
>Just try it, there are several other very nice
>features; but nobody seems to had an eye on it till now... :-(

>If there's interest, I've improved the patch even
>further, and keep it in sync with the current CVS;
>since there's no interest so far, I'm not submitting
>them; except someone tries and applies them.

>tia
>Patric


>>> ------- Ursprungligt meddelande -------
>>>
>>> Från:    Mario Weilguni <mweilguni@sime.com>
>>> Datum:    Mon, 3 Feb 2003 20:05:48 +0100
>>>
>>>try:
>>>explain select * from enheter where enhetsid = '200178146';
>>>or
>>>explain select * from enheter where enhetsid = 200178146::bigint
>>>

>>explain select * from enheter where enhetsid = '200178146';
>>            QUERY PLAN
>---------------------------------------------------------------------------
>>Index Scan using pk_enheter on enheter  (cost=0.00..4.05 rows=1 width=91)
>>Index Cond: (enhetsid = 200178146::bigint)
>>(2 rows)

>>Strange...using:
>>200178146::bigint
>>or
>>'200178146'
>>..the query is lightning fast. Since the PK column is of integer type, I don't think it's logical to pass a
string-typeargument...or am I different than most  
>people on this point? :)

>>What about third party frameworks (such as Hibernate, eg) - I'm sure they will look at the column datatype and think:
'Oh,it's an integer...I'll pass an  
>integer argument then', which will result in unnecessary poor performance.

>>What's the cause of this behaviour? Why isn't psql (or the backend) converting integer type arguments into
char/stringtypes, if there's this much to gain? 

>>(Sorry for being quite a Pg newbie..have never used Pg for such "heavy" load before)

>>- Mikael

>>_____________________________________________________________
>>Här börjar internet!
>>Skaffa gratis e-mail och gratis internet på http://www.spray.se

>>Träffa folk från hela Sverige på ett och samma ställe - http://chat.spray.se/



>>---------------------------(end of broadcast)---------------------------
>>TIP 6: Have you searched our list archives?

>>http://archives.postgresql.org




>PGP Public Key Fingerprint: 2636 F26E F523 7D62  4377 D206 7C68 06BB





>---------------------------(end of broadcast)---------------------------
>TIP 3: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to majordomo@postgresql.org so that your
>message can get through to the mailing list cleanly







>---------------------------(end of broadcast)---------------------------
>TIP 3: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to majordomo@postgresql.org so that your
>message can get through to the mailing list cleanly




PGP Public Key Fingerprint: 2636 F26E F523 7D62  4377 D206 7C68 06BB

-----BEGIN PGP SIGNATURE-----
Version: PGPsdk version 1.7.1 (C) 1997-1999 Network Associates, Inc. and its affiliated companies.

iQA/AwUBPj7XNnxoBrvMu8qQEQLphgCdFp4zC7Rgn44jqOGJGttnmf0vBuIAn2P9
8YhRpfuzfvaU/ErQ1VOH0NL1
=wwrt
-----END PGP SIGNATURE-----





Re: 335 times faster (!)

From
Neil Conway
Date:
On Mon, 2003-02-03 at 15:13, Mikael Carneholm wrote:
> Then I suggest that this is 'fixed' in the next release

Please read the list archives -- this has been discussed before (many
times).

Short answer: the fix is non-trivial. It will be fixed eventually, but
it's difficult to do without causing other undesirable changes.

Cheers,

Neil
--
Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC




Re: 335 times faster (!)

From
Jan Wieck
Date:
Mikael Carneholm wrote:
>
> > ------- Ursprungligt meddelande -------
> >
> > Från: Jan Wieck <JanWieck@Yahoo.com>
> > Datum:        Mon, 03 Feb 2003 16:11:53 -0500
> >
> >It is the other way round. The backend converts a non-quoted sequence of
> >digits too early into an int4 and cannot recover from that "mistake".
> >
>
> Is this on the TODO list for 7.4? The current docs
(http://developer.postgresql.org/docs/postgres/datatype.html#DATATYPE-INT)state that "PostgreSQL currently cannot use
anindex when two different data types are involved" - which makes it sound like this is a current-only drawback that
willbe fixed later on. T/F? 

Exactly ... only I cannot define "later on" very precise. Assuming the
time-line going forward, it's in the future, somewhere.


Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #