Thread: 335 times faster (!)
I discovered a strange thing when doing a simple search on a (comparably) large table with ~900K rows today: When searching for a specific row on the primary key (type: bigint), the search took about 6,5 seconds. The column has adefault btree index as created by the primary key constraint. However, when searching for the same row on one of it's columns(type: text) which has a functional index on lower(column name), the same row was retrieved in 19ms! That's ~335 timesfaster! My idea is thus that one could create tables with a text type column holding the value of the identifier without using the'primary key' clause, and then afterwards add a functional index on lower(column name). If performance is the main issue,this must be an interesting solution. The downside is of course that the text data type may result in invalid integervalues being inserted as keys. Anyone tried this before? - Mikael _____________________________________________________________ Här börjar internet! Skaffa gratis e-mail och gratis internet på http://www.spray.se Träffa folk från hela Sverige på ett och samma ställe - http://chat.spray.se/
On Mon, 3 Feb 2003, Mikael Carneholm wrote: > When searching for a specific row on the primary key (type: bigint), > the search took about 6,5 seconds. The column has a default btree > index as created by the primary key constraint. However, when > searching for the same row on one of it's columns (type: text) which > has a functional index on lower(column name), the same row was > retrieved in 19ms! That's ~335 times faster! Did you remember to cast the constant into bigint? If not, it probably ignored the bigint index and did a table scan.
On Mon, Feb 03, 2003 at 18:42:31 +0100, Mikael Carneholm <carniz@spray.se> wrote: > I discovered a strange thing when doing a simple search on a (comparably) large table with ~900K rows today: > > When searching for a specific row on the primary key (type: bigint), the search took about 6,5 seconds. The column hasa default btree index as created by the primary key constraint. However, when searching for the same row on one of it'scolumns (type: text) which has a functional index on lower(column name), the same row was retrieved in 19ms! That's ~335times faster! This is probably a type coersion issue. You can probably get the first search to run much faster by including an explicit cast to bigint.
On Mon, 3 Feb 2003, Mikael Carneholm wrote: > I discovered a strange thing when doing a simple search on a (comparably) large table with ~900K rows today: > > When searching for a specific row on the primary key (type: bigint), the search took about 6,5 seconds. The column hasa default btree index as created by the primary key constraint. However, when searching for the same row on one of it'scolumns (type: text) which has a functional index on lower(column name), the same row was retrieved in 19ms! That's ~335times faster! > > My idea is thus that one could create tables with a text type column holding the value of the identifier without usingthe 'primary key' clause, and then afterwards add a functional index on lower(column name). If performance is the mainissue, this must be an interesting solution. The downside is of course that the text data type may result in invalidinteger values being inserted as keys. > > Anyone tried this before? Could it be that you've got a reasonably powerful machine and that your table isn't very wide? Are you sure your search using the primary key was actually using the primary key index, i.e. did you just do: SELECT * FROM mytable WHERE pkcol = 45 or did you quote the number or cast to bigint? Perhaps this has changed in 7.3 I don't know. Also, did you perhaps do your search on the text type column just after doing the first SELECT? You might find there's some caching issue. Not sure about anyone else but I think we'd want to see the plans used for your queries, in addition to the queries, before accepting this. -- Nigel J. Andrews
> ------- Ursprungligt meddelande ------- > > Från: Nigel J. Andrews <nandrews@investsystems.co.uk> > Datum: Mon, 3 Feb 2003 17:59:12 +0000 (GMT) > > >Could it be that you've got a reasonably powerful machine and that your table >isn't very wide? Machine: Intel P3 650 laptop /w 256 RAM Table: 10 columns >Are you sure your search using the primary key was actually using the primary >key index, i.e. did you just do: > > SELECT * FROM mytable WHERE pkcol = 45 > Yep: select * from enheter where enhetsid = xxxxxxxxx; >Also, did you perhaps do your search on the text type column just after doing >the first SELECT? You might find there's some caching issue. I tested this (after you pointed it out) by alterating between the same two queries back and forth, but they still differby the same amount. > >Not sure about anyone else but I think we'd want to see the plans used for your >queries, in addition to the queries, before accepting this. > explain select * from enheter where enhetsid = 200178146; QUERY PLAN ------------------------------------------------------------ Seq Scan on enheter (cost=0.00..15678.50 rows=1 width=91) Filter: (enhetsid = 200178146) (2 rows) explain select * from enheter where lower(enhetsnamn1) = 'donalds foto ab'; QUERY PLAN --------------------------------------------------------------------------------------------- Index Scan using idx_enheter_enhetsnamn1 on enheter (cost=0.00..1342.82 rows=337 width=91) Index Cond: (lower(enhetsnamn1) = 'donalds foto ab'::text) (2 rows) - Mikael _____________________________________________________________ Här börjar internet! Skaffa gratis e-mail och gratis internet på http://www.spray.se Träffa folk från hela Sverige på ett och samma ställe - http://chat.spray.se/
> ------- Ursprungligt meddelande ------- > > Från: Mario Weilguni <mweilguni@sime.com> > Datum: Mon, 3 Feb 2003 20:05:48 +0100 > >try: >explain select * from enheter where enhetsid = '200178146'; >or >explain select * from enheter where enhetsid = 200178146::bigint > explain select * from enheter where enhetsid = '200178146'; QUERY PLAN --------------------------------------------------------------------------- Index Scan using pk_enheter on enheter (cost=0.00..4.05 rows=1 width=91) Index Cond: (enhetsid = 200178146::bigint) (2 rows) Strange...using: 200178146::bigint or '200178146' ..the query is lightning fast. Since the PK column is of integer type, I don't think it's logical to pass a string-type argument...oram I different than most people on this point? :) What about third party frameworks (such as Hibernate, eg) - I'm sure they will look at the column datatype and think: 'Oh,it's an integer...I'll pass an integer argument then', which will result in unnecessary poor performance. What's the cause of this behaviour? Why isn't psql (or the backend) converting integer type arguments into char/string types,if there's this much to gain? (Sorry for being quite a Pg newbie..have never used Pg for such "heavy" load before) - Mikael _____________________________________________________________ Här börjar internet! Skaffa gratis e-mail och gratis internet på http://www.spray.se Träffa folk från hela Sverige på ett och samma ställe - http://chat.spray.se/
My surrogate primary keys will all be INT4's. Seems like it would change something passed in to an INT4 to match the primary key if that's what the key is. 2/3/2003 11:23:28 AM, Mikael Carneholm <carniz@spray.se> wrote: <snip> >Strange...using: >200178146::bigint >or >'200178146' >..the query is lightning fast. Since the PK column is of integer type, I don't think it's logical to pass a string-type argument...or am I different than most people on this point? :) > >What about third party frameworks (such as Hibernate, eg) - I'm sure they will look at the column datatype and think: 'Oh, it's an integer...I'll pass an integer argument then', which will result in unnecessary poor performance. > >What's the cause of this behaviour? Why isn't psql (or the backend) converting integer type arguments into char/string types, if there's this much to gain? > >(Sorry for being quite a Pg newbie..have never used Pg for such "heavy" load before) > >- Mikael > >_____________________________________________________________ >Här börjar internet! >Skaffa gratis e-mail och gratis internet på http://www.spray.se > >Träffa folk från hela Sverige på ett och samma ställe - http://chat.spray.se/ > > > >---------------------------(end of broadcast)--------------------------- >TIP 6: Have you searched our list archives? > >http://archives.postgresql.org >
On Mon, 3 Feb 2003, Mikael Carneholm wrote: > > ------- Ursprungligt meddelande ------- > > > > Från: Mario Weilguni <mweilguni@sime.com> > > Datum: Mon, 3 Feb 2003 20:05:48 +0100 > > > >try: > >explain select * from enheter where enhetsid = '200178146'; > >or > >explain select * from enheter where enhetsid = 200178146::bigint > > > > explain select * from enheter where enhetsid = '200178146'; > QUERY PLAN --------------------------------------------------------------------------- > Index Scan using pk_enheter on enheter (cost=0.00..4.05 rows=1 width=91) > Index Cond: (enhetsid = 200178146::bigint) > (2 rows) > > Strange...using: > 200178146::bigint > or > '200178146' > ..the query is lightning fast. Since the PK column is of integer type, I don't think it's logical to pass a string-typeargument...or am I different than most people on this point? :) No, I think a lot of us have been caught by this in the past. > What about third party frameworks (such as Hibernate, eg) - I'm sure they will look at the column datatype and think: 'Oh,it's an integer...I'll pass an integer argument then', which will result in unnecessary poor performance. > > What's the cause of this behaviour? Why isn't psql (or the backend) converting integer type arguments into char/stringtypes, if there's this much to gain? Someone more familiar with the backend can answer this a lot better than I could so I won't give a half assed comment. > > (Sorry for being quite a Pg newbie..have never used Pg for such "heavy" load before) I feel like that most days. -- Nigel J. Andrews
> ------- Ursprungligt meddelande ------- > > Från: Nigel J. Andrews <nandrews@investsystems.co.uk> > Datum: Mon, 3 Feb 2003 20:00:32 +0000 (GMT) > >No, I think a lot of us have been caught by this in the past. Then I suggest that this is 'fixed' in the next release (i.e., the query optimizer should automatically translate integer/biginttype arguments into char/string/text type arguments, or something like that) - Mikael _____________________________________________________________ Här börjar internet! Skaffa gratis e-mail och gratis internet på http://www.spray.se Träffa folk från hela Sverige på ett och samma ställe - http://chat.spray.se/
"Nigel J. Andrews" wrote: > > On Mon, 3 Feb 2003, Mikael Carneholm wrote: > > > What's the cause of this behaviour? Why isn't psql (or the backend) converting integer type arguments into char/stringtypes, if there's this much to gain? > > Someone more familiar with the backend can answer this a lot better than I > could so I won't give a half assed comment. It is the other way round. The backend converts a non-quoted sequence of digits too early into an int4 and cannot recover from that "mistake". A single quoted literal value is treated as a constant of unknown datatype and get's parsed into what fits best much later. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
> ------- Ursprungligt meddelande ------- > > Från: Jan Wieck <JanWieck@Yahoo.com> > Datum: Mon, 03 Feb 2003 16:11:53 -0500 > >It is the other way round. The backend converts a non-quoted sequence of >digits too early into an int4 and cannot recover from that "mistake". > Is this on the TODO list for 7.4? The current docs (http://developer.postgresql.org/docs/postgres/datatype.html#DATATYPE-INT)state that "PostgreSQL currently cannot use anindex when two different data types are involved" - which makes it sound like this is a current-only drawback that willbe fixed later on. T/F? - Mikael _____________________________________________________________ Här börjar internet! Skaffa gratis e-mail och gratis internet på http://www.spray.se Träffa folk från hela Sverige på ett och samma ställe - http://chat.spray.se/
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Mon, 03 Feb 2003 20:23:28 +0100, Mikael Carneholm wrote: Hallo Mikael, it's been around for about a month or so: I've submitted a patch on the patches list, which fixes this (kind of, at least for Castor/OJB/Hibernate) when the URL-parameter useExplicitTyping=true is set. Just try it, there are several other very nice features; but nobody seems to had an eye on it till now... :-( If there's interest, I've improved the patch even further, and keep it in sync with the current CVS; since there's no interest so far, I'm not submitting them; except someone tries and applies them. tia Patric >> ------- Ursprungligt meddelande ------- >> >> Från: Mario Weilguni <mweilguni@sime.com> >> Datum: Mon, 3 Feb 2003 20:05:48 +0100 >> >>try: >>explain select * from enheter where enhetsid = '200178146'; >>or >>explain select * from enheter where enhetsid = 200178146::bigint >> >explain select * from enheter where enhetsid = '200178146'; > QUERY PLAN - --------------------------------------------------------------------------- >Index Scan using pk_enheter on enheter (cost=0.00..4.05 rows=1 width=91) >Index Cond: (enhetsid = 200178146::bigint) >(2 rows) >Strange...using: >200178146::bigint >or >'200178146' >..the query is lightning fast. Since the PK column is of integer type, I don't think it's logical to pass a string-typeargument...or am I different than most people on this point? :) >What about third party frameworks (such as Hibernate, eg) - I'm sure they will look at the column datatype and think: 'Oh,it's an integer...I'll pass an integer argument then', which will result in unnecessary poor performance. >What's the cause of this behaviour? Why isn't psql (or the backend) converting integer type arguments into char/string types,if there's this much to gain? >(Sorry for being quite a Pg newbie..have never used Pg for such "heavy" load before) >- Mikael >_____________________________________________________________ >Här börjar internet! >Skaffa gratis e-mail och gratis internet på http://www.spray.se >Träffa folk från hela Sverige på ett och samma ställe - http://chat.spray.se/ >---------------------------(end of broadcast)--------------------------- >TIP 6: Have you searched our list archives? >http://archives.postgresql.org PGP Public Key Fingerprint: 2636 F26E F523 7D62 4377 D206 7C68 06BB -----BEGIN PGP SIGNATURE----- Version: PGPsdk version 1.7.1 (C) 1997-1999 Network Associates, Inc. and its affiliated companies. iQA/AwUBPj7VW3xoBrvMu8qQEQLfAwCgl0Dd0K4QLtt4E8Seqr4ArRm8Kv0An3Vd mmL6pYa9PZDb4osUxw7q5xSZ =NZGE -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Mon, 03 Feb 2003 22:47:22 +0100, Patric Bechtel wrote: Hello Mikael, sorry for bothering, but I oversaw that you wrote on the GENERAL list... my patch is only for the JDBC drive. It probably isn't the "right" way to fix it, but digging into that backend issue is clearly beyond my skills... :-) Patric >On Mon, 03 Feb 2003 20:23:28 +0100, Mikael Carneholm wrote: >Hallo Mikael, >it's been around for about a month or so: >I've submitted a patch on the patches list, which fixes >this (kind of, at least for Castor/OJB/Hibernate) when >the URL-parameter useExplicitTyping=true is set. >Just try it, there are several other very nice >features; but nobody seems to had an eye on it till now... :-( >If there's interest, I've improved the patch even >further, and keep it in sync with the current CVS; >since there's no interest so far, I'm not submitting >them; except someone tries and applies them. >tia >Patric >>> ------- Ursprungligt meddelande ------- >>> >>> Från: Mario Weilguni <mweilguni@sime.com> >>> Datum: Mon, 3 Feb 2003 20:05:48 +0100 >>> >>>try: >>>explain select * from enheter where enhetsid = '200178146'; >>>or >>>explain select * from enheter where enhetsid = 200178146::bigint >>> >>explain select * from enheter where enhetsid = '200178146'; >> QUERY PLAN >--------------------------------------------------------------------------- >>Index Scan using pk_enheter on enheter (cost=0.00..4.05 rows=1 width=91) >>Index Cond: (enhetsid = 200178146::bigint) >>(2 rows) >>Strange...using: >>200178146::bigint >>or >>'200178146' >>..the query is lightning fast. Since the PK column is of integer type, I don't think it's logical to pass a string-typeargument...or am I different than most >people on this point? :) >>What about third party frameworks (such as Hibernate, eg) - I'm sure they will look at the column datatype and think: 'Oh,it's an integer...I'll pass an >integer argument then', which will result in unnecessary poor performance. >>What's the cause of this behaviour? Why isn't psql (or the backend) converting integer type arguments into char/stringtypes, if there's this much to gain? >>(Sorry for being quite a Pg newbie..have never used Pg for such "heavy" load before) >>- Mikael >>_____________________________________________________________ >>Här börjar internet! >>Skaffa gratis e-mail och gratis internet på http://www.spray.se >>Träffa folk från hela Sverige på ett och samma ställe - http://chat.spray.se/ >>---------------------------(end of broadcast)--------------------------- >>TIP 6: Have you searched our list archives? >>http://archives.postgresql.org >PGP Public Key Fingerprint: 2636 F26E F523 7D62 4377 D206 7C68 06BB >---------------------------(end of broadcast)--------------------------- >TIP 3: if posting/reading through Usenet, please send an appropriate >subscribe-nomail command to majordomo@postgresql.org so that your >message can get through to the mailing list cleanly >---------------------------(end of broadcast)--------------------------- >TIP 3: if posting/reading through Usenet, please send an appropriate >subscribe-nomail command to majordomo@postgresql.org so that your >message can get through to the mailing list cleanly PGP Public Key Fingerprint: 2636 F26E F523 7D62 4377 D206 7C68 06BB -----BEGIN PGP SIGNATURE----- Version: PGPsdk version 1.7.1 (C) 1997-1999 Network Associates, Inc. and its affiliated companies. iQA/AwUBPj7XNnxoBrvMu8qQEQLphgCdFp4zC7Rgn44jqOGJGttnmf0vBuIAn2P9 8YhRpfuzfvaU/ErQ1VOH0NL1 =wwrt -----END PGP SIGNATURE-----
On Mon, 2003-02-03 at 15:13, Mikael Carneholm wrote: > Then I suggest that this is 'fixed' in the next release Please read the list archives -- this has been discussed before (many times). Short answer: the fix is non-trivial. It will be fixed eventually, but it's difficult to do without causing other undesirable changes. Cheers, Neil -- Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC
Mikael Carneholm wrote: > > > ------- Ursprungligt meddelande ------- > > > > Från: Jan Wieck <JanWieck@Yahoo.com> > > Datum: Mon, 03 Feb 2003 16:11:53 -0500 > > > >It is the other way round. The backend converts a non-quoted sequence of > >digits too early into an int4 and cannot recover from that "mistake". > > > > Is this on the TODO list for 7.4? The current docs (http://developer.postgresql.org/docs/postgres/datatype.html#DATATYPE-INT)state that "PostgreSQL currently cannot use anindex when two different data types are involved" - which makes it sound like this is a current-only drawback that willbe fixed later on. T/F? Exactly ... only I cannot define "later on" very precise. Assuming the time-line going forward, it's in the future, somewhere. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #