Thread: Query performance
Hi everyone, i have a table with around 57 million tuples, with the following columns: pid(varchar), crit(varchar), val1(varchar), val2(varchar).Example: pid crit val1 val2 p1 c1 x y p1 c2 x z p1 c3 y x ... What i am doing is to query all val1 and val2 for one pid and all crit values: select val1, val2, crit from mytable where pid='somepid' and crit in(select crit from myCritTable); where myCritTable is a table that contains all crit values (around 42.000) ordered by their insertion date. QUERY PLAN -------------------------------------------------------------------------------- ---------------------------------------------------------- Hash IN Join (cost=1033.67..134959.41 rows=37120 width=23) (actual time=357.11 6..356984.535 rows=37539 loops=1) Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text) -> Bitmap Heap Scan on test2 (cost=232.92..132766.66 rows=37120 width=23) ( actual time=291.600..356707.737 rows=37539 loops=1) Recheck Cond: ((pid)::text = '1'::text) -> Bitmap Index Scan on idx_test2_pid (cost=0.00..232.92 rows=37120 w idth=0) (actual time=234.516..234.516 rows=37539 loops=1) Index Cond: ((pid)::text = '1'::text) -> Hash (cost=700.20..700.20 rows=40220 width=13) (actual time=65.055..65.0 55 rows=40220 loops=1) -> Seq Scan on snps_test (cost=0.00..700.20 rows=40220 width=13) (act ual time=0.020..30.131 rows=40220 loops=1) Total runtime: 357017.259 ms Unfortunately the query takes pretty long for the big table, so maybe one of you has a suggestion on how to make it faster. -- Christian Rengstl M.A. Klinik und Poliklinik für Innere Medizin II Kardiologie - Forschung Universitätsklinikum Regensburg B3 1.388 Franz-Josef-Strauss-Allee 11 93053 Regensburg Tel.: +49-941-944-7230
On 8/3/06, Christian Rengstl <Christian.Rengstl@klinik.uni-regensburg.de> wrote: > ... > Unfortunately the query takes pretty long for the big table, so maybe one of you has a suggestion on how to make it faster. > try smth like this: select val1, val2, crit from mytable as a where pid='somepid' and exists(select 1 from myCritTable as b where a.crit = b.crit); -- Best regards, Nikolay
Hi, the complete query is the one i posted, but here comes the schema for mytable: entry_no int8 NOT NULL DEFAULT nextval('entry_no_seq''::regclass), pid varchar(15) NOT NULL, crit varchar(13) NOT NULL, val1 varchar(1), val2 varchar(1), aendat text, aennam varchar(8), CONSTRAINT "PK_ENTRY" PRIMARY KEY (entry_no) myCritTable: crit varchar(13) NOT NULL, chr int2, aendat timestamp, CONSTRAINT pk_crit_master PRIMARY KEY (crit) My server is 8.1.4. As a matter of fact, i have no idea where the text type comes from, because as you can see from abovethere are only varchar with maximum 15 characters. "Hakan Kocaman" <Hakan.Kocaman@digame.de> wrote on 08/03/06 10:34 am: > Hi, > > can you post the complete query,schema- and > table-definition,server-version etc. ? > This will help to identity the main problem. > > So at the moment i'm just guessing: > > Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text) > -> Bitmap Heap Scan on test2 (cost=232.92..132766.66 rows=37120 > width=23) > (actual time=291.600..356707.737 rows=37539 loops=1) > This part is very expensive, but i got no clue why. > Maybe the text-type is not so ideal. > > Best regards > > Hakan Kocaman > Software-Development > > digame.de GmbH > Richard-Byrd-Str. 4-8 > 50829 Köln > > Tel.: +49 (0) 221 59 68 88 31 > Fax: +49 (0) 221 59 68 88 98 > Email: hakan.kocaman@digame.de > > > >> -----Original Message----- >> From: pgsql-general-owner@postgresql.org >> [mailto:pgsql-general-owner@postgresql.org] On Behalf Of >> Christian Rengstl >> Sent: Thursday, August 03, 2006 10:13 AM >> To: pgsql-general@postgresql.org >> Subject: [GENERAL] Query performance >> >> >> Hi everyone, >> >> i have a table with around 57 million tuples, with the >> following columns: pid(varchar), crit(varchar), >> val1(varchar), val2(varchar). Example: >> pid crit val1 val2 >> p1 c1 x y >> p1 c2 x z >> p1 c3 y x >> ... >> What i am doing is to query all val1 and val2 for one pid and >> all crit values: >> >> select val1, val2, crit from mytable where pid='somepid' and >> crit in(select crit from myCritTable); >> where myCritTable is a table that contains all crit values >> (around 42.000) ordered by their insertion date. >> >> >> QUERY PLAN >> >> -------------------------------------------------------------- >> ------------------ >> ---------------------------------------------------------- >> Hash IN Join (cost=1033.67..134959.41 rows=37120 width=23) >> (actual time=357.11 >> 6..356984.535 rows=37539 loops=1) >> Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text) >> -> Bitmap Heap Scan on test2 (cost=232.92..132766.66 >> rows=37120 width=23) ( >> actual time=291.600..356707.737 rows=37539 loops=1) >> Recheck Cond: ((pid)::text = '1'::text) >> -> Bitmap Index Scan on idx_test2_pid >> (cost=0.00..232.92 rows=37120 w >> idth=0) (actual time=234.516..234.516 rows=37539 loops=1) >> Index Cond: ((pid)::text = '1'::text) >> -> Hash (cost=700.20..700.20 rows=40220 width=13) >> (actual time=65.055..65.0 >> 55 rows=40220 loops=1) >> -> Seq Scan on snps_test (cost=0.00..700.20 >> rows=40220 width=13) (act >> ual time=0.020..30.131 rows=40220 loops=1) >> Total runtime: 357017.259 ms >> >> Unfortunately the query takes pretty long for the big table, >> so maybe one of you has a suggestion on how to make it faster. >> >> -- >> Christian Rengstl M.A. >> Klinik und Poliklinik für Innere Medizin II >> Kardiologie - Forschung >> Universitätsklinikum Regensburg >> B3 1.388 >> Franz-Josef-Strauss-Allee 11 >> 93053 Regensburg >> Tel.: +49-941-944-7230 >> >> >> ---------------------------(end of >> broadcast)--------------------------- >> TIP 4: Have you searched our list archives? >> >> http://archives.postgresql.org >> -- Christian Rengstl M.A. Klinik und Poliklinik für Innere Medizin II Kardiologie - Forschung Universitätsklinikum Regensburg B3 1.388 Franz-Josef-Strauss-Allee 11 93053 Regensburg Tel.: +49-941-944-7230
Hi, can you post the complete query,schema- and table-definition,server-version etc. ? This will help to identity the main problem. So at the moment i'm just guessing: Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text) -> Bitmap Heap Scan on test2 (cost=232.92..132766.66 rows=37120 width=23) (actual time=291.600..356707.737 rows=37539 loops=1) This part is very expensive, but i got no clue why. Maybe the text-type is not so ideal. Best regards Hakan Kocaman Software-Development digame.de GmbH Richard-Byrd-Str. 4-8 50829 Köln Tel.: +49 (0) 221 59 68 88 31 Fax: +49 (0) 221 59 68 88 98 Email: hakan.kocaman@digame.de > -----Original Message----- > From: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org] On Behalf Of > Christian Rengstl > Sent: Thursday, August 03, 2006 10:13 AM > To: pgsql-general@postgresql.org > Subject: [GENERAL] Query performance > > > Hi everyone, > > i have a table with around 57 million tuples, with the > following columns: pid(varchar), crit(varchar), > val1(varchar), val2(varchar). Example: > pid crit val1 val2 > p1 c1 x y > p1 c2 x z > p1 c3 y x > ... > What i am doing is to query all val1 and val2 for one pid and > all crit values: > > select val1, val2, crit from mytable where pid='somepid' and > crit in(select crit from myCritTable); > where myCritTable is a table that contains all crit values > (around 42.000) ordered by their insertion date. > > > QUERY PLAN > > -------------------------------------------------------------- > ------------------ > ---------------------------------------------------------- > Hash IN Join (cost=1033.67..134959.41 rows=37120 width=23) > (actual time=357.11 > 6..356984.535 rows=37539 loops=1) > Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text) > -> Bitmap Heap Scan on test2 (cost=232.92..132766.66 > rows=37120 width=23) ( > actual time=291.600..356707.737 rows=37539 loops=1) > Recheck Cond: ((pid)::text = '1'::text) > -> Bitmap Index Scan on idx_test2_pid > (cost=0.00..232.92 rows=37120 w > idth=0) (actual time=234.516..234.516 rows=37539 loops=1) > Index Cond: ((pid)::text = '1'::text) > -> Hash (cost=700.20..700.20 rows=40220 width=13) > (actual time=65.055..65.0 > 55 rows=40220 loops=1) > -> Seq Scan on snps_test (cost=0.00..700.20 > rows=40220 width=13) (act > ual time=0.020..30.131 rows=40220 loops=1) > Total runtime: 357017.259 ms > > Unfortunately the query takes pretty long for the big table, > so maybe one of you has a suggestion on how to make it faster. > > -- > Christian Rengstl M.A. > Klinik und Poliklinik für Innere Medizin II > Kardiologie - Forschung > Universitätsklinikum Regensburg > B3 1.388 > Franz-Josef-Strauss-Allee 11 > 93053 Regensburg > Tel.: +49-941-944-7230 > > > ---------------------------(end of > broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org >
Christian Rengstl wrote: > Hi, > > the complete query is the one i posted, but here comes the schema for mytable: > entry_no int8 NOT NULL DEFAULT nextval('entry_no_seq''::regclass), > pid varchar(15) NOT NULL, > crit varchar(13) NOT NULL, > val1 varchar(1), > val2 varchar(1), > aendat text, > aennam varchar(8), > CONSTRAINT "PK_ENTRY" PRIMARY KEY (entry_no) > > myCritTable: > crit varchar(13) NOT NULL, > chr int2, > aendat timestamp, > CONSTRAINT pk_crit_master PRIMARY KEY (crit) Still doesn't match the EXPLAIN output - where's snp_id? Where's table test2? > My server is 8.1.4. As a matter of fact, i have no idea where the text > type comes from, because as you can see from above there are only > varchar with maximum 15 characters. PG is casting it to text. There's no real difference between the types (other than the size limit) and it's not expensive. > "Hakan Kocaman" <Hakan.Kocaman@digame.de> wrote on 08/03/06 10:34 am: >> Hi, >> >> can you post the complete query,schema- and >> table-definition,server-version etc. ? >> This will help to identity the main problem. >> >> So at the moment i'm just guessing: >> >> Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text) >> -> Bitmap Heap Scan on test2 (cost=232.92..132766.66 rows=37120 >> width=23) >> (actual time=291.600..356707.737 rows=37539 loops=1) >> This part is very expensive, but i got no clue why. Yep, it looks like the "Bitmap Heap Scan" is at the heart of this. You might want to increase work_mem, it could be that the bitmap is spilling to disk (which is much slower than keeping it all in RAM) http://www.postgresql.org/docs/8.1/static/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-MEMORY If it's still at 1MB (1024), try 5MB, 10MB, 50MB and see what happens. -- Richard Huxton Archonet Ltd
Hi, > -----Original Message----- > From: Richard Huxton [mailto:dev@archonet.com] > Sent: Thursday, August 03, 2006 11:00 AM > To: Christian Rengstl > Cc: Hakan Kocaman; pgsql-general@postgresql.org > Subject: Re: [GENERAL] Query performance > > > Christian Rengstl wrote: > > Hi, > > > > the complete query is the one i posted, but here comes the > schema for mytable: > > entry_no int8 NOT NULL DEFAULT nextval('entry_no_seq''::regclass), > > pid varchar(15) NOT NULL, > > crit varchar(13) NOT NULL, > > val1 varchar(1), > > val2 varchar(1), > > aendat text, > > aennam varchar(8), > > CONSTRAINT "PK_ENTRY" PRIMARY KEY (entry_no) > > > > myCritTable: > > crit varchar(13) NOT NULL, > > chr int2, > > aendat timestamp, > > CONSTRAINT pk_crit_master PRIMARY KEY (crit) > > Still doesn't match the EXPLAIN output - where's snp_id? > Where's table > test2? > Yep, that bothered me too. > > My server is 8.1.4. As a matter of fact, i have no idea > where the text > > type comes from, because as you can see from above there are only > > varchar with maximum 15 characters. > > PG is casting it to text. There's no real difference between > the types > (other than the size limit) and it's not expensive. But wouldn't a comparison between int4 be much cheaper. If i see smth like "id" (here snp_id) in a fieldname it should be a int-type, i think. > > > "Hakan Kocaman" <Hakan.Kocaman@digame.de> wrote on 08/03/06 > 10:34 am: > >> Hi, > >> > >> can you post the complete query,schema- and > >> table-definition,server-version etc. ? > >> This will help to identity the main problem. > >> > >> So at the moment i'm just guessing: > >> > >> Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text) > >> -> Bitmap Heap Scan on test2 (cost=232.92..132766.66 > rows=37120 > >> width=23) > >> (actual time=291.600..356707.737 rows=37539 loops=1) > >> This part is very expensive, but i got no clue why. > > Yep, it looks like the "Bitmap Heap Scan" is at the heart of > this. You > might want to increase work_mem, it could be that the bitmap > is spilling > to disk (which is much slower than keeping it all in RAM) > > http://www.postgresql.org/docs/8.1/static/runtime-config-resou rce.html#RUNTIME-CONFIG-RESOURCE-MEMORY If it's still at 1MB (1024), try 5MB, 10MB, 50MB and see what happens. -- Richard Huxton Archonet Ltd Hakan Kocaman Software-Development digame.de GmbH Richard-Byrd-Str. 4-8 50829 Köln Tel.: +49 (0) 221 59 68 88 31 Fax: +49 (0) 221 59 68 88 98 Email: hakan.kocaman@digame.de
Hi, i would rather compare int4 too, but the snp_id can be something like "abc123" unfortunately. "Hakan Kocaman" <Hakan.Kocaman@digame.de> wrote on 08/03/06 11:08 am: > Hi, > > >> -----Original Message----- >> From: Richard Huxton [mailto:dev@archonet.com] >> Sent: Thursday, August 03, 2006 11:00 AM >> To: Christian Rengstl >> Cc: Hakan Kocaman; pgsql-general@postgresql.org >> Subject: Re: [GENERAL] Query performance >> >> >> Christian Rengstl wrote: >> > Hi, >> > >> > the complete query is the one i posted, but here comes the >> schema for mytable: >> > entry_no int8 NOT NULL DEFAULT nextval('entry_no_seq''::regclass), >> > pid varchar(15) NOT NULL, >> > crit varchar(13) NOT NULL, >> > val1 varchar(1), >> > val2 varchar(1), >> > aendat text, >> > aennam varchar(8), >> > CONSTRAINT "PK_ENTRY" PRIMARY KEY (entry_no) >> > >> > myCritTable: >> > crit varchar(13) NOT NULL, >> > chr int2, >> > aendat timestamp, >> > CONSTRAINT pk_crit_master PRIMARY KEY (crit) >> >> Still doesn't match the EXPLAIN output - where's snp_id? >> Where's table >> test2? >> > > Yep, that bothered me too. > >> > My server is 8.1.4. As a matter of fact, i have no idea >> where the text >> > type comes from, because as you can see from above there are only >> > varchar with maximum 15 characters. >> >> PG is casting it to text. There's no real difference between >> the types >> (other than the size limit) and it's not expensive. > > But wouldn't a comparison between int4 be much cheaper. > If i see smth like "id" (here snp_id) in a fieldname it should be a > int-type, i think. > >> >> > "Hakan Kocaman" <Hakan.Kocaman@digame.de> wrote on 08/03/06 >> 10:34 am: >> >> Hi, >> >> >> >> can you post the complete query,schema- and >> >> table-definition,server-version etc. ? >> >> This will help to identity the main problem. >> >> >> >> So at the moment i'm just guessing: >> >> >> >> Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text) >> >> -> Bitmap Heap Scan on test2 (cost=232.92..132766.66 >> rows=37120 >> >> width=23) >> >> (actual time=291.600..356707.737 rows=37539 loops=1) >> >> This part is very expensive, but i got no clue why. >> >> Yep, it looks like the "Bitmap Heap Scan" is at the heart of >> this. You >> might want to increase work_mem, it could be that the bitmap >> is spilling >> to disk (which is much slower than keeping it all in RAM) >> >> http://www.postgresql.org/docs/8.1/static/runtime-config-resou > rce.html#RUNTIME-CONFIG-RESOURCE-MEMORY > > If it's still at 1MB (1024), try 5MB, 10MB, 50MB and see what happens. > > -- > Richard Huxton > Archonet Ltd > > > > Hakan Kocaman > Software-Development > > digame.de GmbH > Richard-Byrd-Str. 4-8 > 50829 Köln > > Tel.: +49 (0) 221 59 68 88 31 > Fax: +49 (0) 221 59 68 88 98 > Email: hakan.kocaman@digame.de > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match -- Christian Rengstl M.A. Klinik und Poliklinik für Innere Medizin II Kardiologie - Forschung Universitätsklinikum Regensburg B3 1.388 Franz-Josef-Strauss-Allee 11 93053 Regensburg Tel.: +49-941-944-7230
Hi, maybe you should overthink your db-design, but thats a bit premature whithout your complete table-definitions(including table-names, datatypes, indexes, foreign-key constraints,etc.) If your are using pgadmin3 just cut'n paste the content of the window on the bottom left for the corresponding tables. If you're using psql try \d yur-table-name. Best regards Hakan Kocaman Software-Development digame.de GmbH Richard-Byrd-Str. 4-8 50829 Köln Tel.: +49 (0) 221 59 68 88 31 Fax: +49 (0) 221 59 68 88 98 Email: hakan.kocaman@digame.de > -----Original Message----- > From: Christian Rengstl > [mailto:Christian.Rengstl@klinik.uni-regensburg.de] > Sent: Thursday, August 03, 2006 11:18 AM > To: Richard Huxton; Hakan Kocaman > Cc: pgsql-general@postgresql.org > Subject: Re: [GENERAL] Query performance > > > Hi, > > i would rather compare int4 too, but the snp_id can be > something like "abc123" unfortunately. > > "Hakan Kocaman" <Hakan.Kocaman@digame.de> wrote on 08/03/06 11:08 am: > > Hi, > > > > > >> -----Original Message----- > >> From: Richard Huxton [mailto:dev@archonet.com] > >> Sent: Thursday, August 03, 2006 11:00 AM > >> To: Christian Rengstl > >> Cc: Hakan Kocaman; pgsql-general@postgresql.org > >> Subject: Re: [GENERAL] Query performance > >> > >> > >> Christian Rengstl wrote: > >> > Hi, > >> > > >> > the complete query is the one i posted, but here comes the > >> schema for mytable: > >> > entry_no int8 NOT NULL DEFAULT > nextval('entry_no_seq''::regclass), > >> > pid varchar(15) NOT NULL, > >> > crit varchar(13) NOT NULL, > >> > val1 varchar(1), > >> > val2 varchar(1), > >> > aendat text, > >> > aennam varchar(8), > >> > CONSTRAINT "PK_ENTRY" PRIMARY KEY (entry_no) > >> > > >> > myCritTable: > >> > crit varchar(13) NOT NULL, > >> > chr int2, > >> > aendat timestamp, > >> > CONSTRAINT pk_crit_master PRIMARY KEY (crit) > >> > >> Still doesn't match the EXPLAIN output - where's snp_id? > >> Where's table > >> test2? > >> > > > > Yep, that bothered me too. > > > >> > My server is 8.1.4. As a matter of fact, i have no idea > >> where the text > >> > type comes from, because as you can see from above > there are only > >> > varchar with maximum 15 characters. > >> > >> PG is casting it to text. There's no real difference between > >> the types > >> (other than the size limit) and it's not expensive. > > > > But wouldn't a comparison between int4 be much cheaper. > > If i see smth like "id" (here snp_id) in a fieldname it should be a > > int-type, i think. > > > >> > >> > "Hakan Kocaman" <Hakan.Kocaman@digame.de> wrote on 08/03/06 > >> 10:34 am: > >> >> Hi, > >> >> > >> >> can you post the complete query,schema- and > >> >> table-definition,server-version etc. ? > >> >> This will help to identity the main problem. > >> >> > >> >> So at the moment i'm just guessing: > >> >> > >> >> Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text) > >> >> -> Bitmap Heap Scan on test2 (cost=232.92..132766.66 > >> rows=37120 > >> >> width=23) > >> >> (actual time=291.600..356707.737 rows=37539 loops=1) > >> >> This part is very expensive, but i got no clue why. > >> > >> Yep, it looks like the "Bitmap Heap Scan" is at the heart of > >> this. You > >> might want to increase work_mem, it could be that the bitmap > >> is spilling > >> to disk (which is much slower than keeping it all in RAM) > >> > >> http://www.postgresql.org/docs/8.1/static/runtime-config-resou > > rce.html#RUNTIME-CONFIG-RESOURCE-MEMORY > > > > If it's still at 1MB (1024), try 5MB, 10MB, 50MB and see > what happens. > > > > -- > > Richard Huxton > > Archonet Ltd > > > > > > > > Hakan Kocaman > > Software-Development > > > > digame.de GmbH > > Richard-Byrd-Str. 4-8 > > 50829 Köln > > > > Tel.: +49 (0) 221 59 68 88 31 > > Fax: +49 (0) 221 59 68 88 98 > > Email: hakan.kocaman@digame.de > > > > ---------------------------(end of > broadcast)--------------------------- > > TIP 9: In versions below 8.0, the planner will ignore your desire to > > choose an index scan if your joining column's > datatypes do not > > match > > > -- > Christian Rengstl M.A. > Klinik und Poliklinik für Innere Medizin II > Kardiologie - Forschung > Universitätsklinikum Regensburg > B3 1.388 > Franz-Josef-Strauss-Allee 11 > 93053 Regensburg > Tel.: +49-941-944-7230 > >
Hi, here is the definition of the master table which is inherited by around 30 tables based on the value of chr: CREATE TABLE snp_master ( entry_no int8 NOT NULL DEFAULT nextval('entry_no_seq'::regclass), pid varchar(15) NOT NULL, snp_id varchar(13) NOT NULL, val1 varchar(1), val2 varchar(1), chr int2 NOT NULL, aendat text, aennam varchar(8), CONSTRAINT "PK_ENTRY" PRIMARY KEY (entry_no), CONSTRAINT "UNIQUE_SNP_ALLEL_MASTER" UNIQUE (pid, entry_no, snp_id) ) WITHOUT OIDS; The thing is that i load the data from txt files which themselves have more or less the same structure. So for every pidof 1500 there are up to 42000 different snp_id values and for each of this combinations there are different val1 and val2entries (all together this accounts for up to around 58 million tuples in the biggest table). MyCritTable then just containsthe distinct snp_ids so that at least this query does not take very long any more. CREATE TABLE snps_master ( snp_id varchar(13) NOT NULL, chr int2, aendat timestamp, CONSTRAINT pk_snp_master PRIMARY KEY (snp_id) ) WITHOUT OIDS; Up to now there are no foreign key constraints, as once the data is loaded into the db there will be no update or deleteoperations at all. I only have to export the data to different file formats for which i need the query posted originally. "Hakan Kocaman" <Hakan.Kocaman@digame.de> wrote on 08/03/06 11:36 am: > Hi, > > maybe you should overthink your db-design, but thats a bit premature > whithout your complete > table-definitions(including table-names, datatypes, indexes, > foreign-key constraints,etc.) > > If your are using pgadmin3 just cut'n paste the content of the window > on the bottom left for > the corresponding tables. > > If you're using psql try \d yur-table-name. > > Best regards > > Hakan Kocaman > Software-Development > > digame.de GmbH > Richard-Byrd-Str. 4-8 > 50829 Köln > > Tel.: +49 (0) 221 59 68 88 31 > Fax: +49 (0) 221 59 68 88 98 > Email: hakan.kocaman@digame.de > > > >> -----Original Message----- >> From: Christian Rengstl >> [mailto:Christian.Rengstl@klinik.uni-regensburg.de] >> Sent: Thursday, August 03, 2006 11:18 AM >> To: Richard Huxton; Hakan Kocaman >> Cc: pgsql-general@postgresql.org >> Subject: Re: [GENERAL] Query performance >> >> >> Hi, >> >> i would rather compare int4 too, but the snp_id can be >> something like "abc123" unfortunately. >> >> "Hakan Kocaman" <Hakan.Kocaman@digame.de> wrote on 08/03/06 11:08 am: >> > Hi, >> > >> > >> >> -----Original Message----- >> >> From: Richard Huxton [mailto:dev@archonet.com] >> >> Sent: Thursday, August 03, 2006 11:00 AM >> >> To: Christian Rengstl >> >> Cc: Hakan Kocaman; pgsql-general@postgresql.org >> >> Subject: Re: [GENERAL] Query performance >> >> >> >> >> >> Christian Rengstl wrote: >> >> > Hi, >> >> > >> >> > the complete query is the one i posted, but here comes the >> >> schema for mytable: >> >> > entry_no int8 NOT NULL DEFAULT >> nextval('entry_no_seq''::regclass), >> >> > pid varchar(15) NOT NULL, >> >> > crit varchar(13) NOT NULL, >> >> > val1 varchar(1), >> >> > val2 varchar(1), >> >> > aendat text, >> >> > aennam varchar(8), >> >> > CONSTRAINT "PK_ENTRY" PRIMARY KEY (entry_no) >> >> > >> >> > myCritTable: >> >> > crit varchar(13) NOT NULL, >> >> > chr int2, >> >> > aendat timestamp, >> >> > CONSTRAINT pk_crit_master PRIMARY KEY (crit) >> >> >> >> Still doesn't match the EXPLAIN output - where's snp_id? >> >> Where's table >> >> test2? >> >> >> > >> > Yep, that bothered me too. >> > >> >> > My server is 8.1.4. As a matter of fact, i have no idea >> >> where the text >> >> > type comes from, because as you can see from above >> there are only >> >> > varchar with maximum 15 characters. >> >> >> >> PG is casting it to text. There's no real difference between >> >> the types >> >> (other than the size limit) and it's not expensive. >> > >> > But wouldn't a comparison between int4 be much cheaper. >> > If i see smth like "id" (here snp_id) in a fieldname it should be a >> > int-type, i think. >> > >> >> >> >> > "Hakan Kocaman" <Hakan.Kocaman@digame.de> wrote on 08/03/06 >> >> 10:34 am: >> >> >> Hi, >> >> >> >> >> >> can you post the complete query,schema- and >> >> >> table-definition,server-version etc. ? >> >> >> This will help to identity the main problem. >> >> >> >> >> >> So at the moment i'm just guessing: >> >> >> >> >> >> Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text) >> >> >> -> Bitmap Heap Scan on test2 (cost=232.92..132766.66 >> >> rows=37120 >> >> >> width=23) >> >> >> (actual time=291.600..356707.737 rows=37539 loops=1) >> >> >> This part is very expensive, but i got no clue why. >> >> >> >> Yep, it looks like the "Bitmap Heap Scan" is at the heart of >> >> this. You >> >> might want to increase work_mem, it could be that the bitmap >> >> is spilling >> >> to disk (which is much slower than keeping it all in RAM) >> >> >> >> http://www.postgresql.org/docs/8.1/static/runtime-config-resou >> > rce.html#RUNTIME-CONFIG-RESOURCE-MEMORY >> > >> > If it's still at 1MB (1024), try 5MB, 10MB, 50MB and see >> what happens. >> > >> > -- >> > Richard Huxton >> > Archonet Ltd >> > >> > >> > >> > Hakan Kocaman >> > Software-Development >> > >> > digame.de GmbH >> > Richard-Byrd-Str. 4-8 >> > 50829 Köln >> > >> > Tel.: +49 (0) 221 59 68 88 31 >> > Fax: +49 (0) 221 59 68 88 98 >> > Email: hakan.kocaman@digame.de >> > >> > ---------------------------(end of >> broadcast)--------------------------- >> > TIP 9: In versions below 8.0, the planner will ignore your desire to >> > choose an index scan if your joining column's >> datatypes do not >> > match >> >> >> -- >> Christian Rengstl M.A. >> Klinik und Poliklinik für Innere Medizin II >> Kardiologie - Forschung >> Universitätsklinikum Regensburg >> B3 1.388 >> Franz-Josef-Strauss-Allee 11 >> 93053 Regensburg >> Tel.: +49-941-944-7230 >> >> -- Christian Rengstl M.A. Klinik und Poliklinik für Innere Medizin II Kardiologie - Forschung Universitätsklinikum Regensburg B3 1.388 Franz-Josef-Strauss-Allee 11 93053 Regensburg Tel.: +49-941-944-7230
> i have a table with around 57 million tuples, with the following columns: pid(varchar), crit(varchar), val1(varchar), val2(varchar).Example: > pid crit val1 val2 > p1 c1 x y > p1 c2 x z > p1 c3 y x > ... > What i am doing is to query all val1 and val2 for one pid and all crit values: > > select val1, val2, crit from mytable where pid='somepid' and crit in(select crit from myCritTable); > where myCritTable is a table that contains all crit values (around 42.000) ordered by their insertion date. In case myCritTable doesn't change a lot and this select by contrast is executed a lot, have you considered precomputing whether a record from your big table has a crit value from myCritTable? Of course this info would be invalidated each time myCritTable is updated, so you would trade fast selects on the big table vs. slow updates on myCritTable. Don't know wether that makes sence for you... Bye, Chris.