Thread: Trivial query, large query, but very sad results.
Hello. I'm using postgresql 7.1.3 (linux 2.4.17, 2*1.13 Mhz SMP), and I've created a table with 10 million rows. When I do a query, say, SELECT count(id) FROM bigtable WHERE somestring='something'; then what I would expect from the database is that it would completely hog the cpu and disk i/o for a while until the query is completed, which on this machine well.. should take one or two minutes. Instead, it takes 12 minutes. Doing the "same" thing with grep(1) takes 37 seconds. Now, of course I realize the difference between these two scenarios, but bear with me. During the grep(1)-run, grep uses about 20% userspace cpu, and vmstat(1) reveals a throughput of 24 MB/s. During the postgres query, postgres uses about 4-5% userspace cpu, and vmstat(1) reveals a throughput of 2 MB/s. System cpu is about 8% in both cases. All cpu figures are from top(1) output. So my question is.. why the hell doesn't it use more cpu and more i/o resources when they are available? Is there anything tunable in postgres that would increase performance on DB:s like this? Regards, Alexander
=?iso-8859-1?Q?Alexander_Hav=E4ng?= <eel@musiknet.se> writes: > I'm using postgresql 7.1.3 (linux 2.4.17, 2*1.13 Mhz SMP), and I've created a table with 10 million rows. > When I do a query, say, > SELECT count(id) FROM bigtable WHERE somestring='something'; What's the query plan (see EXPLAIN)? What fraction of the rows are actually selected by that WHERE clause? Is there anything else going on in Postgres? 7.1.* is known to have performance problems on SMP machines, but I do not think the problems affect a single-query situation. On the other hand, if you are running multiple queries in parallel, that could explain your poor results. Try 7.2... regards, tom lane
On Tue, 2 Apr 2002, [iso-8859-1] Alexander Hav�ng wrote: > When I do a query, say, > SELECT count(id) FROM bigtable WHERE somestring='something'; > then what I would expect from the database is that it would completely hog the cpu > and disk i/o for a while until the query is completed, which on this machine well.. > should take one or two minutes. Instead, it takes 12 minutes. Out of interest, what results do you get from select count(*) ...? I would expect it to be faster.
On Tue, 2 Apr 2002, Dan Langille wrote: > On Tue, 2 Apr 2002, [iso-8859-1] Alexander Hav�ng wrote: > > > When I do a query, say, > > SELECT count(id) FROM bigtable WHERE somestring='something'; > > then what I would expect from the database is that it would completely hog the cpu > > and disk i/o for a while until the query is completed, which on this machine well.. > > should take one or two minutes. Instead, it takes 12 minutes. > > Out of interest, what results do you get from select count(*) ...? I > would expect it to be faster. I also meant to ask: Do you have any indexes? On id? On somestring?
> What's the query plan (see EXPLAIN)? What fraction of the rows are > actually selected by that WHERE clause? Is there anything else > going on in Postgres? This is the only query running. I browsed through the mail archive and came across the thread about poor performance on large tables even when using indices, and someone suggested the use of the (for me previously unknown) cluster command. Using cluster, and then vacuum analyze, cut the count() query down from 12 minutes to 7 seconds. Amazing :) Here's some info on the table and queries, if you're interested. Table is now clustered on the service-index. nnectionstats" Attribute | Type | Modifier -------------+--------------------------+---------- time | timestamp with time zone | client | integer | server | integer | protocol | smallint | service | character varying(80) | server_port | integer | inbound | bigint | outbound | bigint | Indices: service_idx, client_idx Index "service_idx" Attribute | Type -----------+----------------------- service | character varying(80) btree Index "client_idx" Attribute | Type -----------+--------- client | integer btree orvar=# explain verbose select count(*) from connectionstats where service='ftp' ; NOTICE: QUERY DUMP: { AGG :startup_cost 201487.12 :total_cost 201487.12 :rows 1 :width 0 :qptargetli st ({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 23 :restypmod -1 :resname c ount :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { AGGREG :a ggname count :basetype 0 :aggtype 23 :target { CONST :consttype 23 :constlen 4 : constbyval true :constisnull false :constvalue 4 [ 1 0 0 0 ] } :aggstar true :a ggdistinct false }}) :qpqual <> :lefttree { INDEXSCAN :startup_cost 0.00 :total_ cost 201309.18 :rows 71177 :width 0 :qptargetlist <> :qpqual <> :lefttree <> :ri ghttree <> :extprm () :locprm () :initplan <> :nprm 0 :scanrelid 1 :indxid ( 11 162877) :indxqual (({ EXPR :typeOid 16 :opType op :oper { OPER :opno 1062 :opid 1070 :opresulttype 16 } :args ({ VAR :varno 1 :varattno 1 :vartype 1043 :vartyp mod 84 :varlevelsup 0 :varnoold 1 :varoattno 5} { CONST :consttype 1043 :constl en -1 :constbyval false :constisnull false :constvalue 7 [ 7 0 0 0 102 116 112 ] })})) :indxqualorig (({ EXPR :typeOid 16 :opType op :oper { OPER :opno 1062 : opid 1070 :opresulttype 16 } :args ({ VAR :varno 1 :varattno 5 :vartype 1043 :va rtypmod 84 :varlevelsup 0 :varnoold 1 :varoattno 5} { CONST :consttype 1043 :co nstlen -1 :constbyval false :constisnull false :constvalue 7 [ 7 0 0 0 102 116 112 ] })})) :indxorderdir 1 } :righttree <> :extprm () :locprm () :initplan <> : nprm 0 } NOTICE: QUERY PLAN: Aggregate (cost=201487.12..201487.12 rows=1 width=0) -> Index Scan using brunfitta on connectionstats (cost=0.00..201309.18 rows= 71177 width=0) EXPLAIN orvar=# select timestamp 'now'; select count(*) from connectionstats WHERE client=1; select timestamp 'now'; ?column? ------------------------ 2002-04-02 17:53:27+02 (1 row) count ------- 65 (1 row) ?column? ------------------------ 2002-04-02 17:53:28+02 (1 row) orvar=# select timestamp 'now'; select count(*) from connectionstats WHERE service='ftp'; select timestamp 'now'; ?column? ------------------------ 2002-04-02 17:53:40+02 (1 row) count --------- 1016614 (1 row) ?column? ------------------------ 2002-04-02 17:53:47+02 (1 row) Now, in the real world, my queries are somewhat more complex than these, and I would like some tables to be "clustered" onmore than one index.. which is confusing at best :) But this will do for now.. thanks for the input. (I'll go try 7.2 tomorrow and see if that helps even further). Cheers, Alexander