Thread: Trivial query, large query, but very sad results.

Trivial query, large query, but very sad results.

From
Alexander Haväng
Date:
Hello.

I'm using postgresql 7.1.3 (linux 2.4.17, 2*1.13 Mhz SMP), and I've created a table with 10 million rows.

When I do a query, say,
SELECT count(id) FROM bigtable WHERE somestring='something';
then what I would expect from the database is that it would completely hog the cpu
and disk i/o for a while until the query is completed, which on this machine well..
should take one or two minutes. Instead, it takes 12 minutes.

Doing the "same" thing with grep(1) takes 37 seconds.

Now, of course I realize the difference between these two scenarios, but bear with me.

During the grep(1)-run, grep uses about 20% userspace cpu, and vmstat(1) reveals a throughput
of 24 MB/s.

During the postgres query, postgres uses about 4-5% userspace cpu, and vmstat(1) reveals a throughput
of 2 MB/s.

System cpu is about 8% in both cases. All cpu figures are from top(1) output.

So my question is.. why the hell doesn't it use more cpu and more i/o resources when they are available?
Is there anything tunable in postgres that would increase performance on DB:s like this?

Regards,
Alexander

Re: Trivial query, large query, but very sad results.

From
Tom Lane
Date:
=?iso-8859-1?Q?Alexander_Hav=E4ng?= <eel@musiknet.se> writes:
> I'm using postgresql 7.1.3 (linux 2.4.17, 2*1.13 Mhz SMP), and I've created a table with 10 million rows.

> When I do a query, say,
> SELECT count(id) FROM bigtable WHERE somestring='something';

What's the query plan (see EXPLAIN)?  What fraction of the rows are
actually selected by that WHERE clause?  Is there anything else
going on in Postgres?

7.1.* is known to have performance problems on SMP machines, but I do
not think the problems affect a single-query situation.  On the other
hand, if you are running multiple queries in parallel, that could
explain your poor results.  Try 7.2...

            regards, tom lane

Re: Trivial query, large query, but very sad results.

From
Dan Langille
Date:
On Tue, 2 Apr 2002, [iso-8859-1] Alexander Hav�ng wrote:

> When I do a query, say,
> SELECT count(id) FROM bigtable WHERE somestring='something';
> then what I would expect from the database is that it would completely hog the cpu
> and disk i/o for a while until the query is completed, which on this machine well..
> should take one or two minutes. Instead, it takes 12 minutes.

Out of interest, what results do you get from select count(*) ...?  I
would expect it to be faster.



Re: Trivial query, large query, but very sad results.

From
Dan Langille
Date:
On Tue, 2 Apr 2002, Dan Langille wrote:

> On Tue, 2 Apr 2002, [iso-8859-1] Alexander Hav�ng wrote:
>
> > When I do a query, say,
> > SELECT count(id) FROM bigtable WHERE somestring='something';
> > then what I would expect from the database is that it would completely hog the cpu
> > and disk i/o for a while until the query is completed, which on this machine well..
> > should take one or two minutes. Instead, it takes 12 minutes.
>
> Out of interest, what results do you get from select count(*) ...?  I
> would expect it to be faster.

I also meant to ask: Do you have any indexes?  On id?  On somestring?


Re: Trivial query, large query, but very sad results.

From
Alexander Haväng
Date:
> What's the query plan (see EXPLAIN)?  What fraction of the rows are
> actually selected by that WHERE clause?  Is there anything else
> going on in Postgres?

This is the only query running.

I browsed through the mail archive and came across the thread about poor performance
on large tables even when using indices, and someone suggested the use of the (for me previously unknown) cluster
command.

Using cluster, and then vacuum analyze, cut the count() query down from 12 minutes to
7 seconds. Amazing :)

Here's some info on the table and queries, if you're interested.

Table is now clustered on the service-index.
nnectionstats"
  Attribute  |           Type           | Modifier
-------------+--------------------------+----------
 time        | timestamp with time zone |
 client      | integer                  |
 server      | integer                  |
 protocol    | smallint                 |
 service     | character varying(80)    |
 server_port | integer                  |
 inbound     | bigint                   |
 outbound    | bigint                   |
Indices: service_idx,
         client_idx

 Index "service_idx"
 Attribute |         Type
-----------+-----------------------
 service   | character varying(80)
btree

 Index "client_idx"
 Attribute |  Type
-----------+---------
 client    | integer
btree

orvar=# explain verbose select count(*) from connectionstats where service='ftp'
;
NOTICE:  QUERY DUMP:

{ AGG :startup_cost 201487.12 :total_cost 201487.12 :rows 1 :width 0 :qptargetli
st ({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 23 :restypmod -1 :resname c
ount :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { AGGREG :a
ggname count :basetype 0 :aggtype 23 :target { CONST :consttype 23 :constlen 4 :
constbyval true :constisnull false :constvalue  4 [ 1 0 0 0 ] } :aggstar true :a
ggdistinct false }}) :qpqual <> :lefttree { INDEXSCAN :startup_cost 0.00 :total_
cost 201309.18 :rows 71177 :width 0 :qptargetlist <> :qpqual <> :lefttree <> :ri
ghttree <> :extprm () :locprm () :initplan <> :nprm 0  :scanrelid 1 :indxid ( 11
162877) :indxqual (({ EXPR :typeOid 16  :opType op :oper { OPER :opno 1062 :opid
 1070 :opresulttype 16 } :args ({ VAR :varno 1 :varattno 1 :vartype 1043 :vartyp
mod 84  :varlevelsup 0 :varnoold 1 :varoattno 5} { CONST :consttype 1043 :constl
en -1 :constbyval false :constisnull false :constvalue  7 [ 7 0 0 0 102 116 112
] })})) :indxqualorig (({ EXPR :typeOid 16  :opType op :oper { OPER :opno 1062 :
opid 1070 :opresulttype 16 } :args ({ VAR :varno 1 :varattno 5 :vartype 1043 :va
rtypmod 84  :varlevelsup 0 :varnoold 1 :varoattno 5} { CONST :consttype 1043 :co
nstlen -1 :constbyval false :constisnull false :constvalue  7 [ 7 0 0 0 102 116
112 ] })})) :indxorderdir 1 } :righttree <> :extprm () :locprm () :initplan <> :
nprm 0 }
NOTICE:  QUERY PLAN:

Aggregate  (cost=201487.12..201487.12 rows=1 width=0)
  ->  Index Scan using brunfitta on connectionstats  (cost=0.00..201309.18 rows=
71177 width=0)

EXPLAIN

orvar=# select timestamp 'now'; select count(*) from connectionstats WHERE client=1; select timestamp 'now';
        ?column?
------------------------
 2002-04-02 17:53:27+02
(1 row)

 count
-------
    65
(1 row)

        ?column?
------------------------
 2002-04-02 17:53:28+02
(1 row)

orvar=# select timestamp 'now'; select count(*) from connectionstats WHERE service='ftp'; select timestamp 'now';
        ?column?
------------------------
 2002-04-02 17:53:40+02
(1 row)

  count
---------
 1016614
(1 row)

        ?column?
------------------------
 2002-04-02 17:53:47+02
(1 row)

Now, in the real world, my queries are somewhat more complex than these, and I would like some tables to be "clustered"
onmore than one index.. which is confusing at best :) 
But this will do for now.. thanks for the input. (I'll go try 7.2 tomorrow and see if that helps even further).

Cheers,
Alexander