Thread: query performance question

query performance question

From

Marcin Citowicki

Date:

03 June 2008, 05:18:56

Hello,

I have a table (transactions) containing 61 414 503 rows. The basic
count query (select count(transid) from transactions) takes 138226
milliseconds.
This is the query analysis output:

Aggregate  (cost=2523970.79..2523970.80 rows=1 width=8) (actual
time=268964.088..268964.090 rows=1 loops=1);
  ->  Seq Scan on transactions  (cost=0.00..2370433.43 rows=61414943
width=8) (actual time=13.886..151776.860 rows=61414503 loops=1);
Total runtime: 268973.248 ms;

Query has several indexes defined, including one on transid column:

non-unique;index-qualifier;index-name;type;ordinal-position;column-name;asc-or-desc;cardinality;pages;filter-condition

f;<null>;transactions_id_key;3;1;transid;<null>;61414488;168877;<null>;
t;<null>;trans_ip_address_index;3;1;ip_address;<null>;61414488;168598;<null>;
t;<null>;trans_member_id_index;3;1;member_id;<null>;61414488;169058;<null>;
t;<null>;trans_payment_id_index;3;1;payment_id;<null>;61414488;168998;<null>;
t;<null>;trans_status_index;3;1;status;<null>;61414488;169005;<null>;
t;<null>;transactions__time_idx;3;1;time;<null>;61414488;168877;<null>;
t;<null>;transactions_offer_id_idx;3;1;offer_id;<null>;61414488;169017;<null>;

I'm not a dba so I'm not sure if the time it takes to execute this query
is OK or not, it just  seems a bit long to me.
I'd appreciate it if someone could share his/her thoughts on this. Is
there a way to make this table/query perform better?
Any query I'm running that joins with transactions table takes forever
to complete, but maybe this is normal for a table this size.
Regards,

Marcin

Attachment

marcin_citowicki.vcf

Re: query performance question

From

hubert depesz lubaczewski

Date:

03 June 2008, 05:31:54

On Tue, Jun 03, 2008 at 09:57:15AM +0200, Marcin Citowicki wrote:
> I'm not a dba so I'm not sure if the time it takes to execute this query
> is OK or not, it just  seems a bit long to me.

This is perfectly OK. count(*) from table is generally slow. There are
some ways to make it faster (depending if you need exact count, or some
estimate).

> I'd appreciate it if someone could share his/her thoughts on this. Is
> there a way to make this table/query perform better?

You can keep the count of elements in this table in separate table, and
update it with triggers.

> Any query I'm running that joins with transactions table takes forever
> to complete, but maybe this is normal for a table this size.

As for other queries - show them, and their explain analyze.

Performance of count(*) is dependent basically only on size of table. In
case of other queries - it might be simple to optimize them. Or
impossible - without knowing the queries it's impossible to tell.

Do you really care about count(*) from 60m+ record table? How often do
you count the records?

Best regards,

depesz

Re: query performance question

From

Marcin Citowicki

Date:

03 June 2008, 05:58:47

Hello Hubert,

Thank you for your reply. I don't really need to count rows in transactions table, I just thought this was a good example to show how slow the query was.
But based on what you wrote it looks like count(*) is slow in general, so this seems to be OK since the table is rather large.
I just ran other queries (joining transactions table) and they returned quickly, which leads me to believe that there could be a problem not with the database, but with the box
the db is running on. Sometimes those same queries take forever and now they complete in no time at all, so perhaps there is a process that is running periodically which is slowing the db down.
I'll need to take a look at this.
Thank you for your help!

Marcin

hubert depesz lubaczewski wrote:

On Tue, Jun 03, 2008 at 09:57:15AM +0200, Marcin Citowicki wrote:

I'm not a dba so I'm not sure if the time it takes to execute this query 
is OK or not, it just  seems a bit long to me.

This is perfectly OK. count(*) from table is generally slow. There are
some ways to make it faster (depending if you need exact count, or some
estimate).

I'd appreciate it if someone could share his/her thoughts on this. Is 
there a way to make this table/query perform better?

You can keep the count of elements in this table in separate table, and
update it with triggers.

Any query I'm running that joins with transactions table takes forever 
to complete, but maybe this is normal for a table this size.

As for other queries - show them, and their explain analyze.

Performance of count(*) is dependent basically only on size of table. In
case of other queries - it might be simple to optimize them. Or
impossible - without knowing the queries it's impossible to tell.

Do you really care about count(*) from 60m+ record table? How often do
you count the records?

Best regards,

depesz

Attachment

marcin_citowicki.vcf

Re: query performance question

From

tv@fuzzy.cz

Date:

03 June 2008, 06:26:02

Hi,

Hubert already answered your question - it's expected behavior, the
count(*) has to read all the tuples from the table (even dead ones!). So
if you have a really huge table, it will take a long time to read it.

There are several ways to speed it up - some of them are simple (but the
speedup is limited), some of them require change of application logic and
requires to rewrite part of the application (using triggers to count the
rows, etc.)

1) If the transactions have sequential ID without gaps, you may easily
select MAX(id) and that'll give the count. This won't work if some of the
transactions were deleted or if you need to use other filtering criteria.
The needed changes in the application are quite small (basically just a
single SQL query).

2) Move the table to a separate tablespace (a separate disk if possible).
This will speed up the reads, as the table will be 'compact'. This is just
a db change, it does not require change in the application logic. This
will give you some speedup, but not as good as 1) or 3).

3) Build a table with totals or maybe subtotals, updated by triggers. This
requires serious changes in application as well as in database, but solves
issues of 1) and may give you even better results.

Tomas

> Hello,
>
> I have a table (transactions) containing 61 414 503 rows. The basic
> count query (select count(transid) from transactions) takes 138226
> milliseconds.
> This is the query analysis output:
>
> Aggregate  (cost=2523970.79..2523970.80 rows=1 width=8) (actual
> time=268964.088..268964.090 rows=1 loops=1);
>   ->  Seq Scan on transactions  (cost=0.00..2370433.43 rows=61414943
> width=8) (actual time=13.886..151776.860 rows=61414503 loops=1);
> Total runtime: 268973.248 ms;
>
> Query has several indexes defined, including one on transid column:
>
>
non-unique;index-qualifier;index-name;type;ordinal-position;column-name;asc-or-desc;cardinality;pages;filter-condition
>
> f;<null>;transactions_id_key;3;1;transid;<null>;61414488;168877;<null>;
> t;<null>;trans_ip_address_index;3;1;ip_address;<null>;61414488;168598;<null>;
> t;<null>;trans_member_id_index;3;1;member_id;<null>;61414488;169058;<null>;
> t;<null>;trans_payment_id_index;3;1;payment_id;<null>;61414488;168998;<null>;
> t;<null>;trans_status_index;3;1;status;<null>;61414488;169005;<null>;
> t;<null>;transactions__time_idx;3;1;time;<null>;61414488;168877;<null>;
> t;<null>;transactions_offer_id_idx;3;1;offer_id;<null>;61414488;169017;<null>;
>
> I'm not a dba so I'm not sure if the time it takes to execute this query
> is OK or not, it just  seems a bit long to me.
> I'd appreciate it if someone could share his/her thoughts on this. Is
> there a way to make this table/query perform better?
> Any query I'm running that joins with transactions table takes forever
> to complete, but maybe this is normal for a table this size.
> Regards,
>
> Marcin
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: query performance question

From

PFC

Date:

03 June 2008, 21:04:10

> Thank you for your reply. I don't really need to count rows in
> transactions table, I just thought this was a good example to show how
> slow the query was.

    Usually you're more interested in the performance of the queries you need
to make rather than the ones you don't need to make ;)

> But based on what you wrote it looks like count(*) is slow in general,
> so this seems to be OK since the table is rather large.

    Well any query that needs to scan 60 million rows will be slow...
    Now understand that this is not a problem with count(*) which can be very
fast if you "select count(*) where..." and the condition in the where
produces a reasonable number of rows to count, it is just a problem of
having to scan the 60 million rows. But fortunately since it is perfectly
useless to know the rowcount of this 60 million table with a perfect
precision you never need to make this query ;)

> I just ran other queries (joining transactions table) and they returned
> quickly, which leads me to believe that there could be a problem not
> with the database, but with the box
> the db is running on. Sometimes those same queries take forever and now
> they complete in no time at all, so perhaps there is a process that is
> running periodically which is slowing the db down.

    Then if you have specific queries that you need to optimize you will need
to run EXPLAIN ANALYZE on them and post the results, when they are fast
and when they are slow to see if there is a difference in plans. Also the
output from vmstat in times of big slowness can provide useful
information. Crosschecking with your cron jobs, etc is a good idea. Also
the usual suspects, like are your tables VACUUM'd and ANALYZE'd etc.

Re: query performance question

From

Dan Harris

Date:

05 June 2008, 13:08:50

tv@fuzzy.cz wrote:
>
> 3) Build a table with totals or maybe subtotals, updated by triggers. This
> requires serious changes in application as well as in database, but solves
> issues of 1) and may give you even better results.
>
> Tomas
>
>
I have tried this.  It's not a magic bullet.  We do our billing based on
counts from huge tables, so accuracy is important to us.  I tried
implementing such a scheme and ended up abandoning it because the
summary table became so full of dead tuples during and after large bulk
inserts that it slowed down selects on that table to an unacceptable
speed.  Even with a VACUUM issued every few hundred inserts, it still
bogged down due to the constant churn of the inserts.

I ended up moving this count tracking into the application level.  It's
messy and only allows a single instance of an insert program due to the
localization of the counts in program memory, but it was the only way I
found to avoid the penalty of constant table churn on the triggered inserts.

-Dan

Re: query performance question

From

Kenneth Marshall

Date:

05 June 2008, 14:16:49

Dan,

Did you try this with 8.3 and its new HOT functionality?

Ken

On Thu, Jun 05, 2008 at 09:43:06AM -0600, Dan Harris wrote:
> tv@fuzzy.cz wrote:
>>
>> 3) Build a table with totals or maybe subtotals, updated by triggers. This
>> requires serious changes in application as well as in database, but solves
>> issues of 1) and may give you even better results.
>>
>> Tomas
>>
>>
> I have tried this.  It's not a magic bullet.  We do our billing based on
> counts from huge tables, so accuracy is important to us.  I tried
> implementing such a scheme and ended up abandoning it because the summary
> table became so full of dead tuples during and after large bulk inserts
> that it slowed down selects on that table to an unacceptable speed.  Even
> with a VACUUM issued every few hundred inserts, it still bogged down due to
> the constant churn of the inserts.
> I ended up moving this count tracking into the application level.  It's
> messy and only allows a single instance of an insert program due to the
> localization of the counts in program memory, but it was the only way I
> found to avoid the penalty of constant table churn on the triggered
> inserts.
>
> -Dan
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: query performance question

From

Dan Harris

Date:

05 June 2008, 17:23:22

Kenneth Marshall wrote:
> Dan,
>
> Did you try this with 8.3 and its new HOT functionality?
>
> Ken
>
I did not.  I had to come up with the solution before we were able to
move to 8.3.  But, Tom did mention that the HOT might help and I forgot
about that when writing the prior message.   I'm in the midst of moving
30 databases from 8.0 to 8.3 at the moment but when I'm finished, I
might have time to test it.

-Dan