Re: Really really slow select count(*) - Mailing list pgsql-performance

From Merlin Moncure
Subject Re: Really really slow select count(*)
Date
Msg-id AANLkTinD0Ks+nnQZNVgA_r9aOtMMhnOG7ErPcvmJFjPw@mail.gmail.com
Whole thread Raw
In response to Really really slow select count(*)  (felix <crucialfelix@gmail.com>)
List pgsql-performance
On Fri, Feb 4, 2011 at 8:46 AM, felix <crucialfelix@gmail.com> wrote:
>
> I am having huge performance problems with a table. Performance deteriorates
> every day and I have to run REINDEX and ANALYZE on it every day.  auto
> vacuum is on.  yes, I am reading the other thread about count(*) :)
> but obviously I'm doing something wrong here
>
> explain analyze select count(*) from fastadder_fastadderstatus;
> Aggregate  (cost=62458.73..62458.74 rows=1 width=0) (actual
> time=77130.000..77130.000 rows=1 loops=1)
>    ->  Seq Scan on fastadder_fastadderstatus  (cost=0.00..61701.18
> rows=303018 width=0) (actual time=50.000..76930.000 rows=302479 loops=1)
>  Total runtime: 77250.000 ms
> directly after REINDEX and ANALYZE:
>  Aggregate  (cost=62348.70..62348.71 rows=1 width=0) (actual
> time=15830.000..15830.000 rows=1 loops=1)
>    ->  Seq Scan on fastadder_fastadderstatus  (cost=0.00..61613.16
> rows=294216 width=0) (actual time=30.000..15570.000 rows=302479 loops=1)
>  Total runtime: 15830.000 ms
> still very bad for a 300k row table
> a similar table:
> explain analyze select count(*) from fastadder_fastadderstatuslog;
>  Aggregate  (cost=8332.53..8332.54 rows=1 width=0) (actual
> time=1270.000..1270.000 rows=1 loops=1)
>    ->  Seq Scan on fastadder_fastadderstatuslog  (cost=0.00..7389.02
> rows=377402 width=0) (actual time=0.000..910.000 rows=377033 loops=1)
>  Total runtime: 1270.000 ms
>
> It gets updated quite a bit each day, and this is perhaps the problem.
> To me it doesn't seem like that many updates
> 100-500 rows inserted per day
> no deletes
> 10k-50k updates per day
> mostly of this sort:   set priority=1 where id=12345
> is it perhaps this that is causing the performance problem ?
> I could rework the app to be more efficient and do updates using batches
> where id IN (1,2,3,4...)
> I assume that means a more efficient index update compared to individual
> updates.
> There is one routine that updates position_in_queue using a lot (too many)
> update statements.
> Is that likely to be the culprit ?
> What else can I do to investigate ?

I scanned the thread and I don't think anyone mentioned this: updates
that only hit unindexed columns are much cheaper long term in terms of
bloat purposes than those that touch indexed columns in 8.3+ because
of the 'hot' feature.  Do you really need the priority index?  If you
don't you are much better off without it if priority gets updated a
lot.  Of course, you might still need it -- it's going to depend on
your queries.

On my workstation, I can brute force a 90mb table in about 300ms.
Your table can be much smaller than that if you keep the bloat down
unless your text column is very large and not toasted (how large is it
on average?)

merlin

pgsql-performance by date:

Previous
From: Mark Stosberg
Date:
Subject: Re: application of KNN code to US zipcode searches?
Next
From: Stephen Frost
Date:
Subject: Re: application of KNN code to US zipcode searches?