Re: strategies for dealing with frequently updated tables - Mailing list pgsql-general

From Dave Vitek
Subject Re: strategies for dealing with frequently updated tables
Date
Msg-id 4F4D3487.2030907@grammatech.com
Whole thread Raw
In response to Re: strategies for dealing with frequently updated tables  (Andy Colson <andy@squeakycode.net>)
Responses Re: strategies for dealing with frequently updated tables
List pgsql-general
On 2/27/2012 10:29 PM, Andy Colson wrote:
> On 02/27/2012 06:55 PM, Dave Vitek wrote:
>> Hi all,
>>
>> I have a relation where a tuple typically undergoes a lifecycle
>> something like:
>> 1) Created
>> 2) Updated maybe thousands of times (no updates to indexed columns
>> though)
>> 3) Rarely or never modified again
>>
>> The following query takes about 100 minutes (3 seconds per tuple):
>> SELECT count(id) from T
>>
>> (2) is causing a lot of auto vacuum/analyze activity (which is OK).
>> HOT seems to get used for about 90% of the updates, but there are
>> enough updates that don't use it to cause issues. I'm using pg
>> version 9.0.3 on a 32-bit windows xp machine with 3GB of RAM. The
>> .conf file is using default settings.
>>
>> Table Info:
>> n_live_tup 1799
>> n_dead_tup 191
>> pg_relation_size 2343mb
>> indexsize 10mb
>> toastsize 552kb
>> toastindexsize 16kb
>>
>> This reports under 10kb for most tuples:
>> psql -A -c "select * from T where id=123" | wc -c
>>
>> auto-vacuum and auto-analyze both ran yesterday with default
>> settings. There are only one or two new tuples since yesterday. The
>> database is fairly old (was probably created using pg_restore about
>> when 9.0.3 came out).
>>
>> Here is the output from VACUUM VERBOSE:
>> INFO: vacuuming "public.T"
>> 0 pages are entirely empty.
>> CPU 0.00s/0.00u sec elapsed 0.56 sec.
>> VACUUM
>>
>>
>> I imagine CLUSTERing the table would make things happier, but I'm
>> hoping for a permanent solution that avoids periodic downtime.
>>
>> One thought was to partition so rows that are still changing live in
>> a separate table from the more stable rows. I imagine the cardinality
>> of (2) rarely exceeds 10. Can I still get into performance trouble
>> with a table that small after enough updates? Anyone have other ideas?
>>
>> - Dave
>>
>
> I'm a little confused... whats the problem?
Sequential scans of this table seem to be pathologically slow for a
table with only 2000 rows.
>
>
>> The following query takes about 100 minutes (3 seconds per tuple):
>> SELECT count(id) from T
>
> so table T has 18,000 rows?   (100 * 60 * 3)
100 minutes * 60 (sec/min) / 3 (sec/tuple) = 2000 tuples
>
>> Table Info:
>> n_live_tup 1799
>
> Oh, so the table has 1,800 rows?
Correct.
>
>
>> This reports under 10kb for most tuples:
>> psql -A -c "select * from T where id=123" | wc -c
>
> 10 kb?  what? So that's one row?  And that one row is about 10,000 bytes?
Correct.  I mentioned this since pg_relation_size would suggest that one
tuple is using over a meg including overhead, fragmentation, and free
space.  So only about 1% of the space is used by the payload, which
might be a red flag?
>
>
> So you have a large row, who knows how many, and the problem is
> "SELECT count(id) from T" is slow?
Correct. I think anything requiring a sequential scan of the table takes
100 minutes to run.  However, I wanted to be careful about drawing
conclusions and just present data.
>
> You say:
>
>> 2) Updated maybe thousands of times (no updates to indexed columns
>> though)
>
> Are the updates slow?  How slow, and how fast to you need them to be?
The updates are perfectly responsive.
>   Is ID a unique key? Or does "update ... where id=123" update more
> than one row?
It's the primary key, it's unique, and it updates a single row.
>
> What does the actual table look like?  Looks like it has lots of
> indexes, but I'm just guessing!  Are you sure you need all those
> indexes?  updating a row has to update all those indexes.. I assume,
> but you give no details.
It has 5 indices, which are warranted, but writes are performing fine --
it's just sequential scans that seem to be pathological.  Index scans
are fine too.
>
> Also... in the end, you're on windows, so you probably cant tell me if
> you are IO bound, but you probably need a better IO subsystem.  And
> stop running on raid 5 (which I'm assuming since you don't mention
> anything)
I think this is typical consumer hardware from maybe 3 years ago.
Copying a 1G file on the disk takes... 11 minutes when the machine is
under heavy load.  There are many instances of this database schema+app,
so making it work well on low end hardware is important.
>
> Wow.  Re-reading that I seem to be angry.  Please don't take it that
> way.  Blood sugar is high, makes me angry person.  Others will be
> along to scold me.  Bad Andy!  Bad!  Go to your room until you find
> your happy face.
No problem, I should have been clearer about sequential scans being the
issue.
>
> -Andy
> ps: based on you're email gramma, I almost changed all your to you're,
> just to see if it annoyed you.  Sounds like you work for grammar
> technology.
How embarrassing!  After writing code all day I tend to have some
natural language issues.

pgsql-general by date:

Previous
From: "Lummis, Patrick J"
Date:
Subject: Stored Procedure Record Updates using For Loops - Postgres 8.1
Next
From: Bartosz Dmytrak
Date:
Subject: Re: Stored Procedure Record Updates using For Loops - Postgres 8.1