Re: strategies for dealing with frequently updated tables - Mailing list pgsql-general
From | Dave Vitek |
---|---|
Subject | Re: strategies for dealing with frequently updated tables |
Date | |
Msg-id | 4F4D3487.2030907@grammatech.com Whole thread Raw |
In response to | Re: strategies for dealing with frequently updated tables (Andy Colson <andy@squeakycode.net>) |
Responses |
Re: strategies for dealing with frequently updated tables
|
List | pgsql-general |
On 2/27/2012 10:29 PM, Andy Colson wrote: > On 02/27/2012 06:55 PM, Dave Vitek wrote: >> Hi all, >> >> I have a relation where a tuple typically undergoes a lifecycle >> something like: >> 1) Created >> 2) Updated maybe thousands of times (no updates to indexed columns >> though) >> 3) Rarely or never modified again >> >> The following query takes about 100 minutes (3 seconds per tuple): >> SELECT count(id) from T >> >> (2) is causing a lot of auto vacuum/analyze activity (which is OK). >> HOT seems to get used for about 90% of the updates, but there are >> enough updates that don't use it to cause issues. I'm using pg >> version 9.0.3 on a 32-bit windows xp machine with 3GB of RAM. The >> .conf file is using default settings. >> >> Table Info: >> n_live_tup 1799 >> n_dead_tup 191 >> pg_relation_size 2343mb >> indexsize 10mb >> toastsize 552kb >> toastindexsize 16kb >> >> This reports under 10kb for most tuples: >> psql -A -c "select * from T where id=123" | wc -c >> >> auto-vacuum and auto-analyze both ran yesterday with default >> settings. There are only one or two new tuples since yesterday. The >> database is fairly old (was probably created using pg_restore about >> when 9.0.3 came out). >> >> Here is the output from VACUUM VERBOSE: >> INFO: vacuuming "public.T" >> 0 pages are entirely empty. >> CPU 0.00s/0.00u sec elapsed 0.56 sec. >> VACUUM >> >> >> I imagine CLUSTERing the table would make things happier, but I'm >> hoping for a permanent solution that avoids periodic downtime. >> >> One thought was to partition so rows that are still changing live in >> a separate table from the more stable rows. I imagine the cardinality >> of (2) rarely exceeds 10. Can I still get into performance trouble >> with a table that small after enough updates? Anyone have other ideas? >> >> - Dave >> > > I'm a little confused... whats the problem? Sequential scans of this table seem to be pathologically slow for a table with only 2000 rows. > > >> The following query takes about 100 minutes (3 seconds per tuple): >> SELECT count(id) from T > > so table T has 18,000 rows? (100 * 60 * 3) 100 minutes * 60 (sec/min) / 3 (sec/tuple) = 2000 tuples > >> Table Info: >> n_live_tup 1799 > > Oh, so the table has 1,800 rows? Correct. > > >> This reports under 10kb for most tuples: >> psql -A -c "select * from T where id=123" | wc -c > > 10 kb? what? So that's one row? And that one row is about 10,000 bytes? Correct. I mentioned this since pg_relation_size would suggest that one tuple is using over a meg including overhead, fragmentation, and free space. So only about 1% of the space is used by the payload, which might be a red flag? > > > So you have a large row, who knows how many, and the problem is > "SELECT count(id) from T" is slow? Correct. I think anything requiring a sequential scan of the table takes 100 minutes to run. However, I wanted to be careful about drawing conclusions and just present data. > > You say: > >> 2) Updated maybe thousands of times (no updates to indexed columns >> though) > > Are the updates slow? How slow, and how fast to you need them to be? The updates are perfectly responsive. > Is ID a unique key? Or does "update ... where id=123" update more > than one row? It's the primary key, it's unique, and it updates a single row. > > What does the actual table look like? Looks like it has lots of > indexes, but I'm just guessing! Are you sure you need all those > indexes? updating a row has to update all those indexes.. I assume, > but you give no details. It has 5 indices, which are warranted, but writes are performing fine -- it's just sequential scans that seem to be pathological. Index scans are fine too. > > Also... in the end, you're on windows, so you probably cant tell me if > you are IO bound, but you probably need a better IO subsystem. And > stop running on raid 5 (which I'm assuming since you don't mention > anything) I think this is typical consumer hardware from maybe 3 years ago. Copying a 1G file on the disk takes... 11 minutes when the machine is under heavy load. There are many instances of this database schema+app, so making it work well on low end hardware is important. > > Wow. Re-reading that I seem to be angry. Please don't take it that > way. Blood sugar is high, makes me angry person. Others will be > along to scold me. Bad Andy! Bad! Go to your room until you find > your happy face. No problem, I should have been clearer about sequential scans being the issue. > > -Andy > ps: based on you're email gramma, I almost changed all your to you're, > just to see if it annoyed you. Sounds like you work for grammar > technology. How embarrassing! After writing code all day I tend to have some natural language issues.
pgsql-general by date: