Re: One tuple per transaction - Mailing list pgsql-performance
From | Tambet Matiisen |
---|---|
Subject | Re: One tuple per transaction |
Date | |
Msg-id | A66A11DBF5525341AEF6B8DE39CDE770088052@black.aprote.com Whole thread Raw |
In response to | One tuple per transaction ("Tambet Matiisen" <t.matiisen@aprote.ee>) |
Responses |
Re: One tuple per transaction
|
List | pgsql-performance |
> ------------------------------ > > Date: Mon, 14 Mar 2005 09:41:30 +0800 > From: "Qingqing Zhou" <zhouqq@cs.toronto.edu> > To: pgsql-performance@postgresql.org > Subject: Re: One tuple per transaction > Message-ID: <d12qcr$a1r$1@news.hub.org> > > ""Tambet Matiisen"" <t.matiisen@aprote.ee> writes ... > > If I'm correct, the dead tuples must be scanned also during > table and > > index scan, so a lot of dead tuples slows down queries > considerably, > > especially when the table doesn't fit into shared buffers any more. > > And as I'm in transaction, I can't VACUUM to get rid of > those tuples. > > In one occasion the page count for a table went from 400 to > 22000 at > > the end. > > Not exactly. The dead tuple in the index will be scanned the > first time (and its pointed heap tuple as well), then we will > mark it dead, then next time we came here, we will know that > the index tuple actually points to a uesless tuple, so we > will not scan its pointed heap tuple. > But the dead index tuple will still be read from disk next time? Maybe really the performance loss will be neglible, but if most of tuples in your table/index are dead, then it might be significant. Consider the often suggested solution for speeding up "select count(*) from table" query: make another table rowcounts and for each of the original tables add insert and delete triggers to update row count in rowcounts table. Actually this is standard denormalization technique, which I use often. For example to ensure that order.total = sum(order_line.total). Now, if typical inserts into your most active table occur in batches of 3 rows, in one transaction, then row count for this table is updated 3 times during transaction. 3 updates generate 3 tuples, while 2 of them are dead from the very start. You effectively commit 2 useless tuples. After millions of inserts you end up with rowcounts table having 2/3 of dead tuples and queries start to slow down. Current solution is to vacuum often. My proposal was to create new tuple only with first update. The next updates in the same transaction would update the existing tuple, not create a new. But as I'm writing this, I'm starting to get some of the associated implementation problems. The updated tuple might not be the same size as previous tuple. Tuple updates are probably not implemented anyway. And for a reason, as disk write takes the same time, regardless if you update or write new data. And tons of other problems, which developers are probably more aware of. But one thing still bothers me. Why is new index tuple generated when I update non-indexed column? OK, I get it again. Index tuple points to heap tuple, thus after update it would point to dead tuple. And as it takes the same time to update pointer or to write a new tuple, it's easier to write a new. Case closed. Tambet
pgsql-performance by date: