Re: vacuum locking - Mailing list pgsql-performance
From | Rob Nagler |
---|---|
Subject | Re: vacuum locking |
Date | |
Msg-id | 16281.45354.209000.280418@gargle.gargle.HOWL Whole thread Raw |
In response to | Re: vacuum locking (Greg Stark <gsstark@mit.edu>) |
Responses |
Re: vacuum locking
|
List | pgsql-performance |
Greg Stark writes: > Note that pctfree/pctused are a big performance drain on the usual case. Try > setting them to 0/100 on a table that doesn't get updates (like a many-many > relation table) and see how much faster it is to insert and scan. Right. You can optimize each table independently. The "usual" case doesn't exist in most databases, I've found, which is why Oracle does better. > Judging by the number of FAQ lists out there that explain various quirks of > rollback segment configuration I wouldn't say it's so easily configured. Maybe we just got lucky. :-) > The biggest problem is on systems where there's a combination of both users. As is ours. > You need tremendous rollback segments to deal with the huge volume of oltp > transactions that can occur during a single DSS query. And the DSS query > performance is terrible as it has to check the rollback segments for a large > portion of the blocks it reads. The DSS issues only come into play I think if the queries are long. This is our problem. Postgres does a bad job with DSS, I believe. I mentioned the select avg(f1) from t1 group by f2 in another message. If it were optimized for "standard" SQL, such as, avg, sum, etc., I think it would do a lot better with DSS-type problems. Our problem seems to be that the DSS queries almost always hit disk to sort. > Arguably it's the other way around. Postgres's approach wins whenever most of > the tuples in a table have been updated, in that case it just has to scan the > whole table ignoring old records not visible to the transaction. Oracle has to > consult the rollback segment for any recently updated tuple. Oracle's wins in > the case where most of the tuples haven't changed so it can just scan the > table without consulting lots of rollback segments. I see what you're saying. I'm not a db expert, just a programmer trying to make his queries go faster, so I'll acknowledge that the design is theoretically better. In practice, I'm still stuck. As a simple example, this query select avg(f1) from t1 group by f2 Takes 33 seconds (see explain analyze in another note in this thread) to run on idle hardware with about 1GB available in the cache. It's clearly hitting disk to do the sort. Being a dumb programmer, I changed the query to: select f1 from t1; And wrote the rest in Perl. It takes 4 seconds to run. Why? The Perl doesn't sort to disk, it aggregates in memory. There are 18 rows returned. What I didn't mention is that I originally had: select avg(f1), t2.name from t1, t2 where t2.f2 = t1.f2 group by t2.name; Which is much worse: Aggregate (cost=161046.30..162130.42 rows=8673 width=222) (actual time=72069.10..87455.69 rows=18 loops=1) -> Group (cost=161046.30..161479.95 rows=86729 width=222) (actual time=71066.38..78108.17 rows=963660 loops=1) -> Sort (cost=161046.30..161263.13 rows=86729 width=222) (actual time=71066.36..72445.74 rows=963660 loops=1) Sort Key: t2.name -> Merge Join (cost=148030.15..153932.66 rows=86729 width=222) (actual time=19850.52..27266.40 rows=963660loops=1) Merge Cond: ("outer".f2 = "inner".f2) -> Sort (cost=148028.59..150437.74 rows=963660 width=58) (actual time=19850.18..21750.12 rows=963660loops=1) Sort Key: t1.f2 -> Seq Scan on t1 (cost=0.00..32479.60 rows=963660 width=58) (actual time=0.06..3333.39 rows=963660loops=1) -> Sort (cost=1.56..1.60 rows=18 width=164) (actual time=0.30..737.59 rows=931007 loops=1) Sort Key: t2.f2 -> Seq Scan on t2 (cost=0.00..1.18 rows=18 width=164) (actual time=0.05..0.08 rows=18 loops=1) Total runtime: 87550.31 msec Again, there are about 18 values of f2. The optimizer even knows this (it's a foreign key to t2.f2), but instead it does the query plan in exactly the wrong order. It hits disk probably 3 times as much as the simpler query judging by the amount of time this query takes (33 vs 88 secs). BTW, adding an index to t1.f2 has seriously negative effects on many other DSS queries. I'm still not sure that the sort problem is our only problem when vacuum runs. It's tough to pin down. We'll be adding more memory to see if that helps with the disk contention. Rob
pgsql-performance by date: