Re: vacuum locking - Mailing list pgsql-performance
From | Rob Nagler |
---|---|
Subject | Re: vacuum locking |
Date | |
Msg-id | 16285.17811.994000.543635@gargle.gargle.HOWL Whole thread Raw |
In response to | Re: vacuum locking (Greg Stark <gsstark@mit.edu>) |
Responses |
Re: vacuum locking
|
List | pgsql-performance |
Greg Stark writes: > Sorry I was unclear. By "usual case" I meant reading, as opposed to updates. > The size of the on-disk representation turns out to be a major determinant in > a lot of database applications, since the dominant resource is i/o bandwidth. > Try doing a fresh import of a large table with pctfree 0 pctuse 100 and > compare how long a select takes on it compared to the original table. BTW, I greatly appreciate your support on this stuff. This list is a fantastic resource. I think we agree. The question is what is the workload. On tables without updates, postgres will be fast enough. However, postgres is slow on tables with updates afaict. I think of OLTP as a system with updates. One can do DSS on an OLTP database with Oracle, at least it seems to work for one of our projects. > FIrstly, that type of query will be faster in 7.4 due to implementing a new > method for doing groups called hash aggregates. We'll be trying it as soon as it is out. > Secondly you could try raising sort_mem. Postgres can't know how much memory > it really has before it swaps, so there's a parameter to tell it. And swapping > would be much worse than doing disk sorts. It is at 8000. This is probably as high as I can go with multiple postmasters. The sort area is shared in Oracle (I think :-) in the UGA. > You can raise sort_mem to tell it how much memory it's allowed to > use before it goes to disk sorts. You can even use ALTER SESSION to > raise it in a few DSS sessions but leave it low the many OLTP > sessions. If it's high in OLTP sessions then you could quickly hit > swap when they all happen to decide to use the maximum amount at the > same time. But then you don't want to be doing big sorts in OLTP > sessions anyways. This is a web app. I can't control what the user wants to do. Sometimes they update data, and other times they simply look at it. I didn't find ALTER SESSION for postgres (isn't that Oracle?), so I set sort_mem in the conf file to 512000, restarted postrgres. Reran the simpler query (no name) 3 times, and it was still 27 secs. > Unfortunately there's no way to tell how much memory it thinks it's > going to use. I used to use a script to monitor the pgsql_tmp > directory in the database to watch for usage. I don't have to. The queries that run slow are hitting disk. Anything that takes a minute has to be writing to disk. > Well, first of all it doesn't really because you said to group by t2.name not > f1. You might expect it to at least optimize something like this: I put f2 in the group by, and it doesn't matter. That's the point. It's the on-disk sort before the aggregate that's killing the query. > but even then I don't think it actually is capable of using foreign keys as a > hint like that. I don't think Oracle does either actually, but I'm not sure. I'll be finding out this week. > To convince it to do the right thing you would have to do either: > > SELECT a, t2.name > FROM (SELECT avg(f1),f2 FROM t1 GROUP BY f2) AS t1 > JOIN t2 USING (f2) > > Or use a subquery: > > SELECT a, (SELECT name FROM t2 WHERE t2.f2 = t1.f2) > FROM t1 > GROUP BY f2 This doesn't solve the problem. It's the GROUP BY that is doing the wrong thing. It's grouping, then aggregating. Rob
pgsql-performance by date: