Re: vacuum locking - Mailing list pgsql-performance

From Rob Nagler
Subject Re: vacuum locking
Date
Msg-id 16285.17811.994000.543635@gargle.gargle.HOWL
Whole thread Raw
In response to Re: vacuum locking  (Greg Stark <gsstark@mit.edu>)
Responses Re: vacuum locking
List pgsql-performance
Greg Stark writes:
> Sorry I was unclear. By "usual case" I meant reading, as opposed to updates.
> The size of the on-disk representation turns out to be a major determinant in
> a lot of database applications, since the dominant resource is i/o bandwidth.
> Try doing a fresh import of a large table with pctfree 0 pctuse 100 and
> compare how long a select takes on it compared to the original table.

BTW, I greatly appreciate your support on this stuff.  This list is a
fantastic resource.

I think we agree.  The question is what is the workload.  On tables
without updates, postgres will be fast enough.  However, postgres is
slow on tables with updates afaict.  I think of OLTP as a system with
updates.  One can do DSS on an OLTP database with Oracle, at least it
seems to work for one of our projects.

> FIrstly, that type of query will be faster in 7.4 due to implementing a new
> method for doing groups called hash aggregates.

We'll be trying it as soon as it is out.

> Secondly you could try raising sort_mem. Postgres can't know how much memory
> it really has before it swaps, so there's a parameter to tell it. And swapping
> would be much worse than doing disk sorts.

It is at 8000.  This is probably as high as I can go with multiple
postmasters.  The sort area is shared in Oracle (I think :-) in the
UGA.

> You can raise sort_mem to tell it how much memory it's allowed to
> use before it goes to disk sorts. You can even use ALTER SESSION to
> raise it in a few DSS sessions but leave it low the many OLTP
> sessions. If it's high in OLTP sessions then you could quickly hit
> swap when they all happen to decide to use the maximum amount at the
> same time. But then you don't want to be doing big sorts in OLTP
> sessions anyways.

This is a web app.  I can't control what the user wants to do.
Sometimes they update data, and other times they simply look at it.

I didn't find ALTER SESSION for postgres (isn't that Oracle?), so I
set sort_mem in the conf file to 512000, restarted postrgres.  Reran
the simpler query (no name) 3 times, and it was still 27 secs.

> Unfortunately there's no way to tell how much memory it thinks it's
> going to use. I used to use a script to monitor the pgsql_tmp
> directory in the database to watch for usage.

I don't have to.  The queries that run slow are hitting disk.
Anything that takes a minute has to be writing to disk.

> Well, first of all it doesn't really because you said to group by t2.name not
> f1. You might expect it to at least optimize something like this:

I put f2 in the group by, and it doesn't matter.  That's the point.
It's the on-disk sort before the aggregate that's killing the query.

> but even then I don't think it actually is capable of using foreign keys as a
> hint like that. I don't think Oracle does either actually, but I'm not sure.

I'll be finding out this week.

> To convince it to do the right thing you would have to do either:
>
> SELECT a, t2.name
>   FROM (SELECT avg(f1),f2 FROM t1 GROUP BY f2) AS t1
>   JOIN t2 USING (f2)
>
> Or use a subquery:
>
> SELECT a, (SELECT name FROM t2 WHERE t2.f2 = t1.f2)
>   FROM t1
>  GROUP BY f2

This doesn't solve the problem.  It's the GROUP BY that is doing the
wrong thing.  It's grouping, then aggregating.

Rob



pgsql-performance by date:

Previous
From: Damien Dougan
Date:
Subject: Very Poor Insert Performance
Next
From: Rob Nagler
Date:
Subject: Re: vacuum locking