Thread: index v. seqscan for certain values

index v. seqscan for certain values

From

"Jeremy Dunn"

Date:

12 April 2004, 12:40:36

I've searched the archives and can't find an answer to this seemingly simple question. Apologies if it's too common.

The table in question has ~1.3M rows. It has 85 columns, 5 of which have single-column indexes.

The column in question (CID) has 183 distinct values. For these values, the largest has ~38,000 rows, and the smallest has 1 row. About 30 values have < 100 rows, and about 10 values have > 20,000 rows.

The database is 7.2.3 running on RedHat 7.1. (we are in process of upgrading to PG 7.4.2) All of the query plan options are enabled, and the cpu costs are set to the default values. ( cpu_tuple_cost is 0.01, cpu_index_tuple_cost is 0.001). The database is VACUUM'd every night.

The problem:

A simply query:

select count(*) from xxx where CID=<smalval>

where <smalval> is a CID value which has relatively few rows, returns a plan using the index on that column.

   explain analyze select count(*) from xxx where cid=869366;
   Aggregate (cost=19136.33..19136.33 rows=1 width=0) (actual time=78.49..78.49 rows=1 loops=1)
    -> Index Scan using xxx_cid on emailrcpts (cost=0.00..19122.21 rows=5648 width=0) (actual time=63.40..78.46 rows=1 loops=1)
   Total runtime: 78.69 msec

The same plan is true for values which have up to about 20,000 rows:

   explain analyze select count(*) from xxx where cid=6223341;
   Aggregate (cost=74384.19..74384.19 rows=1 width=0) (actual time=11614.89..11614.89 rows=1 loops=1)
    -> Index Scan using xxx_cid on emailrcpts (cost=0.00..74329.26 rows=21974 width=0) (actual time=35.75..11582.10 rows=20114 loops=1)
   Total runtime: 11615.05 msec

However for the values that have > 20,000 rows, the plan changes to a sequential scan, which is proportionately much slower.

   explain analyze select count(*) from xxx where cid=7191032;
   Aggregate (cost=97357.61..97357.61 rows=1 width=0) (actual time=46427.81..46427.82 rows=1 loops=1)
    -> Seq Scan on xxx (cost=0.00..97230.62 rows=50792 width=0) (actual time=9104.45..46370.27 rows=37765 loops=1)
    Total runtime: 46428.00 msec

The question: why does the planner consider a sequential scan to be better for these top 10 values? In terms of elapsed time it is more than twice as slow, proportionate to an index scan for the same number of rows.

What I tried:

A) alter table xxx alter column cid set statistics 500;

analyze xxx;

This does not affect the results.

B) dropped/rebuilt the index, with no improvement.

C) decreasing cpu_index_tuple_cost by a factor of up to 1000, with no success

D) force an index scan for the larger values by using a very high value for cpu_tuple_cost (e.g. .5) but this doesn't seem like a wise thing to do.

Your thoughts appreciated in advance!

- Jeremy

7+ years experience in Oracle performance-tuning

relatively new to postgresql

Re: index v. seqscan for certain values

From

Bill Moran

Date:

12 April 2004, 13:09:29

Quick bit of input, since you didn't mention it.

How often do you run ANALYZE?  I found it interesting that a database I
was doing tests on sped up by a factor of 20 after ANALYZE.  If your
data changes a lot, you should probably schedule ANALYZE to run with
VACUUM.

Jeremy Dunn wrote:
> I've searched the archives and can't find an answer to this seemingly
> simple question.  Apologies if it's too common.
>
> The table in question has ~1.3M rows.  It has 85 columns, 5 of which
> have single-column indexes.
>
> The column in question (CID) has 183 distinct values.  For these values,
> the largest has ~38,000 rows, and the smallest has 1 row.  About 30
> values have < 100 rows, and about 10 values have > 20,000 rows.
>
> The database is 7.2.3 running on RedHat 7.1. (we are in process of
> upgrading to PG 7.4.2)    All of the query plan options are enabled, and
> the cpu costs are set to the default values. ( cpu_tuple_cost is 0.01,
> cpu_index_tuple_cost is 0.001).  The database is VACUUM'd every night.
>
> The problem:
> A simply query:
>     select count(*) from xxx where CID=<smalval>
> where <smalval> is a CID value which has relatively few rows, returns a
> plan using the index on that column.
>
>    explain analyze select count(*) from xxx where cid=869366;
>    Aggregate  (cost=19136.33..19136.33 rows=1 width=0) (actual
> time=78.49..78.49 rows=1 loops=1)
>      ->  Index Scan using xxx_cid on emailrcpts  (cost=0.00..19122.21
> rows=5648 width=0) (actual time=63.40..78.46 rows=1 loops=1)
>    Total runtime: 78.69 msec
>
> The same plan is true for values which have up to about 20,000 rows:
>
>    explain analyze select count(*) from xxx where cid=6223341;
>    Aggregate  (cost=74384.19..74384.19 rows=1 width=0) (actual
> time=11614.89..11614.89 rows=1 loops=1)
>      ->  Index Scan using xxx_cid on emailrcpts  (cost=0.00..74329.26
> rows=21974 width=0) (actual time=35.75..11582.10 rows=20114 loops=1)
>    Total runtime: 11615.05 msec
> However for the values that have > 20,000 rows, the plan changes to a
> sequential scan, which is proportionately much slower.
>
>    explain analyze select count(*) from xxx where cid=7191032;
>    Aggregate  (cost=97357.61..97357.61 rows=1 width=0) (actual
> time=46427.81..46427.82 rows=1 loops=1)
>     ->   Seq Scan on xxx (cost=0.00..97230.62 rows=50792 width=0)
> (actual time=9104.45..46370.27 rows=37765 loops=1)
>     Total runtime: 46428.00 msec
>
>
> The question: why does the planner consider a sequential scan to be
> better for these top 10 values?  In terms of elapsed time it is more
> than twice as slow, proportionate to an index scan for the same number
> of rows.
>
> What I tried:
>
> A) alter table xxx alter column cid set statistics 500;
>     analyze xxx;
> This does not affect the results.
>
> B)  dropped/rebuilt the index, with no improvement.
>
> C) decreasing cpu_index_tuple_cost by a factor of up to 1000, with no
> success
>
> D) force an index scan for the larger values by using a very high value
> for cpu_tuple_cost (e.g. .5) but this doesn't seem like a wise thing to do.
>
> Your thoughts appreciated in advance!
>
> - Jeremy
>
> 7+ years experience in Oracle performance-tuning
> relatively new to postgresql


--
Bill Moran
Potential Technologies
http://www.potentialtech.com

Re: index v. seqscan for certain values

From

"Jeremy Dunn"

Date:

12 April 2004, 14:08:11

Sorry I should have written that we do VACUUM VERBOSE ANALYZE every
night.

- Jeremy

-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Bill Moran
Sent: Monday, April 12, 2004 12:09 PM
To: jdunn@autorevenue.com
Cc: Postgresql Performance
Subject: Re: [PERFORM] index v. seqscan for certain values


Quick bit of input, since you didn't mention it.

How often do you run ANALYZE?  I found it interesting that a database I
was doing tests on sped up by a factor of 20 after ANALYZE.  If your
data changes a lot, you should probably schedule ANALYZE to run with
VACUUM.

Jeremy Dunn wrote:
> I've searched the archives and can't find an answer to this seemingly
> simple question.  Apologies if it's too common.
>
> The table in question has ~1.3M rows.  It has 85 columns, 5 of which
> have single-column indexes.
>
> The column in question (CID) has 183 distinct values.  For these
> values,
> the largest has ~38,000 rows, and the smallest has 1 row.  About 30
> values have < 100 rows, and about 10 values have > 20,000 rows.
>
> The database is 7.2.3 running on RedHat 7.1. (we are in process of
> upgrading to PG 7.4.2)    All of the query plan options are enabled,
and
> the cpu costs are set to the default values. ( cpu_tuple_cost is 0.01,
> cpu_index_tuple_cost is 0.001).  The database is VACUUM'd every night.
>
> The problem:
> A simply query:
>     select count(*) from xxx where CID=<smalval>
> where <smalval> is a CID value which has relatively few rows, returns
> a
> plan using the index on that column.
>
>    explain analyze select count(*) from xxx where cid=869366;
>    Aggregate  (cost=19136.33..19136.33 rows=1 width=0) (actual
> time=78.49..78.49 rows=1 loops=1)
>      ->  Index Scan using xxx_cid on emailrcpts  (cost=0.00..19122.21
> rows=5648 width=0) (actual time=63.40..78.46 rows=1 loops=1)
>    Total runtime: 78.69 msec
>
> The same plan is true for values which have up to about 20,000 rows:
>
>    explain analyze select count(*) from xxx where cid=6223341;
>    Aggregate  (cost=74384.19..74384.19 rows=1 width=0) (actual
> time=11614.89..11614.89 rows=1 loops=1)
>      ->  Index Scan using xxx_cid on emailrcpts  (cost=0.00..74329.26
> rows=21974 width=0) (actual time=35.75..11582.10 rows=20114 loops=1)
>    Total runtime: 11615.05 msec
> However for the values that have > 20,000 rows, the plan changes to a
> sequential scan, which is proportionately much slower.
>
>    explain analyze select count(*) from xxx where cid=7191032;
>    Aggregate  (cost=97357.61..97357.61 rows=1 width=0) (actual
> time=46427.81..46427.82 rows=1 loops=1)
>     ->   Seq Scan on xxx (cost=0.00..97230.62 rows=50792 width=0)
> (actual time=9104.45..46370.27 rows=37765 loops=1)
>     Total runtime: 46428.00 msec
>
>
> The question: why does the planner consider a sequential scan to be
> better for these top 10 values?  In terms of elapsed time it is more
> than twice as slow, proportionate to an index scan for the same number

> of rows.
>
> What I tried:
>
> A) alter table xxx alter column cid set statistics 500;
>     analyze xxx;
> This does not affect the results.
>
> B)  dropped/rebuilt the index, with no improvement.
>
> C) decreasing cpu_index_tuple_cost by a factor of up to 1000, with no
> success
>
> D) force an index scan for the larger values by using a very high
> value
> for cpu_tuple_cost (e.g. .5) but this doesn't seem like a wise thing
to do.
>
> Your thoughts appreciated in advance!
>
> - Jeremy
>
> 7+ years experience in Oracle performance-tuning
> relatively new to postgresql


--
Bill Moran
Potential Technologies
http://www.potentialtech.com


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Re: index v. seqscan for certain values

From

Stephan Szabo

Date:

12 April 2004, 14:40:02

On Mon, 12 Apr 2004, Jeremy Dunn wrote:

>    explain analyze select count(*) from xxx where cid=6223341;
>    Aggregate  (cost=74384.19..74384.19 rows=1 width=0) (actual
> time=11614.89..11614.89 rows=1 loops=1)
>      ->  Index Scan using xxx_cid on emailrcpts  (cost=0.00..74329.26
> rows=21974 width=0) (actual time=35.75..11582.10 rows=20114 loops=1)
>    Total runtime: 11615.05 msec
>
> However for the values that have > 20,000 rows, the plan changes to a
> sequential scan, which is proportionately much slower.
>
>    explain analyze select count(*) from xxx where cid=7191032;
>    Aggregate  (cost=97357.61..97357.61 rows=1 width=0) (actual
> time=46427.81..46427.82 rows=1 loops=1)
>     ->   Seq Scan on xxx (cost=0.00..97230.62 rows=50792 width=0)
> (actual time=9104.45..46370.27 rows=37765 loops=1)
>     Total runtime: 46428.00 msec
>
> The question: why does the planner consider a sequential scan to be
> better for these top 10 values?  In terms of elapsed time it is more
> than twice as slow, proportionate to an index scan for the same number
> of rows.

One thing to do is to set enable_seqscan=off and run the above and compare
the estimated and real costs.  It may be possible to lower
random_page_cost to a still reasonable number in order to move the point
of the switchover to seqscan.

Re: index v. seqscan for certain values

From

Tom Lane

Date:

12 April 2004, 14:51:34

"Jeremy Dunn" <jdunn@autorevenue.com> writes:
> The question: why does the planner consider a sequential scan to be
> better for these top 10 values?

At some point a seqscan *will* be better.  In the limit, if the key
being sought is common enough to occur on every page of the table,
it's certain that a seqscan will require less I/O than an indexscan
(because reading the index isn't actually saving you any heap fetches).
In practice the breakeven point is less than that because Unix kernels
are better at handling sequential than random access.

Your gripe appears to be basically that the planner's idea of the
breakeven point is off a bit.  It looks to me like it's within about
a factor of 2 of being right, though, which is not all that bad when
it's using generic cost parameters.

> A) alter table xxx alter column cid set statistics 500;
>     analyze xxx;
> This does not affect the results.

It probably improved the accuracy of the row count estimates, no?
The estimate you show for cid=7191032 is off by more than 25% (37765 vs
50792), which seems like a lot of error for one of the most common
values in the table.  (I hope that was with default stats target and
not 500.)  That leads directly to a 25% overestimate of the cost of
an indexscan, while having IIRC no impact on the cost of a seqscan.
Since the cost ratio was more than 25%, this didn't change the selected
plan, but you want to fix that error as best you can before you move
on to tweaking cost parameters.

> C) decreasing cpu_index_tuple_cost by a factor of up to 1000, with no
> success

Wrong thing.  You should be tweaking random_page_cost.  Looks to me like
a value near 2 might be appropriate for your setup.  Also it is likely
appropriate to increase effective_cache_size, which is awfully small in
the default configuration.  I'd set that to something related to your
available RAM before trying to home in on a suitable random_page_cost.

AFAIK hardly anyone bothers with changing the cpu_xxx costs ...

            regards, tom lane

Re: index v. seqscan for certain values

From

"Jeremy Dunn"

Date:

12 April 2004, 16:05:07

> "Jeremy Dunn" <jdunn@autorevenue.com> writes:
> > The question: why does the planner consider a sequential scan to be
> > better for these top 10 values?
>
> At some point a seqscan *will* be better.  In the limit, if
> the key being sought is common enough to occur on every page
> of the table, it's certain that a seqscan will require less
> I/O than an indexscan (because reading the index isn't
> actually saving you any heap fetches). In practice the
> breakeven point is less than that because Unix kernels are
> better at handling sequential than random access.
>
> Your gripe appears to be basically that the planner's idea of
> the breakeven point is off a bit.  It looks to me like it's
> within about a factor of 2 of being right, though, which is
> not all that bad when it's using generic cost parameters.

Agreed.  However, given that count(*) is a question that can be answered
_solely_ using the index (without reference to the actual data blocks),
I'd expect that the break-even point would be considerably higher than
the < 3% (~38,000 / ~1.3M) I'm currently getting.  Does PG not use
solely the index in this situation??

> > A) alter table xxx alter column cid set statistics 500;
> >     analyze xxx;
> > This does not affect the results.
>
> It probably improved the accuracy of the row count estimates,
> no? The estimate you show for cid=7191032 is off by more than
> 25% (37765 vs 50792), which seems like a lot of error for one
> of the most common values in the table.  (I hope that was
> with default stats target and not 500.)  That leads directly
> to a 25% overestimate of the cost of an indexscan, while
> having IIRC no impact on the cost of a seqscan. Since the
> cost ratio was more than 25%, this didn't change the selected
> plan, but you want to fix that error as best you can before
> you move on to tweaking cost parameters.

Actually it made them worse!  Yes, this was the default statistics (10).
When I just tried it again with a value of 300, analyze, then run the
query, I get a *worse* result for an estimate.  I don't understand this.

   alter table xxx alter column cid set statistics 300;
   analyze emailrcpts;
   set random_page_cost to 2;
   explain analyze select count(*) from xxx where cid=7191032;

   Aggregate  (cost=20563.28..20563.28 rows=1 width=0) (actual
time=7653.90..7653.90 rows=1 loops=1)
  ->  Index Scan using xxx_cid on xxx  (cost=0.00..20535.82 rows=10983
width=0) (actual time=72.24..7602.38 rows=37765 loops=1)
   Total runtime: 7654.14 msec

Now it estimates I have only 10,983 rows (~3x too low) instead of the
old estimate 50,792 (1.3x too high).  Why is that ??

Anyway, a workable solution seems to be using a lower value for
Random_Page_Cost.  Thanks to everyone who replied with this answer.

> Also it is likely appropriate to increase
> effective_cache_size, which is awfully small in the default
> configuration.  I'd set that to something related to your
> available RAM before trying to home in on a suitable random_page_cost.

We have ours set to the default value of 1000, which does seem low for a
system with 1GB of RAM.  We'll up this once we figure out what's
available.  Then tweak the Random_Page_Cost appropriately at that point.

I'd still like to understand the strangeness above, if anyone can shed
light.

- Jeremy

Re: index v. seqscan for certain values

From

Bruno Wolff III

Date:

12 April 2004, 16:52:14

On Mon, Apr 12, 2004 at 15:05:02 -0400,
  Jeremy Dunn <jdunn@autorevenue.com> wrote:
>
> Agreed.  However, given that count(*) is a question that can be answered
> _solely_ using the index (without reference to the actual data blocks),
> I'd expect that the break-even point would be considerably higher than
> the < 3% (~38,000 / ~1.3M) I'm currently getting.  Does PG not use
> solely the index in this situation??

That isn't true. In order to check visibility you need to look at the
data blocks.

Re: index v. seqscan for certain values

From

Tom Lane

Date:

12 April 2004, 18:02:50

"Jeremy Dunn" <jdunn@autorevenue.com> writes:
> Agreed.  However, given that count(*) is a question that can be answered
> _solely_ using the index (without reference to the actual data blocks),

As Bruno noted, that is not the case in Postgres; we must visit the
table rows anyway.

> When I just tried it again with a value of 300, analyze, then run the
> query, I get a *worse* result for an estimate.  I don't understand this.

That's annoying.  How repeatable are these results --- if you do ANALYZE
over again several times, how much does the row count estimate change
each time?  (It should change somewhat, since ANALYZE is taking a random
sample, but one would like to think not a whole lot.)  Is the variance
more or less at the higher stats target?  Take a look at a few different
CID values to get a sense of the accuracy, don't look at just one ...

(Actually, you might find it more profitable to look at the pg_stats
entry for the CID column rather than reverse-engineering the stats via
ANALYZE.  Look at how well the most-common-values list and associated
frequency numbers track reality.)

Also, can you think of any reason for the distribution of CID values
to be nonuniform within the table?  For instance, do rows get inserted
in order of increasing CID, or is there any clustering of rows with the
same CID?

            regards, tom lane

Re: index v. seqscan for certain values

From

"Jeremy Dunn"

Date:

13 April 2004, 11:41:35

> > When I just tried it again with a value of 300, analyze,
> then run the query, I get a *worse* result for an estimate.  I don't
understand
> > this.
>
> That's annoying.  How repeatable are these results --- if you
> do ANALYZE over again several times, how much does the row
> count estimate change each time?  (It should change somewhat,
> since ANALYZE is taking a random sample, but one would like
> to think not a whole lot.)  Is the variance more or less at
> the higher stats target?  Take a look at a few different CID
> values to get a sense of the accuracy, don't look at just one ...

Yes, it's repeatable.  I tried a bunch of times, and there are only
small variations in the stats for the higher stat targets.

> (Actually, you might find it more profitable to look at the
> pg_stats entry for the CID column rather than
> reverse-engineering the stats via ANALYZE.  Look at how well
> the most-common-values list and associated frequency numbers
> track reality.)

I checked the accuracy of the stats for various values, and there is a
wide variation.  I see some values where the estimate is 1.75x the
actual; and others where the estimate is .44x the actual.

> Also, can you think of any reason for the distribution of CID
> values to be nonuniform within the table?  For instance, do
> rows get inserted in order of increasing CID, or is there any
> clustering of rows with the same CID?

This is almost certainly the answer.  The data is initially inserted in
chunks for each CID, and later on there is a more normal distribution of
insert/update/deletes across all CIDs; and then again a new CID will
come with a large chunk of rows, etc.

Interestingly, I tried increasing the stat size for the CID column to
2000, analyzing, and checking the accuracy of the stats again.  Even
with this relatively high value, the accuracy of the stats is not that
close.   The value giving .44x previously nows gives an estimate .77x of
actual.  Another value which was at 1.38x of actual is now at .71x of
actual!

Then just for kicks I set the statistics size to 100,000 (!), analyzed,
and ran the query again.  For the same CID I still got an estimated row
count that is .71x the actual rows returned.  Why is this not better?  I
wonder how high I'd have to set the statistics collector to get really
good data, given the uneven data distribution of this table.  Is there
any other technique that works better to get good estimates, given
uneven distribution of values?

So I think this explains the inaccurate stats; and the solution as far
as I'm concerned is to increase the two params mentioned yesterday
(effective_cache_size & random_page_cost).

Thanks again for the help!
- Jeremy

Re: index v. seqscan for certain values

From

Tom Lane

Date:

13 April 2004, 14:55:53

"Jeremy Dunn" <jdunn@autorevenue.com> writes:
> Interestingly, I tried increasing the stat size for the CID column to
> 2000, analyzing, and checking the accuracy of the stats again.

There's a hard limit of 1000, I believe.  Didn't it give you a warning
saying so?

At 1000 the ANALYZE sample size would be 300000 rows, or about a quarter
of your table.  I would have thought this would give frequency estimates
with much better precision than you seem to be seeing --- but my
statistics are rusty enough that I'm not sure about it.  Possibly the
nonuniform clumping of CID has something to do with the poor results.

Any stats majors on the list?

            regards, tom lane

Re: index v. seqscan for certain values

From

"Jeremy Dunn"

Date:

13 April 2004, 15:04:40

> There's a hard limit of 1000, I believe.  Didn't it give you
> a warning saying so?

No warning at 2000, and no warning at 100,000 either!

Remember we are still on 7.2.x.  The docs here
http://www.postgresql.org/docs/7.2/static/sql-altertable.html don't say
anything about a limit.

This is good to know, if it's true.  Can anyone confirm?

- Jeremy

Re: index v. seqscan for certain values

From

Robert Treat

Date:

13 April 2004, 15:35:29

On Tue, 2004-04-13 at 14:04, Jeremy Dunn wrote:
>
> > There's a hard limit of 1000, I believe.  Didn't it give you
> > a warning saying so?
>
> No warning at 2000, and no warning at 100,000 either!
>
> Remember we are still on 7.2.x.  The docs here
> http://www.postgresql.org/docs/7.2/static/sql-altertable.html don't say
> anything about a limit.
>
> This is good to know, if it's true.  Can anyone confirm?
>

transform=# alter table data_pull alter column msg set statistics
100000;
WARNING:  lowering statistics target to 1000
ERROR:  column "msg" of relation "data_pull" does not exist
transform=# select version();
                            version
----------------------------------------------------------------
 PostgreSQL 7.4beta4 on i686-pc-linux-gnu, compiled by GCC 2.96
(1 row)


Robert Treat
--
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

configure shmmax on MAC OS X

From

Qing Zhao

Date:

13 April 2004, 16:10:53

Hi, all,

I have got a new MaC OS G5 with 8GB RAM. So i tried to increase
the shmmax in Kernel so that I can take advantage of the RAM.

I searched the web and read the manual for PG7.4 chapter 16.5.1.
After that, I edited /etc/rc file:

sysctl -w kern.sysv.shmmax=4294967296 // byte
sysctl -w kern.sysv.shmmin=1
sysctl -w kern.sysv.shmmni=32
sysctl -w kern.sysv.shmseg=8
sysctl -w kern.sysv.shmall=1048576 //4kpage

for 4G shared RAM.

Then I changed postgresql.conf:
shared_buffer=100000 //could be bigger?

and restart the machine and postgres server. To my surprise, postgres
server wouldn't
start, saying that the requested shared memory exceeds kernel's shmmax.

My suspision is that the change i made in /etc/rc does not take
effect.Is there a way
to check it?  Is there an
up limit for how much RAM can be allocated for shared buffer in MAC OS
X? Or
is there something wrong with my calculation in numbers?

Thanks a lot!

Qing

Re: configure shmmax on MAC OS X

From

Jeff Bohmer

Date:

13 April 2004, 16:25:34

On OS X, I've always made these changes in:

/System/Library/StartupItems/SystemTuning/SystemTuning

and manually checked it with sysctl after reboot.  Works for me.

100k buffers is probably overkill.  There can be a performance penalty with too many buffers.  See this lists' archives
formore.  10k would probably be a better start. 

- Jeff


>Hi, all,
>
>I have got a new MaC OS G5 with 8GB RAM. So i tried to increase
>the shmmax in Kernel so that I can take advantage of the RAM.
>
>I searched the web and read the manual for PG7.4 chapter 16.5.1.
>After that, I edited /etc/rc file:
>
>sysctl -w kern.sysv.shmmax=4294967296 // byte
>sysctl -w kern.sysv.shmmin=1
>sysctl -w kern.sysv.shmmni=32
>sysctl -w kern.sysv.shmseg=8
>sysctl -w kern.sysv.shmall=1048576 //4kpage
>
>for 4G shared RAM.
>
>Then I changed postgresql.conf:
>shared_buffer=100000 //could be bigger?
>
>and restart the machine and postgres server. To my surprise, postgres server wouldn't
>start, saying that the requested shared memory exceeds kernel's shmmax.
>
>My suspision is that the change i made in /etc/rc does not take effect.Is there a way
>to check it?  Is there an
>up limit for how much RAM can be allocated for shared buffer in MAC OS X? Or
>is there something wrong with my calculation in numbers?
>
>Thanks a lot!
>
>Qing
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 3: if posting/reading through Usenet, please send an appropriate
>     subscribe-nomail command to majordomo@postgresql.org so that your
>     message can get through to the mailing list cleanly


--

Jeff Bohmer
VisionLink, Inc.
_________________________________
303.402.0170
www.visionlink.org
_________________________________
People. Tools. Change. Community.

Re: configure shmmax on MAC OS X

From

Tom Lane

Date:

13 April 2004, 16:55:29

Qing Zhao <qzhao@quotefx.net> writes:
> My suspision is that the change i made in /etc/rc does not take
> effect.Is there a way to check it?

sysctl has an option to show the values currently in effect.

I believe that /etc/rc is the correct place to set shmmax on OSX 10.3 or
later ... but we have seen prior reports of people having trouble
getting the setting to "take".  There may be some other constraint
involved.

> sysctl -w kern.sysv.shmmax=4294967296 // byte

Hmm, does sysctl work for values that exceed the range of int?

There's no particularly good reason to try to set shmmax as high as you
are trying anyhow; you really don't need more than a couple hundred meg
in Postgres shared memory.  It's better to leave the kernel to manage
the bulk of your RAM.

            regards, tom lane

Re: configure shmmax on MAC OS X

From

Qing Zhao

Date:

13 April 2004, 17:10:49

Tom:

I used sysctl -A to see the kernel state, I got:
kern.sysv.shmmax: -1

It looks the value is too big!

Thanks!

Qing
On Apr 13, 2004, at 12:55 PM, Tom Lane wrote:

> Qing Zhao <qzhao@quotefx.net> writes:
>> My suspision is that the change i made in /etc/rc does not take
>> effect.Is there a way to check it?
>
> sysctl has an option to show the values currently in effect.
>
> I believe that /etc/rc is the correct place to set shmmax on OSX 10.3
> or
> later ... but we have seen prior reports of people having trouble
> getting the setting to "take".  There may be some other constraint
> involved.
>
>> sysctl -w kern.sysv.shmmax=4294967296 // byte
>
> Hmm, does sysctl work for values that exceed the range of int?
>
> There's no particularly good reason to try to set shmmax as high as you
> are trying anyhow; you really don't need more than a couple hundred meg
> in Postgres shared memory.  It's better to leave the kernel to manage
> the bulk of your RAM.
>
>             regards, tom lane
>

Re: configure shmmax on MAC OS X

From

"Joshua D. Drake"

Date:

13 April 2004, 17:59:31

Hello,

I found that if you SHMALL value was less than your SHMMAX value,
the value wouldn't take.

J


Tom Lane wrote:

> Qing Zhao <qzhao@quotefx.net> writes:
>
>>My suspision is that the change i made in /etc/rc does not take
>>effect.Is there a way to check it?
>
>
> sysctl has an option to show the values currently in effect.
>
> I believe that /etc/rc is the correct place to set shmmax on OSX 10.3 or
> later ... but we have seen prior reports of people having trouble
> getting the setting to "take".  There may be some other constraint
> involved.
>
>
>>sysctl -w kern.sysv.shmmax=4294967296 // byte
>
>
> Hmm, does sysctl work for values that exceed the range of int?
>
> There's no particularly good reason to try to set shmmax as high as you
> are trying anyhow; you really don't need more than a couple hundred meg
> in Postgres shared memory.  It's better to leave the kernel to manage
> the bulk of your RAM.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

Re: index v. seqscan for certain values

From

Manfred Koizar

Date:

15 April 2004, 20:02:06

On Tue, 13 Apr 2004 13:55:49 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>Possibly the
>nonuniform clumping of CID has something to do with the poor results.

It shouldn't.  The sampling algorithm is designed to give each tuple the
same chance of ending up in the sample, and tuples are selected
independently.  (IOW each one of the {N \chooose n} possible samples has
the same probability.)   There are known problems with nonuniform
distribution of dead vs. live and large vs. small tuples, but AFAICS the
order of values does not matter.

Servus
 Manfred