Thread: Out of Memory - 8.2.4

Out of Memory - 8.2.4

From
Jeff Amiel
Date:
"PostgreSQL 8.2.4 on i386-pc-solaris2.10, compiled by
GCC gcc (GCC) 3.4.3 (csl-sol210-3_4-branch+sol_rpath)"


Week-old install....still tuning and tweaking this
thing.

Over last 2 days, have spotted 10 "Out of Memory"
errors in postgres logs (never saw before with same
app/usage patterns on tuned hardware/postgres under
FreeBSD)

Aug 22 18:08:24 db-1 postgres[16452]: [ID 748848
local0.warning] [6-1] 2007-08-22 18:08:24 CDT   ERROR:
 out of memory.
Aug 22 18:08:24 db-1 postgres[16452]: [ID 748848
local0.warning] [6-2] 2007-08-22 18:08:24 CDT
DETAIL:  Failed on request of size 536870910.

What I found interesting is that It's ALWAYS the same
size....536870910

I am running autovacuum and slony.....but I see
nothing in the logs anywhere near the "out of memory"
errors related to either (autovacuum used to under
8.0.X log INFO messages every time it vacuumed which
came in handy...I assume it doesn't so this any more?)


The events are fairly spread out...and cannot (by
looking at app logs and rest of DB logs) correlate to
any specific query or activity.

Any help would be appreciated

Box is a Sun X4600 with 8 dual-core processors and 32
gig of ram.

# su - pgsql
Sun Microsystems Inc.   SunOS 5.10      Generic
January 2005
-bash-3.00$ ulimit -a
core file size        (blocks, -c) unlimited
data seg size         (kbytes, -d) unlimited
file size             (blocks, -f) unlimited
open files                    (-n) 256
pipe size          (512 bytes, -p) 10
stack size            (kbytes, -s) 10240
cpu time             (seconds, -t) unlimited
max user processes            (-u) 16357
virtual memory        (kbytes, -v) unlimited

shared_buffers = 3GB                    # min 128kB or
max_connections*16kB
temp_buffers = 1000                     # min 800kB
was 8MB
max_prepared_transactions = 450         # can be 0 or
more
work_mem = 100MB                                # min
64kB
maintenance_work_mem = 512MB            # min 1MB
#max_stack_depth = 2MB                  # min 100kB
max_fsm_pages = 208000          # min
max_fsm_relations*16, 6 bytes each
max_fsm_relations = 10000               # min 100, ~70
bytes each
#max_files_per_process = 1000           # min 25
#shared_preload_libraries = ''          # (change
requires restart)
fsync = on                              # turns forced
synchronization on or off
wal_sync_method = fdatasync             # the default
is the first option
full_page_writes = off                  # recover from
partial page writes
wal_buffers = 2300                      # min 32kB
commit_delay = 10                       # range
0-100000, in microseconds
#commit_siblings = 5                    # range 1-1000
checkpoint_segments = 128               # in logfile
segments, min 1, 16MB each
checkpoint_timeout = 5min               # range 30s-1h
checkpoint_warning = 99s                # 0 is off





____________________________________________________________________________________
Be a better Globetrotter. Get better travel answers from someone who knows. Yahoo! Answers - Check it out.
http://answers.yahoo.com/dir/?link=list&sid=396545469

Re: Out of Memory - 8.2.4

From
Erik Jones
Date:
On Aug 24, 2007, at 10:09 AM, Jeff Amiel wrote:

> "PostgreSQL 8.2.4 on i386-pc-solaris2.10, compiled by
> GCC gcc (GCC) 3.4.3 (csl-sol210-3_4-branch+sol_rpath)"
>
>
> Week-old install....still tuning and tweaking this
> thing.
>
> Over last 2 days, have spotted 10 "Out of Memory"
> errors in postgres logs (never saw before with same
> app/usage patterns on tuned hardware/postgres under
> FreeBSD)
>
> Aug 22 18:08:24 db-1 postgres[16452]: [ID 748848
> local0.warning] [6-1] 2007-08-22 18:08:24 CDT   ERROR:
>  out of memory.
> Aug 22 18:08:24 db-1 postgres[16452]: [ID 748848
> local0.warning] [6-2] 2007-08-22 18:08:24 CDT
> DETAIL:  Failed on request of size 536870910.
>
> What I found interesting is that It's ALWAYS the same
> size....536870910
>
> I am running autovacuum and slony.....but I see
> nothing in the logs anywhere near the "out of memory"
> errors related to either (autovacuum used to under
> 8.0.X log INFO messages every time it vacuumed which
> came in handy...I assume it doesn't so this any more?)
>
>
> The events are fairly spread out...and cannot (by
> looking at app logs and rest of DB logs) correlate to
> any specific query or activity.
>
> Any help would be appreciated
>
> Box is a Sun X4600 with 8 dual-core processors and 32
> gig of ram.
>
> # su - pgsql
> Sun Microsystems Inc.   SunOS 5.10      Generic
> January 2005
> -bash-3.00$ ulimit -a
> core file size        (blocks, -c) unlimited
> data seg size         (kbytes, -d) unlimited
> file size             (blocks, -f) unlimited
> open files                    (-n) 256
> pipe size          (512 bytes, -p) 10
> stack size            (kbytes, -s) 10240
> cpu time             (seconds, -t) unlimited
> max user processes            (-u) 16357
> virtual memory        (kbytes, -v) unlimited
>
> shared_buffers = 3GB                    # min 128kB or
> max_connections*16kB
> temp_buffers = 1000                     # min 800kB
> was 8MB
> max_prepared_transactions = 450         # can be 0 or
> more
> work_mem = 100MB                                # min
> 64kB
> maintenance_work_mem = 512MB            # min 1MB
> #max_stack_depth = 2MB                  # min 100kB
> max_fsm_pages = 208000          # min
> max_fsm_relations*16, 6 bytes each
> max_fsm_relations = 10000               # min 100, ~70
> bytes each
> #max_files_per_process = 1000           # min 25
> #shared_preload_libraries = ''          # (change
> requires restart)
> fsync = on                              # turns forced
> synchronization on or off
> wal_sync_method = fdatasync             # the default
> is the first option
> full_page_writes = off                  # recover from
> partial page writes
> wal_buffers = 2300                      # min 32kB
> commit_delay = 10                       # range
> 0-100000, in microseconds
> #commit_siblings = 5                    # range 1-1000
> checkpoint_segments = 128               # in logfile
> segments, min 1, 16MB each
> checkpoint_timeout = 5min               # range 30s-1h
> checkpoint_warning = 99s                # 0 is off

A few weeks ago I got the same error on the same server.  In fact,
the only difference is our memory where you have 32GB and I have 16GB
and you have 512MB maintenance_work_mem and I have 256MB.  I point
out the maintenance work memory setting as that is pretty much
exactly what the request size that your error pointed out as was mine
(yours/2).  However, that was the only time I've seen this.  Below is
the full context of the error report in my log.  I see that there is
an Autovacuum context as well as references to a toast table so,
something to do with autovacuum?

TopMemoryContext: 14466424 total in 1758 blocks; 7160792 free (12578
chunks); 7305632 used
TopTransactionContext: 8192 total in 1 blocks; 7688 free (10 chunks);
504 used
Type information cache: 8192 total in 1 blocks; 1800 free (0 chunks);
6392 used
Operator class cache: 8192 total in 1 blocks; 4872 free (0 chunks);
3320 used
Autovacuum context: 16769024 total in 11 blocks; 6959320 free (11
chunks); 9809704 used
smgr relation table: 24576 total in 2 blocks; 11952 free (4 chunks);
12624 used
TransactionAbortContext: 32768 total in 1 blocks; 32752 free (0
chunks); 16 used
Portal hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used
PortalMemory: 0 total in 0 blocks; 0 free (0 chunks); 0 used
Relcache by OID: 8192 total in 1 blocks; 256 free (0 chunks); 7936 used
CacheMemoryContext: 1183288 total in 20 blocks; 889824 free (4094
chunks); 293464 used
pg_toast_356294_index: 1024 total in 1 blocks; 328 free (0 chunks);
696 used
pg_attrdef_adrelid_adnum_index: 1024 total in 1 blocks; 288 free (0
chunks); 736 used
pg_index_indrelid_index: 1024 total in 1 blocks; 352 free (0 chunks);
672 used
pg_namespace_oid_index: 1024 total in 1 blocks; 352 free (0 chunks);
672 used
pg_authid_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672
used
pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 288 free (0
chunks); 736 used
pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_operator_oid_index: 1024 total in 1 blocks; 352 free (0 chunks);
672 used
pg_index_indexrelid_index: 1024 total in 1 blocks; 352 free (0
chunks); 672 used
pg_class_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672
used
pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 288 free (0
chunks); 736 used
pg_amproc_opc_proc_index: 1024 total in 1 blocks; 216 free (0
chunks); 808 used
pg_amop_opc_strat_index: 1024 total in 1 blocks; 216 free (0 chunks);
808 used
Per-database table: 57344 total in 3 blocks; 37592 free (13 chunks);
19752 used
Per-database table: 57344 total in 3 blocks; 37592 free (13 chunks);
19752 used
Per-database table: 57344 total in 3 blocks; 37592 free (13 chunks);
19752 used
Per-database table: 57344 total in 3 blocks; 37592 free (13 chunks);
19752 used
Per-database table: 24576 total in 2 blocks; 13040 free (5 chunks);
11536 used
Per-database table: 100671512 total in 22 blocks; 5803552 free (91
chunks); 94867960 used
Databases hash: 8192 total in 1 blocks; 4936 free (0 chunks); 3256 used
MdSmgr: 8192 total in 1 blocks; 7936 free (226 chunks); 256 used
LOCALLOCK hash: 24576 total in 2 blocks; 10000 free (4 chunks); 14576
used
Timezones: 48616 total in 2 blocks; 5968 free (0 chunks); 42648 used
Postmaster: 24576 total in 2 blocks; 20264 free (155 chunks); 4312 used
ErrorContext: 8192 total in 1 blocks; 8176 free (4 chunks); 16 used
2007-08-08 20:21:05 CDT 3716 :ERROR:  out of memory
2007-08-08 20:21:05 CDT 3716 :DETAIL:  Failed on request of size
268435452.


Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



Re: Out of Memory - 8.2.4

From
Tom Lane
Date:
Jeff Amiel <becauseimjeff@yahoo.com> writes:
> Aug 22 18:08:24 db-1 postgres[16452]: [ID 748848
> local0.warning] [6-1] 2007-08-22 18:08:24 CDT   ERROR:
>  out of memory.
> Aug 22 18:08:24 db-1 postgres[16452]: [ID 748848
> local0.warning] [6-2] 2007-08-22 18:08:24 CDT
> DETAIL:  Failed on request of size 536870910.

> What I found interesting is that It's ALWAYS the same
> size....536870910

> maintenance_work_mem = 512MB            # min 1MB

Apparently this maintenance_work_mem setting is higher than your system
can reliably provide.  Knock it back a bit.

            regards, tom lane

Re: Out of Memory - 8.2.4

From
Erik Jones
Date:
On Aug 24, 2007, at 11:46 AM, Tom Lane wrote:

> Jeff Amiel <becauseimjeff@yahoo.com> writes:
>> Aug 22 18:08:24 db-1 postgres[16452]: [ID 748848
>> local0.warning] [6-1] 2007-08-22 18:08:24 CDT   ERROR:
>>  out of memory.
>> Aug 22 18:08:24 db-1 postgres[16452]: [ID 748848
>> local0.warning] [6-2] 2007-08-22 18:08:24 CDT
>> DETAIL:  Failed on request of size 536870910.
>
>> What I found interesting is that It's ALWAYS the same
>> size....536870910
>
>> maintenance_work_mem = 512MB            # min 1MB
>
> Apparently this maintenance_work_mem setting is higher than your
> system
> can reliably provide.  Knock it back a bit.
>
>             regards, tom lane

I'm not so sure.  In my case, the request size was only 256MB and we
maintain about 10 - 11 GB free of our 16 GB of memory (2GB
shared_buffers, 42MB work_mem, and 256 MB maintenance_work_mem).  The
toast table that was involved in the error was pretty small and I was
able to successfully vacuum it myself virtually instantly.  However,
in my case, this (so far) being a one time error I don't have much
more data to contribute.  We constantly monitor and graph our
system's I/O, CPU, and memory usage and scan our logs for errors so
if anything else comes up I'll be sure to share.


Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



Re: Out of Memory - 8.2.4

From
"Joshua D. Drake"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Erik Jones wrote:
> On Aug 24, 2007, at 11:46 AM, Tom Lane wrote:
>
>> Jeff Amiel <becauseimjeff@yahoo.com> writes:
>>> Aug 22 18:08:24 db-1 postgres[16452]: [ID 748848
>>> local0.warning] [6-1] 2007-08-22 18:08:24 CDT   ERROR:
>>>  out of memory.
>>> Aug 22 18:08:24 db-1 postgres[16452]: [ID 748848
>>> local0.warning] [6-2] 2007-08-22 18:08:24 CDT
>>> DETAIL:  Failed on request of size 536870910.
>>
>>> What I found interesting is that It's ALWAYS the same
>>> size....536870910
>>
>>> maintenance_work_mem = 512MB            # min 1MB
>>
>> Apparently this maintenance_work_mem setting is higher than your system
>> can reliably provide.  Knock it back a bit.
>>
>>             regards, tom lane
>
> I'm not so sure.  In my case, the request size was only 256MB and we
> maintain about 10 - 11 GB free of our 16 GB of memory (2GB
> shared_buffers, 42MB work_mem, and 256 MB maintenance_work_mem).  The
> toast table that was involved in the error was pretty small and I was
> able to successfully vacuum it myself virtually instantly.  However, in
> my case, this (so far) being a one time error I don't have much more
> data to contribute.  We constantly monitor and graph our system's I/O,
> CPU, and memory usage and scan our logs for errors so if anything else
> comes up I'll be sure to share.

We are actually diagnosing a similar problem on this end, where we get a
failure at 1920... I am currently trying to get some DEBUG output.

Sincerely,

Joshua D. Drake


>
>
> Erik Jones
>
> Software Developer | Emma®
> erik@myemma.com
> 800.595.4401 or 615.292.5888
> 615.292.0777 (fax)
>
> Emma helps organizations everywhere communicate & market in style.
> Visit us online at http://www.myemma.com
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>               http://archives.postgresql.org/
>


- --

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
            UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGzx/gATb/zqfZUUQRAg10AJ9bmIUZ8V99vVCDZfWH05PWckf49QCfa4ta
G1daeagQY2CMUR1QDMtuXTQ=
=HxG3
-----END PGP SIGNATURE-----

Re: Out of Memory - 8.2.4

From
Jeff Amiel
Date:
--- "Joshua D. Drake" <jd@commandprompt.com> wrote:


> We are actually diagnosing a similar problem on this
> end, where we get a
> failure at 1920... I am currently trying to get some
> DEBUG output.

We are actually getting it semi-regularly today (3
times already)....I would be happy to provide some
more info if somebody  guides me (just set
log_min_messages to one of the debug settings?)





      ____________________________________________________________________________________
Park yourself in front of a world of choices in alternative vehicles. Visit the Yahoo! Auto Green Center.
http://autos.yahoo.com/green_center/

Re: Out of Memory - 8.2.4

From
"Joshua D. Drake"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jeff Amiel wrote:
> --- "Joshua D. Drake" <jd@commandprompt.com> wrote:
>
>
>> We are actually diagnosing a similar problem on this
>> end, where we get a
>> failure at 1920... I am currently trying to get some
>> DEBUG output.
>
> We are actually getting it semi-regularly today (3
> times already)....I would be happy to provide some
> more info if somebody  guides me (just set
> log_min_messages to one of the debug settings?)

Having log_line_prefix with at least %p and %m (or %t) plus a
log_min_messages of DEBUG2 would be great.

Joshua D. Drake

>
>
>
>
>
>       ____________________________________________________________________________________
> Park yourself in front of a world of choices in alternative vehicles. Visit the Yahoo! Auto Green Center.
> http://autos.yahoo.com/green_center/
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq
>


- --

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
            UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGzySaATb/zqfZUUQRAqxJAJwL8VcEjDJ1dwQYuvEPh4pORCRUQQCeIwAO
ajfjr7m1bTy9r5DFuNmUP6Y=
=zq4y
-----END PGP SIGNATURE-----

Re: Out of Memory - 8.2.4

From
Jeff Amiel
Date:
--- "Joshua D. Drake" <jd@commandprompt.com> wrote:


> Having log_line_prefix with at least %p and %m (or
> %t) plus a
> log_min_messages of DEBUG2 would be great.

i am getting the additional timestampt/pid on my log
lines now....but no additional debug output...
is log_min_messages one of them that requires a
restart?




____________________________________________________________________________________
Be a better Heartthrob. Get better relationship answers from someone who knows. Yahoo! Answers - Check it out.
http://answers.yahoo.com/dir/?link=list&sid=396545433

Re: Out of Memory - 8.2.4

From
Tom Lane
Date:
Jeff Amiel <becauseimjeff@yahoo.com> writes:
> is log_min_messages one of them that requires a
> restart?

No, SIGHUP (pg_ctl reload) should be sufficient.

            regards, tom lane

Re: Out of Memory - 8.2.4

From
Jeff Amiel
Date:
--- Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Jeff Amiel <becauseimjeff@yahoo.com> writes:
> > is log_min_messages one of them that requires a
> > restart?
>
> No, SIGHUP (pg_ctl reload) should be sufficient.

Weird....
looks like some items are going to syslog and some to
my defined postgres logfile (from -L option).
Bizarre.
Anyway....I hope this helps someone.....

At 10:46, I find this in my syslog files..

Aug 27 10:46:01 db-1 postgres[27008]: [ID 748848
local0.warning] [85355-1] 2007-08-27 10:46:01.888 CDT
  27008ERROR:  out of memory
Aug 27 10:46:01 db-1 postgres[27008]: [ID 748848
local0.warning] [85355-2] 2007-08-27 10:46:01.888 CDT
  27008DETAIL:  Failed on request of size 536870910.

and at the same time in my postgres logfile I see this
(and only this)

TopMemoryContext: 169608 total in 10 blocks; 18832
free (34 chunks); 150776 used
TopTransactionContext: 8192 total in 1 blocks; 7648
free (9 chunks); 544 used
CFuncHash: 8192 total in 1 blocks; 4936 free (0
chunks); 3256 used
Type information cache: 8192 total in 1 blocks; 1800
free (0 chunks); 6392 used
Operator class cache: 8192 total in 1 blocks; 4872
free (0 chunks); 3320 used
Autovacuum context: 8192 total in 1 blocks; 5416 free
(8 chunks); 2776 used
smgr relation table: 8192 total in 1 blocks; 2808 free
(0 chunks); 5384 used
TransactionAbortContext: 32768 total in 1 blocks;
32752 free (0 chunks); 16 used
Portal hash: 8192 total in 1 blocks; 3912 free (0
chunks); 4280 used
PortalMemory: 0 total in 0 blocks; 0 free (0 chunks);
0 used
Relcache by OID: 8192 total in 1 blocks; 3376 free (0
chunks); 4816 used
CacheMemoryContext: 659000 total in 19 blocks; 264904
free (15 chunks); 394096 used
sl_seqlog_idx: 1024 total in 1 blocks; 256 free (0
chunks); 768 used
PartInd_istream_replication_cluster_sl_log_1-node-1:
1024 total in 1 blocks; 392 free (0 chunks); 632 used
sl_log_1_idx1: 1024 total in 1 blocks; 256 free (0
chunks); 768 used
pg_index_indrelid_index: 1024 total in 1 blocks; 352
free (0 chunks); 672 used
pg_attrdef_adrelid_adnum_index: 1024 total in 1
blocks; 288 free (0 chunks); 736 used
pg_autovacuum_vacrelid_index: 1024 total in 1 blocks;
392 free (0 chunks); 632 used
pg_type_typname_nsp_index: 1024 total in 1 blocks; 328
free (0 chunks); 696 used
pg_type_oid_index: 1024 total in 1 blocks; 352 free (0
chunks); 672 used
pg_trigger_tgrelid_tgname_index: 1024 total in 1
blocks; 288 free (0 chunks); 736 used
pg_statistic_relid_att_index: 1024 total in 1 blocks;
288 free (0 chunks); 736 used
pg_auth_members_member_role_index: 1024 total in 1
blocks; 328 free (0 chunks); 696 used
pg_auth_members_role_member_index: 1024 total in 1
blocks; 328 free (0 chunks); 696 used
pg_rewrite_rel_rulename_index: 1024 total in 1 blocks;
328 free (0 chunks); 696 used
pg_proc_proname_args_nsp_index: 1024 total in 1
blocks; 256 free (0 chunks); 768 used
pg_proc_oid_index: 1024 total in 1 blocks; 352 free (0
chunks); 672 used
pg_operator_oprname_l_r_n_index: 1024 total in 1
blocks; 192 free (0 chunks); 832 used
pg_operator_oid_index: 1024 total in 1 blocks; 352
free (0 chunks); 672 used
pg_opclass_oid_index: 1024 total in 1 blocks; 352 free
(0 chunks); 672 used
pg_opclass_am_name_nsp_index: 1024 total in 1 blocks;
216 free (0 chunks); 808 used
pg_namespace_oid_index: 1024 total in 1 blocks; 352
free (0 chunks); 672 used
pg_namespace_nspname_index: 1024 total in 1 blocks;
392 free (0 chunks); 632 used
pg_language_oid_index: 1024 total in 1 blocks; 392
free (0 chunks); 632 used
pg_language_name_index: 1024 total in 1 blocks; 392
free (0 chunks); 632 used
pg_inherits_relid_seqno_index: 1024 total in 1 blocks;
328 free (0 chunks); 696 used
pg_index_indexrelid_index: 1024 total in 1 blocks; 352
free (0 chunks); 672 used
pg_authid_oid_index: 1024 total in 1 blocks; 352 free
(0 chunks); 672 used
pg_authid_rolname_index: 1024 total in 1 blocks; 392
free (0 chunks); 632 used
pg_database_oid_index: 1024 total in 1 blocks; 352
free (0 chunks); 672 used
pg_conversion_oid_index: 1024 total in 1 blocks; 392
free (0 chunks); 632 used
pg_conversion_name_nsp_index: 1024 total in 1 blocks;
328 free (0 chunks); 696 used
pg_conversion_default_index: 1024 total in 1 blocks;
192 free (0 chunks); 832 used
pg_class_relname_nsp_index: 1024 total in 1 blocks;
328 free (0 chunks); 696 used
pg_class_oid_index: 1024 total in 1 blocks; 352 free
(0 chunks); 672 used
pg_cast_source_target_index: 1024 total in 1 blocks;
288 free (0 chunks); 736 used
pg_attribute_relid_attnum_index: 1024 total in 1
blocks; 288 free (0 chunks); 736 used
pg_attribute_relid_attnam_index: 1024 total in 1
blocks; 328 free (0 chunks); 696 used
pg_amproc_opc_proc_index: 1024 total in 1 blocks; 216
free (0 chunks); 808 used
pg_amop_opr_opc_index: 1024 total in 1 blocks; 288
free (0 chunks); 736 used
pg_amop_opc_strat_index: 1024 total in 1 blocks; 216
free (0 chunks); 808 used
pg_aggregate_fnoid_index: 1024 total in 1 blocks; 392
free (0 chunks); 632 used
Per-database table: 122880 total in 4 blocks; 44680
free (19 chunks); 78200 used
Per-database table: 24576 total in 2 blocks; 13040
free (5 chunks); 11536 used
Per-database table: 24576 total in 2 blocks; 13040
free (5 chunks); 11536 used
Per-database table: 24576 total in 2 blocks; 13040
free (5 chunks); 11536 used
Databases hash: 8192 total in 1 blocks; 4936 free (0
chunks); 3256 used
MdSmgr: 8192 total in 1 blocks; 8056 free (1 chunks);
136 used
LOCALLOCK hash: 8192 total in 1 blocks; 3912 free (0
chunks); 4280 used
Timezones: 48616 total in 2 blocks; 5968 free (0
chunks); 42648 used
Postmaster: 24576 total in 2 blocks; 13576 free (123
chunks); 11000 used
ErrorContext: 8192 total in 1 blocks; 8176 free (11
chunks); 16 used




      ____________________________________________________________________________________
Park yourself in front of a world of choices in alternative vehicles. Visit the Yahoo! Auto Green Center.
http://autos.yahoo.com/green_center/

Re: Out of Memory - 8.2.4

From
Jeff Amiel
Date:
--- Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Jeff Amiel <becauseimjeff@yahoo.com> writes:
> > is log_min_messages one of them that requires a
> > restart?
>
> No, SIGHUP (pg_ctl reload) should be sufficient.

Weird....
looks like some items are going to syslog and some to
my defined postgres logfile (from -L option).
Bizarre.
Anyway....I hope this helps someone.....

At 10:46, I find this in my syslog files..

Aug 27 10:46:01 db-1 postgres[27008]: [ID 748848
local0.warning] [85355-1] 2007-08-27 10:46:01.888 CDT
  27008ERROR:  out of memory
Aug 27 10:46:01 db-1 postgres[27008]: [ID 748848
local0.warning] [85355-2] 2007-08-27 10:46:01.888 CDT
  27008DETAIL:  Failed on request of size 536870910.

and at the same time in my postgres logfile I see this
(and only this)

TopMemoryContext: 169608 total in 10 blocks; 18832
free (34 chunks); 150776 used
TopTransactionContext: 8192 total in 1 blocks; 7648
free (9 chunks); 544 used
CFuncHash: 8192 total in 1 blocks; 4936 free (0
chunks); 3256 used
Type information cache: 8192 total in 1 blocks; 1800
free (0 chunks); 6392 used
Operator class cache: 8192 total in 1 blocks; 4872
free (0 chunks); 3320 used
Autovacuum context: 8192 total in 1 blocks; 5416 free
(8 chunks); 2776 used
smgr relation table: 8192 total in 1 blocks; 2808 free
(0 chunks); 5384 used
TransactionAbortContext: 32768 total in 1 blocks;
32752 free (0 chunks); 16 used
Portal hash: 8192 total in 1 blocks; 3912 free (0
chunks); 4280 used
PortalMemory: 0 total in 0 blocks; 0 free (0 chunks);
0 used
Relcache by OID: 8192 total in 1 blocks; 3376 free (0
chunks); 4816 used
CacheMemoryContext: 659000 total in 19 blocks; 264904
free (15 chunks); 394096 used
sl_seqlog_idx: 1024 total in 1 blocks; 256 free (0
chunks); 768 used
PartInd_istream_replication_cluster_sl_log_1-node-1:
1024 total in 1 blocks; 392 free (0 chunks); 632 used
sl_log_1_idx1: 1024 total in 1 blocks; 256 free (0
chunks); 768 used
pg_index_indrelid_index: 1024 total in 1 blocks; 352
free (0 chunks); 672 used
pg_attrdef_adrelid_adnum_index: 1024 total in 1
blocks; 288 free (0 chunks); 736 used
pg_autovacuum_vacrelid_index: 1024 total in 1 blocks;
392 free (0 chunks); 632 used
pg_type_typname_nsp_index: 1024 total in 1 blocks; 328
free (0 chunks); 696 used
pg_type_oid_index: 1024 total in 1 blocks; 352 free (0
chunks); 672 used
pg_trigger_tgrelid_tgname_index: 1024 total in 1
blocks; 288 free (0 chunks); 736 used
pg_statistic_relid_att_index: 1024 total in 1 blocks;
288 free (0 chunks); 736 used
pg_auth_members_member_role_index: 1024 total in 1
blocks; 328 free (0 chunks); 696 used
pg_auth_members_role_member_index: 1024 total in 1
blocks; 328 free (0 chunks); 696 used
pg_rewrite_rel_rulename_index: 1024 total in 1 blocks;
328 free (0 chunks); 696 used
pg_proc_proname_args_nsp_index: 1024 total in 1
blocks; 256 free (0 chunks); 768 used
pg_proc_oid_index: 1024 total in 1 blocks; 352 free (0
chunks); 672 used
pg_operator_oprname_l_r_n_index: 1024 total in 1
blocks; 192 free (0 chunks); 832 used
pg_operator_oid_index: 1024 total in 1 blocks; 352
free (0 chunks); 672 used
pg_opclass_oid_index: 1024 total in 1 blocks; 352 free
(0 chunks); 672 used
pg_opclass_am_name_nsp_index: 1024 total in 1 blocks;
216 free (0 chunks); 808 used
pg_namespace_oid_index: 1024 total in 1 blocks; 352
free (0 chunks); 672 used
pg_namespace_nspname_index: 1024 total in 1 blocks;
392 free (0 chunks); 632 used
pg_language_oid_index: 1024 total in 1 blocks; 392
free (0 chunks); 632 used
pg_language_name_index: 1024 total in 1 blocks; 392
free (0 chunks); 632 used
pg_inherits_relid_seqno_index: 1024 total in 1 blocks;
328 free (0 chunks); 696 used
pg_index_indexrelid_index: 1024 total in 1 blocks; 352
free (0 chunks); 672 used
pg_authid_oid_index: 1024 total in 1 blocks; 352 free
(0 chunks); 672 used
pg_authid_rolname_index: 1024 total in 1 blocks; 392
free (0 chunks); 632 used
pg_database_oid_index: 1024 total in 1 blocks; 352
free (0 chunks); 672 used
pg_conversion_oid_index: 1024 total in 1 blocks; 392
free (0 chunks); 632 used
pg_conversion_name_nsp_index: 1024 total in 1 blocks;
328 free (0 chunks); 696 used
pg_conversion_default_index: 1024 total in 1 blocks;
192 free (0 chunks); 832 used
pg_class_relname_nsp_index: 1024 total in 1 blocks;
328 free (0 chunks); 696 used
pg_class_oid_index: 1024 total in 1 blocks; 352 free
(0 chunks); 672 used
pg_cast_source_target_index: 1024 total in 1 blocks;
288 free (0 chunks); 736 used
pg_attribute_relid_attnum_index: 1024 total in 1
blocks; 288 free (0 chunks); 736 used
pg_attribute_relid_attnam_index: 1024 total in 1
blocks; 328 free (0 chunks); 696 used
pg_amproc_opc_proc_index: 1024 total in 1 blocks; 216
free (0 chunks); 808 used
pg_amop_opr_opc_index: 1024 total in 1 blocks; 288
free (0 chunks); 736 used
pg_amop_opc_strat_index: 1024 total in 1 blocks; 216
free (0 chunks); 808 used
pg_aggregate_fnoid_index: 1024 total in 1 blocks; 392
free (0 chunks); 632 used
Per-database table: 122880 total in 4 blocks; 44680
free (19 chunks); 78200 used
Per-database table: 24576 total in 2 blocks; 13040
free (5 chunks); 11536 used
Per-database table: 24576 total in 2 blocks; 13040
free (5 chunks); 11536 used
Per-database table: 24576 total in 2 blocks; 13040
free (5 chunks); 11536 used
Databases hash: 8192 total in 1 blocks; 4936 free (0
chunks); 3256 used
MdSmgr: 8192 total in 1 blocks; 8056 free (1 chunks);
136 used
LOCALLOCK hash: 8192 total in 1 blocks; 3912 free (0
chunks); 4280 used
Timezones: 48616 total in 2 blocks; 5968 free (0
chunks); 42648 used
Postmaster: 24576 total in 2 blocks; 13576 free (123
chunks); 11000 used
ErrorContext: 8192 total in 1 blocks; 8176 free (11
chunks); 16 used





____________________________________________________________________________________
Yahoo! oneSearch: Finally, mobile search
that gives answers, not web links.
http://mobile.yahoo.com/mobileweb/onesearch?refer=1ONXIC

Re: Out of Memory - 8.2.4

From
Jeff Amiel
Date:
--- "Joshua D. Drake" <jd@commandprompt.com> wrote:

> We are actually diagnosing a similar problem on this
> end, where we get a
> failure at 1920... I am currently trying to get some
> DEBUG output.

Tracking for last few days.
Does not appear to happen when little or no user
activity (like Saturday)  I don't know if that rules
out autovacuum or not (if no update threshholds are
reached, no vacuuming will take place anyway)

Aug 23 11:11:51 db-1 postgres[8455]: [ID 748848
local0.warning] [2-1] 2007-08-23 11:11:51 CDT   ERROR:
 out of memory
Aug 23 11:11:51 db-1 postgres[8455]: [ID 748848
local0.warning] [2-2] 2007-08-23 11:11:51 CDT
DETAIL:  Failed on request of size 536870910.
--
Aug 23 11:34:49 db-1 postgres[8910]: [ID 748848
local0.warning] [2-1] 2007-08-23 11:34:49 CDT   ERROR:
 out of memory
Aug 23 11:34:49 db-1 postgres[8910]: [ID 748848
local0.warning] [2-2] 2007-08-23 11:34:49 CDT
DETAIL:  Failed on request of size 536870910.
--
Aug 23 12:06:47 db-1 postgres[9562]: [ID 748848
local0.warning] [2-1] 2007-08-23 12:06:47 CDT   ERROR:
 out of memory
Aug 23 12:06:47 db-1 postgres[9562]: [ID 748848
local0.warning] [2-2] 2007-08-23 12:06:47 CDT
DETAIL:  Failed on request of size 536870910.
--
Aug 23 12:58:47 db-1 postgres[10617]: [ID 748848
local0.warning] [2-1] 2007-08-23 12:58:47 CDT   ERROR:
 out of memory
Aug 23 12:58:47 db-1 postgres[10617]: [ID 748848
local0.warning] [2-2] 2007-08-23 12:58:47 CDT
DETAIL:  Failed on request of size 536870910.
--
Aug 23 15:15:35 db-1 postgres[13400]: [ID 748848
local0.warning] [2-1] 2007-08-23 15:15:35 CDT   ERROR:
 out of memory
Aug 23 15:15:35 db-1 postgres[13400]: [ID 748848
local0.warning] [2-2] 2007-08-23 15:15:35 CDT
DETAIL:  Failed on request of size 536870910.
--
Aug 23 16:50:47 db-1 postgres[15422]: [ID 748848
local0.warning] [2-1] 2007-08-23 16:50:47 CDT   ERROR:
 out of memory
Aug 23 16:50:47 db-1 postgres[15422]: [ID 748848
local0.warning] [2-2] 2007-08-23 16:50:47 CDT
DETAIL:  Failed on request of size 536870910.
--
Aug 24 10:46:46 db-1 postgres[10508]: [ID 748848
local0.warning] [2-1] 2007-08-24 10:46:46 CDT   ERROR:
 out of memory
Aug 24 10:46:46 db-1 postgres[10508]: [ID 748848
local0.warning] [2-2] 2007-08-24 10:46:46 CDT
DETAIL:  Failed on request of size 536870910.
--
Aug 24 11:29:00 db-1 postgres[11539]: [ID 748848
local0.warning] [2-1] 2007-08-24 11:29:00 CDT   ERROR:
 out of memory
Aug 24 11:29:00 db-1 postgres[11539]: [ID 748848
local0.warning] [2-2] 2007-08-24 11:29:00 CDT
DETAIL:  Failed on request of size 536870910.
--
Aug 24 11:50:04 db-1 postgres[12051]: [ID 748848
local0.warning] [2-1] 2007-08-24 11:50:04 CDT   ERROR:
 out of memory
Aug 24 11:50:04 db-1 postgres[12051]: [ID 748848
local0.warning] [2-2] 2007-08-24 11:50:04 CDT
DETAIL:  Failed on request of size 536870910.
--
Aug 24 12:00:33 db-1 postgres[12310]: [ID 748848
local0.warning] [2-1] 2007-08-24 12:00:33 CDT   ERROR:
 out of memory
Aug 24 12:00:33 db-1 postgres[12310]: [ID 748848
local0.warning] [2-2] 2007-08-24 12:00:33 CDT
DETAIL:  Failed on request of size 536870910.
--
Aug 24 16:03:19 db-1 postgres[18263]: [ID 748848
local0.warning] [2493-1] 2007-08-24 16:03:19.296 CDT
 18263ERROR:  out of memory
Aug 24 16:03:19 db-1 postgres[18263]: [ID 748848
local0.warning] [2493-2] 2007-08-24 16:03:19.296 CDT
 18263DETAIL:  Failed on request of size 536870910.
--
Aug 24 16:45:46 db-1 postgres[19313]: [ID 748848
local0.warning] [3356-1] 2007-08-24 16:45:46.804 CDT
 19313ERROR:  out of memory
Aug 24 16:45:46 db-1 postgres[19313]: [ID 748848
local0.warning] [3356-2] 2007-08-24 16:45:46.804 CDT
 19313DETAIL:  Failed on request of size 536870910.
--
Aug 24 17:29:16 db-1 postgres[20379]: [ID 748848
local0.warning] [4238-1] 2007-08-24 17:29:16.926 CDT
 20379ERROR:  out of memory
Aug 24 17:29:16 db-1 postgres[20379]: [ID 748848
local0.warning] [4238-2] 2007-08-24 17:29:16.926 CDT
 20379DETAIL:  Failed on request of size 536870910.
--
Aug 24 17:40:02 db-1 postgres[20651]: [ID 748848
local0.warning] [4452-1] 2007-08-24 17:40:02.682 CDT
 20651ERROR:  out of memory
Aug 24 17:40:02 db-1 postgres[20651]: [ID 748848
local0.warning] [4452-2] 2007-08-24 17:40:02.682 CDT
 20651DETAIL:  Failed on request of size 536870910.
--
Aug 26 11:14:56 db-1 postgres[22161]: [ID 748848
local0.warning] [56115-1] 2007-08-26 11:14:56.077 CDT
  22161ERROR:  out of memory
Aug 26 11:14:56 db-1 postgres[22161]: [ID 748848
local0.warning] [56115-2] 2007-08-26 11:14:56.077 CDT
  22161DETAIL:  Failed on request of size 536870910.
--
Aug 26 11:27:41 db-1 postgres[22477]: [ID 748848
local0.warning] [56381-1] 2007-08-26 11:27:41.141 CDT
  22477ERROR:  out of memory
Aug 26 11:27:41 db-1 postgres[22477]: [ID 748848
local0.warning] [56381-2] 2007-08-26 11:27:41.141 CDT
  22477DETAIL:  Failed on request of size 536870910.
--
Aug 26 11:37:27 db-1 postgres[22729]: [ID 748848
local0.warning] [56603-1] 2007-08-26 11:37:27.476 CDT
  22729ERROR:  out of memory
Aug 26 11:37:27 db-1 postgres[22729]: [ID 748848
local0.warning] [56603-2] 2007-08-26 11:37:27.476 CDT
  22729DETAIL:  Failed on request of size 536870910.
--
Aug 26 13:02:47 db-1 postgres[24831]: [ID 748848
local0.warning] [58357-1] 2007-08-26 13:02:47.721 CDT
  24831ERROR:  out of memory
Aug 26 13:02:47 db-1 postgres[24831]: [ID 748848
local0.warning] [58357-2] 2007-08-26 13:02:47.721 CDT
  24831DETAIL:  Failed on request of size 536870910.
--
Aug 26 14:15:54 db-1 postgres[26625]: [ID 748848
local0.warning] [59885-1] 2007-08-26 14:15:54.583 CDT
  26625ERROR:  out of memory
Aug 26 14:15:54 db-1 postgres[26625]: [ID 748848
local0.warning] [59885-2] 2007-08-26 14:15:54.583 CDT
  26625DETAIL:  Failed on request of size 536870910.
--
Aug 26 14:38:10 db-1 postgres[27167]: [ID 748848
local0.warning] [60334-1] 2007-08-26 14:38:10.817 CDT
  27167ERROR:  out of memory
Aug 26 14:38:10 db-1 postgres[27167]: [ID 748848
local0.warning] [60334-2] 2007-08-26 14:38:10.817 CDT
  27167DETAIL:  Failed on request of size 536870910.
--
Aug 26 14:57:42 db-1 postgres[27662]: [ID 748848
local0.warning] [60748-1] 2007-08-26 14:57:42.690 CDT
  27662ERROR:  out of memory
Aug 26 14:57:42 db-1 postgres[27662]: [ID 748848
local0.warning] [60748-2] 2007-08-26 14:57:42.690 CDT
  27662DETAIL:  Failed on request of size 536870910.
--
Aug 26 17:25:41 db-1 postgres[1352]: [ID 748848
local0.warning] [63840-1] 2007-08-26 17:25:41.189 CDT
  1352ERROR:  out of memory
Aug 26 17:25:41 db-1 postgres[1352]: [ID 748848
local0.warning] [63840-2] 2007-08-26 17:25:41.189 CDT
  1352DETAIL:  Failed on request of size 536870910.
--
Aug 26 18:10:21 db-1 postgres[2467]: [ID 748848
local0.warning] [64756-1] 2007-08-26 18:10:21.684 CDT
  2467ERROR:  out of memory
Aug 26 18:10:21 db-1 postgres[2467]: [ID 748848
local0.warning] [64756-2] 2007-08-26 18:10:21.684 CDT
  2467DETAIL:  Failed on request of size 536870910.
--
Aug 26 18:42:15 db-1 postgres[3246]: [ID 748848
local0.warning] [65420-1] 2007-08-26 18:42:15.973 CDT
  3246ERROR:  out of memory
Aug 26 18:42:15 db-1 postgres[3246]: [ID 748848
local0.warning] [65420-2] 2007-08-26 18:42:15.973 CDT
  3246DETAIL:  Failed on request of size 536870910.
--
Aug 27 08:05:48 db-1 postgres[23092]: [ID 748848
local0.warning] [82122-1] 2007-08-27 08:05:48.214 CDT
  23092ERROR:  out of memory
Aug 27 08:05:48 db-1 postgres[23092]: [ID 748848
local0.warning] [82122-2] 2007-08-27 08:05:48.214 CDT
  23092DETAIL:  Failed on request of size 536870910.
--
Aug 27 08:25:06 db-1 postgres[23569]: [ID 748848
local0.warning] [82520-1] 2007-08-27 08:25:06.407 CDT
  23569ERROR:  out of memory
Aug 27 08:25:06 db-1 postgres[23569]: [ID 748848
local0.warning] [82520-2] 2007-08-27 08:25:06.407 CDT
  23569DETAIL:  Failed on request of size 536870910.
--
Aug 27 08:38:05 db-1 postgres[23909]: [ID 748848
local0.warning] [82785-1] 2007-08-27 08:38:05.991 CDT
  23909ERROR:  out of memory
Aug 27 08:38:05 db-1 postgres[23909]: [ID 748848
local0.warning] [82785-2] 2007-08-27 08:38:05.991 CDT
  23909DETAIL:  Failed on request of size 536870910.
--
Aug 27 09:20:09 db-1 postgres[24945]: [ID 748848
local0.warning] [83640-1] 2007-08-27 09:20:09.331 CDT
  24945ERROR:  out of memory
Aug 27 09:20:09 db-1 postgres[24945]: [ID 748848
local0.warning] [83640-2] 2007-08-27 09:20:09.331 CDT
  24945DETAIL:  Failed on request of size 536870910.
--
Aug 27 09:30:08 db-1 postgres[25155]: [ID 748848
local0.warning] [83857-1] 2007-08-27 09:30:08.536 CDT
  25155ERROR:  out of memory
Aug 27 09:30:08 db-1 postgres[25155]: [ID 748848
local0.warning] [83857-2] 2007-08-27 09:30:08.536 CDT
  25155DETAIL:  Failed on request of size 536870910.
--
Aug 27 09:40:01 db-1 postgres[25396]: [ID 748848
local0.warning] [84040-1] 2007-08-27 09:40:01.195 CDT
  25396ERROR:  out of memory
Aug 27 09:40:01 db-1 postgres[25396]: [ID 748848
local0.warning] [84040-2] 2007-08-27 09:40:01.195 CDT
  25396DETAIL:  Failed on request of size 536870910.
--
Aug 27 09:53:16 db-1 postgres[25729]: [ID 748848
local0.warning] [84289-1] 2007-08-27 09:53:16.815 CDT
  25729ERROR:  out of memory
Aug 27 09:53:16 db-1 postgres[25729]: [ID 748848
local0.warning] [84289-2] 2007-08-27 09:53:16.815 CDT
  25729DETAIL:  Failed on request of size 536870910.
--
Aug 27 10:46:01 db-1 postgres[27008]: [ID 748848
local0.warning] [85355-1] 2007-08-27 10:46:01.888 CDT
  27008ERROR:  out of memory
Aug 27 10:46:01 db-1 postgres[27008]: [ID 748848
local0.warning] [85355-2] 2007-08-27 10:46:01.888 CDT
  27008DETAIL:  Failed on request of size 536870910.






____________________________________________________________________________________
Sick sense of humor? Visit Yahoo! TV's
Comedy with an Edge to see what's on, when.
http://tv.yahoo.com/collections/222

Re: Out of Memory - 8.2.4

From
Tom Lane
Date:
Jeff Amiel <becauseimjeff@yahoo.com> writes:
> Tracking for last few days.
> Does not appear to happen when little or no user
> activity (like Saturday)  I don't know if that rules
> out autovacuum or not (if no update threshholds are
> reached, no vacuuming will take place anyway)

Can you correlate these occurrences with anything in the regular system
logs (kernel log in particular)?  The Postgres log shows nothing out of
the ordinary --- it's simply that the kernel won't give us 512M for some
reason.  I'm guessing it's got something to do with overall system load.

            regards, tom lane

Re: Out of Memory - 8.2.4

From
Martijn van Oosterhout
Date:
On Mon, Aug 27, 2007 at 09:12:17AM -0700, Jeff Amiel wrote:
> Tracking for last few days.
> Does not appear to happen when little or no user
> activity (like Saturday)  I don't know if that rules
> out autovacuum or not (if no update threshholds are
> reached, no vacuuming will take place anyway)

I don't think I've seen it so far this thread, but what are your memory
overcommit settings and allocated swap? At least on Linux you would
need a significant chunk of swap to be able to work with that much
memory, even with overcommit off. Check the rules for your system.

Another thing I havn't seen mentioned: you appear to be on a 32-bit
architecture and with 2GB shared_buffers you've lost half your address
space on that alone. Perhaps you simply don't have enough contiguous
address space to alloc 512MB.

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

Re: Out of Memory - 8.2.4

From
Erik Jones
Date:
On Aug 27, 2007, at 12:15 PM, Martijn van Oosterhout wrote:

> On Mon, Aug 27, 2007 at 09:12:17AM -0700, Jeff Amiel wrote:
>> Tracking for last few days.
>> Does not appear to happen when little or no user
>> activity (like Saturday)  I don't know if that rules
>> out autovacuum or not (if no update threshholds are
>> reached, no vacuuming will take place anyway)
>
> I don't think I've seen it so far this thread, but what are your
> memory
> overcommit settings and allocated swap? At least on Linux you would
> need a significant chunk of swap to be able to work with that much
> memory, even with overcommit off. Check the rules for your system.
>
> Another thing I havn't seen mentioned: you appear to be on a 32-bit
> architecture and with 2GB shared_buffers you've lost half your address
> space on that alone. Perhaps you simply don't have enough contiguous
> address space to alloc 512MB.

The X4600 runs with 64-bit Dual Opterons.

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



Re: Out of Memory - 8.2.4

From
"Joshua D. Drake"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Erik Jones wrote:
>
> On Aug 27, 2007, at 12:15 PM, Martijn van Oosterhout wrote:
>
>> On Mon, Aug 27, 2007 at 09:12:17AM -0700, Jeff Amiel wrote:
>>> Tracking for last few days.
>>> Does not appear to happen when little or no user
>>> activity (like Saturday)  I don't know if that rules
>>> out autovacuum or not (if no update threshholds are
>>> reached, no vacuuming will take place anyway)
>>
>> I don't think I've seen it so far this thread, but what are your memory
>> overcommit settings and allocated swap? At least on Linux you would
>> need a significant chunk of swap to be able to work with that much
>> memory, even with overcommit off. Check the rules for your system.
>>
>> Another thing I havn't seen mentioned: you appear to be on a 32-bit
>> architecture and with 2GB shared_buffers you've lost half your address
>> space on that alone. Perhaps you simply don't have enough contiguous
>> address space to alloc 512MB.
>
> The X4600 runs with 64-bit Dual Opterons.

The machine we are tracking this problem on is also 64bit.

Joshua D. Drake

>
> Erik Jones
>
> Software Developer | Emma®
> erik@myemma.com
> 800.595.4401 or 615.292.5888
> 615.292.0777 (fax)
>
> Emma helps organizations everywhere communicate & market in style.
> Visit us online at http://www.myemma.com
>
>
>


- --

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
            UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG0xomATb/zqfZUUQRAnfUAJ4jQhMV9vEqL8I7zyT59qo0vhbxuACeLH9d
+PpbVOWYxMkrNC/+V4meHSs=
=DK8s
-----END PGP SIGNATURE-----

Re: Out of Memory - 8.2.4

From
Jeff Amiel
Date:
--- "Joshua D. Drake" <jd@commandprompt.com> wrote:
>
> The machine we are tracking this problem on is also 64bit.

Hmmmm.....looks like 3 different people are tracking a similar issue on 64 bit platforms.....you,
Erik and myself.








____________________________________________________________________________________Ready for the edge of your seat?
Check out tonight's top picks on Yahoo! TV.
http://tv.yahoo.com/

Re: Out of Memory - 8.2.4

From
Erik Jones
Date:
Yes, but fortunately for me, unfortunately for the list, it's only
happened to me once so I don't really have anything to go on wrt
repeating the problem.  I can only say, "Yep!  It's happened!"  I am
watching my db closely, though.  Well, my monitoring scripts are :)

On Aug 27, 2007, at 1:56 PM, Jeff Amiel wrote:

> --- "Joshua D. Drake" <jd@commandprompt.com> wrote:
>>
>> The machine we are tracking this problem on is also 64bit.
>
> Hmmmm.....looks like 3 different people are tracking a similar
> issue on 64 bit platforms.....you,
> Erik and myself.
>
>
>
>
>
>
>
>
> ______________________________________________________________________
> ______________Ready for the edge of your seat?
> Check out tonight's top picks on Yahoo! TV.
> http://tv.yahoo.com/
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that
> your
>        message can get through to the mailing list cleanly

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



Re: Out of Memory - 8.2.4

From
Jeff Amiel
Date:
I notice in the log entries for the out of memory events have no username, database name or host
identifier (while regular logged events do)  Does that mean anything to anybody?

Aug 28 08:25:50 db-1 postgres[29019]: [ID 748848 local0.warning] [111900-1] 2007-08-28
08:25:50.081 CDT    29019ERROR:  out of memory
Aug 28 08:25:50 db-1 postgres[29019]: [ID 748848 local0.warning] [111900-2] 2007-08-28
08:25:50.081 CDT    29019DETAIL:  Failed on request of size 536870910.

(regular log entry)
Aug 28 08:26:45 db-1 postgres[28785]: [ID 748848 local0.info] [114999-1] 2007-08-28 08:26:45.413
CDT jboss prod 192.168.20.44 28785LOG:  duration: 22606.146 ms  execute <unnamed>: select


--- Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Can you correlate these occurrences with anything in the regular system
> logs (kernel log in particular)?  The Postgres log shows nothing out of
> the ordinary --- it's simply that the kernel won't give us 512M for some
> reason.  I'm guessing it's got something to do with overall system load.
>
>             regards, tom lane
>




____________________________________________________________________________________
Moody friends. Drama queens. Your life? Nope! - their life, your story. Play Sims Stories at Yahoo! Games.
http://sims.yahoo.com/

Re: Out of Memory - 8.2.4

From
Alvaro Herrera
Date:
Jeff Amiel wrote:
> I notice in the log entries for the out of memory events have no username, database name or host
> identifier (while regular logged events do)  Does that mean anything to anybody?
>
> Aug 28 08:25:50 db-1 postgres[29019]: [ID 748848 local0.warning] [111900-1] 2007-08-28
> 08:25:50.081 CDT    29019ERROR:  out of memory
> Aug 28 08:25:50 db-1 postgres[29019]: [ID 748848 local0.warning] [111900-2] 2007-08-28
> 08:25:50.081 CDT    29019DETAIL:  Failed on request of size 536870910.
>
> (regular log entry)
> Aug 28 08:26:45 db-1 postgres[28785]: [ID 748848 local0.info] [114999-1] 2007-08-28 08:26:45.413
> CDT jboss prod 192.168.20.44 28785LOG:  duration: 22606.146 ms  execute <unnamed>: select

Interesting.  What's your log_line_prefix?  Does it have "%q" somewhere?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Out of Memory - 8.2.4

From
Tom Lane
Date:
Jeff Amiel <becauseimjeff@yahoo.com> writes:
> I notice in the log entries for the out of memory events have no username, database name or host
> identifier (while regular logged events do)  Does that mean anything to anybody?

Means they're coming from autovacuum, likely?

Autovacuum probably *should* populate the database field when
possible, but the other two are meaningless for it.

            regards, tom lane

Re: Out of Memory - 8.2.4

From
Jeff Amiel
Date:
--- Alvaro Herrera <alvherre@commandprompt.com> wrote:

>2.168.20.44 28785LOG:  duration: 22606.146 ms  execute <unnamed>: select
>
> Interesting.  What's your log_line_prefix?  Does it have "%q" somewhere?

No, no %q...not quite sure what it means: "stop here in non-session processes"



      ____________________________________________________________________________________
Fussy? Opinionated? Impossible to please? Perfect.  Join Yahoo!'s user panel and lay it on us.
http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7 


Re: Out of Memory - 8.2.4

From
Erik Jones
Date:
On Aug 28, 2007, at 3:55 PM, Tom Lane wrote:

> Jeff Amiel <becauseimjeff@yahoo.com> writes:
>> I notice in the log entries for the out of memory events have no
>> username, database name or host
>> identifier (while regular logged events do)  Does that mean
>> anything to anybody?
>
> Means they're coming from autovacuum, likely?

Such was my original supposition given that the memory context output
that preceded the actual error in my log included a line for
"Autovacuum Context: ..."

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



Re: Out of Memory - 8.2.4

From
"Marko Kreen"
Date:
On 8/24/07, Jeff Amiel <becauseimjeff@yahoo.com> wrote:
> Over last 2 days, have spotted 10 "Out of Memory"
> errors in postgres logs (never saw before with same
> app/usage patterns on tuned hardware/postgres under
> FreeBSD)
>
> Aug 22 18:08:24 db-1 postgres[16452]: [ID 748848
> local0.warning] [6-1] 2007-08-22 18:08:24 CDT   ERROR:
>  out of memory.
> Aug 22 18:08:24 db-1 postgres[16452]: [ID 748848
> local0.warning] [6-2] 2007-08-22 18:08:24 CDT
> DETAIL:  Failed on request of size 536870910.
>
> What I found interesting is that It's ALWAYS the same
> size....536870910
>
> I am running autovacuum and slony.....but I see
> nothing in the logs anywhere near the "out of memory"
> errors related to either (autovacuum used to under
> 8.0.X log INFO messages every time it vacuumed which
> came in handy...I assume it doesn't so this any more?)
>
>
> The events are fairly spread out...and cannot (by
> looking at app logs and rest of DB logs) correlate to
> any specific query or activity.
>
> Any help would be appreciated

I've experienced something similar.  The reason turned out to be
combination of overcommit=off, big maint_mem and several parallel
vacuums for fast-changing tables.  Seems like VACUUM allocates
full maint_mem before start, whatever the actual size of the table.

Fix was to put "set maint_mem=32M" before small vacuums and
serialize some of them.

--
marko

Re: Out of Memory - 8.2.4

From
Erik Jones
Date:
On Aug 28, 2007, at 4:33 PM, Marko Kreen wrote:

> On 8/24/07, Jeff Amiel <becauseimjeff@yahoo.com> wrote:
>> Over last 2 days, have spotted 10 "Out of Memory"
>> errors in postgres logs (never saw before with same
>> app/usage patterns on tuned hardware/postgres under
>> FreeBSD)
>>
>> Aug 22 18:08:24 db-1 postgres[16452]: [ID 748848
>> local0.warning] [6-1] 2007-08-22 18:08:24 CDT   ERROR:
>>  out of memory.
>> Aug 22 18:08:24 db-1 postgres[16452]: [ID 748848
>> local0.warning] [6-2] 2007-08-22 18:08:24 CDT
>> DETAIL:  Failed on request of size 536870910.
>>
>> What I found interesting is that It's ALWAYS the same
>> size....536870910
>>
>> I am running autovacuum and slony.....but I see
>> nothing in the logs anywhere near the "out of memory"
>> errors related to either (autovacuum used to under
>> 8.0.X log INFO messages every time it vacuumed which
>> came in handy...I assume it doesn't so this any more?)
>>
>>
>> The events are fairly spread out...and cannot (by
>> looking at app logs and rest of DB logs) correlate to
>> any specific query or activity.
>>
>> Any help would be appreciated
>
> I've experienced something similar.  The reason turned out to be
> combination of overcommit=off, big maint_mem and several parallel
> vacuums for fast-changing tables.  Seems like VACUUM allocates
> full maint_mem before start, whatever the actual size of the table.
>
> Fix was to put "set maint_mem=32M" before small vacuums and
> serialize some of them.
>

I'm not sure about the OP but I know that we just run autovacuum so
no problem with parallel vacuums.  In addition, Solaris doesn't have
overcommit.

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



Re: Out of Memory - 8.2.4

From
Alvaro Herrera
Date:
Marko Kreen escribió:

> I've experienced something similar.  The reason turned out to be
> combination of overcommit=off, big maint_mem and several parallel
> vacuums for fast-changing tables.  Seems like VACUUM allocates
> full maint_mem before start, whatever the actual size of the table.

Hmm.  Maybe we should have VACUUM estimate how much is the maximum
amount of memory that would be used, given the size of the table, and
allocate only that much.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Out of Memory - 8.2.4

From
Alvaro Herrera
Date:
Erik Jones escribió:
> On Aug 28, 2007, at 4:33 PM, Marko Kreen wrote:
>
>> I've experienced something similar.  The reason turned out to be
>> combination of overcommit=off, big maint_mem and several parallel
>> vacuums for fast-changing tables.  Seems like VACUUM allocates
>> full maint_mem before start, whatever the actual size of the table.
>>
>> Fix was to put "set maint_mem=32M" before small vacuums and
>> serialize some of them.
>
> I'm not sure about the OP but I know that we just run autovacuum so no
> problem with parallel vacuums.  In addition, Solaris doesn't have
> overcommit.

Note that as of 8.3 you will have vacuums running in parallel with
autovac ... 3 in the default config.  So keep an eye on that
maintenance_work_mem setting.

--
Alvaro Herrera                  http://www.amazon.com/gp/registry/5ZYLFMCVHXC
"XML!" Exclaimed C++.  "What are you doing here? You're not a programming
language."
"Tell that to the people who use me," said XML.

Re: Out of Memory - 8.2.4

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Marko Kreen escribi�:
>> I've experienced something similar.  The reason turned out to be
>> combination of overcommit=off, big maint_mem and several parallel
>> vacuums for fast-changing tables.  Seems like VACUUM allocates
>> full maint_mem before start, whatever the actual size of the table.

> Hmm.  Maybe we should have VACUUM estimate how much is the maximum
> amount of memory that would be used, given the size of the table, and
> allocate only that much.

Yeah --- given the likelihood of parallel vacuum activity in 8.3,
it'd be good to not expend memory we certainly aren't going to need.

We could set a hard limit at RelationGetNumberOfBlocks *
MaxHeapTuplesPerPage TIDs, but that is *extremely* conservative
(it'd work out to allocating about a quarter of the table's actual size
in bytes, if I did the math right).

Given that the worst-case consequence is extra index vacuum passes,
which don't hurt that much when a table is small, maybe some smaller
estimate like 100 TIDs per page would be enough.  Or, instead of
using a hard-wired constant, look at pg_class.reltuples/relpages
to estimate the average tuple density ...

            regards, tom lane

Re: Out of Memory - 8.2.4

From
Alvaro Herrera
Date:
Tom Lane escribió:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Marko Kreen escribi�:
> >> I've experienced something similar.  The reason turned out to be
> >> combination of overcommit=off, big maint_mem and several parallel
> >> vacuums for fast-changing tables.  Seems like VACUUM allocates
> >> full maint_mem before start, whatever the actual size of the table.
>
> > Hmm.  Maybe we should have VACUUM estimate how much is the maximum
> > amount of memory that would be used, given the size of the table, and
> > allocate only that much.
>
> Yeah --- given the likelihood of parallel vacuum activity in 8.3,
> it'd be good to not expend memory we certainly aren't going to need.
>
> We could set a hard limit at RelationGetNumberOfBlocks *
> MaxHeapTuplesPerPage TIDs, but that is *extremely* conservative
> (it'd work out to allocating about a quarter of the table's actual size
> in bytes, if I did the math right).

Another idea is to consider applying this patch:
http://thread.gmane.org/gmane.comp.db.postgresql.devel.patches/19384/focus=19393
which is said to reduce the amount of memory needed to store the TID
array.

> Given that the worst-case consequence is extra index vacuum passes,
> which don't hurt that much when a table is small, maybe some smaller
> estimate like 100 TIDs per page would be enough.  Or, instead of
> using a hard-wired constant, look at pg_class.reltuples/relpages
> to estimate the average tuple density ...

This sounds like a reasonable compromise.

--
Alvaro Herrera       Valdivia, Chile   ICBM: S 39º 49' 18.1", W 73º 13' 56.4"
Management by consensus: I have decided; you concede.
(Leonard Liu)

Re: Out of Memory - 8.2.4

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane escribió:
>> We could set a hard limit at RelationGetNumberOfBlocks *
>> MaxHeapTuplesPerPage TIDs, but that is *extremely* conservative
>> (it'd work out to allocating about a quarter of the table's actual size
>> in bytes, if I did the math right).

> Another idea is to consider applying this patch:
> http://thread.gmane.org/gmane.comp.db.postgresql.devel.patches/19384/focus=19393
> which is said to reduce the amount of memory needed to store the TID
> array.

Yeah, but that's orthogonal to the question of how many TID slots we need.

>> Given that the worst-case consequence is extra index vacuum passes,
>> which don't hurt that much when a table is small, maybe some smaller
>> estimate like 100 TIDs per page would be enough.  Or, instead of
>> using a hard-wired constant, look at pg_class.reltuples/relpages
>> to estimate the average tuple density ...

> This sounds like a reasonable compromise.

Do you want to make it happen?

            regards, tom lane

Re: Out of Memory - 8.2.4

From
Alvaro Herrera
Date:
Tom Lane escribió:
> Alvaro Herrera <alvherre@commandprompt.com> writes:

> >> Given that the worst-case consequence is extra index vacuum passes,
> >> which don't hurt that much when a table is small, maybe some smaller
> >> estimate like 100 TIDs per page would be enough.  Or, instead of
> >> using a hard-wired constant, look at pg_class.reltuples/relpages
> >> to estimate the average tuple density ...
>
> > This sounds like a reasonable compromise.
>
> Do you want to make it happen?

OK, I'll work on it tomorrow.

--
Alvaro Herrera                               http://www.PlanetPostgreSQL.org/
Hi! I'm a .signature virus!
cp me into your .signature file to help me spread!

Re: Out of Memory - 8.2.4

From
Alvaro Herrera
Date:
Tom Lane escribió:
> Alvaro Herrera <alvherre@commandprompt.com> writes:

> >> Given that the worst-case consequence is extra index vacuum passes,
> >> which don't hurt that much when a table is small, maybe some smaller
> >> estimate like 100 TIDs per page would be enough.  Or, instead of
> >> using a hard-wired constant, look at pg_class.reltuples/relpages
> >> to estimate the average tuple density ...
>
> > This sounds like a reasonable compromise.
>
> Do you want to make it happen?

I'm not having much luck really.  I think the problem is that ANALYZE
stores reltuples as the number of live tuples, so if you delete a big
portion of a big table, then ANALYZE and then VACUUM, there's a huge
misestimation and extra index cleanup passes happen, which is a bad
thing.

There seems to be no way to estimate the dead space, is there?  We could
go to pgstats but that seems backwards.

I was having a problem at first with estimating for small tables which
had no valid info in pg_class.reltuples, but I worked around that by
using MaxHeapTuplesPerPage.  (I was experimenting with the code that
estimates average tuple width in estimate_rel_size() but then figured it
was too much work.)  So this part is fine AFAICS.

I attach the patch I am playing with, and the simple test I've been
examining (on which I comment the ANALYZE on some runs, change the
conditions on the DELETE, put the CREATE INDEX before insertion instead
of after it, etc).

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Attachment

Re: Out of Memory - 8.2.4

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> I'm not having much luck really.  I think the problem is that ANALYZE
> stores reltuples as the number of live tuples, so if you delete a big
> portion of a big table, then ANALYZE and then VACUUM, there's a huge
> misestimation and extra index cleanup passes happen, which is a bad
> thing.

Yeah ... so just go with a constant estimate of say 200 deletable tuples
per page?

            regards, tom lane

Re: Out of Memory - 8.2.4

From
"Marko Kreen"
Date:
On 8/29/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > I'm not having much luck really.  I think the problem is that ANALYZE
> > stores reltuples as the number of live tuples, so if you delete a big
> > portion of a big table, then ANALYZE and then VACUUM, there's a huge
> > misestimation and extra index cleanup passes happen, which is a bad
> > thing.
>
> Yeah ... so just go with a constant estimate of say 200 deletable tuples
> per page?

Note that it's much better to err on the smaller values.

Extra index pass is really no problem.  VACUUM getting
"Out of memory" may not sound like a big problem, but the scary
thing is - the last VACUUM's memory request may succeed and that
means following queries start failing and that is big problem.

--
marko

Re: Out of Memory - 8.2.4

From
Tom Lane
Date:
"Marko Kreen" <markokr@gmail.com> writes:
> Note that it's much better to err on the smaller values.

> Extra index pass is really no problem.

I beg to differ ...

What this may actually suggest is that autovacuum needs its own value of
"maintenance_work_mem", or that it should automatically divide the value
by the authorized number of workers.

            regards, tom lane

Re: Out of Memory - 8.2.4

From
Alvaro Herrera
Date:
Marko Kreen escribió:
> On 8/29/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Alvaro Herrera <alvherre@commandprompt.com> writes:
> > > I'm not having much luck really.  I think the problem is that ANALYZE
> > > stores reltuples as the number of live tuples, so if you delete a big
> > > portion of a big table, then ANALYZE and then VACUUM, there's a huge
> > > misestimation and extra index cleanup passes happen, which is a bad
> > > thing.
> >
> > Yeah ... so just go with a constant estimate of say 200 deletable tuples
> > per page?
>
> Note that it's much better to err on the smaller values.
>
> Extra index pass is really no problem.

Humm, is it?  If you have a really big table (say, a hundred million
tuples) and two indexes then you are not happy when vacuum must make two
passes over the indexes.  It may mean vacuum taking five hours instead
of three with vacuum delay.  Remember, you must scan each index
*completely* each time.

> VACUUM getting "Out of memory" may not sound like a big problem, but
> the scary thing is - the last VACUUM's memory request may succeed and
> that means following queries start failing and that is big problem.

Maybe what we should do is spill the TID list to disk instead.  TODO for
8.4?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Out of Memory - 8.2.4

From
Alvaro Herrera
Date:
Tom Lane escribió:
> "Marko Kreen" <markokr@gmail.com> writes:
> > Note that it's much better to err on the smaller values.
>
> > Extra index pass is really no problem.
>
> I beg to differ ...
>
> What this may actually suggest is that autovacuum needs its own value of
> "maintenance_work_mem", or that it should automatically divide the value
> by the authorized number of workers.

Hmm, this makes sense.

--
Alvaro Herrera       Valdivia, Chile   ICBM: S 39º 49' 18.1", W 73º 13' 56.4"
"Cómo ponemos nuestros dedos en la arcilla del otro. Eso es la amistad; jugar
al alfarero y ver qué formas se pueden sacar del otro" (C. Halloway en
La Feria de las Tinieblas, R. Bradbury)

Re: Out of Memory - 8.2.4

From
Alvaro Herrera
Date:
Tom Lane escribió:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > I'm not having much luck really.  I think the problem is that ANALYZE
> > stores reltuples as the number of live tuples, so if you delete a big
> > portion of a big table, then ANALYZE and then VACUUM, there's a huge
> > misestimation and extra index cleanup passes happen, which is a bad
> > thing.
>
> Yeah ... so just go with a constant estimate of say 200 deletable tuples
> per page?

How about we use a constant estimate using the average tuple width code?

--
Alvaro Herrera                 http://www.amazon.com/gp/registry/CTMLCN8V17R4
"In fact, the basic problem with Perl 5's subroutines is that they're not
crufty enough, so the cruft leaks out into user-defined code instead, by
the Conservation of Cruft Principle."  (Larry Wall, Apocalypse 6)

Re: Out of Memory - 8.2.4

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane escribi�:
>> Yeah ... so just go with a constant estimate of say 200 deletable tuples
>> per page?

> How about we use a constant estimate using the average tuple width code?

I think that's overthinking the problem.  The point here is mostly for
vacuum to not consume 512MB (or whatever you have maintenance_work_mem
set to) when vacuuming a ten-page table.  I think that if we
significantly increase the risk of having to make multiple index passes
on medium-size tables, we'll not be doing anyone any favors.

If we went with allocating MaxHeapTuplesPerPage slots per page (292 in
CVS HEAD), 512MB would correspond to a bit over 300,000 pages, and you'd
get memory savings for anything less than that.  But that's already a
2GB table --- do you want to risk multiple index passes because you were
chintzy with your memory allocation?

Ultimately, the answer for a DBA who sees "out of memory" a lot is to
reduce his maintenance_work_mem.  I don't think VACUUM should be trying
to substitute for the DBA's judgment.

BTW, if an autovac worker gets an elog(ERROR) on one table, does it die
or continue on with the next table?

            regards, tom lane

Re: Out of Memory - 8.2.4

From
"Marko Kreen"
Date:
On 8/30/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Marko Kreen" <markokr@gmail.com> writes:
> > Note that it's much better to err on the smaller values.
>
> > Extra index pass is really no problem.
>
> I beg to differ ...

Well, if Postgres tries to cut down passes by using max memory
then admin is forced to cut down maint_mem for safety reasons...

> What this may actually suggest is that autovacuum needs its own value of
> "maintenance_work_mem", or that it should automatically divide the value
> by the authorized number of workers.

+1

Autovacuum having it's own value and cutting pieces from there
and giving to vacuums is probably the easiest path.

Then the per-backend maint_mem does not need to be large.

--
marko

Re: Out of Memory - 8.2.4

From
Alvaro Herrera
Date:
Tom Lane escribió:

> BTW, if an autovac worker gets an elog(ERROR) on one table, does it die
> or continue on with the next table?

It continues with the next table if interrupted (SIGINT), but the worker
exits on any other error.  I would ask you to review that code -- it's
in do_autovacuum, the PG_TRY block at the end.  It was committed in rev
1.52 of autovacuum.c.

--
Alvaro Herrera                        http://www.advogato.org/person/alvherre
"We are who we choose to be", sang the goldfinch
when the sun is high (Sandman)

Re: Out of Memory - 8.2.4

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane escribió:
>> BTW, if an autovac worker gets an elog(ERROR) on one table, does it die
>> or continue on with the next table?

> It continues with the next table if interrupted (SIGINT), but the worker
> exits on any other error.

Hmm, that seems backwards.  What's the rationale?

            regards, tom lane

Re: Out of Memory - 8.2.4

From
Alvaro Herrera
Date:
Tom Lane escribió:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Tom Lane escribió:
> >> BTW, if an autovac worker gets an elog(ERROR) on one table, does it die
> >> or continue on with the next table?
>
> > It continues with the next table if interrupted (SIGINT), but the worker
> > exits on any other error.
>
> Hmm, that seems backwards.  What's the rationale?

I don't remember :-(  We now use SIGTERM for shutdown, which closes the
worker for good.

It may well be that we need to change this so that the worker is shut
down on specific error codes, instead of continuing on some other codes.
Or it may need more thorough surgery.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Out of Memory - 8.2.4

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
>> VACUUM getting "Out of memory" may not sound like a big problem, but
>> the scary thing is - the last VACUUM's memory request may succeed and
>> that means following queries start failing and that is big problem.

> Maybe what we should do is spill the TID list to disk instead.  TODO for
> 8.4?

I'm inclined to think that that'd be counterproductive.  Searching the
TID list has to be *fast*, else the index cleanup scans will take
forever.  It's probably better to have a smaller list and do two index
passes than to try to do it in one pass using a list that doesn't fit in
memory --- in the former case you're at least doing a seqscan of the
index, rather than randomly faulting in different parts of the TID list.

It's fairly likely that 512MB is already too high and is causing some
swapping on the OP's machine ...

            regards, tom lane

Re: Out of Memory - 8.2.4

From
Alvaro Herrera
Date:
Tom Lane escribió:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> >> VACUUM getting "Out of memory" may not sound like a big problem, but
> >> the scary thing is - the last VACUUM's memory request may succeed and
> >> that means following queries start failing and that is big problem.
>
> > Maybe what we should do is spill the TID list to disk instead.  TODO for
> > 8.4?
>
> I'm inclined to think that that'd be counterproductive.  Searching the
> TID list has to be *fast*, else the index cleanup scans will take
> forever.  It's probably better to have a smaller list and do two index
> passes than to try to do it in one pass using a list that doesn't fit in
> memory --- in the former case you're at least doing a seqscan of the
> index, rather than randomly faulting in different parts of the TID list.

Maybe we could load it in a more compact form after the heap cleanup
pass, instead of a plain TID list.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Out of Memory - 8.2.4

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Maybe we could load it in a more compact form after the heap cleanup
> pass, instead of a plain TID list.

Hmm ... a non-lossy bitmap would be an interesting alternative.
I think I went with bsearch mainly because I knew I could generate
the TID list already sorted.  Somebody should try to measure the
probe time of the tidbitmap code against bsearch ...

            regards, tom lane

Re: Out of Memory - 8.2.4

From
Alvaro Herrera
Date:
Tom Lane escribió:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > I'm not having much luck really.  I think the problem is that ANALYZE
> > stores reltuples as the number of live tuples, so if you delete a big
> > portion of a big table, then ANALYZE and then VACUUM, there's a huge
> > misestimation and extra index cleanup passes happen, which is a bad
> > thing.
>
> Yeah ... so just go with a constant estimate of say 200 deletable tuples
> per page?

This seems the most reasonable approach for now.  So this is the patch.

--
Alvaro Herrera                               http://www.PlanetPostgreSQL.org/
"I call it GNU/Linux. Except the GNU/ is silent." (Ben Reiter)

Attachment

Re: Out of Memory - 8.2.4

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> !         /* no need to allocate more space than we have pages */
> !         maxtuples = LAZY_ALLOC_TUPLES * relblocks;

Uh ... you need to guard against integer overflow in this calculation.
Perhaps leave the current initialization alone, and then after the
Min-reduction steps add

        /* curious coding to ensure the multiplication can't overflow */
    if ((BlockNumber) (maxtuples / LAZY_ALLOC_TUPLES) > relblocks)
        maxtuples = relblocks * LAZY_ALLOC_TUPLES;

which is demonstrably overflow-proof once maxtuples has been clamped to
less than INT_MAX.

            regards, tom lane

Re: Out of Memory - 8.2.4

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Alvaro Herrera <alvherre@commandprompt.com> writes:
>> Maybe we could load it in a more compact form after the heap cleanup
>> pass, instead of a plain TID list.
>
> Hmm ... a non-lossy bitmap would be an interesting alternative.
> I think I went with bsearch mainly because I knew I could generate
> the TID list already sorted.  Somebody should try to measure the
> probe time of the tidbitmap code against bsearch ...

Incidentally, there was a previous discussion about this a while back.
I can't seem to find it in the archives though.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: Out of Memory - 8.2.4

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> It continues with the next table if interrupted (SIGINT), but the worker
> exits on any other error.  I would ask you to review that code -- it's
> in do_autovacuum, the PG_TRY block at the end.  It was committed in rev
> 1.52 of autovacuum.c.

While looking at this I came across something I didn't like at all:

         * We somewhat ignore the risk that the launcher changes its PID
         * between we reading it and the actual kill; we expect ProcKill to be
         * called shortly after us, and we assume that PIDs are not reused too
         * quickly after a process exits.

I'm fairly sure that Windows has a bad habit of recycling PIDs almost
immediately.  I didn't actually read the code to see what the assumption
is for --- I just noticed this comment and it set off alarm bells.  Can
you rework the logic to not depend on PIDs at all?  (Perhaps the
"session IDs" that Florian's patch will create would serve instead?
I imagine those will be assigned during InitProcess, so they should
be available to identify individual autovac workers.)

            regards, tom lane

Re: Out of Memory - 8.2.4

From
Alvaro Herrera
Date:
Tom Lane escribió:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > It continues with the next table if interrupted (SIGINT), but the worker
> > exits on any other error.  I would ask you to review that code -- it's
> > in do_autovacuum, the PG_TRY block at the end.  It was committed in rev
> > 1.52 of autovacuum.c.
>
> While looking at this I came across something I didn't like at all:
>
>          * We somewhat ignore the risk that the launcher changes its PID
>          * between we reading it and the actual kill; we expect ProcKill to be
>          * called shortly after us, and we assume that PIDs are not reused too
>          * quickly after a process exits.
>
> I'm fairly sure that Windows has a bad habit of recycling PIDs almost
> immediately.  I didn't actually read the code to see what the assumption
> is for --- I just noticed this comment and it set off alarm bells.

Well, this is not much of a risk, because what's going on is that the
worker wants to signal the launcher.  So the launcher would need to shut
down for this to happen, which would be pretty rare on its own.  Also,
note that the time interval we're talking about is between one proc_exit
handler fires and the next.

Also, note that the worst thing that can happen is that the wrong
process gets a SIGUSR1 signal, and the launcher misses an opportunity
for starting another worker and rebalancing the vacuum cost parameters.

--
Alvaro Herrera                  http://www.amazon.com/gp/registry/5ZYLFMCVHXC
"No me acuerdo, pero no es cierto.  No es cierto, y si fuera cierto,
 no me acuerdo."                 (Augusto Pinochet a una corte de justicia)

Re: Out of Memory - 8.2.4

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Also, note that the worst thing that can happen is that the wrong
> process gets a SIGUSR1 signal, and the launcher misses an opportunity
> for starting another worker and rebalancing the vacuum cost parameters.

Hmmm ... okay, but I note that part of that assumption is that every
postgres-owned process either ignores SIGUSR1 or handles it in a fashion
such that an extra signal won't cause any Bad Things.  This is not
obvious, especially considering that the Unix default action for SIGUSR1
is abnormal process termination.  I'm starting to think that we need a
README somewhere collecting all the system's assumptions about signal
handling.

            regards, tom lane

Re: Out of Memory - 8.2.4

From
Alvaro Herrera
Date:
Tom Lane escribió:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > !         /* no need to allocate more space than we have pages */
> > !         maxtuples = LAZY_ALLOC_TUPLES * relblocks;
>
> Uh ... you need to guard against integer overflow in this calculation.

Done this way.  I applied it all the way back to 8.0, because 8.2 was
requested and supporting 8.1 in this way still seems necessary; and the
8.1 patch applied cleanly to 8.0 so I did that one too.  It didn't apply
cleanly to 7.4 though, but I don't think I've ever seen a complaint
about this with so old a version.  It's only a wording change in a
comment though, so it's trivial to apply it back if somebody feels it's
needed.

Sorry it didn't make it into 8.2.5 :-(

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support