Thread: Is my database now too big?

Is my database now too big?

From
"Darren Reed"
Date:
I'm starting to wonder if my database has finally grown too big for my
computer.

But, rather than peform badly, I see a number of "bad things" happen:

* pg_dump fails to run, causing an "out of memory" error (I don't
understand
  why this happens, but it does);

* restarting postgres across a reboot results in tables disappearing
  without any error messages being given out (this is *really
  disturbing*.)

postgres itself, however, starts up fine.

"wc -l" of the output from the last successuful dump_all is around
8million
lines, spread across half a dozen or so tables.

I'm afraid that if I restructure the db to have smaller tables, I'd just
be
putting off "hitting" the ceiling.

What can I safely do to get postgres running normally?

I'm currently using 8.1.3, will I fare any better with a more recent
version?

My end goal is to have maybe 50 million records.  Will postgresql handle
this
(with only 1.5GB of RAM) or do I need to look elsewhere for a db of this
size?

Thoughts?

Cheers,
Darren

Re: Is my database now too big?

From
"Joshua D. Drake"
Date:
Darren Reed wrote:
> I'm starting to wonder if my database has finally grown too big for my
> computer.
>
> But, rather than peform badly, I see a number of "bad things" happen:
>
> * pg_dump fails to run, causing an "out of memory" error (I don't
> understand
>   why this happens, but it does);
>
> * restarting postgres across a reboot results in tables disappearing
>   without any error messages being given out (this is *really
>   disturbing*.)

Are you vacuuming?

>
> postgres itself, however, starts up fine.
>
> "wc -l" of the output from the last successuful dump_all is around
> 8million
> lines, spread across half a dozen or so tables.
>
> I'm afraid that if I restructure the db to have smaller tables, I'd just
> be
> putting off "hitting" the ceiling.
>
> What can I safely do to get postgres running normally?
>
> I'm currently using 8.1.3, will I fare any better with a more recent
> version?
>
> My end goal is to have maybe 50 million records.  Will postgresql handle
> this
> (with only 1.5GB of RAM) or do I need to look elsewhere for a db of this
> size?
>
> Thoughts?
>
> Cheers,
> Darren
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>


Re: Is my database now too big?

From
"Darren Reed"
Date:
On Sat, 06 Oct 2007 23:05:23 -0700, "Joshua D. Drake"
<jd@commandprompt.com> said:
> Darren Reed wrote:
> > I'm starting to wonder if my database has finally grown too big for my
> > computer.
> >
> > But, rather than peform badly, I see a number of "bad things" happen:
> >
> > * pg_dump fails to run, causing an "out of memory" error (I don't
> > understand
> >   why this happens, but it does);
> >
> > * restarting postgres across a reboot results in tables disappearing
> >   without any error messages being given out (this is *really
> >   disturbing*.)
>
> Are you vacuuming?

No.  I'm just using the standard config

If you're implying I should?

The problems I've come across have become evident with as little work as
this:
- run "initdb"
- start postgres
- run "psql" to import the dumped data
- run for a short period of time (less than an hour)
- shutdown postgres
- reboot
- postgres starts
- the large tables are now missing

With the above, I wouldn't have thought there is enough time to do
vacuums,
which seem to take quite a while on large tables.  The activity is a
mixture
of inserts and updates (from triggers and plsql functions.)

Darren

Re: Is my database now too big?

From
"Scott Marlowe"
Date:
On 10/7/07, Darren Reed <darrenr+postgres@fastmail.net> wrote:
> On Sat, 06 Oct 2007 23:05:23 -0700, "Joshua D. Drake"
> <jd@commandprompt.com> said:
> > Darren Reed wrote:
> > > I'm starting to wonder if my database has finally grown too big for my
> > > computer.
> > >
> > > But, rather than peform badly, I see a number of "bad things" happen:
> > >
> > > * pg_dump fails to run, causing an "out of memory" error (I don't
> > > understand
> > >   why this happens, but it does);
> > >
> > > * restarting postgres across a reboot results in tables disappearing
> > >   without any error messages being given out (this is *really
> > >   disturbing*.)
> >
> > Are you vacuuming?
>
> No.  I'm just using the standard config
>
> If you're implying I should?
>
> The problems I've come across have become evident with as little work as
> this:
> - run "initdb"
> - start postgres
> - run "psql" to import the dumped data
> - run for a short period of time (less than an hour)
> - shutdown postgres
> - reboot
> - postgres starts
> - the large tables are now missing
>
> With the above, I wouldn't have thought there is enough time to do
> vacuums,
> which seem to take quite a while on large tables.  The activity is a
> mixture
> of inserts and updates (from triggers and plsql functions.)

Any reasonably modern version of pgsql should simply stop accepting
requests rather than suffering loss due to txid wraparound. So,I can
think of two possibilities here.  Bad hardware or operator error.

Assuming you've checked out your machine thoroughly for bad hardware,
I can see a scenario where one does something like:

begin;
create table xyz;
load 10,000,000 rows
manipulate rows
shutdown db without committing
start database
voila, table xyz is gone, and rightly so.

Got more detailed info on what you're doing?

There's no problem with a few million rows.  That's actually a pretty
small pgsql db.  I have a reporting db that grows by about 1-2 million
rows a month and takes up 40 gigs, and that's pretty small in the
pgsql universe.

Re: Is my database now too big?

From
"Scott Marlowe"
Date:
On 10/7/07, Darren Reed <darrenr@fastmail.net> wrote:
> Scott Marlowe wrote:
> > ...
> >
> > Any reasonably modern version of pgsql should simply stop accepting
> > requests rather than suffering loss due to txid wraparound.So,I can
> > think of two possibilities here. Bad hardware or operator error.
> >
> > Assuming you've checked out your machine thoroughly for bad hardware,
> > I can see a scenario where one does something like:
> >
> > begin;
> > create table xyz;
> > load 10,000,000 rows
> > manipulate rows
> > shutdown db without committing
> > start database
> > voila, table xyz is gone, and rightly so.
> >
> > Got more detailed info on what you're doing?
>
> That does describe what was happening (I haven't used BEGIN/COMMIT.)

then it isn't the same thing.  If you did a begin, then did everything
else without commit, the table would rightly disappear.

> Nothing very special, I thought...
>
> But, doing "SELECT * FROM ifl LIMIT 1;" causes postgres to grow its
> process to 2GB and then die because the OS ran out of swap!

I doubt that exact query is causing the db to run out of memory,
unless ifl is a complex view or something.

Can you be more specific on what exact query causes the problem to show up?

> Actually, this is a table that sees a lot of INSERT/DELETE (it's a place to
> store work to be done and bits get removed when completed) and I haven't
> been using BEGIN/COMMIT.  This is how postgres currently handles it:
>
> LOG:  database system was not properly shut down; automatic recovery in
> progress
> LOG:  record with zero length at 0/891157C8
> LOG:  redo is not required
> LOG:  database system is ready
> LOG:  transaction ID wrap limit is 2147484146, limited by database
> "postgres"
> LOG:  unexpected EOF on client connection
> LOG:  server process (PID 7212) was terminated by signal 9
> LOG:  terminating any other active server processes
> WARNING:  terminating connection because of crash of another server process

Looks like some query is running the server out of memory.  Normally,
postgresql will spill to disk if it needs more memory, unless it's
miconfigured.

> I'm modifying the work to use BEGIN/COMMIT, but the ifl table worries me...
> I can't seem to do anything with it that doesn't cause postgres to crap
> out ;(

begin/commit ain't the problem here.  Looks like you've either got
pgsql set to use too much memory or it's choosing a bad plan where it
thinks something will fit in memory but it won't.

Have you been analyzing your data before you start working on it?

Can we see your postgresql.conf file?

Re: Is my database now too big?

From
Tom Lane
Date:
"Darren Reed" <darrenr+postgres@fastmail.net> writes:
> * pg_dump fails to run, causing an "out of memory" error (I don't
> understand
>   why this happens, but it does);

Let's see the exact command to pg_dump and the exact failure message.
It might be useful to add -v so we can get some sense of where in the
process it fails.

> * restarting postgres across a reboot results in tables disappearing
>   without any error messages being given out (this is *really
>   disturbing*.)

Hm, system reboot you mean?  To be honest I think this is likely pilot
error, along the lines of the postmaster you are running after the
reboot is using a different data directory than the one you were talking
to before.  Comparing the output of "SHOW data_directory;" before and
after might be illuminating.

I remember a vaguely similar incident awhile back in which it turned out
that the DBA had put the data directory on a soft-mounted NFS server,
which sometimes hadn't come up by the time Postgres started, and so
sometimes he got the NFS server's version of the database and sometimes
he got a version that was on the local disk underneath the NFS mount
point.  *That* was a mess :-(, and that was one reason why the PGDG
and Red Hat init scripts for PG no longer do an automatic initdb if
they don't find a valid data directory where it's supposed to be.

Exactly how are you starting the postmaster, anyway, and what is the
underlying platform here?  Did you roll-your-own Postgres build or
is this a prepackaged distribution?  If the latter, whose?

While I'm asking questions, is it just the tables that disappear or
is it all your database objects (including functions etc)?  Do
*all* your tables disappear or just some of them?  What do you
mean by "disappear" exactly --- do you get 'relation "foo" does not
exist', or some weirder error, or are the tables present but empty?

> "wc -l" of the output from the last successuful dump_all is around
> 8million lines, spread across half a dozen or so tables.

As already pointed out, this database is tiny.  I'm not sure what your
problem is, but I'm quite sure it's not "database is too big".

> I'm currently using 8.1.3, will I fare any better with a more recent
> version?

You should certainly be on 8.1.10 just on general principles, though
I don't immediately recall any bugs that seem like they could be
related.  There have been plenty of bugs fixed since 8.1.3 though.

            regards, tom lane

Re: Is my database now too big?

From
Darren Reed
Date:
Scott Marlowe wrote:
> On 10/7/07, Darren Reed <darrenr@fastmail.net> wrote:
> > Scott Marlowe wrote:
> > > ...
> > >
> > > Any reasonably modern version of pgsql should simply stop accepting
> > > requests rather than suffering loss due to txid wraparound.So,I can
> > > think of two possibilities here. Bad hardware or operator error.
> > >
> > > Assuming you've checked out your machine thoroughly for bad hardware,
> > > I can see a scenario where one does something like:
> > >
> > > begin;
> > > create table xyz;
> > > load 10,000,000 rows
> > > manipulate rows
> > > shutdown db without committing
> > > start database
> > > voila, table xyz is gone, and rightly so.
> > >
> > > Got more detailed info on what you're doing?
> >
> > That does describe what was happening (I haven't used BEGIN/COMMIT.)
>
> then it isn't the same thing.  If you did a begin, then did everything
> else without commit, the table would rightly disappear.
>

Right, I'm with you on that.
A few days ago I did:
pg_dumpall > foo
What I was doing yesterday was:
rm -rf /data/db/*
initdb -D /data/db
start
psql < foo
run for some period
stop
reboot
start
...tables have gone but disk space is still in use.
I dont know if it was during the period of running that the
database got corrupted (interrupted insert/update/query?)
or what happened.


> > Nothing very special, I thought...
> >
> > But, doing "SELECT * FROM ifl LIMIT 1;" causes postgres to grow its
> > process to 2GB and then die because the OS ran out of swap!
>
> I doubt that exact query is causing the db to run out of memory,
> unless ifl is a complex view or something.
>
> Can you be more specific on what exact query causes the problem to show up?
>

It turned out that _any_ query on that table caused the problem to show up.

I couldn't even do "DROP TABLE ifl;" without postgres growing until it
ran out of memory.

So in the end, I wiped it clean and reloaded the data - this time
bounding all of the
work with BEGIN/COMMIT.  So far things are looking better.  All of the
data I've
been building the tables with is elsewhere, so I can reconstruct it.
Maybe adding
BEGIN/COMMIT makes no difference to not using them before, but I'm curious
to see if it does.  Ideally I'd like to get to a place where I don't
need to use vacuum
at all.

> > Actually, this is a table that sees a lot of INSERT/DELETE (it's a place to
> > store work to be done and bits get removed when completed) and I haven't
> > been using BEGIN/COMMIT.  This is how postgres currently handles it:
> >
> > LOG:  database system was not properly shut down; automatic recovery in
> > progress
> > LOG:  record with zero length at 0/891157C8
> > LOG:  redo is not required
> > LOG:  database system is ready
> > LOG:  transaction ID wrap limit is 2147484146, limited by database
> > "postgres"
> > LOG:  unexpected EOF on client connection
> > LOG:  server process (PID 7212) was terminated by signal 9
> > LOG:  terminating any other active server processes
> > WARNING:  terminating connection because of crash of another server process
>
> Looks like some query is running the server out of memory.  Normally,
> postgresql will spill to disk if it needs more memory, unless it's
> miconfigured.
>

Yes.  I tried increasing the swap space but that just meant it grew
larger...from limit:
datasize     3145728 kbytes

This is from NetBSD 4.99.  I ended up running with 3.5GB of SWAP and
1.5GB of RAM.


> > I'm modifying the work to use BEGIN/COMMIT, but the ifl table worries me...
> > I can't seem to do anything with it that doesn't cause postgres to crap
> > out ;(
>
> begin/commit ain't the problem here.  Looks like you've either got
> pgsql set to use too much memory or it's choosing a bad plan where it
> thinks something will fit in memory but it won't.
>

I have no other problems with any of the other tables and it is only a
small table (at the time
it should have had less than 5000 rows.)

> Have you been analyzing your data before you start working on it?
>

No.

> Can we see your postgresql.conf file?
>

Sure, I've attached it.
I've also run with the "default" .conf file without tuning it (down.)

Darren

# -----------------------------
# PostgreSQL configuration file
# -----------------------------
#
# This file consists of lines of the form:
#
#   name = value
#
# (The '=' is optional.) White space may be used. Comments are introduced
# with '#' anywhere on a line. The complete list of option names and
# allowed values can be found in the PostgreSQL documentation. The
# commented-out settings shown in this file represent the default values.
#
# Please note that re-commenting a setting is NOT sufficient to revert it
# to the default value, unless you restart the postmaster.
#
# Any option can also be given as a command line switch to the
# postmaster, e.g. 'postmaster -c log_connections=on'. Some options
# can be changed at run-time with the 'SET' SQL command.
#
# This file is read on postmaster startup and when the postmaster
# receives a SIGHUP. If you edit the file on a running system, you have
# to SIGHUP the postmaster for the changes to take effect, or use
# "pg_ctl reload". Some settings, such as listen_addresses, require
# a postmaster shutdown and restart to take effect.


#---------------------------------------------------------------------------
# FILE LOCATIONS
#---------------------------------------------------------------------------

# The default values of these variables are driven from the -D command line
# switch or PGDATA environment variable, represented here as ConfigDir.

#data_directory = 'ConfigDir'        # use data in another directory
#hba_file = 'ConfigDir/pg_hba.conf'    # host-based authentication file
#ident_file = 'ConfigDir/pg_ident.conf'    # IDENT configuration file

# If external_pid_file is not explicitly set, no extra pid file is written.
#external_pid_file = '(none)'        # write an extra pid file


#---------------------------------------------------------------------------
# CONNECTIONS AND AUTHENTICATION
#---------------------------------------------------------------------------

# - Connection Settings -

#listen_addresses = 'localhost'        # what IP address(es) to listen on;
                    # comma-separated list of addresses;
                    # defaults to 'localhost', '*' = all
#port = 5432
max_connections = 15
# note: increasing max_connections costs ~400 bytes of shared memory per
# connection slot, plus lock space (see max_locks_per_transaction).  You
# might also need to raise shared_buffers to support more connections.
superuser_reserved_connections = 2
#unix_socket_directory = ''
#unix_socket_group = ''
#unix_socket_permissions = 0777        # octal
#bonjour_name = ''            # defaults to the computer name

# - Security & Authentication -

#authentication_timeout = 60        # 1-600, in seconds
#ssl = off
#password_encryption = on
#db_user_namespace = off

# Kerberos
#krb_server_keyfile = ''
#krb_srvname = 'postgres'
#krb_server_hostname = ''        # empty string matches any keytab entry
#krb_caseins_users = off

# - TCP Keepalives -
# see 'man 7 tcp' for details

#tcp_keepalives_idle = 0        # TCP_KEEPIDLE, in seconds;
                    # 0 selects the system default
#tcp_keepalives_interval = 0        # TCP_KEEPINTVL, in seconds;
                    # 0 selects the system default
#tcp_keepalives_count = 0        # TCP_KEEPCNT;
                    # 0 selects the system default


#---------------------------------------------------------------------------
# RESOURCE USAGE (except WAL)
#---------------------------------------------------------------------------

# - Memory -

shared_buffers = 2000            # min 16 or max_connections*2, 8KB each
temp_buffers = 200            # min 100, 8KB each
max_prepared_transactions = 5        # can be 0 or more
# note: increasing max_prepared_transactions costs ~600 bytes of shared memory
# per transaction slot, plus lock space (see max_locks_per_transaction).
work_mem = 4096            # min 64, size in KB
maintenance_work_mem = 8192        # min 1024, size in KB
max_stack_depth = 400            # min 100, size in KB

# - Free Space Map -

max_fsm_pages = 20000        # min max_fsm_relations*16, 6 bytes each
max_fsm_relations = 200        # min 100, ~70 bytes each

# - Kernel Resource Usage -

#max_files_per_process = 25        # min 25
#preload_libraries = ''

# - Cost-Based Vacuum Delay -

#vacuum_cost_delay = 0            # 0-1000 milliseconds
#vacuum_cost_page_hit = 1        # 0-10000 credits
#vacuum_cost_page_miss = 10        # 0-10000 credits
#vacuum_cost_page_dirty = 20        # 0-10000 credits
#vacuum_cost_limit = 200        # 0-10000 credits

# - Background writer -

#bgwriter_delay = 200            # 10-10000 milliseconds between rounds
#bgwriter_lru_percent = 1.0        # 0-100% of LRU buffers scanned/round
#bgwriter_lru_maxpages = 5        # 0-1000 buffers max written/round
#bgwriter_all_percent = 0.333        # 0-100% of all buffers scanned/round
#bgwriter_all_maxpages = 5        # 0-1000 buffers max written/round


#---------------------------------------------------------------------------
# WRITE AHEAD LOG
#---------------------------------------------------------------------------

# - Settings -

#fsync = on                # turns forced synchronization on or off
#wal_sync_method = fsync        # the default is the first option
                    # supported by the operating system:
                    #   open_datasync
                    #   fdatasync
                    #   fsync
                    #   fsync_writethrough
                    #   open_sync
#full_page_writes = on            # recover from partial page writes
#wal_buffers = 8            # min 4, 8KB each
#commit_delay = 0            # range 0-100000, in microseconds
#commit_siblings = 5            # range 1-1000

# - Checkpoints -

#checkpoint_segments = 3        # in logfile segments, min 1, 16MB each
#checkpoint_timeout = 300        # range 30-3600, in seconds
#checkpoint_warning = 30        # in seconds, 0 is off

# - Archiving -

#archive_command = ''            # command to use to archive a logfile
                    # segment


#---------------------------------------------------------------------------
# QUERY TUNING
#---------------------------------------------------------------------------

# - Planner Method Configuration -

#enable_bitmapscan = on
#enable_hashagg = on
#enable_hashjoin = on
#enable_indexscan = on
#enable_mergejoin = on
#enable_nestloop = on
#enable_seqscan = on
#enable_sort = on
#enable_tidscan = on

# - Planner Cost Constants -

effective_cache_size = 1000        # typically 8KB each
#random_page_cost = 4            # units are one sequential page fetch
                    # cost
#cpu_tuple_cost = 0.01            # (same)
#cpu_index_tuple_cost = 0.001        # (same)
#cpu_operator_cost = 0.0025        # (same)

# - Genetic Query Optimizer -

#geqo = on
#geqo_threshold = 12
#geqo_effort = 5            # range 1-10
#geqo_pool_size = 0            # selects default based on effort
#geqo_generations = 0            # selects default based on effort
#geqo_selection_bias = 2.0        # range 1.5-2.0

# - Other Planner Options -

#default_statistics_target = 10        # range 1-1000
#constraint_exclusion = off
#from_collapse_limit = 8
#join_collapse_limit = 8        # 1 disables collapsing of explicit
                    # JOINs


#---------------------------------------------------------------------------
# ERROR REPORTING AND LOGGING
#---------------------------------------------------------------------------

# - Where to Log -

#log_destination = 'stderr'        # Valid values are combinations of
                    # stderr, syslog and eventlog,
                    # depending on platform.

# This is used when logging to stderr:
#redirect_stderr = off            # Enable capturing of stderr into log
                    # files

# These are only used if redirect_stderr is on:
#log_directory = 'pg_log'        # Directory where log files are written
                    # Can be absolute or relative to PGDATA
#log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # Log file name pattern.
                    # Can include strftime() escapes
#log_truncate_on_rotation = off # If on, any existing log file of the same
                    # name as the new log file will be
                    # truncated rather than appended to. But
                    # such truncation only occurs on
                    # time-driven rotation, not on restarts
                    # or size-driven rotation. Default is
                    # off, meaning append to existing files
                    # in all cases.
#log_rotation_age = 1440        # Automatic rotation of logfiles will
                    # happen after so many minutes.  0 to
                    # disable.
#log_rotation_size = 10240        # Automatic rotation of logfiles will
                    # happen after so many kilobytes of log
                    # output.  0 to disable.

# These are relevant when logging to syslog:
#syslog_facility = 'LOCAL0'
#syslog_ident = 'postgres'


# - When to Log -

#client_min_messages = notice        # Values, in order of decreasing detail:
                    #   debug5
                    #   debug4
                    #   debug3
                    #   debug2
                    #   debug1
                    #   log
                    #   notice
                    #   warning
                    #   error

#log_min_messages = notice        # Values, in order of decreasing detail:
                    #   debug5
                    #   debug4
                    #   debug3
                    #   debug2
                    #   debug1
                    #   info
                    #   notice
                    #   warning
                    #   error
                    #   log
                    #   fatal
                    #   panic

#log_error_verbosity = default        # terse, default, or verbose messages

#log_min_error_statement = panic    # Values in order of increasing severity:
                     #   debug5
                    #   debug4
                    #   debug3
                    #   debug2
                    #   debug1
                     #   info
                    #   notice
                    #   warning
                    #   error
                    #   panic(off)

#log_min_duration_statement = -1    # -1 is disabled, 0 logs all statements
                    # and their durations, in milliseconds.

#silent_mode = off            # DO NOT USE without syslog or
                    # redirect_stderr

# - What to Log -

#debug_print_parse = off
#debug_print_rewritten = off
#debug_print_plan = off
#debug_pretty_print = off
#log_connections = off
#log_disconnections = off
#log_duration = off
#log_line_prefix = ''            # Special values:
                    #   %u = user name
                    #   %d = database name
                    #   %r = remote host and port
                    #   %h = remote host
                    #   %p = PID
                    #   %t = timestamp (no milliseconds)
                    #   %m = timestamp with milliseconds
                    #   %i = command tag
                    #   %c = session id
                    #   %l = session line number
                    #   %s = session start timestamp
                    #   %x = transaction id
                    #   %q = stop here in non-session
                    #        processes
                    #   %% = '%'
                    # e.g. '<%u%%%d> '
#log_statement = 'none'            # none, mod, ddl, all
#log_hostname = off


#---------------------------------------------------------------------------
# RUNTIME STATISTICS
#---------------------------------------------------------------------------

# - Statistics Monitoring -

#log_parser_stats = off
#log_planner_stats = off
#log_executor_stats = off
#log_statement_stats = off

# - Query/Index Statistics Collector -

#stats_start_collector = on
#stats_command_string = off
#stats_block_level = off
#stats_row_level = off
#stats_reset_on_server_start = off


#---------------------------------------------------------------------------
# AUTOVACUUM PARAMETERS
#---------------------------------------------------------------------------

#autovacuum = off            # enable autovacuum subprocess?
#autovacuum_naptime = 60        # time between autovacuum runs, in secs
#autovacuum_vacuum_threshold = 1000    # min # of tuple updates before
                    # vacuum
#autovacuum_analyze_threshold = 500    # min # of tuple updates before
                    # analyze
#autovacuum_vacuum_scale_factor = 0.4    # fraction of rel size before
                    # vacuum
#autovacuum_analyze_scale_factor = 0.2    # fraction of rel size before
                    # analyze
#autovacuum_vacuum_cost_delay = -1    # default vacuum cost delay for
                    # autovac, -1 means use
                    # vacuum_cost_delay
#autovacuum_vacuum_cost_limit = -1    # default vacuum cost limit for
                    # autovac, -1 means use
                    # vacuum_cost_limit


#---------------------------------------------------------------------------
# CLIENT CONNECTION DEFAULTS
#---------------------------------------------------------------------------

# - Statement Behavior -

#search_path = '$user,public'        # schema names
#default_tablespace = ''        # a tablespace name, '' uses
                    # the default
#check_function_bodies = on
#default_transaction_isolation = 'read committed'
#default_transaction_read_only = off
#statement_timeout = 0            # 0 is disabled, in milliseconds

# - Locale and Formatting -

#datestyle = 'iso, mdy'
#timezone = unknown            # actually, defaults to TZ
                    # environment setting
#australian_timezones = off
#extra_float_digits = 0            # min -15, max 2
#client_encoding = sql_ascii        # actually, defaults to database
                    # encoding

# These settings are initialized by initdb -- they might be changed
#lc_messages = 'C'            # locale for system error message
                    # strings
#lc_monetary = 'C'            # locale for monetary formatting
#lc_numeric = 'C'            # locale for number formatting
#lc_time = 'C'                # locale for time formatting

# - Other Defaults -

#explain_pretty_print = on
#dynamic_library_path = '$libdir'


#---------------------------------------------------------------------------
# LOCK MANAGEMENT
#---------------------------------------------------------------------------

#deadlock_timeout = 1000        # in milliseconds
#max_locks_per_transaction = 64        # min 10
# note: each lock table slot uses ~220 bytes of shared memory, and there are
# max_locks_per_transaction * (max_connections + max_prepared_transactions)
# lock table slots.


#---------------------------------------------------------------------------
# VERSION/PLATFORM COMPATIBILITY
#---------------------------------------------------------------------------

# - Previous Postgres Versions -

#add_missing_from = off
#regex_flavor = advanced        # advanced, extended, or basic
#sql_inheritance = on
#default_with_oids = off
#escape_string_warning = off

# - Other Platforms & Clients -

#transform_null_equals = off


#---------------------------------------------------------------------------
# CUSTOMIZED OPTIONS
#---------------------------------------------------------------------------

#custom_variable_classes = ''        # list of custom variable class names

Re: Is my database now too big?

From
"Scott Marlowe"
Date:
On 10/7/07, Darren Reed <darrenr+postgres@fastmail.net> wrote:
> Scott Marlowe wrote:
> > On 10/7/07, Darren Reed <darrenr@fastmail.net> wrote:
> > > Scott Marlowe wrote:

> A few days ago I did:
> pg_dumpall > foo
> What I was doing yesterday was:
> rm -rf /data/db/*
> initdb -D /data/db
> start
> psql < foo
> run for some period
> stop
> reboot
> start
> ...tables have gone but disk space is still in use.
> I dont know if it was during the period of running that the
> database got corrupted (interrupted insert/update/query?)
> or what happened.

Are you sure postgresql was starting up in the /data/db directory
after reboot and not somewhere else like /var/lib/pgsql/data???

IF you're definitely hitting the right directory, then Is the database
shutting down cleanly on reboot?  It might be that it's getting killed
during a write and you've got some kind of problem with fsync on your
machine so the db is getting corrupted

> > Can you be more specific on what exact query causes the problem to show up?
> >
>
> It turned out that _any_ query on that table caused the problem to show up.
>
> I couldn't even do "DROP TABLE ifl;" without postgres growing until it
> ran out of memory.

definitely sounds like some kind of issue other just the size of the
table, like some kind of corruption.

>
> So in the end, I wiped it clean and reloaded the data - this time
> bounding all of the
> work with BEGIN/COMMIT.  So far things are looking better.  All of the
> data I've
> been building the tables with is elsewhere, so I can reconstruct it.
> Maybe adding
> BEGIN/COMMIT makes no difference to not using them before, but I'm curious
> to see if it does.  Ideally I'd like to get to a place where I don't
> need to use vacuum
> at all.

If nothing else, wrapping your load and building in begin;commit;
should make it faster.

> > Have you been analyzing your data before you start working on it?
> >
>
> No.

You should definitely run analyze after updating the table.  It helps
the query planner make the optimal choice for query plans.

>
> > Can we see your postgresql.conf file?
> >
>
> Sure, I've attached it.
> I've also run with the "default" .conf file without tuning it (down.)
>
> Darren
>
>

Nothing odd here:

> shared_buffers = 2000                   # min 16 or max_connections*2, 8KB each
> temp_buffers = 200                      # min 100, 8KB each
> max_prepared_transactions = 5           # can be 0 or more
> # note: increasing max_prepared_transactions costs ~600 bytes of shared memory
> # per transaction slot, plus lock space (see max_locks_per_transaction).
> work_mem = 4096                 # min 64, size in KB
> maintenance_work_mem = 8192             # min 1024, size in KB
> max_stack_depth = 400                   # min 100, size in KB
>
> # - Free Space Map -
>
> max_fsm_pages = 20000           # min max_fsm_relations*16, 6 bytes each
> max_fsm_relations = 200         # min 100, ~70 bytes each
>

This:
> effective_cache_size = 1000             # typically 8KB each
is a little low, but that's not a huge deal.  effective cache size
doesn't allocate anything, it just tells the planner about how much
memory the OS is using to cache your database.

And I don't see anything else in your postgresql.conf that looks
suspicious.  I'm leaning towards possible pilot error in shutting down
or starting up the db.

Re: Is my database now too big?

From
Darren Reed
Date:
Scott Marlowe wrote:
> ...
>
> Any reasonably modern version of pgsql should simply stop accepting
> requests rather than suffering loss due to txid wraparound.So,I can
> think of two possibilities here. Bad hardware or operator error.
>
> Assuming you've checked out your machine thoroughly for bad hardware,
> I can see a scenario where one does something like:
>
> begin;
> create table xyz;
> load 10,000,000 rows
> manipulate rows
> shutdown db without committing
> start database
> voila, table xyz is gone, and rightly so.
>
> Got more detailed info on what you're doing?

That does describe what was happening (I haven't used BEGIN/COMMIT.)

Nothing very special, I thought...

But, doing "SELECT * FROM ifl LIMIT 1;" causes postgres to grow its
process to 2GB and then die because the OS ran out of swap!

Actually, this is a table that sees a lot of INSERT/DELETE (it's a place to
store work to be done and bits get removed when completed) and I haven't
been using BEGIN/COMMIT.  This is how postgres currently handles it:

LOG:  database system was not properly shut down; automatic recovery in
progress
LOG:  record with zero length at 0/891157C8
LOG:  redo is not required
LOG:  database system is ready
LOG:  transaction ID wrap limit is 2147484146, limited by database
"postgres"
LOG:  unexpected EOF on client connection
LOG:  server process (PID 7212) was terminated by signal 9
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
LOG:  all server processes terminated; reinitializing
LOG:  database system was interrupted at 2007-10-07 02:22:05 PDT
LOG:  checkpoint record is at 0/891157C8
LOG:  redo record is at 0/891157C8; undo record is at 0/0; shutdown TRUE
LOG:  next transaction ID: 2715; next OID: 24576
LOG:  next MultiXactId: 1; next MultiXactOffset: 0
LOG:  database system was not properly shut down; automatic recovery in
progress
LOG:  record with zero length at 0/8911580C
LOG:  redo is not required
LOG:  database system is ready
LOG:  transaction ID wrap limit is 2147484146, limited by database
"postgres"

I'm modifying the work to use BEGIN/COMMIT, but the ifl table worries me...
I can't seem to do anything with it that doesn't cause postgres to crap
out ;(

Darren


Re: Is my database now too big?

From
Darren Reed
Date:
Scott Marlowe wrote:
> On 10/7/07, Darren Reed <darrenr+postgres@fastmail.net> wrote:
> > Scott Marlowe wrote:
> > > On 10/7/07, Darren Reed <darrenr@fastmail.net> wrote:
> > > > Scott Marlowe wrote:
>
> > A few days ago I did:
> > pg_dumpall > foo
> > What I was doing yesterday was:
> > rm -rf /data/db/*
> > initdb -D /data/db
> > start
> > psql < foo
> > run for some period
> > stop
> > reboot
> > start
> > ...tables have gone but disk space is still in use.
> > I dont know if it was during the period of running that the
> > database got corrupted (interrupted insert/update/query?)
> > or what happened.
>
> Are you sure postgresql was starting up in the /data/db directory
> after reboot and not somewhere else like /var/lib/pgsql/data???
>
> IF you're definitely hitting the right directory, then Is the database
> shutting down cleanly on reboot?  It might be that it's getting killed
> during a write and you've got some kind of problem with fsync on your
> machine so the db is getting corrupted
>
> > > Can you be more specific on what exact query causes the problem to show up?
> > >
> >
> > It turned out that _any_ query on that table caused the problem to show up.
> >
> > I couldn't even do "DROP TABLE ifl;" without postgres growing until it
> > ran out of memory.
>
> definitely sounds like some kind of issue other just the size of the
> table, like some kind of corruption.
>
> ...
> And I don't see anything else in your postgresql.conf that looks
> suspicious.  I'm leaning towards possible pilot error in shutting down
> or starting up the db.
>

Ok, I've had another reoccurance of this problem.

The sequence of events was something like this:
CREATE TABLESPACE foo LOCATION "/data/index/ext";
<wait>
<machine hangs>
<reboot>
Of course postgresql didn't shut down cleanly because it was
naughtly earlier and ate all my RAM, causing the box to hang.
Now I'm back to the prior problem: entire tables are missing
when postgresql starts back up again.  Obviously there is some
sort of corruption (caused by postgresql) and it isn't able to
recover properly.

So I'm moving quite squarely away from postgres being able
to gobble up as much RAM is it desires.

Darren


Re: Is my database now too big?

From
Darren Reed
Date:
Darren Reed wrote:
> Scott Marlowe wrote:
>> On 10/7/07, Darren Reed <darrenr+postgres@fastmail.net> wrote:
>> > Scott Marlowe wrote:
>> > > On 10/7/07, Darren Reed <darrenr@fastmail.net> wrote:
>> > > > Scott Marlowe wrote:
>>
>> > A few days ago I did:
>> > pg_dumpall > foo
>> > What I was doing yesterday was:
>> > rm -rf /data/db/*
>> > initdb -D /data/db
>> > start
>> > psql < foo
>> > run for some period
>> > stop
>> > reboot
>> > start
>> > ...tables have gone but disk space is still in use.
>> > I dont know if it was during the period of running that the
>> > database got corrupted (interrupted insert/update/query?)
>> > or what happened.
>>
>> Are you sure postgresql was starting up in the /data/db directory
>> after reboot and not somewhere else like /var/lib/pgsql/data???
>>
>> IF you're definitely hitting the right directory, then Is the database
>> shutting down cleanly on reboot?  It might be that it's getting killed
>> during a write and you've got some kind of problem with fsync on your
>> machine so the db is getting corrupted
>>
>> > > Can you be more specific on what exact query causes the problem
>> to show up?
>> > >
>> >
>> > It turned out that _any_ query on that table caused the problem to
>> show up.
>> >
>> > I couldn't even do "DROP TABLE ifl;" without postgres growing until it
>> > ran out of memory.
>>
>> definitely sounds like some kind of issue other just the size of the
>> table, like some kind of corruption.
>>
>> ...
>> And I don't see anything else in your postgresql.conf that looks
>> suspicious.  I'm leaning towards possible pilot error in shutting down
>> or starting up the db.
>>
>
> Ok, I've had another reoccurance of this problem.
>
> The sequence of events was something like this:
> CREATE TABLESPACE foo LOCATION "/data/index/ext";
> <wait>
> <machine hangs>
> <reboot>
> Of course postgresql didn't shut down cleanly because it was
> naughtly earlier and ate all my RAM, causing the box to hang.
> Now I'm back to the prior problem: entire tables are missing
> when postgresql starts back up again.  Obviously there is some
> sort of corruption (caused by postgresql) and it isn't able to
> recover properly.

I suppose the obvious question here is how do I restore the
missing tables?  The data is all there, is there some sort of
transaction log that can be unwound to restore visibility of
the missing tables?  Where do I start for trying to do some
forensics?  Or am I SOL?

Later on...


So having upgraded to 8.1.10, I thought I'd try out a few things...


# /usr/pkg/bin/pg_dumpall
--
-- PostgreSQL database cluster dump
--

\connect postgres

--
-- Roles
--

CREATE ROLE postgres;
ALTER ROLE postgres WITH SUPERUSER INHERIT CREATEROLE CREATEDB LOGIN;
CREATE ROLE root;
ALTER ROLE root WITH SUPERUSER INHERIT CREATEROLE CREATEDB LOGIN;






--
-- Database creation
--

REVOKE ALL ON DATABASE template1 FROM PUBLIC;
REVOKE ALL ON DATABASE template1 FROM postgres;
GRANT ALL ON DATABASE template1 TO postgres;


\connect postgres

pg_dump: SQL command failed
pg_dump: Error message from server: ERROR:  out of memory
DETAIL:  Failed on request of size 20.
pg_dump: The command was: LOCK TABLE public.foo IN ACCESS SHARE MODE
pg_dumpall: pg_dump failed on database "postgres", exiting
# SELECT * FROM foo LIMIT 1;
ERROR:  out of memory
DETAIL:  Failed on request of size 20.

I don't know if this is of any significance:
# \l
       List of databases
  Name    |  Owner   | Encoding
-----------+----------+-----------
postgres  | postgres | SQL_ASCII
template0 | postgres | SQL_ASCII
template1 | postgres | SQL_ASCII
(3 rows)

# /usr/pkg/bin/psql -U postgres template1
psql: FATAL:  out of memory
DETAIL:  Failed on request of size 20.

What puzzles me is why the transaction log hasn't
resulted in postgresql being able to restore itself
to a known clean state.

Darren

Re: Is my database now too big?

From
Darren Reed
Date:
Darren Reed wrote:
> Scott Marlowe wrote:
>> On 10/7/07, Darren Reed <darrenr+postgres@fastmail.net> wrote:
>> > Scott Marlowe wrote:
>> > > On 10/7/07, Darren Reed <darrenr@fastmail.net> wrote:
>> > > > Scott Marlowe wrote:
>>
>> > A few days ago I did:
>> > pg_dumpall > foo
>> > What I was doing yesterday was:
>> > rm -rf /data/db/*
>> > initdb -D /data/db
>> > start
>> > psql < foo
>> > run for some period
>> > stop
>> > reboot
>> > start
>> > ...tables have gone but disk space is still in use.
>> > I dont know if it was during the period of running that the
>> > database got corrupted (interrupted insert/update/query?)
>> > or what happened.
>>
>> Are you sure postgresql was starting up in the /data/db directory
>> after reboot and not somewhere else like /var/lib/pgsql/data???
>>
>> IF you're definitely hitting the right directory, then Is the database
>> shutting down cleanly on reboot?  It might be that it's getting killed
>> during a write and you've got some kind of problem with fsync on your
>> machine so the db is getting corrupted
>>
>> > > Can you be more specific on what exact query causes the problem
>> to show up?
>> > >
>> >
>> > It turned out that _any_ query on that table caused the problem to
>> show up.
>> >
>> > I couldn't even do "DROP TABLE ifl;" without postgres growing until it
>> > ran out of memory.
>>
>> definitely sounds like some kind of issue other just the size of the
>> table, like some kind of corruption.
>>
>> ...
>> And I don't see anything else in your postgresql.conf that looks
>> suspicious.  I'm leaning towards possible pilot error in shutting down
>> or starting up the db.
>>
>
> Ok, I've had another reoccurance of this problem.
>
> The sequence of events was something like this:
> CREATE TABLESPACE foo LOCATION "/data/index/ext";
> <wait>
> <machine hangs>
> <reboot>
> Of course postgresql didn't shut down cleanly because it was
> naughtly earlier and ate all my RAM, causing the box to hang.
> Now I'm back to the prior problem: entire tables are missing
> when postgresql starts back up again.  Obviously there is some
> sort of corruption (caused by postgresql) and it isn't able to
> recover properly.

I suppose the obvious question here is how do I restore the
missing tables?  The data is all there, is there some sort of
transaction log that can be unwound to restore visibility of
the missing tables?  Where do I start for trying to do some
forensics?  Or am I SOL?

Darren


Re: Is my database now too big?

From
Darren Reed
Date:
So having upgraded to 8.1.10, I thought I'd try out a few things...


# /usr/pkg/bin/pg_dumpall
--
-- PostgreSQL database cluster dump
--

\connect postgres

--
-- Roles
--

CREATE ROLE postgres;
ALTER ROLE postgres WITH SUPERUSER INHERIT CREATEROLE CREATEDB LOGIN;
CREATE ROLE root;
ALTER ROLE root WITH SUPERUSER INHERIT CREATEROLE CREATEDB LOGIN;






--
-- Database creation
--

REVOKE ALL ON DATABASE template1 FROM PUBLIC;
REVOKE ALL ON DATABASE template1 FROM postgres;
GRANT ALL ON DATABASE template1 TO postgres;


\connect postgres

pg_dump: SQL command failed
pg_dump: Error message from server: ERROR:  out of memory
DETAIL:  Failed on request of size 20.
pg_dump: The command was: LOCK TABLE public.foo IN ACCESS SHARE MODE
pg_dumpall: pg_dump failed on database "postgres", exiting
# SELECT * FROM foo LIMIT 1;
ERROR:  out of memory
DETAIL:  Failed on request of size 20.

I don't know if this is of any significance:
# \l
        List of databases
   Name    |  Owner   | Encoding
-----------+----------+-----------
 postgres  | postgres | SQL_ASCII
 template0 | postgres | SQL_ASCII
 template1 | postgres | SQL_ASCII
(3 rows)

# /usr/pkg/bin/psql -U postgres template1
psql: FATAL:  out of memory
DETAIL:  Failed on request of size 20.

What puzzles me is why the transaction log hasn't
resulted in postgresql being able to restore itself
to a known clean state.

Darren


Re: Is my database now too big?

From
Tom Lane
Date:
Darren Reed <darrenr@fastmail.net> writes:
> # /usr/pkg/bin/psql -U postgres template1
> psql: FATAL:  out of memory
> DETAIL:  Failed on request of size 20.

I'm starting to think there is something very broken about your machine :-(.
Have you run any hardware diagnostics on it lately?  The level of
flakiness you're seeing starts to suggest bad RAM to me.

Anyway, the above error should have also produced a map of per-context
memory usage in the postmaster log (ie, postmaster stderr).  If you
could show us that, it might be revealing.

            regards, tom lane

Re: Is my database now too big?

From
Darren Reed
Date:
Tom Lane wrote:
> Darren Reed <darrenr@fastmail.net> writes:
> > # /usr/pkg/bin/psql -U postgres template1
> > psql: FATAL:  out of memory
> > DETAIL:  Failed on request of size 20.
>
> I'm starting to think there is something very broken about your
> machine :-(.
> Have you run any hardware diagnostics on it lately?  The level of
> flakiness you're seeing starts to suggest bad RAM to me.
>

No, I haven't run any diagnostics.

But I'm not convinced the hardware is a problem because the flakiness
has only really
been a problem when I started doing more than just inserts and updates.
The table that
has shown the most problems (ifl) is a table of work to do, so I'm
inserting records,
doing a select of random items out of it and also deleting records (once
the work is
complete.)  Multiple processes can be trying to simultaneously be doing
all of these,
which should not be anything out of the ordinary.  Or maybe this
workload is just making
the database stress the hardware more?

> Anyway, the above error should have also produced a map of per-context
> memory usage in the postmaster log (ie, postmaster stderr).  If you
> could show us that, it might be revealing.
>

I think what you're referring to is this:

TopMemoryContext: 32768 total in 3 blocks; 7776 free (8 chunks); 24992 used
Operator class cache: 8192 total in 1 blocks; 4936 free (0 chunks); 3256
used
TopTransactionContext: 8192 total in 1 blocks; 7856 free (0 chunks); 336
used
MessageContext: 267378688 total in 43 blocks; 1768 free (45 chunks);
267376920 used
smgr relation table: 8192 total in 1 blocks; 3904 free (0 chunks); 4288
used
Portal hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used
PortalMemory: 8192 total in 1 blocks; 8176 free (1 chunks); 16 used
Relcache by OID: 8192 total in 1 blocks; 3896 free (0 chunks); 4296 used
CacheMemoryContext: 253952 total in 5 blocks; 11880 free (1 chunks);
242072 used
pg_attrdef_adrelid_adnum_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_type_typname_nsp_index: 1024 total in 1 blocks; 328 free (0 chunks);
696 used
pg_type_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_statistic_relid_att_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_auth_members_member_role_index: 1024 total in 1 blocks; 328 free (0
chunks);696 used
pg_auth_members_role_member_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_proc_proname_args_nsp_index: 1024 total in 1 blocks; 256 free (0
chunks); 768 used
pg_proc_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
pg_operator_oprname_l_r_n_index: 1024 total in 1 blocks; 192 free (0
chunks); 832 used
pg_operator_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632
used
pg_opclass_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
pg_opclass_am_name_nsp_index: 1024 total in 1 blocks; 256 free (0
chunks); 768 used
pg_namespace_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632
used
pg_namespace_nspname_index: 1024 total in 1 blocks; 392 free (0 chunks);
632 used
pg_language_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632
used
pg_language_name_index: 1024 total in 1 blocks; 392 free (0 chunks); 632
used
pg_inherits_relid_seqno_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_index_indexrelid_index: 1024 total in 1 blocks; 392 free (0 chunks);
632 used
pg_authid_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
pg_authid_rolname_index: 1024 total in 1 blocks; 392 free (0 chunks);
632 used
pg_database_datname_index: 1024 total in 1 blocks; 392 free (0 chunks);
632 used
pg_conversion_oid_index: 1024 total in 1 blocks; 392 free (0 chunks);
632 used
pg_conversion_name_nsp_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_conversion_default_index: 1024 total in 1 blocks; 192 free (0
chunks); 832 used
pg_class_relname_nsp_index: 1024 total in 1 blocks; 328 free (0 chunks);
696 used
pg_class_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
pg_cast_source_target_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_amproc_opc_proc_index: 1024 total in 1 blocks; 256 free (0 chunks);
768 used
pg_amop_opr_opc_index: 1024 total in 1 blocks; 328 free (0 chunks); 696
used
pg_amop_opc_strat_index: 1024 total in 1 blocks; 256 free (0 chunks);
768 used
pg_amop_opc_strat_index: 1024 total in 1 blocks; 256 free (0 chunks);
768 used
pg_aggregate_fnoid_index: 1024 total in 1 blocks; 392 free (0 chunks);
632 used
MdSmgr: 8192 total in 1 blocks; 8152 free (0 chunks); 40 used
LockTable (locallock hash): 8192 total in 1 blocks; 3912 free (0
chunks); 4280 used
Timezones: 47592 total in 2 blocks; 5968 free (0 chunks); 41624 used
ErrorContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used
ERROR:  out of memory
DETAIL:  Failed on request of size 20.
TopMemoryContext: 32768 total in 3 blocks; 7776 free (8 chunks); 24992 used

The standout problem is the "MessageContext" count.

Darren

Re: Is my database now too big?

From
"Scott Marlowe"
Date:
On 10/15/07, Darren Reed <darrenr@fastmail.net> wrote:
> Tom Lane wrote:
> > Darren Reed <darrenr@fastmail.net> writes:
> > > # /usr/pkg/bin/psql -U postgres template1
> > > psql: FATAL:  out of memory
> > > DETAIL:  Failed on request of size 20.
> >
> > I'm starting to think there is something very broken about your machine :-(.
> > Have you run any hardware diagnostics on it lately?  The level of
> > flakiness you're seeing starts to suggest bad RAM to me.
> >
>
> No, I haven't run any diagnostics.
>
> But I'm not convinced the hardware is a problem because the flakiness
> has only really
> been a problem when I started doing more than just inserts and updates.
> The table that
> has shown the most problems (ifl) is a table of work to do, so I'm
> inserting records,
> doing a select of random items out of it and also deleting records (once
> the work is
> complete.)  Multiple processes can be trying to simultaneously be doing
> all of these,
> which should not be anything out of the ordinary.  Or maybe this
> workload is just making
> the database stress the hardware more?

So, I'm guessing you're doing something like:

select * from ifl order by random()

in several different threads?  that means that the table has to be
materialized twice in memory, and then most of the result thrown away.

Generally, the preferred way to do a random select of a large table is
to assign a random number to each row and then select a range based on
that number.  So, you have 100,000 rows, you assign the numbers 1
through 100,000 to each row at random, then you select them using
something like a sequence to make sure that each process isn't bumping
into each other.

So, if you're going to process 10 records at a time, you create a
sequence with an increment of 10 and use a select from it to get your
"random" row to operate on

Again, I'm kinda shooting in the dark here as you reveal more and more
what you are doing a little at a time.  A test case that can invoke
this failure would be most useful.

Re: Is my database now too big?

From
Darren Reed
Date:
Tom Lane wrote:
> Darren Reed <darrenr@fastmail.net> writes:
> > # /usr/pkg/bin/psql -U postgres template1
> > psql: FATAL:  out of memory
> > DETAIL:  Failed on request of size 20.
>
> I'm starting to think there is something very broken about your machine :-(.
> Have you run any hardware diagnostics on it lately?  The level of
> flakiness you're seeing starts to suggest bad RAM to me.
>

No, I haven't run any diagnostics.

But I'm not convinced the hardware is a problem because the flakiness
has only really
been a problem when I started doing more than just inserts and updates.
The table that
has shown the most problems (ifl) is a table of work to do, so I'm
inserting records,
doing a select of random items out of it and also deleting records (once
the work is
complete.)  Multiple processes can be trying to simultaneously be doing
all of these,
which should not be anything out of the ordinary.  Or maybe this
workload is just making
the database stress the hardware more?

> Anyway, the above error should have also produced a map of per-context
> memory usage in the postmaster log (ie, postmaster stderr).  If you
> could show us that, it might be revealing.
>

I think what you're referring to is this:

TopMemoryContext: 32768 total in 3 blocks; 7776 free (8 chunks); 24992 used
Operator class cache: 8192 total in 1 blocks; 4936 free (0 chunks); 3256
used
TopTransactionContext: 8192 total in 1 blocks; 7856 free (0 chunks); 336
used
MessageContext: 267378688 total in 43 blocks; 1768 free (45 chunks);
267376920 used
smgr relation table: 8192 total in 1 blocks; 3904 free (0 chunks); 4288 used
Portal hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used
PortalMemory: 8192 total in 1 blocks; 8176 free (1 chunks); 16 used
Relcache by OID: 8192 total in 1 blocks; 3896 free (0 chunks); 4296 used
CacheMemoryContext: 253952 total in 5 blocks; 11880 free (1 chunks);
242072 used
pg_attrdef_adrelid_adnum_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_type_typname_nsp_index: 1024 total in 1 blocks; 328 free (0 chunks);
696 used
pg_type_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_statistic_relid_att_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_auth_members_member_role_index: 1024 total in 1 blocks; 328 free (0
chunks);696 used
pg_auth_members_role_member_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_proc_proname_args_nsp_index: 1024 total in 1 blocks; 256 free (0
chunks); 768 used
pg_proc_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
pg_operator_oprname_l_r_n_index: 1024 total in 1 blocks; 192 free (0
chunks); 832 used
pg_operator_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
pg_opclass_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
pg_opclass_am_name_nsp_index: 1024 total in 1 blocks; 256 free (0
chunks); 768 used
pg_namespace_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632
used
pg_namespace_nspname_index: 1024 total in 1 blocks; 392 free (0 chunks);
632 used
pg_language_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
pg_language_name_index: 1024 total in 1 blocks; 392 free (0 chunks); 632
used
pg_inherits_relid_seqno_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_index_indexrelid_index: 1024 total in 1 blocks; 392 free (0 chunks);
632 used
pg_authid_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
pg_authid_rolname_index: 1024 total in 1 blocks; 392 free (0 chunks);
632 used
pg_database_datname_index: 1024 total in 1 blocks; 392 free (0 chunks);
632 used
pg_conversion_oid_index: 1024 total in 1 blocks; 392 free (0 chunks);
632 used
pg_conversion_name_nsp_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_conversion_default_index: 1024 total in 1 blocks; 192 free (0
chunks); 832 used
pg_class_relname_nsp_index: 1024 total in 1 blocks; 328 free (0 chunks);
696 used
pg_class_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
pg_cast_source_target_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_amproc_opc_proc_index: 1024 total in 1 blocks; 256 free (0 chunks);
768 used
pg_amop_opr_opc_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used
pg_amop_opc_strat_index: 1024 total in 1 blocks; 256 free (0 chunks);
768 used
pg_amop_opc_strat_index: 1024 total in 1 blocks; 256 free (0 chunks);
768 used
pg_aggregate_fnoid_index: 1024 total in 1 blocks; 392 free (0 chunks);
632 used
MdSmgr: 8192 total in 1 blocks; 8152 free (0 chunks); 40 used
LockTable (locallock hash): 8192 total in 1 blocks; 3912 free (0
chunks); 4280 used
Timezones: 47592 total in 2 blocks; 5968 free (0 chunks); 41624 used
ErrorContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used
ERROR:  out of memory
DETAIL:  Failed on request of size 20.
TopMemoryContext: 32768 total in 3 blocks; 7776 free (8 chunks); 24992 used

The standout problem is the "MessageContext" count.

Darren


Re: Is my database now too big?

From
Tom Lane
Date:
Darren Reed <darrenr@fastmail.net> writes:
> Tom Lane wrote:
>> Anyway, the above error should have also produced a map of per-context
>> memory usage in the postmaster log (ie, postmaster stderr).  If you
>> could show us that, it might be revealing.

> MessageContext: 267378688 total in 43 blocks; 1768 free (45 chunks);
> 267376920 used

> The standout problem is the "MessageContext" count.

Indeed.  And there shouldn't even be anything in MessageContext until
the first client command has been received.  Maybe you have something
in ~/.psqlrc that you haven't told us about?

            regards, tom lane

Re: Is my database now too big?

From
Darren Reed
Date:
Tom Lane wrote:
> Darren Reed <darrenr@fastmail.net> writes:
> > Tom Lane wrote:
> >> Anyway, the above error should have also produced a map of per-context
> >> memory usage in the postmaster log (ie, postmaster stderr).  If you
> >> could show us that, it might be revealing.
>
> > MessageContext: 267378688 total in 43 blocks; 1768 free (45 chunks);
> > 267376920 used
>
> > The standout problem is the "MessageContext" count.
>
> Indeed.  And there shouldn't even be anything in MessageContext until
> the first client command has been received.  Maybe you have something
> in ~/.psqlrc that you haven't told us about?
>

That's easy - I don't even have one of these files!

Darren


Re: Is my database now too big?

From
Tom Lane
Date:
Darren Reed <darrenr+postgres@fastmail.net> writes:
> Tom Lane wrote:
>> Indeed.  And there shouldn't even be anything in MessageContext until
>> the first client command has been received.  Maybe you have something
>> in ~/.psqlrc that you haven't told us about?

> That's easy - I don't even have one of these files!

Then the behavior you showed is impossible ;-)

There is *something* that is completely broken about your machine,
and the rest of us really don't have enough context to tell what.
You haven't told us anything about the hardware or operating system,
or how you built or obtained the Postgres executables.

I don't think you should dismiss the possibility of a hardware problem,
especially since the failures aren't 100% reproducible (AFAICT from your
previous remarks).  We've seen more than one case where Postgres
stressed a system more than anything else that was being run, and
thereby exposed hardware problems that didn't manifest otherwise.
For instance, a bit of bad RAM up near the end of physical memory might
not get used at all until Postgres starts eating up memory.

Another line of thought is that you built Postgres with a buggy compiler
and thereby got buggy executables.  Have you tried running the PG
regression tests?

            regards, tom lane

Re: Is my database now too big?

From
Darren Reed
Date:
Scott Marlowe wrote:
> ...
>
> Again, I'm kinda shooting in the dark here as you reveal more and more
> what you are doing a little at a time.  A test case that can invoke
> this failure would be most useful.
>
After seeing this today:
ERROR:  duplicate key violates unique constraint "ers_pkey"
ERROR:  duplicate key violates unique constraint "foo_pkey"
ERROR:  duplicate key violates unique constraint "foo_pkey"
ERROR:  duplicate key violates unique constraint "foo_pkey"
ERROR:  duplicate key violates unique constraint "foo_pkey"
ERROR:  duplicate key violates unique constraint "foo_pkey"
ERROR:  could not open segment 1 of relation 1663/10793/2659 (target
block 858862642): No such file or directory
ERROR:  could not open segment 1 of relation 1663/10793/2659 (target
block 858862642): No such file or directory
ERROR:  could not open segment 1 of relation 1663/10793/2659 (target
block 858862642): No such file or directory
ERROR:  could not open segment 1 of relation 1663/10793/2659 (target
block 858862642): No such file or directory
...

...there was little or no activity during this time, apart from
some inserts, maybe some selects, etc.  Nothing that should
have caused this kind of upset.

There is a file that matches this:
-rw-------  1 postgres  wheel  57344 Oct 14 22:57
/data/db/second/base/10793/2659
but it isn't in the directory where I moved most of the indexes to:
ls /data/index/ext/10793/
16390  16397  16399  16406  16407  16410  16414  16425  16434  16435

I don't know if the file numbers have any meaning?

But in addition, the list of tables (\dp) is now fubar'd.

I'm starting to wonder if it is a combination of:
- the operating system (NetBSD 4.99.20)
- the hardware (small HP box, not meant for hard work like this but
shouldn't be impossible for it)
- the way pkgsrc compiles postgresql for NetBSD

I'm shying away from the hardware (or at least RAM/CPU) because I'd
expect there to be some other kind kinds of faults show up, ultimately
leading to a panic due to just random corruption of some kernel data
structure.  As it is, everything else seems to be functioning ok.

Darren


Re: Is my database now too big?

From
"Scott Marlowe"
Date:
On 10/15/07, Darren Reed <darrenr+postgres@fastmail.net> wrote:
> Scott Marlowe wrote:
> > ...
> >
> > Again, I'm kinda shooting in the dark here as you reveal more and more
> > what you are doing a little at a time.  A test case that can invoke
> > this failure would be most useful.
> >
> After seeing this today:
> ERROR:  duplicate key violates unique constraint "ers_pkey"
> ERROR:  duplicate key violates unique constraint "foo_pkey"
> ERROR:  duplicate key violates unique constraint "foo_pkey"
> ERROR:  duplicate key violates unique constraint "foo_pkey"
> ERROR:  duplicate key violates unique constraint "foo_pkey"
> ERROR:  duplicate key violates unique constraint "foo_pkey"
> ERROR:  could not open segment 1 of relation 1663/10793/2659 (target
> block 858862642): No such file or directory
> ERROR:  could not open segment 1 of relation 1663/10793/2659 (target
> block 858862642): No such file or directory
> ERROR:  could not open segment 1 of relation 1663/10793/2659 (target
> block 858862642): No such file or directory
> ERROR:  could not open segment 1 of relation 1663/10793/2659 (target
> block 858862642): No such file or directory
> ...
>
> ...there was little or no activity during this time, apart from
> some inserts, maybe some selects, etc.  Nothing that should
> have caused this kind of upset.
>
> There is a file that matches this:
> -rw-------  1 postgres  wheel  57344 Oct 14 22:57
> /data/db/second/base/10793/2659
> but it isn't in the directory where I moved most of the indexes to:
> ls /data/index/ext/10793/
> 16390  16397  16399  16406  16407  16410  16414  16425  16434  16435

How, exactly, did you move those indexes?

Re: Is my database now too big?

From
Tom Lane
Date:
Darren Reed <darrenr+postgres@fastmail.net> writes:
> I'm starting to wonder if it is a combination of:
> - the operating system (NetBSD 4.99.20)

Um ... what was your motivation for choosing that?  According to
http://www.netbsd.org/releases/release-map.html
a .99 release number signifies "an alpha quality distribution. It isn't
even guaranteed to compile."  It looks like NetBSD 4 is currently
up to an RC2 release, which is probably not what you've got there ...
but even if you were running the RC2 code I'd question the sanity
of insisting on a back-rev Postgres release on top of bleeding edge
operating system.

            regards, tom lane

Re: Is my database now too big?

From
Darren Reed
Date:
Scott Marlowe wrote:
> On 10/15/07, Darren Reed <darrenr+postgres@fastmail.net> wrote:
> > Scott Marlowe wrote:
> > > ...
> > >
> > > Again, I'm kinda shooting in the dark here as you reveal more and more
> > > what you are doing a little at a time.  A test case that can invoke
> > > this failure would be most useful.
> > >
> > After seeing this today:
> > ERROR:  duplicate key violates unique constraint "ers_pkey"
> > ERROR:  duplicate key violates unique constraint "foo_pkey"
> > ERROR:  duplicate key violates unique constraint "foo_pkey"
> > ERROR:  duplicate key violates unique constraint "foo_pkey"
> > ERROR:  duplicate key violates unique constraint "foo_pkey"
> > ERROR:  duplicate key violates unique constraint "foo_pkey"
> > ERROR:  could not open segment 1 of relation 1663/10793/2659 (target
> > block 858862642): No such file or directory
> > ERROR:  could not open segment 1 of relation 1663/10793/2659 (target
> > block 858862642): No such file or directory
> > ERROR:  could not open segment 1 of relation 1663/10793/2659 (target
> > block 858862642): No such file or directory
> > ERROR:  could not open segment 1 of relation 1663/10793/2659 (target
> > block 858862642): No such file or directory
> > ...
> >
> > ...there was little or no activity during this time, apart from
> > some inserts, maybe some selects, etc.  Nothing that should
> > have caused this kind of upset.
> >
> > There is a file that matches this:
> > -rw-------  1 postgres  wheel  57344 Oct 14 22:57
> > /data/db/second/base/10793/2659
> > but it isn't in the directory where I moved most of the indexes to:
> > ls /data/index/ext/10793/
> > 16390  16397  16399  16406  16407  16410  16414  16425  16434  16435
>
> How, exactly, did you move those indexes?
>

With "ALTER TABLE".

Since then I recreated the database and after merging a chunk of data, I
see this:
(various errors about duplicate keys and values too wide for fields...)
ERROR:  duplicate key violates unique constraint "t_a_pkey"
LOG:  unexpected EOF on client connection
ERROR:  value too long for type character(12)
LOG:  unexpected EOF on client connection
LOG:  unexpected EOF on client connection
LOG:  could not send data to client: Broken pipe
LOG:  unexpected EOF on client connection
ERROR:  relation "t_a" does not exist

I've attached the commands I've used to create the schema.
It's called sanitised because I've culled all of the fields that
aren't used.  This script was used to create the database that
I then imported records into before seeing the above.

Darren

CREATE TABLESPACE ext LOCATION '/data/index/ext';

CREATE ROLE root;

CREATE LANGUAGE 'plpgsql' HANDLER plpgsql_call_handler LANCOMPILER 'PL/pgSQL';

CREATE TABLE t_a (
    int1 INTEGER NOT NULL PRIMARY KEY,
    str1 CHARACTER(20),
    bool1 boolean
);

CREATE VIEW a_v1 AS
SELECT DISTINCT(str1),bool1 FROM t_a WHERE bool1 IS TRUE ORDER BY str1;

CREATE VIEW a_v2 AS SELECT distinct(str1),count(*) FROM t_a GROUP BY t_a.str1;

CREATE TABLE t_b (
    int1 INTEGER NOT NULL,
    str2 CHARACTER VARYING(20) NOT NULL,
    bool1 BOOLEAN
);

CREATE TABLE t_c (
    str1 CHAR(20) NOT NULL,
    str2 VARCHAR(20) NOT NULL PRIMARY KEY
);

CREATE TABLE ifl (
    recno SERIAL PRIMARY KEY,
    int1 INTEGER NOT NULL
);

DROP FUNCTION add_str1tot_a();
CREATE OR REPLACE FUNCTION add_str1tot_a()
RETURNS TRIGGER AS $ptot_a$
DECLARE
    temp VARCHAR(20);
BEGIN
    SELECT p.str1 INTO temp FROM t_c p WHERE p.str2=NEW.str2;
    UPDATE t_a SET str1=temp WHERE str1 IS NULL AND int1=NEW.int1;
    RETURN NULL;
END;
$ptot_a$ LANGUAGE plpgsql;

DROP FUNCTION sett_astr1bool1();
CREATE OR REPLACE FUNCTION sett_astr1bool1()
RETURNS TRIGGER as $sepi$
DECLARE
    ig BOOLEAN;
BEGIN
    IF NEW.str1 IS NOT NULL AND NEW.bool1 IS NULL THEN
        SELECT b.bool1 INTO ig FROM a_v1 b WHERE b.str1=NEW.str1;
        IF ig IS NOT NULL THEN
            UPDATE t_a SET bool1=ig WHERE int1=NEW.int1;
        END IF;
    END IF;
    RETURN NULL;
END;
$sepi$ LANGUAGE plpgsql;

DROP FUNCTION sett_abool1();
CREATE OR REPLACE FUNCTION sett_abool1()
RETURNS TRIGGER as $sei$
DECLARE
    temp BOOLEAN;
    temp2 CHAR(20);
BEGIN
    SELECT b.bool1 INTO temp FROM badt_b b WHERE (b.str2=NEW.str2) AND
        (bool1 IS NOT NULL);
    IF temp IS NOT NULL THEN
        UPDATE t_b SET bool1=temp WHERE str2=NEW.str2;
    END IF;
    SELECT t.str1 INTO temp2 FROM t_a t WHERE t.int1=NEW.int1;
    IF temp2 IS NULL THEN
        SELECT u.str1 INTO temp2 FROM t_c u WHERE u.str2=NEW.str2;
        IF temp2 IS NOT NULL THEN
            IF temp IS NOT NULL THEN
                UPDATE t_a SET str1=temp2,bool1=temp
                    WHERE int1=NEW.int1;
            ELSE
                UPDATE t_a SET str1=temp2 WHERE int1=NEW.int1;
            END IF;
        ELSE
            IF temp IS NOT NULL THEN
                UPDATE t_a SET bool1=temp WHERE int1=NEW.int1;
            END IF;
        END IF;
    ELSE
        IF temp IS NOT NULL THEN
            UPDATE t_a SET bool1=temp WHERE int1=NEW.int1;
        END IF;
    END IF;
    RETURN NULL;
END;
$sei$ LANGUAGE plpgsql;

CREATE INDEX t_a_str1 ON t_a USING btree (str1) TABLESPACE ext;
CREATE INDEX str2_index ON t_b(str2);
CREATE INDEX t_b_int1_index ON t_b(int1);
CREATE INDEX t_c_str1_idx ON t_c(str1) TABLESPACE ext;
CREATE INDEX t_c_str2_idx ON t_c(str2) TABLESPACE ext;
ALTER INDEX t_c_pkey SET TABLESPACE ext;
ALTER INDEX ifl_pkey SET TABLESPACE ext;


Re: Is my database now too big?

From
Darren Reed
Date:
For better or worse, it seems to be behaving itself again for a while...

There was, however, one change to my procedure and that was to
drop the triggers/functions when restoring the data (using copy into).

Cheers,
Darren


Re: Is my database now too big?

From
Darren Reed
Date:
Darren Reed wrote:
> For better or worse, it seems to be behaving itself again for a while...
>
> There was, however, one change to my procedure and that was to
> drop the triggers/functions when restoring the data (using copy into).

Perhaps I spoke too soon...

# /usr/pkg/bin/pg_dumpall -O -U postgres -a > /data/dumpall
pg_dump: [archiver (db)] connection to database "template1" failed:
FATAL:  out of memory
DETAIL:  Failed on request of size 20.
pg_dumpall: pg_dump failed on database "template1", exiting

...for better or worse, I'm interfacing to postgresql using perl's DBI
interface.
Are there likely to be any issues from that?  (Although I can't see how
a client
should be able to cause the server to become corrupt unless there's a bug in
the server.)

Darren


Re: Is my database now too big?

From
"Scott Marlowe"
Date:
On 10/17/07, Darren Reed <darrenr@fastmail.net> wrote:
> Darren Reed wrote:
> > For better or worse, it seems to be behaving itself again for a while...
> >
> > There was, however, one change to my procedure and that was to
> > drop the triggers/functions when restoring the data (using copy into).
>
> Perhaps I spoke too soon...
>
> # /usr/pkg/bin/pg_dumpall -O -U postgres -a > /data/dumpall
> pg_dump: [archiver (db)] connection to database "template1" failed:
> FATAL:  out of memory
> DETAIL:  Failed on request of size 20.
> pg_dumpall: pg_dump failed on database "template1", exiting
>
> ...for better or worse, I'm interfacing to postgresql using perl's DBI
> interface.
> Are there likely to be any issues from that?  (Although I can't see how
> a client
> should be able to cause the server to become corrupt unless there's a bug in
> the server.)

Perl and DBI are fine.  You really need to stop using an alpha release
of an OS though.  There's no way to know if the problems you're
experiencing are coming from that or something else.

Once you're on a stable OS release, start testing again and let us
know if anything breaks.

But like Tom likes to say, if postgresql breaks running on an alpha OS
release, you get to keep both pieces.