Thread: An out of memory error when doing a vacuum full

An out of memory error when doing a vacuum full

From
Sean Shanny
Date:
To all,

The facts:

PostgreSQL 7.4.0 running on BSD 5.1 on Dell 2650 with 4GB RAM, 5 SCSI
drives in hardware RAID 0 configuration.  Database size with indexes is
currently 122GB.  DB size before we completed the vacuum full was 150GB.

We have recently done a major update to a table, f_pageviews, in our
data warehouse. The f_pageviews table contains about 118 million rows.
Schema is at the end of this message.  We probably touched 80% of those
rows with the update.  We then commenced to drop all indexes on said
table, except the primary key, and attempted to do a vacuum full on the
entire DB.  You can see the output below it failed.  We then tried to do
the vacuum full on the f_pageviews table alone and the same error
occurred.  I did vacuum full on other tables in the schema, one of them
about 8 times larger but with very few dead tuples, and all complete
successfully. We ended up dumping the table and reloading it to
eliminate the dead tuples.  After the reload we did the vacuum full with
no problems.

Does anyone have an explanation as to why this might occur?

Thanks.

--sean


nohup /usr/local/pgsql/bin/vacuumdb -d tripmaster -U tripmaster -f -z -v
-t f_pageviews  > & /tmp/vacuum2.log &

tail -f /tmp/vacuum2.log
INFO:  vacuuming "public.f_pageviews"
INFO:  "f_pageviews": found 17736235 removable, 111796026 nonremovable
row versions in 1552349 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 156 to 244 bytes long.
There were 134565418 unused item pointers.
Total free space (including removable row versions) is 27898448080 bytes.
583420 pages are or will become empty, including 0 at the end of the table.
1088195 pages containing 27860101432 free bytes are potential move
destinations.
CPU 238.91s/27.44u sec elapsed 1261.80 sec.
INFO:  index "f_pageviews_pkey" now contains 111796026 row versions in
210003 pages
DETAIL:  15618120 index row versions were removed.
734 index pages have been deleted, 734 are currently reusable.
CPU 96.09s/139.03u sec elapsed 1569.93 sec.
vacuumdb: vacuuming of table "f_pageviews" in database "tripmaster"
failed: ERROR:  out of memory
DETAIL:  Failed on request of size 350.


 \d f_pageviews
                                   Table "public.f_pageviews"
         Column         |  Type   |                          Modifiers
------------------------+---------+-------------------------------------------------------------
 id                     | integer | not null default
nextval('public.f_pageviews_id_seq'::text)
 date_key               | integer | not null
 time_key               | integer | not null
 content_key            | integer | not null
 location_key           | integer | not null
 session_key            | integer | not null
 subscriber_key         | text    | not null
 persistent_cookie_key  | integer | not null
 ip_key                 | integer | not null
 referral_key           | integer | not null
 servlet_key            | integer | not null
 tracking_key           | integer | not null
 provider_key           | text    | not null
 marketing_campaign_key | integer | not null
 orig_airport           | text    | not null
 dest_airport           | text    | not null
 commerce_page          | boolean | not null default false
 job_control_number     | integer | not null
 sequenceid             | integer | not null default 0
 url_key                | integer | not null
 useragent_key          | integer | not null
 web_server_name        | text    | not null default 'Not Available'::text
 cpc                    | integer | not null default 0
 referring_servlet_key  | integer | default 1
 first_page_key         | integer | default 1
 newsletterid_key       | text    | not null default 'Not Available'::text
Indexes:
    "f_pageviews_pkey" primary key, btree (id)
    "idx_page_views_content" btree (content_key)
    "idx_pageviews_date_dec_2003" btree (date_key) WHERE ((date_key >=
335) AND (date_key <= 365))
    "idx_pageviews_date_nov_2003" btree (date_key) WHERE ((date_key >=
304) AND (date_key <= 334))
    "idx_pageviews_referring_servlet" btree (referring_servlet_key)
    "idx_pageviews_servlet" btree (servlet_key)
    "idx_pageviews_session" btree (session_key)





Re: An out of memory error when doing a vacuum full

From
Fernando Schapachnik
Date:
Is your system using full RAM? Ie, what does limits -a show?


Regards.

Fernando.

En un mensaje anterior, Sean Shanny escribió:
> To all,
>
> The facts:
>
> PostgreSQL 7.4.0 running on BSD 5.1 on Dell 2650 with 4GB RAM, 5 SCSI
> drives in hardware RAID 0 configuration.  Database size with indexes is
> currently 122GB.  DB size before we completed the vacuum full was 150GB.

Re: An out of memory error when doing a vacuum full

From
Fernando Schapachnik
Date:
Take a look at datasize: your processes are allowed a maximum of 512 Mb RAM.
Read the handbook to find out how to reconfigure your kernel and the limits
(and/or ulimit) man page to tweak the values for individual processes.

Good luck!

Fernando.

En un mensaje anterior, Sean Shanny escribió:
> limits -a
> Resource limits (current):
>  cputime          infinity secs
>  filesize         infinity kb
>  datasize           524288 kb
>  stacksize           65536 kb
>  coredumpsize     infinity kb
>  memoryuse        infinity kb
>  memorylocked     infinity kb
>  maxprocesses         5547
>  openfiles           11095
>  sbsize           infinity bytes
>  vmemoryuse       infinity kb

Re: An out of memory error when doing a vacuum full

From
Tom Lane
Date:
Sean Shanny <shannyconsulting@earthlink.net> writes:
> Does anyone have an explanation as to why this might occur?

What have you got vacuum_mem set to?  Also, what ulimit settings is the
postmaster running under?  (I'm wondering exactly how large the backend
process has grown when it gets the failure.)

            regards, tom lane

Re: An out of memory error when doing a vacuum full

From
Tom Lane
Date:
Fernando Schapachnik <fernando@mecon.gov.ar> writes:
> Take a look at datasize: your processes are allowed a maximum of 512 Mb RAM.
> Read the handbook to find out how to reconfigure your kernel and the limits
> (and/or ulimit) man page to tweak the values for individual processes.

512Mb does not seem like an unreasonable per-process limit (indeed, it
may be too high considering he's got just 4G RAM to go around among all
the processes).  So I'd not recommend Sean try to solve the problem by
raising it.  The real question is why is the backend getting that big.
I'm suspicious that sort_mem and/or vacuum_mem are set to
unrealistically large values, but we don't have that info yet.

            regards, tom lane

Re: An out of memory error when doing a vacuum full

From
Sean Shanny
Date:
 From postgresql.conf:

# - Memory -

shared_buffers = 10000          # min 16, at least max_connections*2,
8KB each
sort_mem = 64000                # min 64, size in KB
vacuum_mem = 32767              # min 1024, size in KB

The ulimit is set to unlimited as far as I can tell.

--sean

Tom Lane wrote:

>Sean Shanny <shannyconsulting@earthlink.net> writes:
>
>
>>Does anyone have an explanation as to why this might occur?
>>
>>
>
>What have you got vacuum_mem set to?  Also, what ulimit settings is the
>postmaster running under?  (I'm wondering exactly how large the backend
>process has grown when it gets the failure.)
>
>            regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>
>
>


Re: An out of memory error when doing a vacuum full

From
Tom Lane
Date:
Sean Shanny <shannyconsulting@earthlink.net> writes:
> sort_mem = 64000                # min 64, size in KB

You might want to lower that; a complex query could easily use several
times sort_mem.  Whether this is the immediate source of your problem
with the other query is hard to tell.

> vacuum_mem = 32767              # min 1024, size in KB

That seems all right, but I recollect now that it only applies to plain
VACUUM not VACUUM FULL.  VACUUM FULL needs to keep track of *all* the
free space in a table, and so it's certainly possible that vacuuming a
huge table with many dead tuples could require lots of memory.  I can't
recall anyone else ever complaining about VACUUM FULL running out of
memory, though, so there may be some other contributing factor in your
situation.  Too bad you reloaded the table --- it would be interesting
to see if increasing your 512Mb datasize ulimit would have allowed the
VACUUM FULL to complete.  (Not but what it would've taken forever :-()

            regards, tom lane

Re: An out of memory error when doing a vacuum full

From
Sean Shanny
Date:
Tom,

I will lower the sort_mem and see what happens. :-)

I agree that we probably pushed the limits of a vacuum full with the
size table we had and the large percentage of change in the table.  We
did NOT run any vacuum jobs during the update process, that my have
helped to allow the updates to use the reclaimed space in place.

We are going to continue to run into this issue as the table is only
going to get larger and we are still making tweaks.  An example would be
adding a new column to the f_pageviews table, we have done this several
times as we discover new data that needs to be captured.  This of course
entails an ALTER TABLE ADD COLUMN, then an ALTER TABLE ALTER COLUMN to
set the default, then an update to set the 'historical' rows to the
default value, and finally a SET NOT NULL on the column as none of our
data is allowed to have a value of  null. This pretty much entails
updating all the rows in the table.

We are thinking instead of doing an UPDATE it would be better to make a
new temp table, run the code that contains the update logic but instead
of updating the real table write the updated and non updated rows to the
temp table , and then do a drop, rename?

Thanks again for your time.

--sean

p.s.

By the way I am unable to send mail to you directly.  I get these
errors: (I assume you just don't want email from earthlink?)

 tgl@sss.pgh.pa.us
    SMTP error from remote mailer after MAIL FROM:<shannyconsulting@earthlink.net>:
    host sss.pgh.pa.us [192.204.191.242]: 550 5.0.0 If you would like to talk to me, find a more responsible ISP than
earthlink




Tom Lane wrote:

>Sean Shanny <shannyconsulting@earthlink.net> writes:
>
>
>>sort_mem = 64000                # min 64, size in KB
>>
>>
>
>You might want to lower that; a complex query could easily use several
>times sort_mem.  Whether this is the immediate source of your problem
>with the other query is hard to tell.
>
>
>
>>vacuum_mem = 32767              # min 1024, size in KB
>>
>>
>
>That seems all right, but I recollect now that it only applies to plain
>VACUUM not VACUUM FULL.  VACUUM FULL needs to keep track of *all* the
>free space in a table, and so it's certainly possible that vacuuming a
>huge table with many dead tuples could require lots of memory.  I can't
>recall anyone else ever complaining about VACUUM FULL running out of
>memory, though, so there may be some other contributing factor in your
>situation.  Too bad you reloaded the table --- it would be interesting
>to see if increasing your 512Mb datasize ulimit would have allowed the
>VACUUM FULL to complete.  (Not but what it would've taken forever :-()
>
>            regards, tom lane
>
>
>


Re: An out of memory error when doing a vacuum full

From
Tom Lane
Date:
Sean Shanny <shannyconsulting@earthlink.net> writes:
> By the way I am unable to send mail to you directly.  I get these
> errors: (I assume you just don't want email from earthlink?)
>  tgl@sss.pgh.pa.us
>     SMTP error from remote mailer after MAIL FROM:<shannyconsulting@earthlink.net>:
>     host sss.pgh.pa.us [192.204.191.242]: 550 5.0.0 If you would like to talk to me, find a more responsible ISP than
earthlink

Sorry about that --- I blocked earthlink in disgust about six months
ago, when I noticed that my virus-rejection script had been faithfully
bouncing virus mails from the *same* earthlink account to abuse at
abuse.earthlink.net every few days for the preceding nine months, and
their abuse people quite obviously had taken zero action to get their
customer disinfected.  Assuming they have any abuse people, that is, and
abuse@ isn't just an alias for /dev/null.  I'm not planning to unblock
until I see some evidence that they have a functioning abuse desk.

            regards, tom lane