[BUGS] BUG #14654: With high statistics targets on ts_vector,unexpectedly high memory use & OOM are triggered - Mailing list pgsql-bugs

From james+postgres@carbocation.com
Subject [BUGS] BUG #14654: With high statistics targets on ts_vector,unexpectedly high memory use & OOM are triggered
Date
Msg-id 20170514200602.1451.46797@wrigleys.postgresql.org
Whole thread Raw
Responses Re: [BUGS] BUG #14654: With high statistics targets on ts_vector,unexpectedly high memory use & OOM are triggered  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      14654
Logged by:          James C
Email address:      james+postgres@carbocation.com
PostgreSQL version: 9.5.5
Operating system:   Linux
Description:

I am running a mirror of PubMed, which contains about 27 million articles.
One of my tables (article) includes a unique ID for each article, the
article title, a ts_vector of the abstract, some timestamps, and a numeric
column. There are (unsurprisingly) about 27 million rows in this table.
Because the abstracts contain scientific terms, the number of lexemes is
quite large.

With some prototyping on a local machine with 64G of RAM, I found queries to
work quite well with the the ts_vector column having default statistics of
10,000. The production box has about 12G of RAM, but initially I didn't
notice any issues.

Today, I attempted to execute 'ANALYZE article;' However, the OOM killer
kept taking down postgres. I reviewed numerous system settings and postgres
settings, all of which seemed to properly give about 10G+ of RAM to
postgres, and which should have constrained postgres to using less than that
limit.

RhodiumToad in Freenode#postgres helped me troubleshoot, and suggested I try
reducing the statistics on the ts_vector. Dropping from 10,000 to 1,000
permitted ANALYZE to run without consuming large amounts of RAM.

It seems that ANALYZE on a ts_vector column can consume 300 * (statistics
target) * (size of data in field), which in my case ended up being well
above 10 gigabytes. I wonder if this might be considered a bug (either in
code, or of documentation), as this memory usage seems not to obey other
limits, or at least wasn't documented in a way that might have helped me
guess at the underlying problem.


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

pgsql-bugs by date:

Previous
From: Marko Elezovic
Date:
Subject: [BUGS] Commenting a FK crashes ALTER TABLE statements
Next
From: Michael Paquier
Date:
Subject: Re: [BUGS] BUG #14650: pg_dump -c fails when 'public' schema doesn't exist