Thread: Thousands of schemas and ANALYZE goes out of memory

Thousands of schemas and ANALYZE goes out of memory

From
"Hugo "
Date:
Hi everyone,

We have two postgresql 9.0 databases (32-bits) with more than 10,000
schemas. When we try to run ANALYZE in those databases we get errors like
this (after a few hours):

2012-09-14 01:46:24 PDT ERROR:  out of memory
2012-09-14 01:46:24 PDT DETAIL:  Failed on request of size 421.
2012-09-14 01:46:24 PDT STATEMENT:  analyze;

(Note that we do have plenty of memory available for postgresql:
shared_buffers=2048MB, work_mem=128MB, maintenance_work_mem=384MB,
effective_cache_size = 3072MB, etc.)

We have other similar databases with less than 10,000 schemas and ANALYZE
works fine with them (they run on similar machines and configs). For now, we
had to create shell scripts to run ANALYZE per schema, table by table. It
works that way, so at least we have an alternative solution. But what
exactly causes the out of memory? Is postgresql trying to run everything in
a single transaction? Maybe this should be improved for the future releases.
Please let me know what you guys think.

Thanks in advance,
Hugo



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Thousands-of-schemas-and-ANALYZE-goes-out-of-memory-tp5726198.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Thousands of schemas and ANALYZE goes out of memory

From
Tom Lane
Date:
"Hugo <Nabble>" <hugo.tech@gmail.com> writes:
> We have two postgresql 9.0 databases (32-bits) with more than 10,000
> schemas. When we try to run ANALYZE in those databases we get errors like
> this (after a few hours):

> 2012-09-14 01:46:24 PDT ERROR:  out of memory
> 2012-09-14 01:46:24 PDT DETAIL:  Failed on request of size 421.
> 2012-09-14 01:46:24 PDT STATEMENT:  analyze;

I doubt that the number of *schemas* is a big deal here, but the number
of *tables* might well be.  How many?  Also, 9.0.what?

            regards, tom lane


Re: Thousands of schemas and ANALYZE goes out of memory

From
"Hugo "
Date:
> I doubt that the number of *schemas* is a big deal here, but the number
> of *tables* might well be.  How many?  Also, 9.0.what?

Each schema has 22 tables, so we can count at least 22 x 10,000 = 220,000
tables.
The postgresql version is 9.0.7-1.

Regards,
Hugo




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Thousands-of-schemas-and-ANALYZE-goes-out-of-memory-tp5726198p5726212.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Thousands of schemas and ANALYZE goes out of memory

From
Jeff Janes
Date:
On Mon, Oct 1, 2012 at 12:52 PM, Hugo <Nabble> <hugo.tech@gmail.com> wrote:
> Hi everyone,
>
> We have two postgresql 9.0 databases (32-bits)

Why 32 bits?  Is that what your hardware is?

> with more than 10,000
> schemas. When we try to run ANALYZE in those databases we get errors like
> this (after a few hours):
>
> 2012-09-14 01:46:24 PDT ERROR:  out of memory
> 2012-09-14 01:46:24 PDT DETAIL:  Failed on request of size 421.
> 2012-09-14 01:46:24 PDT STATEMENT:  analyze;
>
> (Note that we do have plenty of memory available for postgresql:
> shared_buffers=2048MB, work_mem=128MB, maintenance_work_mem=384MB,
> effective_cache_size = 3072MB, etc.)

That might be the problem.  I think with 32 bits, you only 2GB of
address space available to any given process, and you just allowed
shared_buffers to grab all of it.

Cheers,

Jeff


Re: Thousands of schemas and ANALYZE goes out of memory

From
"Hugo "
Date:
> Why 32 bits?  Is that what your hardware is?

The business started in 2005 and we have been using 32 bits since then. We
have several machines, each with a remote replica databases (WAL shipping)
configured and changing this to 64 bits is going to be a lot of work, let
alone the down time of each server (pg_dump + pg_restore). But we will
probably do this in the future after we finish some priorities.

> That might be the problem.  I think with 32 bits, you only 2GB of
> address space available to any given process, and you just allowed
> shared_buffers to grab all of it.

The address space for 32 bits is 4Gb. We just tried to reach a balance in
the configuration and it seems to be working (except for the ANALYZE command
when the number of schemas/tables is huge).

Some questions I have:

1) Is there any reason to run the ANALYZE command in a single transaction?
2) Is there any difference running the ANALYZE in the whole database or
running it per schema, table by table?

Thanks for all the help,
Hugo



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Thousands-of-schemas-and-ANALYZE-goes-out-of-memory-tp5726198p5726351.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Thousands of schemas and ANALYZE goes out of memory

From
Martijn van Oosterhout
Date:
On Tue, Oct 02, 2012 at 10:38:38AM -0700, Hugo <Nabble> wrote:
> > That might be the problem.  I think with 32 bits, you only 2GB of
> > address space available to any given process, and you just allowed
> > shared_buffers to grab all of it.
>
> The address space for 32 bits is 4Gb. We just tried to reach a balance in
> the configuration and it seems to be working (except for the ANALYZE command
> when the number of schemas/tables is huge).

Are you sure about that? You don't say what OS you are using but on
Linux 3Gb is normal and on Windows 2Gb.

Here are some nice diagrams:

http://duartes.org/gustavo/blog/post/anatomy-of-a-program-in-memory

In my experience it's better to keep the shared buffers around your
working set size and let the kernel cache the rest as needed. Setting
the shared_buffers to 1Gb will give your server much more breathing
space for large operations like what you are asking.

Note that unlike the way some other database servers work, the
shared_buffers is the *minimum* postgres will use, not the maximum.

> Some questions I have:
>
> 1) Is there any reason to run the ANALYZE command in a single transaction?
> 2) Is there any difference running the ANALYZE in the whole database or
> running it per schema, table by table?

I don't think it does do everything in a single transaction, though I
can imagine that if you try to analyse the whole database it uses up
more memory to track the work it has to do. With 220,000 tables I
imagine this could add up.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer

Attachment

Re: Thousands of schemas and ANALYZE goes out of memory

From
Jeff Janes
Date:
On Tue, Oct 2, 2012 at 10:38 AM, Hugo <Nabble> <hugo.tech@gmail.com> wrote:
>> That might be the problem.  I think with 32 bits, you only 2GB of
>> address space available to any given process, and you just allowed
>> shared_buffers to grab all of it.
>
> The address space for 32 bits is 4Gb.

I had thought the highest bit was not usable, but maybe that was just
a Windows thing.

> We just tried to reach a balance in
> the configuration and it seems to be working (except for the ANALYZE command
> when the number of schemas/tables is huge).
>
> Some questions I have:
>
> 1) Is there any reason to run the ANALYZE command in a single transaction?

I don't know how the transactionality of analyze works.  I was
surprised to find that I even could run it in an explicit transaction
block, I thought it would behave like vacuum and create index
concurrently in that regard.

However, I think that that would not solve your problem.  When I run
analyze on each of 220,000 tiny tables by name within one session
(using autocommit, so each in a transaction), it does run about 4
times faster than just doing a database-wide vacuum which covers those
same tables.  (Maybe this is the lock/resource manager issue that has
been fixed for 9.3?)  But it takes the same amount of memory.  It is
only by closing the connection periodically that I can reduce the peak
memory usage.  So I think the memory is going to syscache, catcache,
and/or stats collector, which I think are non-transactional, live for
the duration of the backend, and have no way to evict least recently
used members once the caches get too large.  Also, some parts of them
seem to have N^2 performance, albeit with a very low constant.

> 2) Is there any difference running the ANALYZE in the whole database or
> running it per schema, table by table?

I can't think of any important ones, unless there are some things you
forget to analyze that way.  Does auto analyze have the same problem
as a manual analyze does?  Probably not, unless your tables become
eligible simultaneously.

Cheers,

Jeff


Re: Thousands of schemas and ANALYZE goes out of memory

From
Chris Angelico
Date:
On Wed, Oct 3, 2012 at 10:09 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Tue, Oct 2, 2012 at 10:38 AM, Hugo <Nabble> <hugo.tech@gmail.com> wrote:
>>> That might be the problem.  I think with 32 bits, you only 2GB of
>>> address space available to any given process, and you just allowed
>>> shared_buffers to grab all of it.
>>
>> The address space for 32 bits is 4Gb.
>
> I had thought the highest bit was not usable, but maybe that was just
> a Windows thing.

Losing the highest bit is common when something treats a number as
signed, but the <4GB memory issue isn't that. It's because other areas
of memory need to be addressable (devices claim certain areas), and
all that has to be factored into the total 4GB addressable space.
Usually you get >3GB, somewhere around the 3.5GB mark.

ChrisA


Re: Thousands of schemas and ANALYZE goes out of memory

From
Jeff Janes
Date:
On Tue, Oct 2, 2012 at 5:09 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

> I don't know how the transactionality of analyze works.  I was
> surprised to find that I even could run it in an explicit transaction
> block, I thought it would behave like vacuum and create index
> concurrently in that regard.
>
> However, I think that that would not solve your problem.  When I run
> analyze on each of 220,000 tiny tables by name within one session
> (using autocommit, so each in a transaction), it does run about 4
> times faster than just doing a database-wide vacuum which covers those
> same tables.  (Maybe this is the lock/resource manager issue that has
> been fixed for 9.3?)

For the record, the culprit that causes "analyze;" of a database with
a large number of small objects to be quadratic in time is
"get_tabstat_entry" and it is not fixed for 9.3.

Cheers,

Jeff


Re: Thousands of schemas and ANALYZE goes out of memory

From
"Hugo "
Date:
We use Debian linux 32-bits, so the addressable space available really seems
to be in the 3.0Gb to 3.5Gb range. Last night I decreased the shared_buffers
from 2Gb to 1Gb and tried the global ANALYZE again. It went out of memory
after 3 hours 40 minutes. That database has 12,197 schemas with 22 tables
each, which means 268,334 tables. Should I keep reducing the shared_buffers
and trying again? We don't plan to run the ANALYZE every week, so we can
keep the shared_buffers high most of the time and tweak it when needed. I
will try again tonight with ~512Mb and see what happens. Please let me know
if you have ideas.

Thanks again!
Hugo



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Thousands-of-schemas-and-ANALYZE-goes-out-of-memory-tp5726198p5726657.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Thousands of schemas and ANALYZE goes out of memory

From
Tom Lane
Date:
"Hugo <Nabble>" <hugo.tech@gmail.com> writes:
> We use Debian linux 32-bits, so the addressable space available really seems
> to be in the 3.0Gb to 3.5Gb range. Last night I decreased the shared_buffers
> from 2Gb to 1Gb and tried the global ANALYZE again. It went out of memory
> after 3 hours 40 minutes. That database has 12,197 schemas with 22 tables
> each, which means 268,334 tables.

It sure sounds to me like you're using the schema for what ought to be a
high-order key column in a much smaller number of tables ... having that
many tables is simply never going to perform well.  We can play
whack-a-mole on individual bottlenecks all day, but that's never going
to fix the fundamental problem.  Some of the bottlenecks will be at
kernel or filesystem level and thus out of our reach, anyway.

            regards, tom lane


Re: Thousands of schemas and ANALYZE goes out of memory

From
Jeff Janes
Date:
On Thu, Oct 4, 2012 at 10:54 AM, Hugo <Nabble> <hugo.tech@gmail.com> wrote:
> We use Debian linux 32-bits, so the addressable space available really seems
> to be in the 3.0Gb to 3.5Gb range. Last night I decreased the shared_buffers
> from 2Gb to 1Gb and tried the global ANALYZE again. It went out of memory
> after 3 hours 40 minutes. That database has 12,197 schemas with 22 tables
> each, which means 268,334 tables. Should I keep reducing the shared_buffers
> and trying again?  We don't plan to run the ANALYZE every week, so we can
> keep the shared_buffers high most of the time and tweak it when needed. I
> will try again tonight with ~512Mb and see what happens. Please let me know
> if you have ideas.


Your analyzes must be taking more memory than mine did, probably your
tables had indexes and my dummy ones did not.  But you said you had
shell scripts to analyze table by table, and that seems like the
solution (until you can upgrade to 64-bit) as it is both faster and
more memory efficient to do it that way.


Cheers,

Jeff


Re: Thousands of schemas and ANALYZE goes out of memory

From
Tom Lane
Date:
Jeff Janes <jeff.janes@gmail.com> writes:
> For the record, the culprit that causes "analyze;" of a database with
> a large number of small objects to be quadratic in time is
> "get_tabstat_entry" and it is not fixed for 9.3.

I was a bit surprised by this assertion, as I'd thought that tabstats
were flushed to the collector at transaction end, and thus that the
internal transaction boundaries in a VACUUM or ANALYZE should prevent
the tabstats table from getting unreasonably large.  However, a look
at the code shows that pgstat_report_stat() is only called when the main
loop in postgres.c is about to wait for client input.

We could build a lot of infrastructure to try to index the tabstat
arrays more efficiently ... or we could just do something like the
attached.

It appears that the next tallest mole in the VACUUM case is
CleanupTempFiles.  This workload is not creating any temp files, I hope,
so the implication is that have_pending_fd_cleanup is getting set by
FileSetTransient (probably from blind writes).  We might want to revisit
how that works --- particularly since I see no reason that there would
be any actually-blind writes in this example.  But in any case, that
innocent looking flag setting can result in a lot of work.

            regards, tom lane

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 14d1c08..a5d00fc 100644
*** a/src/backend/commands/vacuum.c
--- b/src/backend/commands/vacuum.c
*************** vacuum(VacuumStmt *vacstmt, Oid relid, b
*** 251,256 ****
--- 251,257 ----
                  {
                      PopActiveSnapshot();
                      CommitTransactionCommand();
+                     pgstat_report_stat(false);
                  }
              }
          }
*************** vacuum_rel(Oid relid, VacuumStmt *vacstm
*** 1071,1080 ****
          relation_close(onerel, NoLock);

      /*
!      * Complete the transaction and free all temporary memory used.
       */
      PopActiveSnapshot();
      CommitTransactionCommand();

      /*
       * If the relation has a secondary toast rel, vacuum that too while we
--- 1072,1083 ----
          relation_close(onerel, NoLock);

      /*
!      * Complete the transaction and free all temporary memory used.  Also,
!      * flush table-access statistics to the stats collector after each table.
       */
      PopActiveSnapshot();
      CommitTransactionCommand();
+     pgstat_report_stat(false);

      /*
       * If the relation has a secondary toast rel, vacuum that too while we

Re: Thousands of schemas and ANALYZE goes out of memory

From
Bruce Momjian
Date:
Is there a reason the following patch wasn't applied?

---------------------------------------------------------------------------

On Thu, Oct  4, 2012 at 07:14:31PM -0400, Tom Lane wrote:
> Jeff Janes <jeff.janes@gmail.com> writes:
> > For the record, the culprit that causes "analyze;" of a database with
> > a large number of small objects to be quadratic in time is
> > "get_tabstat_entry" and it is not fixed for 9.3.
>
> I was a bit surprised by this assertion, as I'd thought that tabstats
> were flushed to the collector at transaction end, and thus that the
> internal transaction boundaries in a VACUUM or ANALYZE should prevent
> the tabstats table from getting unreasonably large.  However, a look
> at the code shows that pgstat_report_stat() is only called when the main
> loop in postgres.c is about to wait for client input.
>
> We could build a lot of infrastructure to try to index the tabstat
> arrays more efficiently ... or we could just do something like the
> attached.
>
> It appears that the next tallest mole in the VACUUM case is
> CleanupTempFiles.  This workload is not creating any temp files, I hope,
> so the implication is that have_pending_fd_cleanup is getting set by
> FileSetTransient (probably from blind writes).  We might want to revisit
> how that works --- particularly since I see no reason that there would
> be any actually-blind writes in this example.  But in any case, that
> innocent looking flag setting can result in a lot of work.
>
>             regards, tom lane
>

> diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
> index 14d1c08..a5d00fc 100644
> *** a/src/backend/commands/vacuum.c
> --- b/src/backend/commands/vacuum.c
> *************** vacuum(VacuumStmt *vacstmt, Oid relid, b
> *** 251,256 ****
> --- 251,257 ----
>                   {
>                       PopActiveSnapshot();
>                       CommitTransactionCommand();
> +                     pgstat_report_stat(false);
>                   }
>               }
>           }
> *************** vacuum_rel(Oid relid, VacuumStmt *vacstm
> *** 1071,1080 ****
>           relation_close(onerel, NoLock);
>
>       /*
> !      * Complete the transaction and free all temporary memory used.
>        */
>       PopActiveSnapshot();
>       CommitTransactionCommand();
>
>       /*
>        * If the relation has a secondary toast rel, vacuum that too while we
> --- 1072,1083 ----
>           relation_close(onerel, NoLock);
>
>       /*
> !      * Complete the transaction and free all temporary memory used.  Also,
> !      * flush table-access statistics to the stats collector after each table.
>        */
>       PopActiveSnapshot();
>       CommitTransactionCommand();
> +     pgstat_report_stat(false);
>
>       /*
>        * If the relation has a secondary toast rel, vacuum that too while we

>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +


Re: Thousands of schemas and ANALYZE goes out of memory

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Is there a reason the following patch wasn't applied?

I don't think anybody ever did the legwork to verify it was a good idea.
In particular, it'd be good to check if sending a tabstat message for each
table adds noticeable overhead.

            regards, tom lane


Re: Thousands of schemas and ANALYZE goes out of memory

From
Bruce Momjian
Date:
On Wed, May 13, 2015 at 06:10:26PM -0400, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Is there a reason the following patch wasn't applied?
>
> I don't think anybody ever did the legwork to verify it was a good idea.
> In particular, it'd be good to check if sending a tabstat message for each
> table adds noticeable overhead.

OK, I will mark it as closed then.  Thanks.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +


Re: Thousands of schemas and ANALYZE goes out of memory

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> On Wed, May 13, 2015 at 06:10:26PM -0400, Tom Lane wrote:
>> Bruce Momjian <bruce@momjian.us> writes:
>>> Is there a reason the following patch wasn't applied?

>> I don't think anybody ever did the legwork to verify it was a good idea.
>> In particular, it'd be good to check if sending a tabstat message for each
>> table adds noticeable overhead.

> OK, I will mark it as closed then.  Thanks.

I don't know that it should be closed exactly --- if we don't do this,
we should do something else about the performance issue.  Maybe put it
on TODO?

            regards, tom lane


Re: Thousands of schemas and ANALYZE goes out of memory

From
Bruce Momjian
Date:
On Wed, May 13, 2015 at 06:23:58PM -0400, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > On Wed, May 13, 2015 at 06:10:26PM -0400, Tom Lane wrote:
> >> Bruce Momjian <bruce@momjian.us> writes:
> >>> Is there a reason the following patch wasn't applied?
>
> >> I don't think anybody ever did the legwork to verify it was a good idea.
> >> In particular, it'd be good to check if sending a tabstat message for each
> >> table adds noticeable overhead.
>
> > OK, I will mark it as closed then.  Thanks.
>
> I don't know that it should be closed exactly --- if we don't do this,
> we should do something else about the performance issue.  Maybe put it
> on TODO?

Thanks, TODO added:

    Reduce memory use when analyzing many tables

(This is part of my clean up of old issues in preparation for 9.5
feature freeze.)

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +


Re: Thousands of schemas and ANALYZE goes out of memory

From
Jeff Janes
Date:
On Wed, May 13, 2015 at 3:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Bruce Momjian <bruce@momjian.us> writes:
> Is there a reason the following patch wasn't applied?

I don't think anybody ever did the legwork to verify it was a good idea.
In particular, it'd be good to check if sending a tabstat message for each
table adds noticeable overhead.

                        regards, tom lane

Sorry, I thought you had already committed this change.

It definitely fixes the worst CPU bottleneck.  To ANALYZE (manually, no table name specified) a database with 210,000 tables goes from 1280 seconds to 120 seconds.  It does not meaningfully change the memory usage, as tabstat is not the bottleneck on memory usage, CacheMemoryContext is.  It takes about 1.5GB either with this patch or without it.

I'm having trouble coming up with an adverse scenario to test for performance regressions.  Perhaps a platform where IPC is pathologically slow?  I don't have one of those at hand, nor even know what such a one might be.  And presumably such a platform would be unsuited to the task of running a very large database anyway.


By the way, this issue can now be partially worked around using "vacuumdb -Z -j8".  The quadratic CPU issue is fixed because with the parallel option it issues a series of "ANALYZE table" commands rather than one unqualified command for the whole database.  And the memory usage originally reported is ameliorated because each backend gets own address space. The total amount of memory used remains the same, but the 32 bit limit doesn't apply to the aggregate, only to each process.

Cheers,

Jeff