Re: BUG #16363: Memory increases for each table accessed untilconnection is closed - Mailing list pgsql-bugs

From Eduardo Barreto Alexandre
Subject Re: BUG #16363: Memory increases for each table accessed untilconnection is closed
Date
Msg-id CAG6JkqN-1GpfSakqyP74ArOy08uTd2qP7enpV-=FRofO3cT7gg@mail.gmail.com
Whole thread Raw
In response to Re: BUG #16363: Memory increases for each table accessed untilconnection is closed  (luis.roberto@siscobra.com.br)
Responses Re: BUG #16363: Memory increases for each table accessed untilconnection is closed
Re: BUG #16363: Memory increases for each table accessed untilconnection is closed
List pgsql-bugs
Unfortunately, same result :(

On Thu, Apr 16, 2020 at 11:43 AM <luis.roberto@siscobra.com.br> wrote:
Sending the reply again since it seems that it didn't go through the mailing list

Hello Tom, first of all, thanks for the response!
Sorry to not have added a lot of the needed information, I will try to rectify that in this message.
So, first is what I'm trying to do, in my backend, I store financial data (candlesticks and trades) in Postgres + TimescaleDB, the way I do it is separating each exchange in one database and each market and timeframe in one table, for example, giving that I have `ExchangeA` with 1 market EUR/USD and 4 timeframes, I would get these 5 tables:
candles_eur_usd_one_minute
candles_eur_usd_five_minutes
candles_eur_usd_third_minutes
candles_eur_usd_one_hour
trades_eur_usd
Normally in my backend, I have, per database (exchange) ~750 markets, and ~16 timeframes, so 750 * 16 = 12000 tables.
I noticed that Postgres would continuously increase my memory usage per connection until I got an OOM error, which, in my backend terms, means inserting candles and trades, ~2000 rows per minute. For a system with 16GB, it would take like 1 to 3 hours and then I would get the OOM. For my other system which has 64GB, it would take like 2 days but it would come a time when Postgres would use everything and crash too.
For versions, I was using Postgres 11 with latest TimescaleDB, since I was not sure if this was a TimescaleDB issue, I first created a minimal example that triggered the issue faster (the one I talked in the first e-mail that uses 700 tables) and then tested without TimescaleDB in Postgres 11 and 12. I got the same result in both, so removing TimescaleDB out of the equation.
For the operational system, I'm using ubuntu, but I do test it in my Gentoo machine with the same results, and I also tested it using Postgres in docker.
For postgresql.conf, I was using first one tunned by the TimescaleDB tuner software, but then changed to a default one to see if it would make a change.
My backend is written in Elixir, so I'm using Ecto library to connect with Postgres, Ecto will create a pool of connections that will be alive as long as my backend is alive too. To make it easier for me to test the issue, I limited the number of connections to 1, so I can keep an eye on that connection process memory usage, etc.

> Yes, Postgres caches some information about every table your session
> has accessed, and no you don't have a lot of control over that, and
> no it isn't a bug.

> Having said that, the amount of memory consumed this way is in the
> vicinity of a few tens of KB per table, according to some simple
> experiments I just did.  I couldn't get a process touching 700 simple
> tables to eat more than ~70MB of space.  So either you are talking about
> tables with enormously complex schemas, or there's something else going
> on.  (Please note that the SO post you are referencing describes an
> attempt to use circa 100000 tables/views in one session, not 700.)

Here are the details of one of the tables I'm using to do my test (all of then are equal, just the name is different):
test_dev=# \d+ trades_eur_usd
                                         Table "public.trades_eur_usd"
  Column   |            Type             | Collation | Nullable | Default | Storage | Stats target | Description
-----------+-----------------------------+-----------+----------+---------+---------+--------------+-------------
 id        | integer                     |           | not null |         | plain   |              |
 timestamp | timestamp without time zone |           | not null |         | plain   |              |
 amount    | numeric                     |           | not null |         | main    |              |
 price     | numeric                     |           | not null |         | main    |              |
Indexes:
    "trades_eur_usd_id_desc_index" btree (id DESC)
    "trades_eur_usd_timestamp_desc_index" btree ("timestamp" DESC)
Access method: heap
One thing that I forgot to explain better is what I'm doing to touch those tables. Basically I created a little Elixir backend with Ecto which will keep 1 connection alive for the pool and start a bulk insertion of rows to one of the 700 tables, so, basically, the steps are:
1) Choose one of the tables randomly;
2) Insert in one query 13000 rows to that table;
3) Go back to step 1 and repeat.
This will trigger the issue instantly and will make the memory usage of that connection grow very fast as shown in the last e-mail.
One very important thing to note is that if I change the above algorithm to do the same thing, but always choose the same table in step 1 (so I will do the bulk insertion always in 1 table instead of one of the 700), I will not trigger the issue, memory usage will not go up at all. it only triggers it if I'm doing the insertion in different tables each time.

> The memory consumption curve you showed looks suspiciously like you have
> circa 4GB shared buffers allocated and the reported number is mostly about
> how many of those buffers the process has touched so far.  This is a
> common misunderstanding; "top" and similar tools tend to be really bad
> about accounting for shared memory usage in any useful way.  In any case,
> the process you've shown us stopped growing its space consumption
> some time ago, so I wonder where the OOM complaint actually came from.

> The short answer is that you're probably barking up the wrong tree,
> but there's not enough information here to provide any useful guesses
> about which is the right tree.

I guess you are right, maybe the issue is with some shared buffers cache or something, I tried lowering to the minimum value each possible option in postgresql.conf that I found but still got the same OOM result.
Thanks a lot for the help, and feel free to ask for more information!
Can you try to run it without a connection pool? just a simple connection to the backend.


--

pgsql-bugs by date:

Previous
From: luis.roberto@siscobra.com.br
Date:
Subject: Re: BUG #16363: Memory increases for each table accessed untilconnection is closed
Next
From: Tomas Vondra
Date:
Subject: Re: BUG #16363: Memory increases for each table accessed untilconnection is closed