Re: BUG #16363: Memory increases for each table accessed untilconnection is closed - Mailing list pgsql-bugs

From Tomas Vondra
Subject Re: BUG #16363: Memory increases for each table accessed untilconnection is closed
Date
Msg-id 20200416213656.arunx2wvbod5hirk@development
Whole thread Raw
In response to Re: BUG #16363: Memory increases for each table accessed untilconnection is closed  (Eduardo Barreto Alexandre <sezdocs@gmail.com>)
List pgsql-bugs
I think you need to fix your mail system, because clearly it's broken in
some strange way. If you look at [1] you'll see the messages indeed got
to the mailing list so either you're not receiving messages from the
list or it's getting stashed in a spam folder or something.

[1] https://www.postgresql.org/message-id/16363-a66916ab04716e58%40postgresql.org

On Thu, Apr 16, 2020 at 06:23:33PM -0300, Eduardo Barreto Alexandre wrote:
>Unfortunately, same result :(
>
>On Thu, Apr 16, 2020 at 11:43 AM <luis.roberto@siscobra.com.br> wrote:
>
>> Sending the reply again since it seems that it didn't go through the
>> mailing list
>>
>>
>> Hello Tom, first of all, thanks for the response!
>> Sorry to not have added a lot of the needed information, I will try to
>> rectify that in this message.
>> So, first is what I'm trying to do, in my backend, I store financial data
>> (candlesticks and trades) in Postgres + TimescaleDB, the way I do it is
>> separating each exchange in one database and each market and timeframe in
>> one table, for example, giving that I have `ExchangeA` with 1 market
>> EUR/USD and 4 timeframes, I would get these 5 tables:
>> candles_eur_usd_one_minute
>> candles_eur_usd_five_minutes
>> candles_eur_usd_third_minutes
>> candles_eur_usd_one_hour
>> trades_eur_usd
>> Normally in my backend, I have, per database (exchange) ~750 markets, and
>> ~16 timeframes, so 750 * 16 = 12000 tables.
>> I noticed that Postgres would continuously increase my memory usage per
>> connection until I got an OOM error, which, in my backend terms, means
>> inserting candles and trades, ~2000 rows per minute. For a system with
>> 16GB, it would take like 1 to 3 hours and then I would get the OOM. For my
>> other system which has 64GB, it would take like 2 days but it would come a
>> time when Postgres would use everything and crash too.
>> For versions, I was using Postgres 11 with latest TimescaleDB, since I was
>> not sure if this was a TimescaleDB issue, I first created a minimal example
>> that triggered the issue faster (the one I talked in the first e-mail that
>> uses 700 tables) and then tested without TimescaleDB in Postgres 11 and 12.
>> I got the same result in both, so removing TimescaleDB out of the equation.
>> For the operational system, I'm using ubuntu, but I do test it in my
>> Gentoo machine with the same results, and I also tested it using Postgres
>> in docker.
>> For postgresql.conf, I was using first one tunned by the TimescaleDB tuner
>> software, but then changed to a default one to see if it would make a
>> change.
>> My backend is written in Elixir, so I'm using Ecto library to connect with
>> Postgres, Ecto will create a pool of connections that will be alive as long
>> as my backend is alive too. To make it easier for me to test the issue, I
>> limited the number of connections to 1, so I can keep an eye on that
>> connection process memory usage, etc.
>>
>> > Yes, Postgres caches some information about every table your session
>> > has accessed, and no you don't have a lot of control over that, and
>> > no it isn't a bug.
>>
>> > Having said that, the amount of memory consumed this way is in the
>> > vicinity of a few tens of KB per table, according to some simple
>> > experiments I just did.  I couldn't get a process touching 700 simple
>> > tables to eat more than ~70MB of space.  So either you are talking about
>> > tables with enormously complex schemas, or there's something else going
>> > on.  (Please note that the SO post you are referencing describes an
>> > attempt to use circa 100000 tables/views in one session, not 700.)
>>
>> Here are the details of one of the tables I'm using to do my test (all of
>> then are equal, just the name is different):
>> test_dev=# \d+ trades_eur_usd
>>                                          Table "public.trades_eur_usd"
>>   Column   |            Type             | Collation | Nullable | Default
>> | Storage | Stats target | Description
>>
>> -----------+-----------------------------+-----------+----------+---------+---------+--------------+-------------
>>  id        | integer                     |           | not null |
>> | plain   |              |
>>  timestamp | timestamp without time zone |           | not null |
>> | plain   |              |
>>  amount    | numeric                     |           | not null |
>> | main    |              |
>>  price     | numeric                     |           | not null |
>> | main    |              |
>> Indexes:
>>     "trades_eur_usd_id_desc_index" btree (id DESC)
>>     "trades_eur_usd_timestamp_desc_index" btree ("timestamp" DESC)
>> Access method: heap
>> One thing that I forgot to explain better is what I'm doing to touch those
>> tables. Basically I created a little Elixir backend with Ecto which will
>> keep 1 connection alive for the pool and start a bulk insertion of rows to
>> one of the 700 tables, so, basically, the steps are:
>> 1) Choose one of the tables randomly;
>> 2) Insert in one query 13000 rows to that table;
>> 3) Go back to step 1 and repeat.
>> This will trigger the issue instantly and will make the memory usage of
>> that connection grow very fast as shown in the last e-mail.
>> One very important thing to note is that if I change the above algorithm
>> to do the same thing, but always choose the same table in step 1 (so I will
>> do the bulk insertion always in 1 table instead of one of the 700), I will
>> not trigger the issue, memory usage will not go up at all. it only triggers
>> it if I'm doing the insertion in different tables each time.
>>
>> > The memory consumption curve you showed looks suspiciously like you have
>> > circa 4GB shared buffers allocated and the reported number is mostly
>> about
>> > how many of those buffers the process has touched so far.  This is a
>> > common misunderstanding; "top" and similar tools tend to be really bad
>> > about accounting for shared memory usage in any useful way.  In any case,
>> > the process you've shown us stopped growing its space consumption
>> > some time ago, so I wonder where the OOM complaint actually came from.
>>
>> > The short answer is that you're probably barking up the wrong tree,
>> > but there's not enough information here to provide any useful guesses
>> > about which is the right tree.
>>
>> I guess you are right, maybe the issue is with some shared buffers cache
>> or something, I tried lowering to the minimum value each possible option in
>> postgresql.conf that I found but still got the same OOM result.
>> Thanks a lot for the help, and feel free to ask for more information!
>>
>> Can you try to run it without a connection pool? just a simple connection
>> to the backend.
>>
>
>
>-- 
>Eduardo Barreto Alexandre
>
>http://lattes.cnpq.br/0983072410155605

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-bugs by date:

Previous
From: Eduardo Barreto Alexandre
Date:
Subject: Re: BUG #16363: Memory increases for each table accessed untilconnection is closed
Next
From: Jehan-Guillaume de Rorthais
Date:
Subject: Re: [BUG] non archived WAL removed during production crash recovery