BUG #15778: Replication slot peak query cause memory spike at the server when big transaction are running - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #15778: Replication slot peak query cause memory spike at the server when big transaction are running |
Date | |
Msg-id | 15778-0d88c55df57eb082@postgresql.org Whole thread Raw |
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 15778 Logged by: Nitesh Yadav Email address: nitesh@datacoral.co PostgreSQL version: 9.6.2 Operating system: x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.3 2 Description: Hi, Postgres Server setup: Postgres server is running as AWS rds instance. Server Version is PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit With the following parameters group rds.logical_replication is set to 1.Which internally set the following flags: wal_level, max_wal_senders, max_replication_slots, max_connections. We are using test_decoding module for retrieving/read the WAL data through the logical decoding mechanism. Application setup: Periodically we run the peek command to retrieve the data from the slot: eg SELECT * FROM pg_logical_slot_peek_changes('pgldpublic_cdc_slot', NULL, NULL, 'include-timestamp', 'on') LIMIT 200000 OFFSET 0; From the above query result, we use location of last transaction to remove the data from the slot: eg SELECT location, xid FROM pg_logical_slot_get_changes('pgldpublic_cdc_slot', 'B92/C7394678', NULL,'include-timestamp', 'on') LIMIT 1; We run Step 1 & 2 in the loop for reading data in the chunk of 200K records at a time in a given process. ------------------------------------------------------------------------------ Behavior reported (Bug): ------------------------------------------------------------------------------ Client Side timeline: ------------------------------------------------------------------------------ We have process with PID: 10032, which started at 2019-04-24T06:47:01.877Z At 2019-04-24T06:47:01.893Z: It invoked the following peek query SELECT * FROM pg_logical_slot_peek_changes('silo2_cdc_slot', NULL, NULL, 'include-timestamp', 'on') LIMIT 500000 OFFSET 0; Till 2019-04-24T07:01:41.852ZL: There was no response from the server and then we kill the process. Server side time: ------------------------------------------------------------------------------ At the server we have large transaction going-on, from 2019-04-24T06:00:00Z. At 2019-04-24T06:45:38Z, the slot size was around 2GB, at the moment the query was taking around 20 sec to response. At 2019-04-24T06:47:00Z, the memory usage of pg server by the PID = 10032 started growing drastically At 2019-04-24T06:54:00Z, the memory usage of pg server by the PID = 10032 came upto 20 GB At 2019-04-24T06:58:00Z, the memory usage of pg server by the PID = 10032 came upto 41 GB At 2019-04-24T07:05:40Z, the memory usage of pg server by the PID = 10032 stays around 40.98 GB Following are the anonymized pg server logs ------------------------------------------------------------------------------ 2019-04-24 06:45:38 UTC:ec2-xx-xx-xx-xx.compute-1.amazonaws.com(50260):abc@xyz:[6988]:LOG: connection authorized: user=abc database=xyz SSL enabled (protocol=TLSv1.2, cipher=XXXXX-YYY-AES256-GCM-ABC123, compression=off) 2019-04-24 06:45:38 UTC:ec2-xx-xx-xx-xx.compute-1.amazonaws.com(22837):abc@xyz:[6989]:LOG: connection authorized: user=abc database=xyz SSL enabled (protocol=TLSv1.2, cipher=XXXXX-YYY-AES256-GCM-ABC123, compression=off) 2019-04-24 06:45:38 UTC:ec2-xx-xx-xx-xx.compute-1.amazonaws.com(50260):abc@xyz:[6988]:LOG: starting logical decoding for slot "pg_slot" 2019-04-24 06:45:38 UTC:ec2-xx-xx-xx-xx.compute-1.amazonaws.com(50260):abc@xyz:[6988]:DETAIL: streaming transactions committing after 1003/499E5518, reading WAL from FFB/F8F0F3E8 2019-04-24 06:45:38 UTC:ec2-xx-xx-xx-xx.compute-1.amazonaws.com(50260):abc@xyz:[6988]:STATEMENT: SELECT * FROM pg_logical_slot_peek_changes('pg_slot', NULL, NULL, 'include-timestamp', 'on') LIMIT 500000 OFFSET 0; 2019-04-24 06:45:38 UTC:ec2-xx-xx-xx-xx.compute-1.amazonaws.com(50260):abc@xyz:[6988]:LOG: logical decoding found consistent point at FFB/F8F0F3E8 2019-04-24 06:45:38 UTC:ec2-xx-xx-xx-xx.compute-1.amazonaws.com(50260):abc@xyz:[6988]:DETAIL: Logical decoding will begin using saved snapshot. 2019-04-24 06:45:38 UTC:ec2-xx-xx-xx-xx.compute-1.amazonaws.com(50260):abc@xyz:[6988]:STATEMENT: SELECT * FROM pg_logical_slot_peek_changes('pg_slot', NULL, NULL, 'include-timestamp', 'on') LIMIT 500000 OFFSET 0; 2019-04-24 06:46:00 UTC:ec2-xx-xx-xx-xx.compute-1.amazonaws.com(50260):abc@xyz:[6988]:LOG: duration: 22294.217 ms execute <unnamed>: SELECT * FROM pg_logical_slot_peek_changes('pg_slot', NULL, NULL, 'include-timestamp', 'on') LIMIT 500000 OFFSET 0; 2019-04-24 06:46:38 UTC:ec2-xx-xx-xx-xx.compute-1.amazonaws.com(50260):abc@xyz:[6988]:LOG: starting logical decoding for slot "pg_slot" 2019-04-24 06:46:38 UTC:ec2-xx-xx-xx-xx.compute-1.amazonaws.com(50260):abc@xyz:[6988]:DETAIL: streaming transactions committing after 1003/499E5518, reading WAL from FFB/F8F0F3E8 2019-04-24 06:46:38 UTC:ec2-xx-xx-xx-xx.compute-1.amazonaws.com(50260):abc@xyz:[6988]:STATEMENT: SELECT xid FROM pg_logical_slot_get_changes('pg_slot', '1003/9B28D200', NULL, 'include-timestamp', 'on') LIMIT 1; 2019-04-24 06:46:38 UTC:ec2-xx-xx-xx-xx.compute-1.amazonaws.com(50260):abc@xyz:[6988]:LOG: logical decoding found consistent point at FFB/F8F0F3E8 2019-04-24 06:46:38 UTC:ec2-xx-xx-xx-xx.compute-1.amazonaws.com(50260):abc@xyz:[6988]:DETAIL: Logical decoding will begin using saved snapshot. 2019-04-24 06:46:38 UTC:ec2-xx-xx-xx-xx.compute-1.amazonaws.com(50260):abc@xyz:[6988]:STATEMENT: SELECT xid FROM pg_logical_slot_get_changes('pg_slot', '1003/9B28D200', NULL, 'include-timestamp', 'on') LIMIT 1; 2019-04-24 06:47:01 UTC:ec2-xx-xx-xx-xx.compute-1.amazonaws.com(50260):abc@xyz:[6988]:LOG: duration: 23156.543 ms statement: SELECT xid FROM pg_logical_slot_get_changes('pg_slot', '1003/9B28D200', NULL, 'include-timestamp', 'on') LIMIT 1; 2019-04-24 06:47:01 UTC:ec2-xx-xx-xx-xx.compute-1.amazonaws.com(10137):abc@xyz:[10032]:LOG: connection authorized: user=abc database=xyz SSL enabled (protocol=TLSv1.2, cipher=XXXXX-YYY-AES256-GCM-ABC123, compression=off) 2019-04-24 06:47:01 UTC:ec2-xx-xx-xx-xx.compute-1.amazonaws.com(46673):abc@xyz:[10034]:LOG: connection authorized: user=abc database=xyz SSL enabled (protocol=TLSv1.2, cipher=XXXXX-YYY-AES256-GCM-ABC123, compression=off) 2019-04-24 06:47:01 UTC:ec2-xx-xx-xx-xx.compute-1.amazonaws.com(10137):abc@xyz:[10032]:LOG: starting logical decoding for slot "pg_slot" 2019-04-24 06:47:01 UTC:ec2-xx-xx-xx-xx.compute-1.amazonaws.com(10137):abc@xyz:[10032]:DETAIL: streaming transactions committing after 1003/9B28D200, reading WAL from FFB/F8F0F3E8 2019-04-24 06:47:01 UTC:ec2-xx-xx-xx-xx.compute-1.amazonaws.com(10137):abc@xyz:[10032]:STATEMENT: SELECT * FROM pg_logical_slot_peek_changes('pg_slot', NULL, NULL, 'include-timestamp', 'on') LIMIT 500000 OFFSET 0; 2019-04-24 06:47:01 UTC:ec2-xx-xx-xx-xx.compute-1.amazonaws.com(10137):abc@xyz:[10032]:LOG: logical decoding found consistent point at FFB/F8F0F3E8 2019-04-24 06:47:01 UTC:ec2-xx-xx-xx-xx.compute-1.amazonaws.com(10137):abc@xyz:[10032]:DETAIL: Logical decoding will begin using saved snapshot. 2019-04-24 06:47:01 UTC:ec2-xx-xx-xx-xx.compute-1.amazonaws.com(10137):abc@xyz:[10032]:STATEMENT: SELECT * FROM pg_logical_slot_peek_changes('pg_slot', NULL, NULL, 'include-timestamp', 'on') LIMIT 500000 OFFSET 0; 2019-04-24 06:50:14 UTC:ec2-xx-xx-xx-xx.compute-1.amazonaws.com(60004):abc@xyz:[18122]:LOG: connection authorized: user=abc database=xyz SSL enabled (protocol=TLSv1.2, cipher=XXXXX-YYY-AES256-GCM-ABC123, compression=off) 2019-04-24 06:55:14 UTC:ec2-xx-xx-xx-xx.compute-1.amazonaws.com(29409):abc@xyz:[31637]:LOG: connection authorized: user=abc database=xyz SSL enabled (protocol=TLSv1.2, cipher=XXXXX-YYY-AES256-GCM-ABC123, compression=off) 2019-04-24 07:00:14 UTC:ec2-xx-xx-xx-xx.compute-1.amazonaws.com(43301):abc@xyz:[5247]:LOG: connection authorized: user=abc database=xyz SSL enabled (protocol=TLSv1.2, cipher=XXXXX-YYY-AES256-GCM-ABC123, compression=off) 2019-04-24 07:02:01 UTC:ec2-xx-xx-xx-xx.compute-1.amazonaws.com(46673):abc@xyz:[10034]:LOG: could not receive data from client: Connection reset by peer 2019-04-24 07:05:14 UTC:ec2-xx-xx-xx-xx.compute-1.amazonaws.com(59862):abc@xyz:[10813]:LOG: connection authorized: user=abc database=xyz SSL enabled (protocol=TLSv1.2, cipher=XXXXX-YYY-AES256-GCM-ABC123, compression=off) 2019-04-24 07:10:14 UTC:ec2-xx-xx-xx-xx.compute-1.amazonaws.com(15221):abc@xyz:[16183]:LOG: connection authorized: user=abc database=xyz SSL enabled (protocol=TLSv1.2, cipher=XXXXX-YYY-AES256-GCM-ABC123, compression=off) We have the following questions: ------------------------------------------------------------------------------ 1) What causes the memory usage spike at the server? Is it the peek query or write of big transaction running on the server ? 2) Why the peek query took so much time at 2019-04-24T06:47:00Z, whereas it was finished within 20s, when run last time at 2019-04-24T06:45: 38.516Z? 3) How can we restrict/limit the memory usage of peek queries from the server side? Please suggest if there are configuration we can use? Regards, Nitesh
pgsql-bugs by date: