RE: Batch insert heavily affecting query performance. - Mailing list pgsql-performance
From | Jean Baro |
---|---|
Subject | RE: Batch insert heavily affecting query performance. |
Date | |
Msg-id | CA+fQeenBfOT8Krv0COK3aszHVn3J+CTZg9tejc2BF9eL9vue2w@mail.gmail.com Whole thread Raw |
In response to | Batch insert heavily affecting query performance. (Jean Baro <jfbaro@gmail.com>) |
List | pgsql-performance |
Hi Jean,
I’ve used Postgres on a regular EC2 instance (an m4.xlarge), storing complex genomic data, hundreds of millions of rows in a table and “normal” queries that used an index returned in 50-100ms, depending on the query…so this isn’t a Postgres issue per se.
Your table and index structures look ok, although in PG, use the “text” datatype instead of varchar, it is the optimized type for storing string data of any size (even a 2 char country code). Since you have 2 such columns that you’ve indexed and are querying for, there is a chance you’ll see an improvement.
I have not yet used Aurora or RDS for any large data…it sure seems like the finger could be pointing there, but it isn’t clear what mechanism in Aurora could be creating the slowness.
Is there a possibility of you creating the same db on a normal EC2 instance with PG installed and running the same test? There is nothing else obvious about your data/structure that could result in such terrible performance.
Mike Sofen
From: Jean Baro [mailto:jfbaro@gmail.com]
Sent: Wednesday, December 27, 2017 7:14 AMHello,
We are still seeing queries (by UserID + UserCountry) taking over 2 seconds, even when there is no batch insert going on at the same time.
Each query returns from 100 to 200 messagens, which would be a 400kb pay load, which is super tiny.
I don't know what else I can do with the limitations (m4.large), 167MM rows, almost 500GB database and 29GB of indexes (all indexes).
I am probably to optimistic, but I was expecting queries (up to 50 queries per second) to return (99th) under 500ms or even less, as the index is simple, there is no aggregation or join involves.
Any suggestion?
The table structure:
CREATE TABLE public.card
(
id character(36) NOT NULL,
user_id character varying(40) NOT NULL,
user_country character(2) NOT NULL,
user_channel character varying(40),
user_role character varying(40),
created_by_system_key character(36) NOT NULL,
created_by_username character varying(40),
created_at timestamp with time zone NOT NULL,
last_modified_at timestamp with time zone NOT NULL,
date_start timestamp with time zone NOT NULL,
date_end timestamp with time zone NOT NULL,
payload json NOT NULL,
tags character varying(500),
menu character varying(50),
deleted boolean NOT NULL,
campaign character varying(500) NOT NULL,
correlation_id character varying(50),
PRIMARY KEY (id)
);
CREATE INDEX idx_user_country
ON public.card USING btree
(user_id COLLATE pg_catalog."default", user_country COLLATE pg_catalog."default");
CREATE INDEX idx_last_modified_at
ON public.card USING btree
(last_modified_at ASC NULLS LAST);
CREATE INDEX idx_campaign
ON public.card USING btree
(campaign ASC NULLS LAST)
The EXPLAIN
'Index Scan using idx_user_country on card (cost=0.57..1854.66 rows=460 width=922)'
' Index Cond: (((user_id)::text = '4684'::text) AND (user_country = 'BR'::bpchar))'
Em 25 de dez de 2017 01:10, "Jean Baro" <jfbaro@gmail.com> escreveu:
Thanks for the clarification guys.
It will be super useful. After trying this I'll post the results!
Merry Christmas!
pgsql-performance by date: