RE: Batch insert heavily affecting query performance. - Mailing list pgsql-performance

From Jean Baro
Subject RE: Batch insert heavily affecting query performance.
Date
Msg-id CA+fQeenBfOT8Krv0COK3aszHVn3J+CTZg9tejc2BF9eL9vue2w@mail.gmail.com
Whole thread Raw
In response to Batch insert heavily affecting query performance.  (Jean Baro <jfbaro@gmail.com>)
List pgsql-performance
Thanks Mike, 

We are using the standard RDS instance m4.large, it's not Aurora, which is a much more powerful server  (according to AWS).

Yes, we could install it on EC2, but it would take some extra effort from our side, it can be an investment though in case it will help us finding the bottle neck, BUT after tuning the database it must run on RDS for production use. As the company I work for demands we run microseconds DB as a managed service (RDS in this case).

Mike, what can we expect to see if we run PG on EC2? More logging? More tuning options? Let me know what your intention is so that I can convince other people on the team. But keep in mind in the end that payload should run on RDS m4.large (500gb to 1TB of general purpose SSD).

Again, thanks a lot!

Em 27 de dez de 2017 13:59, "Mike Sofen" <msofen@runbox.com> escreveu:

Hi Jean,

 

I’ve used Postgres on a regular EC2 instance (an m4.xlarge), storing complex genomic data, hundreds of millions of rows in a table and “normal” queries that used an index returned in 50-100ms, depending on the query…so this isn’t a Postgres issue per se. 

 

Your table and index structures look ok, although in PG, use the “text” datatype instead of varchar, it is the optimized type for storing string data of any size (even a 2 char country code).  Since you have 2 such columns that you’ve indexed and are querying for, there is a chance you’ll see an improvement. 

 

I have not yet used Aurora or RDS for any large data…it sure seems like the finger could be pointing there, but it isn’t clear what mechanism in Aurora could be creating the slowness.

 

Is there a possibility of you creating the same db on a normal EC2 instance with PG installed and running the same test?  There is nothing else obvious about your data/structure that could result in such terrible performance.

 

Mike Sofen

 

From: Jean Baro [mailto:jfbaro@gmail.com]
Sent: Wednesday, December 27, 2017 7:14 AM

Hello,

 

We are still seeing queries  (by UserID + UserCountry) taking over 2 seconds, even when there is no batch insert going on at the same time.

 

Each query returns from 100 to 200 messagens, which would be a 400kb pay load, which is super tiny.

 

I don't know what else I can do with the limitations (m4.large), 167MM rows, almost 500GB database and 29GB of indexes (all indexes).

 

I am probably to optimistic, but I was expecting queries (up to 50 queries per second) to return  (99th) under 500ms or even less, as the index is simple, there is no aggregation or join involves.

 

Any suggestion?

 

The table structure:

CREATE TABLE public.card

(

    id character(36) NOT NULL,

    user_id character varying(40) NOT NULL,

    user_country character(2) NOT NULL,

    user_channel character varying(40),

    user_role character varying(40),

    created_by_system_key character(36) NOT NULL,

    created_by_username character varying(40),

    created_at timestamp with time zone NOT NULL,

    last_modified_at timestamp with time zone NOT NULL,

    date_start timestamp with time zone NOT NULL,

    date_end timestamp with time zone NOT NULL,

    payload json NOT NULL,

    tags character varying(500),

    menu character varying(50),

    deleted boolean NOT NULL,

    campaign character varying(500) NOT NULL,

    correlation_id character varying(50),

    PRIMARY KEY (id)

);

 

CREATE INDEX idx_user_country

    ON public.card USING btree

    (user_id COLLATE pg_catalog."default", user_country COLLATE pg_catalog."default");

 

CREATE INDEX idx_last_modified_at

    ON public.card USING btree

    (last_modified_at ASC NULLS LAST);

 

CREATE INDEX idx_campaign

    ON public.card USING btree

    (campaign ASC NULLS LAST)

 

The EXPLAIN

 

'Index Scan using idx_user_country on card  (cost=0.57..1854.66 rows=460 width=922)'

'  Index Cond: (((user_id)::text = '4684'::text) AND (user_country = 'BR'::bpchar))'

 

 

 

Em 25 de dez de 2017 01:10, "Jean Baro" <jfbaro@gmail.com> escreveu:

Thanks for the clarification guys.

 

It will be super useful. After trying this I'll post the results!

 

Merry Christmas!

 


pgsql-performance by date:

Previous
From: Jean Baro
Date:
Subject: Re: Batch insert heavily affecting query performance.
Next
From: Jean Baro
Date:
Subject: Re: Batch insert heavily affecting query performance.