Thread: Improving performance of select query
Hi,
I have a postgres(v9.5) table named customer holding 1 billion rows. It is not partitioned but it has an index against the primary key (integer). I need to keep a very few records (say, about 10k rows) and remove everything else.
insert into customer_backup select * from customer where customer_id in (<id1>,<id2>,..);
If I go for something like above I'm afraid the insert-select may take a very long time as when I ran
select count(*) from customer;
it is taking about 45 minutes to return the count.
Are there ways to improve the efficiency of the insert-select by , say, tuning some configurations related to memory to improve the efficiency ?
This is a box with 96GB of RAM overall and I can stop all the data load and DML operations if needed. But need a way to run this query as much efficiently as possible
Thanks and Regards,
Karthik
Disclaimer:
This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is considered confidential, proprietary, sensitive and/or otherwise legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation.
Karthik Shivashankar schrieb am 14.12.2020 um 12:38: > I have a postgres(v9.5) table named customer holding 1 billion rows. > It is not partitioned but it has an index against the primary key > (integer). I need to keep a very few records (say, about 10k rows) > and remove everything else. > > /insert into customer_backup select * from customer where customer_id in (<id1>,<id2>,..); / > > > > If I go for something like above I'm afraid the insert-select may take a very long time as when I ran > > /select count(*) from customer;/ > > it is taking about 45 minutes to return the count. Well, you need to compare the time with the same condition you use in your CREATE TABLE .. AS SELECT statement, e.g.: select count(*) from customer where id in (....); Or: explain (analyze) select * from customer where id in (....); Regards Thomas
> On Dec 14, 2020, at 4:47 AM, Thomas Kellerer <shammat@gmx.net> wrote: > > Karthik Shivashankar schrieb am 14.12.2020 um 12:38: >> I have a postgres(v9.5) table named customer holding 1 billion rows. >> It is not partitioned but it has an index against the primary key >> (integer). I need to keep a very few records (say, about 10k rows) >> and remove everything else. >> >> /insert into customer_backup select * from customer where customer_id in (<id1>,<id2>,..); / >> >> >> >> If I go for something like above I'm afraid the insert-select may take a very long time as when I ran >> >> /select count(*) from customer;/ >> >> it is taking about 45 minutes to return the count. > > Well, you need to compare the time with the same condition you use in your > CREATE TABLE .. AS SELECT statement, > > e.g.: > > select count(*) > from customer > where id in (....); > > Or: > > explain (analyze) > select * > from customer > where id in (....); > > > Regards > Thomas > As for the actually copy of the specific records, I would ‘where exists’ (even possibly with a temp table of ids) ratherthan in(id1..id10000) >
> On Dec 14, 2020, at 4:47 AM, Thomas Kellerer <shammat@gmx.net> wrote:
>
> Karthik Shivashankar schrieb am 14.12.2020 um 12:38:
>> I have a postgres(v9.5) table named customer holding 1 billion rows.
>> It is not partitioned but it has an index against the primary key
>> (integer). I need to keep a very few records (say, about 10k rows)
>> and remove everything else.
>>
>> /insert into customer_backup select * from customer where customer_id in (<id1>,<id2>,..); /
>>
>>
>>
>> If I go for something like above I'm afraid the insert-select may take a very long time as when I ran
>>
>> /select count(*) from customer;/
>>
>> it is taking about 45 minutes to return the count.
>
> Well, you need to compare the time with the same condition you use in your
> CREATE TABLE .. AS SELECT statement,
>
> e.g.:
>
> select count(*)
> from customer
> where id in (....);
>
> Or:
>
> explain (analyze)
> select *
> from customer
> where id in (....);
>
>
> Regards
> Thomas
>
As for the actually copy of the specific records, I would ‘where exists’ (even possibly with a temp table of ids) rather than in(id1..id10000)
>
On Dec 14, 2020, at 10:37 AM, Muhammad Bilal Jamil <mbjamil92@gmail.com> wrote:I think you can also increase the query performance by creating indexes?