Thread: Any way to speed up INSERT INTO

Any way to speed up INSERT INTO

From
aditya desai
Date:
Hi,
One of the service layer app is inserting Millions of records in a table but one row at a time. Although COPY is the fastest way to import a file in a table. Application has a requirement of processing a row and inserting it into a table. Is there any way this INSERT can be tuned by increasing parameters? It is taking almost 10 hours for just 2.2 million rows in a table. Table does not have any indexes or triggers.

Regards,
Aditya.

Re: Any way to speed up INSERT INTO

From
Bruce Momjian
Date:
On Sat, Mar  5, 2022 at 12:01:52AM +0530, aditya desai wrote:
> Hi,
> One of the service layer app is inserting Millions of records in a table but
> one row at a time. Although COPY is the fastest way to import a file in a
> table. Application has a requirement of processing a row and inserting it into
> a table. Is there any way this INSERT can be tuned by increasing parameters? It
> is taking almost 10 hours for just 2.2 million rows in a table. Table does not
> have any indexes or triggers.

Well, sections 14.4 and 14.5 might help:

    https://www.postgresql.org/docs/14/performance-tips.html

Your time seems very slow --- are the rows very wide?

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: Any way to speed up INSERT INTO

From
Tom Lane
Date:
aditya desai <admad123@gmail.com> writes:
> One of the service layer app is inserting Millions of records in a table
> but one row at a time. Although COPY is the fastest way to import a file in
> a table. Application has a requirement of processing a row and inserting it
> into a table. Is there any way this INSERT can be tuned by increasing
> parameters? It is taking almost 10 hours for just 2.2 million rows in a
> table. Table does not have any indexes or triggers.

Using a prepared statement for the INSERT would help a little bit.
What would help more, if you don't expect any insertion failures,
is to group multiple inserts per transaction (ie put BEGIN ... COMMIT
around each batch of 100 or 1000 or so insertions).  There's not
going to be any magic bullet that lets you get away without changing
the app, though.

It's quite possible that network round trip costs are a big chunk of your
problem, in which case physically grouping multiple rows into each INSERT
command (... or COPY ...) is the only way to fix it.  But I'd start with
trying to reduce the transaction commit overhead.

            regards, tom lane



Re: Any way to speed up INSERT INTO

From
aditya desai
Date:
Hi Bruce,
Correct rows are wider. One of the columns is text and one is bytea.

Regards,
Aditya.

On Sat, Mar 5, 2022 at 12:08 AM Bruce Momjian <bruce@momjian.us> wrote:
On Sat, Mar  5, 2022 at 12:01:52AM +0530, aditya desai wrote:
> Hi,
> One of the service layer app is inserting Millions of records in a table but
> one row at a time. Although COPY is the fastest way to import a file in a
> table. Application has a requirement of processing a row and inserting it into
> a table. Is there any way this INSERT can be tuned by increasing parameters? It
> is taking almost 10 hours for just 2.2 million rows in a table. Table does not
> have any indexes or triggers.

Well, sections 14.4 and 14.5 might help:

        https://www.postgresql.org/docs/14/performance-tips.html

Your time seems very slow --- are the rows very wide?

--
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Any way to speed up INSERT INTO

From
Bruce Momjian
Date:
On Fri, Mar  4, 2022 at 01:42:39PM -0500, Tom Lane wrote:
> aditya desai <admad123@gmail.com> writes:
> > One of the service layer app is inserting Millions of records in a table
> > but one row at a time. Although COPY is the fastest way to import a file in
> > a table. Application has a requirement of processing a row and inserting it
> > into a table. Is there any way this INSERT can be tuned by increasing
> > parameters? It is taking almost 10 hours for just 2.2 million rows in a
> > table. Table does not have any indexes or triggers.
> 
> Using a prepared statement for the INSERT would help a little bit.

Yeah, I thought about that but it seems it would only minimally help.

> What would help more, if you don't expect any insertion failures,
> is to group multiple inserts per transaction (ie put BEGIN ... COMMIT
> around each batch of 100 or 1000 or so insertions).  There's not
> going to be any magic bullet that lets you get away without changing
> the app, though.

Yeah, he/she could insert via multiple rows too:

    CREATE TABLE test (x int);
    INSERT INTO test VALUES (1), (2), (3);
    
> It's quite possible that network round trip costs are a big chunk of your
> problem, in which case physically grouping multiple rows into each INSERT
> command (... or COPY ...) is the only way to fix it.  But I'd start with
> trying to reduce the transaction commit overhead.

Agreed, turning off synchronous_commit for that those queries would be
my first approach.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: Any way to speed up INSERT INTO

From
Andres Freund
Date:
Hi,

On March 4, 2022 10:42:39 AM PST, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>aditya desai <admad123@gmail.com> writes:
>> One of the service layer app is inserting Millions of records in a table
>> but one row at a time. Although COPY is the fastest way to import a file in
>> a table. Application has a requirement of processing a row and inserting it
>> into a table. Is there any way this INSERT can be tuned by increasing
>> parameters? It is taking almost 10 hours for just 2.2 million rows in a
>> table. Table does not have any indexes or triggers.
>
>Using a prepared statement for the INSERT would help a little bit.
>What would help more, if you don't expect any insertion failures,
>is to group multiple inserts per transaction (ie put BEGIN ... COMMIT
>around each batch of 100 or 1000 or so insertions).  There's not
>going to be any magic bullet that lets you get away without changing
>the app, though.
>
>It's quite possible that network round trip costs are a big chunk of your
>problem, in which case physically grouping multiple rows into each INSERT
>command (... or COPY ...) is the only way to fix it.  But I'd start with
>trying to reduce the transaction commit overhead.

Pipelining could also help.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



RES: Any way to speed up INSERT INTO

From
Edson Richter
Date:

 

De: Andres Freund
Enviado:sexta-feira, 4 de março de 2022 15:52
Para: pgsql-performance@lists.postgresql.org; Tom Lane; aditya desai
Cc:Pgsql Performance
Assunto: Re: Any way to speed up INSERT INTO

 

Hi,

On March 4, 2022 10:42:39 AM PST, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>aditya desai <admad123@gmail.com> writes:
>> One of the service layer app is inserting Millions of records in a table
>> but one row at a time. Although COPY is the fastest way to import a file in
>> a table. Application has a requirement of processing a row and inserting it
>> into a table. Is there any way this INSERT can be tuned by increasing
>> parameters? It is taking almost 10 hours for just 2.2 million rows in a
>> table. Table does not have any indexes or triggers.
>
>Using a prepared statement for the INSERT would help a little bit.
>What would help more, if you don't expect any insertion failures,
>is to group multiple inserts per transaction (ie put BEGIN ... COMMIT
>around each batch of 100 or 1000 or so insertions).  There's not
>going to be any magic bullet that lets you get away without changing
>the app, though.
>
>It's quite possible that network round trip costs are a big chunk of your
>problem, in which case physically grouping multiple rows into each INSERT
>command (... or COPY ...) is the only way to fix it.  But I'd start with
>trying to reduce the transaction commit overhead.

Pipelining could also help.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

 

Sorry for disturbing – I had similar problem with storing logs for e-commerce service mesh producing millions of records per day; to not loose anything, I do record every log records in Apache ActiveMQ Artemis, and then another microservice collects data from MQ and store in PostgreSQL. Since we have logs in waves, ActiveMQ Artemis reduces the “impedance” between systems.

Just my 2c.

 

Regards,

 

ER.

Re: Any way to speed up INSERT INTO

From
Imre Samu
Date:
> Correct rows are wider. One of the columns is text and one is bytea.

with the PG14 the LZ4 compression is worth checking.

"""
INSERT statements with 16 clients
Another common scenario that I tested was accessing the database from multiple clients - 16 in this case.
What I found out, as can be seen below, is that compression performance of single large files (HTML, English text, source code, executable binary, pictures) using LZ4 was 60% to 70% faster compared to PGLZ, and that there was also a small improvement while inserting multiple small files (PostgreSQL document).
"""

kind regards,
  Imre

aditya desai <admad123@gmail.com> ezt írta (időpont: 2022. márc. 4., P, 19:42):
Hi Bruce,
Correct rows are wider. One of the columns is text and one is bytea.

Regards,
Aditya.

On Sat, Mar 5, 2022 at 12:08 AM Bruce Momjian <bruce@momjian.us> wrote:
On Sat, Mar  5, 2022 at 12:01:52AM +0530, aditya desai wrote:
> Hi,
> One of the service layer app is inserting Millions of records in a table but
> one row at a time. Although COPY is the fastest way to import a file in a
> table. Application has a requirement of processing a row and inserting it into
> a table. Is there any way this INSERT can be tuned by increasing parameters? It
> is taking almost 10 hours for just 2.2 million rows in a table. Table does not
> have any indexes or triggers.

Well, sections 14.4 and 14.5 might help:

        https://www.postgresql.org/docs/14/performance-tips.html

Your time seems very slow --- are the rows very wide?

--
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Any way to speed up INSERT INTO

From
aditya desai
Date:
Thanks all for your inputs. We will try to implement inserts in single transaction. I feel that is the best approach.

Thanks,
AD.

On Saturday, March 5, 2022, Bruce Momjian <bruce@momjian.us> wrote:
On Fri, Mar  4, 2022 at 01:42:39PM -0500, Tom Lane wrote:
> aditya desai <admad123@gmail.com> writes:
> > One of the service layer app is inserting Millions of records in a table
> > but one row at a time. Although COPY is the fastest way to import a file in
> > a table. Application has a requirement of processing a row and inserting it
> > into a table. Is there any way this INSERT can be tuned by increasing
> > parameters? It is taking almost 10 hours for just 2.2 million rows in a
> > table. Table does not have any indexes or triggers.
>
> Using a prepared statement for the INSERT would help a little bit.

Yeah, I thought about that but it seems it would only minimally help.

> What would help more, if you don't expect any insertion failures,
> is to group multiple inserts per transaction (ie put BEGIN ... COMMIT
> around each batch of 100 or 1000 or so insertions).  There's not
> going to be any magic bullet that lets you get away without changing
> the app, though.

Yeah, he/she could insert via multiple rows too:

        CREATE TABLE test (x int);
        INSERT INTO test VALUES (1), (2), (3);
       
> It's quite possible that network round trip costs are a big chunk of your
> problem, in which case physically grouping multiple rows into each INSERT
> command (... or COPY ...) is the only way to fix it.  But I'd start with
> trying to reduce the transaction commit overhead.

Agreed, turning off synchronous_commit for that those queries would be
my first approach.

--
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Any way to speed up INSERT INTO

From
aditya desai
Date:
Hi Tom,
I added BEGIN and COMMIT as shown below around insert and executed it from pgadmin for 100,000 rows. It ran in just 1 min.

BEGIN;
INSERT INTO TABLE VALUES(....);
INSERT INTO TABLE VALUES(....);
.
,
COMMIT;

However when I run above from psql by passing it to psql(As shown below) as a file. It still takes a lot of time. Am I doing anything wrong? How can I run this from pgadmin within a minute?

psql -h host -U user -p Port -d database < INSERT_FILE.sql

PSQL is still printing as below.
INSERT 0 1
INSERT 0 1


Regards,
Aditya.


On Sat, Mar 5, 2022 at 12:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
aditya desai <admad123@gmail.com> writes:
> One of the service layer app is inserting Millions of records in a table
> but one row at a time. Although COPY is the fastest way to import a file in
> a table. Application has a requirement of processing a row and inserting it
> into a table. Is there any way this INSERT can be tuned by increasing
> parameters? It is taking almost 10 hours for just 2.2 million rows in a
> table. Table does not have any indexes or triggers.

Using a prepared statement for the INSERT would help a little bit.
What would help more, if you don't expect any insertion failures,
is to group multiple inserts per transaction (ie put BEGIN ... COMMIT
around each batch of 100 or 1000 or so insertions).  There's not
going to be any magic bullet that lets you get away without changing
the app, though.

It's quite possible that network round trip costs are a big chunk of your
problem, in which case physically grouping multiple rows into each INSERT
command (... or COPY ...) is the only way to fix it.  But I'd start with
trying to reduce the transaction commit overhead.

                        regards, tom lane

Re: Any way to speed up INSERT INTO

From
Bruce Momjian
Date:
On Tue, Mar  8, 2022 at 06:36:17PM +0530, aditya desai wrote:
> Hi Tom,
> I added BEGIN and COMMIT as shown below around insert and executed it from
> pgadmin for 100,000 rows. It ran in just 1 min.
> 
> BEGIN;
> INSERT INTO TABLE VALUES(....);
> INSERT INTO TABLE VALUES(....);
> .
> ,
> COMMIT;
> 
> However when I run above from psql by passing it to psql(As shown below) as a
> file. It still takes a lot of time. Am I doing anything wrong? How can I run
> this from pgadmin within a minute?
> 
> psql -h host -U user -p Port -d database < INSERT_FILE.sql
> 
> PSQL is still printing as below.
> INSERT 0 1
> INSERT 0 1

Uh, they should be the same.  You can turn on log_statement=all on the
server and look at what queries are being issued in each case.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: Any way to speed up INSERT INTO

From
aditya desai
Date:
Ok Will check. But from pgadmin it takes 1min and by psql it is taking 20 mins for 100,000 rows with BEGIN; COMMIT;

Thanks,
Aditya.

On Tue, Mar 8, 2022 at 8:23 PM Bruce Momjian <bruce@momjian.us> wrote:
On Tue, Mar  8, 2022 at 06:36:17PM +0530, aditya desai wrote:
> Hi Tom,
> I added BEGIN and COMMIT as shown below around insert and executed it from
> pgadmin for 100,000 rows. It ran in just 1 min.
>
> BEGIN;
> INSERT INTO TABLE VALUES(....);
> INSERT INTO TABLE VALUES(....);
> .
> ,
> COMMIT;
>
> However when I run above from psql by passing it to psql(As shown below) as a
> file. It still takes a lot of time. Am I doing anything wrong? How can I run
> this from pgadmin within a minute?
>
> psql -h host -U user -p Port -d database < INSERT_FILE.sql
>
> PSQL is still printing as below.
> INSERT 0 1
> INSERT 0 1

Uh, they should be the same.  You can turn on log_statement=all on the
server and look at what queries are being issued in each case.

--
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.