Re: How batch processing works - Mailing list pgsql-general
From | Adrian Klaver |
---|---|
Subject | Re: How batch processing works |
Date | |
Msg-id | 2fe0be15-f64e-465c-9dd2-b55c559ac7e2@aklaver.com Whole thread Raw |
In response to | How batch processing works (Lok P <loknath.73@gmail.com>) |
Responses |
Re: How batch processing works
|
List | pgsql-general |
On 9/21/24 07:36, Peter J. Holzer wrote: > On 2024-09-21 16:44:08 +0530, Lok P wrote: > --------------------------------------------------------------------------------------------------- > #!/usr/bin/python3 > > import time > import psycopg2 > > num_inserts = 10_000 > batch_size = 50 > > db = psycopg2.connect() > csr = db.cursor() > > csr.execute("drop table if exists parent_table") > csr.execute("create table parent_table (id int primary key, t text)") > db.commit() > > start_time = time.monotonic() > for i in range(1, num_inserts+1): > csr.execute("insert into parent_table values(%s, %s)", (i, 'a')) > if i % batch_size == 0: > db.commit() > db.commit() > end_time = time.monotonic() > elapsed_time = end_time - start_time > print(f"Method 2: Individual Inserts with Commit after {batch_size} Rows: {elapsed_time:.3} seconds") > > # vim: tw=99 > --------------------------------------------------------------------------------------------------- FYI, this is less of problem with psycopg(3) and pipeline mode: import time import psycopg num_inserts = 10_000 batch_size = 50 db = psycopg.connect("dbname=test user=postgres host=104.237.158.68") csr = db.cursor() csr.execute("drop table if exists parent_table") csr.execute("create table parent_table (id int primary key, t text)") db.commit() start_time = time.monotonic() with db.pipeline(): for i in range(1, num_inserts+1): csr.execute("insert into parent_table values(%s, %s)", (i, 'a')) if i % batch_size == 0: db.commit() db.commit() end_time = time.monotonic() elapsed_time = end_time - start_time print(f"Method 2: Individual Inserts(psycopg3 pipeline mode) with Commit after {batch_size} Rows: {elapsed_time:.3} seconds") For remote to a database in another state that took the time from: Method 2: Individual Inserts with Commit after 50 Rows: 2.42e+02 seconds to: Method 2: Individual Inserts(psycopg3 pipeline mode) with Commit after 50 Rows: 9.83 seconds > #!/usr/bin/python3 > > import itertools > import time > import psycopg2 > > num_inserts = 10_000 > batch_size = 50 > > db = psycopg2.connect() > csr = db.cursor() > > csr.execute("drop table if exists parent_table") > csr.execute("create table parent_table (id int primary key, t text)") > db.commit() > > start_time = time.monotonic() > batch = [] > for i in range(1, num_inserts+1): > batch.append((i, 'a')) > if i % batch_size == 0: > q = "insert into parent_table values" + ",".join(["(%s, %s)"] * len(batch)) > params = list(itertools.chain.from_iterable(batch)) > csr.execute(q, params) > db.commit() > batch = [] > if batch: > q = "insert into parent_table values" + ",".join(["(%s, %s)"] * len(batch)) > csr.execute(q, list(itertools.chain(batch))) > db.commit() > batch = [] > > end_time = time.monotonic() > elapsed_time = end_time - start_time > print(f"Method 3: Batch Inserts ({batch_size}) with Commit after each batch: {elapsed_time:.3} seconds") > > # vim: tw=99 > --------------------------------------------------------------------------------------------------- The above can also be handled with execute_batch() and execute_values() from: https://www.psycopg.org/docs/extras.html#fast-execution-helpers > > On my laptop, method2 is about twice as fast as method3. But if I > connect to a database on the other side of the city, method2 is now more > than 16 times faster than method3 . Simply because the delay in > communication is now large compared to the time it takes to insert those > rows. > > hp > -- Adrian Klaver adrian.klaver@aklaver.com
pgsql-general by date: