On Thu, 2025-03-13 at 18:13 +0800, bill.poole@ymail.com wrote:
>
> it is noteworthy that inserting 27 MB of data into a newly created table creates
> 191 MB of data including the index and 127 MB of data excluding the index.
PostgreSQL has a lot of overhead per row.
>
> Can you help me understand why performing 3 million lookups on a b-tree index
> with all pages cached in memory takes so long?
It is probably not the lookup, but the *modification* of the index that is slow.
>
> It seems like deleting 3 million rows identified by 3 million IDs should be
> faster than updating 3 million rows (also identified by 3 million IDs).
It should be, yes.
To find out where the time is spent, use EXPLAIN (ANALYZE, BUFFERS) on the
statement.
> > To get better performance, the best I can think of is to parallelize loading
> > the data until you saturate CPU, disk or hit internal contention in the database.
>
> Sadly, I cannot do that because I need all rows to be inserted in a single
> database transaction, which I cannot do over multiple database connections.
Then the best you can do is to use COPY rather than INSERT.
It will perform better (but now vastly better).
Yours,
Laurenz Albe