Improve postgres 13.2 performance for concurrent bulk insert - Mailing list pgsql-admin

From Ivan Petrov
Subject Improve postgres 13.2 performance for concurrent bulk insert
Date
Msg-id CAEARqsGJ3KMCY9MfENnHaxfzux9qZz91gqMG6ehFxANJw42Fbg@mail.gmail.com
Whole thread Raw
List pgsql-admin
Hi, 

What I'm trying to do:

  • I have 3rd party DB (any type Postgres, Oracle...) I have no control over it. Let's call it REMOTE_DB
  • I have my own Postgres instance 13.2. I fully control it. Let's call it MY_PG
  • REMOTE_DB has a 20-200M records table
  • I need from time to time to fetch 100K - 1M records from REMOTE_DB, insert into MY_PG and then do sophisticated joins to produce some analytical results. Let's call it REMOTE_DB_FETCH
  • REMOTE_DB_FETCH saves records to MY_PG
  • MY_PG has table that mimics schema of source table stored in REMOTE_DB. It mimics PK too.
  • there are concurrent processes that run REMOTE_DB_FETCH to insert data into MY_PG

Long story short:

  • I cache subset of REMOTE_DB at MY_PG
  • Do analytics in MY_PG

What I've tried to far

What else can I try in order to UPSERT from REMOTE_DB to MY_PG faster from concurrent REMOTE_DB_FETCH processes?

I've created isolated test to try out performance. Inserting around 48K rows into a table with 150 fields with only PK constraint on to fields: varchar and bigint.

Table is unlogged

isUnlogged: [{relpersistence=u, relname=cache_1694}, {relpersistence=u, relname=cache_1694_pkey}] 

sql

INSERT INTO "cache_1694" ("varchar_column", "bigint_column", "another_column" /* more column 150 in total */) VALUES (?, ?, ? /**/)  
ON CONFLICT ("varchar_column", "bigint_column") DO NOTHING;

It took 30 sec to insert 48819 records. looks slow, what can I try else? I'm running postgres on my Mac Pro with an SSD disk. Here are the settings:

postgres:    image: postgres:13.2    volumes:      - ~/pgdata:/var/lib/postgresql/data    ports:      - "5432:5432"    environment:      - POSTGRES_USER=pguser      - POSTGRES_PASSWORD=pguser      - POSTGRES_DB=pgdb      - PGDATA=/var/lib/postgresql/data/pgdata
    command:      - "postgres"      - "-c"      - "max_connections=50"      - "-c"      - "shared_buffers=1GB"      - "-c"      - "effective_cache_size=1GB"      - "-c"      - "synchronous_commit=off"
SELECT * FROM pg_settings where pg_settings.name like '%commi%'

shows

  {    "name": "synchronous_commit",    "setting": "off"  }
]

Strange thing is that LOGGED table performance is not much greater than UNLOGGED. Same 30 sec for 48819 inserted duplicated rows


pgsql-admin by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Temporary Files
Next
From: Devendra Yadav
Date:
Subject: