What I'm trying to do:
- I have 3rd party DB (any type Postgres, Oracle...) I have no control over it. Let's call it REMOTE_DB
- I have my own Postgres instance 13.2. I fully control it. Let's call it MY_PG
- REMOTE_DB has a 20-200M records table
- I need from time to time to fetch 100K - 1M records from REMOTE_DB, insert into MY_PG and then do sophisticated joins to produce some analytical results. Let's call it REMOTE_DB_FETCH
- REMOTE_DB_FETCH saves records to MY_PG
- MY_PG has table that mimics schema of source table stored in REMOTE_DB. It mimics PK too.
- there are concurrent processes that run REMOTE_DB_FETCH to insert data into MY_PG
Long story short:
- I cache subset of REMOTE_DB at MY_PG
- Do analytics in MY_PG
What I've tried to far
What else can I try in order to UPSERT from REMOTE_DB to MY_PG faster from concurrent REMOTE_DB_FETCH processes?
I've created isolated test to try out performance. Inserting around 48K rows into a table with 150 fields with only PK constraint on to fields: varchar and bigint.
Table is unlogged
isUnlogged: [{relpersistence=u, relname=cache_1694}, {relpersistence=u, relname=cache_1694_pkey}]
sql
INSERT INTO "cache_1694" ("varchar_column", "bigint_column", "another_column" ) VALUES (?, ?, ? )
ON CONFLICT ("varchar_column", "bigint_column") DO NOTHING;
It took 30 sec to insert 48819 records. looks slow, what can I try else? I'm running postgres on my Mac Pro with an SSD disk. Here are the settings:
postgres: image: postgres:13.2 volumes: - ~/pgdata:/var/lib/postgresql/data ports: - "5432:5432" environment: - POSTGRES_USER=pguser - POSTGRES_PASSWORD=pguser - POSTGRES_DB=pgdb - PGDATA=/var/lib/postgresql/data/pgdata
command: - "postgres" - "-c" - "max_connections=50" - "-c" - "shared_buffers=1GB" - "-c" - "effective_cache_size=1GB" - "-c" - "synchronous_commit=off"
SELECT * FROM pg_settings where pg_settings.name like '%commi%'
shows
{ "name": "synchronous_commit", "setting": "off" }
]
Strange thing is that LOGGED table performance is not much greater than UNLOGGED. Same 30 sec for 48819 inserted duplicated rows