Re: Delete tables difference involves seq scan - Mailing list pgsql-performance

From Danylo Hlynskyi
Subject Re: Delete tables difference involves seq scan
Date
Msg-id CANZg+yeKu3S=V=pf8xZzL=pfq9s1_Vbaa2x6dn8WAgtkKMKVKw@mail.gmail.com
Whole thread Raw
In response to Delete tables difference involves seq scan  (Danylo Hlynskyi <abcz2.uprola@gmail.com>)
Responses Re: Delete tables difference involves seq scan  (Danylo Hlynskyi <abcz2.uprola@gmail.com>)
List pgsql-performance
Oh, sorry, this happens on Postgresql 9.6.6. I've checked that on Postgresql 10.0 query plan from setup (1) now uses two seq scans, like in setup (2). 2017-12-01 15:03 GMT+02:00 Danylo Hlynskyi : > Hello. I want to remove rows from first table, that exist in second > (equality is done using PK). However I experience seq scan on second table, > which counters my intuition - I think it should be index-only. Because > tables are large, performance of query is very bad. > > However I got mixed results when trying to reproduce this behavior on > syntetic tables. Here I'll show 3 different plans, which I got for the same > query. > > 1. Setup is: > --------------------------- > create table diff (id uuid constraint diff_pkey primary key); > create table origin (id uuid constraint origin_pkey primary key); > --------------------------- > > The query generates correct plan, which performs only index scans: > > explain delete from origin where exists (select id from diff where > origin.id = diff.id); > QUERY PLAN > ------------------------------------------------------------ > ------------------------------- > Delete on origin (cost=0.30..105.56 rows=1850 width=12) > -> Merge Semi Join (cost=0.30..105.56 rows=1850 width=12) > Merge Cond: (origin.id = diff.id) > -> Index Scan using origin_pkey on origin (cost=0.15..38.90 > rows=1850 width=22) > -> Index Scan using diff_pkey on diff (cost=0.15..38.90 > rows=1850 width=22) > (5 rows) > > 2. Setup is: > -------------------------------- > create table origin (id uuid constraint origin_pkey primary key, data > jsonb); > create table diff (id uuid constraint diff_pkey primary key, data jsonb); > -------------------------------- > > The query generates plan with two seq scans: > > explain delete from origin where exists (select id from diff where > origin.id = diff.id); > QUERY PLAN > ------------------------------------------------------------ > --------------- > Delete on origin (cost=34.08..69.49 rows=1070 width=12) > -> Hash Semi Join (cost=34.08..69.49 rows=1070 width=12) > Hash Cond: (origin.id = diff.id) > -> Seq Scan on origin (cost=0.00..20.70 rows=1070 width=22) > -> Hash (cost=20.70..20.70 rows=1070 width=22) > -> Seq Scan on diff (cost=0.00..20.70 rows=1070 width=22) > (6 rows) > > 3. My real `origin` table has 26 fields and 800 billion rows, real `diff` > table has 12 million rows and the query generates plan with nested loop and > seq scan on `diff` table: > > explain delete from drug_refills origin where exists (select id from > drug_refills_diff diff where origin.id = diff.id); > QUERY PLAN > ------------------------------------------------------------ > ---------------------------------------------- > Delete on drug_refills origin (cost=0.57..22049570.11 rows=11975161 > width=12) > -> Nested Loop (cost=0.57..22049570.11 rows=11975161 width=12) > -> Seq Scan on drug_refills_diff diff (cost=0.00..720405.61 > rows=11975161 width=22) > -> Index Scan using drug_refills_pkey on drug_refills origin > (cost=0.57..1.77 rows=1 width=22) > Index Cond: (id = diff.id) > (5 rows) > > I have run ANALYZE on both tables, but it didn't help. Here are column > types in origin and diff (same schema), if that matters: > > uuid > timestamp with time zone > timestamp with time zone > character varying(255) > character varying(255) > character varying(1024) > numeric(10,4) > integer > numeric(14,8) > numeric(14,8) > numeric(14,8) > numeric(14,8) > numeric(14,8) > character varying(16) > character varying(16) > character varying(16) > character varying(16) > character varying(16) > character varying(16) > date > jsonb > text[] > uuid > uuid > uuid > uuid > >

pgsql-performance by date:

Previous
From: Danylo Hlynskyi
Date:
Subject: Delete tables difference involves seq scan
Next
From: Danylo Hlynskyi
Date:
Subject: Re: Delete tables difference involves seq scan