Home > mailing lists

More correlated (?) index woes - Mailing list pgsql-general

From	Geoff Winkless
Subject	More correlated (?) index woes
Date	March 28, 2016 22:23:53
Msg-id	CAEzk6fex6MRdMfGiiaJyA1U=Q+f0qF=G4iESXvHtaHLb0dfRpg@mail.gmail.com Whole thread Raw
Responses	Re: More correlated (?) index woes (rob stone <floriparob@gmail.com>) Re: More correlated (?) index woes (Geoff Winkless <pgsqladmin@geoff.dj>)
List	pgsql-general

Tree view

So I accept that when using MIN(sc_id) against scdate it makes statistical sense to use the sc_id index for a reasonable percentage of the full range of scdate, unless we know in advance that scdate is closely correlated to sc_id (because using MIN means we can stop immediately we hit a value).

However I'm now finding a similar problem when using a multi-table DELETE, where the same obviously can't apply.

This query:

DELETE FROM pa USING legs WHERE legs.scdate BETWEEN 20160220 AND 20160222 AND legs.sc_id=pa.sc_id;

does what one would hope:

Delete on pa (cost=0.99..705406.24 rows=36570 width=12)

-> Nested Loop (cost=0.99..705406.24 rows=36570 width=12)

-> Index Scan using legs_scdate_idx on legs (cost=0.43..1171.88 rows=14458 width=10)

Index Cond: ((scdate >= 20160220) AND (scdate <= 20160222))

-> Index Scan using pa_pkey on pa (cost=0.56..48.33 rows=38 width=10)

Index Cond: (sc_id = legs.sc_id)

However as soon as I add an extra test for field1 IS NULL, it apparently goes insane:

Delete on pa (cost=577260.90..626296.46 rows=23732 width=12) (actual time=41870.770..41870.770 rows=0 loops=1)

-> Hash Join (cost=577260.90..626296.46 rows=23732 width=12) (actual time=37886.396..41315.668 rows=44960 loops=1)

Hash Cond: (legs.sc_id = pa.sc_id)

-> Index Scan using legs_scdate_idx on legs (cost=0.43..1171.88 rows=14458 width=10) (actual time=0.030..13.667 rows=21281 loops=1)

Index Cond: ((scdate >= 20160220) AND (scdate <= 20160222))

-> Hash (cost=481691.12..481691.12 rows=5497868 width=10) (actual time=37805.756..37805.756 rows=4875870 loops=1)

Buckets: 131072 Batches: 64 Memory Usage: 4311kB

-> Seq Scan on pa (cost=0.00..481691.12 rows=5497868 width=10) (actual time=0.008..35869.304 rows=4875870 loops=1)

Filter: (field1 IS NULL)

Rows Removed by Filter: 2688634

Planning time: 0.447 ms

Execution time: 41870.832 ms

Running ANALYZE makes no difference.

Table pa has 7522676 rows, 4834042 of which have field1 NULL, so it's absolutely not reasonable to expect this to be an optimal strategy.

Any suggestions as to how I can improve this query?

Thanks :)

Geoff

pgsql-general by date:

From: Adrian Klaver
Date: 28 March 2016, 16:45:26
Subject: Re: View deleted records in a table

From: Mat Arye
Date: 28 March 2016, 23:41:07
Subject: Table size for partitioned setup

More correlated (?) index woes - Mailing list pgsql-general

Previous

Next