Re: Interesting slow query - Mailing list pgsql-performance
From | PFC |
---|---|
Subject | Re: Interesting slow query |
Date | |
Msg-id | op.ta214xotcigqcu@apollo13 Whole thread Raw |
In response to | Re: Interesting slow query (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-performance |
> Usually we get complaints the other way around (that the NOT EXISTS > approach is a lot slower). Yes, I know ;) (I rephrased the query this way to exploit the fact that the planner would choose a nested loop) > You did not show any statistics, but I > suspect the key point here is that the condition id > 1130306 excludes > most or all of the A and D tables. Right. Actually : - Table r (raw_annonces) contains raw data waiting to be processed - Table a (annonces) contains processed data ready for display on the website (active data) - Table d (archive) contains old archived data which can be displayed on request but is normally excluded from the searches, which normally only hit recent records. This is to get speedy searches. So, records are added into the "raw" table, these have a SERIAL primary key. Then a script processes them and inserts the results into the active table. 15 days of "raw" records are kept, then they are deleted. Periodically old records from "annonces" are moved to the archive. The promary key stays the same in the 3 tables. The script knows at which id it stopped last time it was run, hence the (id > x) condition. Normally this excludes the entire "annonces" table, because we process only new records. > The planner is not smart about > making transitive inequality deductions, but you could help it along > by adding the implied clauses yourself: > EXPLAIN ANALYZE SELECT r.* FROM raw_annonces r LEFT JOIN annonces a ON (a.id=r.id AND a.id > 1180726) LEFT JOIN archive_data d ON (d.id=r.id AND d.id > 1180726) WHERE a.id IS NULL AND d.id IS NULL AND r.id > 1180726 order by id limit 1; > > Whether this is worth doing in your app depends on how often you do > searches at the end of the ID range ... Quite often actually, so I did the mod. The interesting part is that, yesterday after ANALYZE the query plan was horrible, and today, after adding new data I ANALYZED and retried the slow query, and it was fast again : EXPLAIN ANALYZE SELECT r.* FROM raw_annonces r LEFT JOIN annonces a ON (a.id=r.id) LEFT JOIN archive_data d ON (d.id=r.id) WHERE a.id IS NULL AND d.id IS NULL AND r.id > 1180726 order by id limit 1; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..10.42 rows=1 width=631) (actual time=0.076..0.076 rows=1 loops=1) -> Nested Loop Left Join (cost=0.00..7129.11 rows=684 width=631) (actual time=0.074..0.074 rows=1 loops=1) Filter: ("inner".id IS NULL) -> Nested Loop Left Join (cost=0.00..4608.71 rows=684 width=631) (actual time=0.064..0.064 rows=1 loops=1) Filter: ("inner".id IS NULL) -> Index Scan using raw_annonces_pkey on raw_annonces r (cost=0.00..667.56 rows=684 width=631) (actual time=0.013..0.013 rows=1 loops=1) Index Cond: (id > 1180726) -> Index Scan using annonces_pkey on annonces a (cost=0.00..5.75 rows=1 width=4) (actual time=0.046..0.046 rows=0 loops=1) Index Cond: (a.id = "outer".id) -> Index Scan using archive_data_pkey on archive_data d (cost=0.00..3.67 rows=1 width=4) (actual time=0.008..0.008 rows=0 loops=1) Index Cond: (d.id = "outer".id) Total runtime: 0.197 ms So I did a few tests... CREATE TABLE test.raw (id INTEGER PRIMARY KEY); CREATE TABLE test.active (id INTEGER PRIMARY KEY); CREATE TABLE test.archive (id INTEGER PRIMARY KEY); INSERT INTO test.archive SELECT * FROM generate_series( 1, 1000000 ); INSERT INTO test.active SELECT * FROM generate_series( 1000001, 1100000 ); INSERT INTO test.raw SELECT * FROM generate_series( 1050000, 1101000 ); VACUUM ANALYZE; So we have 1M archived records, 100K active, 51K in the "raw" table of which 1000 are new. Query 1: EXPLAIN ANALYZE SELECT * FROM test.raw AS raw LEFT JOIN test.active AS active ON (active.id=raw.id) LEFT JOIN test.archive AS archive ON (archive.id=raw.id) WHERE raw.id>1100000 AND active.id IS NULL AND archive.id IS NULL LIMIT 1; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..5.29 rows=1 width=12) (actual time=94.478..94.478 rows=1 loops=1) -> Nested Loop Left Join (cost=0.00..5400.09 rows=1021 width=12) (actual time=94.477..94.477 rows=1 loops=1) Filter: ("inner".id IS NULL) -> Merge Left Join (cost=0.00..2310.55 rows=1021 width=8) (actual time=94.458..94.458 rows=1 loops=1) Merge Cond: ("outer".id = "inner".id) Filter: ("inner".id IS NULL) -> Index Scan using raw_pkey on raw (cost=0.00..24.78 rows=1021 width=4) (actual time=0.016..0.016 rows=1 loops=1) Index Cond: (id > 1100000) -> Index Scan using active_pkey on active (cost=0.00..2023.00 rows=100000 width=4) (actual time=0.005..76.572 rows=100000 loops=1) -> Index Scan using archive_pkey on archive (cost=0.00..3.01 rows=1 width=4) (actual time=0.013..0.013 rows=0 loops=1) Index Cond: (archive.id = "outer".id) Total runtime: 94.550 ms Query 2: EXPLAIN ANALYZE SELECT * FROM test.raw AS raw LEFT JOIN test.active AS active ON (active.id=raw.id AND active.id>1100000) LEFT JOIN test.archive AS archive ON (archive.id=raw.id AND archive.id > 1100000) WHERE raw.id>1100000 AND active.id IS NULL AND archive.id IS NULL LIMIT 1; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..0.04 rows=1 width=12) (actual time=0.035..0.035 rows=1 loops=1) -> Merge Left Join (cost=0.00..37.67 rows=1021 width=12) (actual time=0.034..0.034 rows=1 loops=1) Merge Cond: ("outer".id = "inner".id) Filter: ("inner".id IS NULL) -> Merge Left Join (cost=0.00..30.51 rows=1021 width=8) (actual time=0.026..0.026 rows=1 loops=1) Merge Cond: ("outer".id = "inner".id) Filter: ("inner".id IS NULL) -> Index Scan using raw_pkey on raw (cost=0.00..24.78 rows=1021 width=4) (actual time=0.016..0.016 rows=1 loops=1) Index Cond: (id > 1100000) -> Index Scan using active_pkey on active (cost=0.00..3.14 rows=10 width=4) (actual time=0.006..0.006 rows=0 loops=1) Index Cond: (id > 1100000) -> Index Scan using archive_pkey on archive (cost=0.00..4.35 rows=100 width=4) (actual time=0.007..0.007 rows=0 loops=1) Index Cond: (id > 1100000) Total runtime: 0.101 ms OK, you were right ;) Query 3: EXPLAIN ANALYZE SELECT * FROM test.raw AS raw WHERE raw.id > 1100000 AND NOT EXISTS (SELECT 1 FROM test.active AS a WHERE a.id=raw.id) AND NOT EXISTS (SELECT 1 FROM test.archive AS a WHERE a.id=raw.id) LIMIT 1; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..24.23 rows=1 width=4) (actual time=0.036..0.036 rows=1 loops=1) -> Index Scan using raw_pkey on raw (cost=0.00..6178.35 rows=255 width=4) (actual time=0.035..0.035 rows=1 loops=1) Index Cond: (id > 1100000) Filter: ((NOT (subplan)) AND (NOT (subplan))) SubPlan -> Index Scan using archive_pkey on archive a (cost=0.00..3.01 rows=1 width=0) (actual time=0.007..0.007 rows=0 loops=1) Index Cond: (id = $0) -> Index Scan using active_pkey on active a (cost=0.00..3.01 rows=1 width=0) (actual time=0.008..0.008 rows=0 loops=1) Index Cond: (id = $0) Total runtime: 0.086 ms I see a problem with Query 1: The Merge Join goes through tables "raw" and "active" in sorted order. "archive" contains values 1-1000000 "active" contains values 1000001-1100000 "raw" contains values 1050000-1101000 However it starts at the beginning of "active" ; it would be smarter to start the index scan of "active" at the lowest value in "raw", ie. to seek into the right position into the index before beginning to scan it. This is achieved by your advice on manually adding the "id > x" conditions in the query. However, if I want to join the full tables, dropping the id>x condition : EXPLAIN ANALYZE SELECT * FROM test.raw AS raw LEFT JOIN test.active AS active ON (active.id=raw.id) LEFT JOIN test.archive AS archive ON (archive.id=raw.id) WHERE active.id IS NULL AND archive.id IS NULL; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------- Merge Left Join (cost=0.00..27305.04 rows=51001 width=12) (actual time=837.196..838.099 rows=1000 loops=1) Merge Cond: ("outer".id = "inner".id) Filter: ("inner".id IS NULL) -> Merge Left Join (cost=0.00..3943.52 rows=51001 width=8) (actual time=153.495..154.190 rows=1000 loops=1) Merge Cond: ("outer".id = "inner".id) Filter: ("inner".id IS NULL) -> Index Scan using raw_pkey on raw (cost=0.00..1033.01 rows=51001 width=4) (actual time=0.012..23.085 rows=51001 loops=1) -> Index Scan using active_pkey on active (cost=0.00..2023.00 rows=100000 width=4) (actual time=0.004..47.333 rows=100000 loops=1) -> Index Scan using archive_pkey on archive (cost=0.00..20224.00 rows=1000000 width=4) (actual time=0.043..501.953 rows=1000000 loops=1) Total runtime: 838.272 ms This is very slow : the Index Scans on "active" and "archive" have to skip a huge number of rows before getting to the first interesting row. We know that rows in "active" and "archive" will be of no use if their id is < (SELECT min(id) FROM test.raw) which is 1050000. Let's rephrase : EXPLAIN ANALYZE SELECT * FROM test.raw AS raw LEFT JOIN test.active AS active ON (active.id=raw.id AND active.id >= 1050000) LEFT JOIN test.archive AS archive ON (archive.id=raw.id AND archive.id >= 1050000) WHERE active.id IS NULL AND archive.id IS NULL; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- Merge Left Join (cost=0.00..2837.93 rows=51001 width=12) (actual time=114.590..115.451 rows=1000 loops=1) Merge Cond: ("outer".id = "inner".id) Filter: ("inner".id IS NULL) -> Merge Left Join (cost=0.00..2705.78 rows=51001 width=8) (actual time=114.576..115.239 rows=1000 loops=1) Merge Cond: ("outer".id = "inner".id) Filter: ("inner".id IS NULL) -> Index Scan using raw_pkey on raw (cost=0.00..1033.01 rows=51001 width=4) (actual time=0.012..51.505 rows=51001 loops=1) -> Index Scan using active_pkey on active (cost=0.00..1158.32 rows=50913 width=4) (actual time=0.009..22.312 rows=50001 loops=1) Index Cond: (id >= 1050000) -> Index Scan using archive_pkey on archive (cost=0.00..4.35 rows=100 width=4) (actual time=0.012..0.012 rows=0 loops=1) Index Cond: (id >= 1050000) Total runtime: 115.601 ms So here's my point : the first operation in the Index Scan in a merge join could be to seek to the right position in the index before scanning it. This value is known : it is the first value yielded by the index scan on "raw". This would remove the need for teaching the planner about transitivity, and also optimize this case where transitivity is useless.
pgsql-performance by date: