Thread: Question about explain of index scan
How does Index scan perform a scan for overlapping Index Cond ? If I get a plan like this, what will actually be performed if EXPLAIN shows this: Sort (cost=12.90..12.91 rows=1 width=207) Sort Key: log_actionseq -> Index Scan using sl_log_1_idx2_hu, sl_log_1_idx2_hu, sl_log_1_idx2_hu, sl_log_1_idx2_hu on sl_log_1 (cost=0.00..12.89 rows=1 width=207) Index Cond: ( ((log_xid < '1349053093') AND (log_xid >= '1349052761')) OR ((log_xid < '1349053093') AND (log_xid >= '1349052761')) OR ((log_xid < '1349053093') AND (log_xid >= '1349052761')) OR ((log_xid < '1349053093') AND (log_xid >= '1349052761')) ) (this is from a query generated by Slony for 4 sets replicated from the same master) Will the same range be scanned 4 times ? Or is the scan method smart enough to collapse them into one pass ? Or does this actually mean 4 conactenated index scans (Index Scan using X, X, X, X on sl_log_1) ? -- Hannu Krosing <hannu@skype.net>
Hannu Krosing <hannu@skype.net> writes: > If I get a plan like this, what will actually be performed if EXPLAIN > shows this: > Sort (cost=12.90..12.91 rows=1 width=207) > Sort Key: log_actionseq > -> Index Scan using sl_log_1_idx2_hu, sl_log_1_idx2_hu, > sl_log_1_idx2_hu, sl_log_1_idx2_hu on sl_log_1 (cost=0.00..12.89 rows=1 > width=207) > Index Cond: ( > ((log_xid < '1349053093') AND (log_xid >= '1349052761')) > OR ((log_xid < '1349053093') AND (log_xid >= '1349052761')) > OR ((log_xid < '1349053093') AND (log_xid >= '1349052761')) > OR ((log_xid < '1349053093') AND (log_xid >= '1349052761')) > ) > Will the same range be scanned 4 times ? Yes. However, I don't understand how you got that result; AFAIK the planner should have eliminated the duplicate subclauses. For example, in 8.0 I get regression=# explain select * from tenk1 where unique1 between 1 and 100 or unique1 between 1 and 100 or unique1 between1 and 100; QUERY PLAN ---------------------------------------------------------------------------------Index Scan using tenk1_unique1 on tenk1 (cost=0.00..360.63 rows=102 width=244) Index Cond: ((unique1 >= 1) AND (unique1 <= 100)) (2 rows) Is Slony doing something to bypass the planner? regards, tom lane
On Fri, Sep 02, 2005 at 10:31:45AM -0400, Tom Lane wrote: > Hannu Krosing <hannu@skype.net> writes: > > If I get a plan like this, what will actually be performed if EXPLAIN > > shows this: > > > Sort (cost=12.90..12.91 rows=1 width=207) > > Sort Key: log_actionseq > > -> Index Scan using sl_log_1_idx2_hu, sl_log_1_idx2_hu, > > sl_log_1_idx2_hu, sl_log_1_idx2_hu on sl_log_1 (cost=0.00..12.89 rows=1 > > width=207) > > Index Cond: ( > > ((log_xid < '1349053093') AND (log_xid >= '1349052761')) > > OR ((log_xid < '1349053093') AND (log_xid >= '1349052761')) > > OR ((log_xid < '1349053093') AND (log_xid >= '1349052761')) > > OR ((log_xid < '1349053093') AND (log_xid >= '1349052761')) > > ) > > > Will the same range be scanned 4 times ? > > Yes. However, I don't understand how you got that result; AFAIK the > planner should have eliminated the duplicate subclauses. Maybe it has to do with the xxid datatype Slony-I adds; maybe it's missing some operator or property. I wonder why we don't support more operators on Xid, so these things are avoided? Right now we only have =, AFAIR. -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com "Hay quien adquiere la mala costumbre de ser infeliz" (M. A. Evans)
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > I wonder why we don't support more operators on Xid, so these things are > avoided? Right now we only have =, AFAIR. I once started to make a btree opclass for XID, and stopped when it occurred to me that XID comparison doesn't obey the transitive law. btree won't like that... regards, tom lane
On R, 2005-09-02 at 10:31 -0400, Tom Lane wrote: > Hannu Krosing <hannu@skype.net> writes: > > If I get a plan like this, what will actually be performed if EXPLAIN > > shows this: > > > Sort (cost=12.90..12.91 rows=1 width=207) > > Sort Key: log_actionseq > > -> Index Scan using sl_log_1_idx2_hu, sl_log_1_idx2_hu, > > sl_log_1_idx2_hu, sl_log_1_idx2_hu on sl_log_1 (cost=0.00..12.89 rows=1 > > width=207) > > Index Cond: ( > > ((log_xid < '1349053093') AND (log_xid >= '1349052761')) > > OR ((log_xid < '1349053093') AND (log_xid >= '1349052761')) > > OR ((log_xid < '1349053093') AND (log_xid >= '1349052761')) > > OR ((log_xid < '1349053093') AND (log_xid >= '1349052761')) > > ) > > > Will the same range be scanned 4 times ? > > Yes. However, I don't understand how you got that result; AFAIK the > planner should have eliminated the duplicate subclauses. For example, > in 8.0 I get This was on 7.4, sorry for forgetting to mention it. I also edited out xid types and filter expression. maybe that filter expression is also something that is shown in a weird way for mulltiple range scans ? the query was similar to this: ----------------------------------------------------------------------------- select log_origin, log_xid, log_tableid, log_actionseq, log_cmdtype, log_cmddata from "_bbb_cluster".sl_log_1 where log_origin = 1 and ( ( log_tableid in (3,9008,9007,9005,9004,2002,2001) and (log_xid < '1312955843' and "_bbb_cluster".xxid_lt_snapshot(log_xid, '1312950014:1312955843:''1312955836'',''1312955783'',''1312955806'',''1312950014'',''1312952044''')) and (log_xid >= '1312942023' and "_bbb_cluster".xxid_ge_snapshot(log_xid, '1312942023:1312947935:''1312947917'',''1312947924'',''1312942023'',''1312946242''')) ) or ( log_tableid in (1002,1003,1013,1041,1037,1028,1026,1023,1031,1012,1048,1050,1046,1021,1019,1024,1027,1029,1025,1035,1011,1009,1010,1016,1032,1018,1030,1138) and (log_xid < '1312955843' and "_bbb_cluster".xxid_lt_snapshot(log_xid, '1312950014:1312955843:''1312955836'',''1312955783'',''1312955806'',''1312950014'',''1312952044''')) and (log_xid >= '1312942023' and "_bbb_cluster".xxid_ge_snapshot(log_xid, '1312942023:1312947935:''1312947917'',''1312947924'',''1312942023'',''1312946242''')) ) or ( log_tableid in (7001,7008,7007,7004,7039,7002,7030,7018,7038,7003,7005,7006,7009,7011,7012,7013,7016,7021, 7022,7025,7026,7027,7028,7029,7031,7033,7034,7035,7036,7037,1075,9009,9011,9012,9013,9014, 9015,9016,9017,1051,1052,1053,1054,1055,1056,1057,1058,1059,1060,1061,1062,1063,1064,1065, 1066,1067,1068,1070,1071,1072,1073,1074,1076,1077,1078) and (log_xid < '1312955843' and "_bbb_cluster".xxid_lt_snapshot(log_xid, '1312950014:1312955843:''1312955836'',''1312955783'',''1312955806'',''1312950014'',''1312952044''')) and (log_xid >= '1312942023' and "_bbb_cluster".xxid_ge_snapshot(log_xid, '1312942023:1312947935:''1312947917'',''1312947924'',''1312942023'',''1312946242''')) ) or ( log_tableid in (7051,7050,7052,7053,7054,7055,7056,7057,7058,7059,7060,7061,7062,7063,7064,7065,7066,7067,7068,7069,7070,7071,7072,7073,7074) and (log_xid < '1312955843' and "_bbb_cluster".xxid_lt_snapshot(log_xid, '1312950014:1312955843:''1312955836'',''1312955783'',''1312955806'',''1312950014'',''1312952044''')) and (log_xid >= '1312942023' and "_bbb_cluster".xxid_ge_snapshot(log_xid, '1312942023:1312947935:''1312947917'',''1312947924'',''1312942023'',''1312946242''')) ) ) order by log_actionseq; ----------------------------------------------------------------------------------- the table used is : CREATE TABLE sl_log_1 ( log_origin integer, log_xid xxid, log_tableid integer, log_actionseq bigint, log_cmdtypecharacter(1), log_cmddata text ); CREATE INDEX sl_log_1_idx1 ON sl_log_1 USING btree (log_origin, log_xid, log_actionseq); ALTER TABLE sl_log_1 CLUSTER ON sl_log_1_idx1; CREATE INDEX sl_log_1_idx2_hu ON sl_log_1 USING btree (log_xid); ----------------------------------------------------------------------------------- to get this plan you need to disable seqscan. without second index you get an indexscan using sl_log_1_idx1 for log_origin (always 1 in my case) and a really heavy filter. -- Hannu Krosing <hannu@skype.net>
On Fri, Sep 02, 2005 at 11:03:24AM -0400, Tom Lane wrote: > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > > I wonder why we don't support more operators on Xid, so these things are > > avoided? Right now we only have =, AFAIR. > > I once started to make a btree opclass for XID, and stopped when it > occurred to me that XID comparison doesn't obey the transitive law. > btree won't like that... Not having it does affect the planner somehow, right? Maybe we could have the opclass but somehow dictate that making indexes with it is verboten. -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com "Right now the sectors on the hard disk run clockwise, but I heard a rumor that you can squeeze 0.2% more throughput by running them counterclockwise. It's worth the effort. Recommended." (Gerry Pourwelle)
Hannu Krosing <hannu@skype.net> writes: > On R, 2005-09-02 at 10:31 -0400, Tom Lane wrote: >> Yes. However, I don't understand how you got that result; AFAIK the >> planner should have eliminated the duplicate subclauses. > the query was similar to this: > [snip] Oh, the OR arms are actually *not* equivalent because of the log_tableid subclauses. So that explains why canonicalize_qual() didn't eliminate them. [ experiments... ] It looks like 8.0 and up are smart enough to consolidate the identical extracted indexquals, but not 7.4. regards, tom lane
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > On Fri, Sep 02, 2005 at 11:03:24AM -0400, Tom Lane wrote: >> I once started to make a btree opclass for XID, and stopped when it >> occurred to me that XID comparison doesn't obey the transitive law. > Not having it does affect the planner somehow, right? > Maybe we could have the opclass but somehow dictate that making indexes > with it is verboten. The reason it affects the planner is that the planner assumes that operators found in a btree opclass obey the normal laws of comparison. Such an opclass would certainly break predtest.c for instance, as it uses the assumption of transitivity directly. (In any case Hannu's problem seems to be unrelated to the datatype, see followups.) regards, tom lane
On R, 2005-09-02 at 11:03 -0400, Tom Lane wrote: > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > > I wonder why we don't support more operators on Xid, so these things are > > avoided? Right now we only have =, AFAIR. > > I once started to make a btree opclass for XID, and stopped when it > occurred to me that XID comparison doesn't obey the transitive law. > btree won't like that... Does this mean that Slony's usage of btree index on XID gives (occasionally) wrong results ? -- Hannu Krosing <hannu@skype.net>
Hannu Krosing <hannu@skype.net> writes: > On R, 2005-09-02 at 11:03 -0400, Tom Lane wrote: >> I once started to make a btree opclass for XID, and stopped when it >> occurred to me that XID comparison doesn't obey the transitive law. >> btree won't like that... > Does this mean that Slony's usage of btree index on XID gives > (occasionally) wrong results ? I seem to recall some discussion of that in the archives (but can't find it right now). If they do actually make btree indexes on XIDs then they are probably broken. XID comparison works OK as long as you make sure that all the XIDs extant in the system at any one time are within +/- 2 billion of each other, and so transitivity does hold within that subset. The problem with a btree is that upper-level tree nodes are likely to contain page boundary keys copied from data that vanished some time ago from the underlying table. VACUUM-like techniques can guarantee that the underlying table is free of old XIDs before the wraparound horizon is reached, but I don't know how much extra safety margin is needed to guarantee no inconsistencies inside a btree index (if indeed any such guarantee is possible at all). regards, tom lane
On Sun, Sep 04, 2005 at 06:21:51PM -0400, Tom Lane wrote: > XID comparison works OK as long as you make sure that all the XIDs > extant in the system at any one time are within +/- 2 billion of each > other, and so transitivity does hold within that subset. The problem > with a btree is that upper-level tree nodes are likely to contain page > boundary keys copied from data that vanished some time ago from the > underlying table. So there would be no problem if a REINDEX was forced every two billion transactions, right? (A bit less, I think.) -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com "Oh, great altar of passive entertainment, bestow upon me thy discordant images at such speed as to render linear thought impossible" (Calvin a la TV)
On P, 2005-09-04 at 18:21 -0400, Tom Lane wrote: > Hannu Krosing <hannu@skype.net> writes: > > On R, 2005-09-02 at 11:03 -0400, Tom Lane wrote: > >> I once started to make a btree opclass for XID, and stopped when it > >> occurred to me that XID comparison doesn't obey the transitive law. > >> btree won't like that... > > > Does this mean that Slony's usage of btree index on XID gives > > (occasionally) wrong results ? > > I seem to recall some discussion of that in the archives (but can't find > it right now). If they do actually make btree indexes on XIDs then they > are probably broken. > > XID comparison works OK as long as you make sure that all the XIDs > extant in the system at any one time are within +/- 2 billion of each > other, and so transitivity does hold within that subset. The table itself is very dynamic, only a a few hundred thousand rows are live at any time, consumed in FIFO fashion. > The problem > with a btree is that upper-level tree nodes are likely to contain page > boundary keys copied from data that vanished some time ago from the > underlying table. So a reindex / cluster every now and then should keep it usable ? > VACUUM-like techniques can guarantee that the > underlying table is free of old XIDs before the wraparound horizon is > reached, but I don't know how much extra safety margin is needed to > guarantee no inconsistencies inside a btree index (if indeed any such > guarantee is possible at all). I'm not sure how the ordering by xid works in slony though - after a wraparound has happened, it seems very hard to determine the first XID, even though the ordering between any two XID's may be ok. -- Hannu Krosing <hannu@skype.net>
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > On Sun, Sep 04, 2005 at 06:21:51PM -0400, Tom Lane wrote: >> XID comparison works OK as long as you make sure that all the XIDs >> extant in the system at any one time are within +/- 2 billion of each >> other, and so transitivity does hold within that subset. The problem >> with a btree is that upper-level tree nodes are likely to contain page >> boundary keys copied from data that vanished some time ago from the >> underlying table. > So there would be no problem if a REINDEX was forced every two billion > transactions, right? (A bit less, I think.) That seems a bit brute-force, but it'd probably work. (IIRC the convention we use for vacuuming is to force some activity every 1 billion transactions, because waiting 2 billion leaves no safety margin.) regards, tom lane