Thread: Question about explain of index scan

Question about explain of index scan

From

Hannu Krosing

Date:

02 September 2005, 03:36:30

How does Index scan perform a scan for overlapping Index Cond ?

If I get a plan like this, what will actually be performed if EXPLAIN
shows this:
Sort  (cost=12.90..12.91 rows=1 width=207)  Sort Key: log_actionseq  ->  Index Scan using sl_log_1_idx2_hu,
sl_log_1_idx2_hu,
sl_log_1_idx2_hu, sl_log_1_idx2_hu on sl_log_1  (cost=0.00..12.89 rows=1
width=207)        Index Cond: (  ((log_xid < '1349053093') AND (log_xid >= '1349052761')) 
OR ((log_xid < '1349053093') AND (log_xid >= '1349052761')) 
OR ((log_xid < '1349053093') AND (log_xid >= '1349052761')) 
OR ((log_xid < '1349053093') AND (log_xid >= '1349052761'))                    )

(this is from a query generated by Slony for 4 sets replicated from the
same master)

Will the same range be scanned 4 times ?

Or is the scan method smart enough to collapse them into one pass ?

Or does this actually mean 4 conactenated index scans (Index Scan using
X, X, X, X on sl_log_1) ?

-- 
Hannu Krosing <hannu@skype.net>

Re: Question about explain of index scan

From

Tom Lane

Date:

02 September 2005, 11:34:22

Hannu Krosing <hannu@skype.net> writes:
> If I get a plan like this, what will actually be performed if EXPLAIN
> shows this:

>  Sort  (cost=12.90..12.91 rows=1 width=207)
>    Sort Key: log_actionseq
>    ->  Index Scan using sl_log_1_idx2_hu, sl_log_1_idx2_hu,
> sl_log_1_idx2_hu, sl_log_1_idx2_hu on sl_log_1  (cost=0.00..12.89 rows=1
> width=207)
>          Index Cond: (
>    ((log_xid < '1349053093') AND (log_xid >= '1349052761')) 
> OR ((log_xid < '1349053093') AND (log_xid >= '1349052761')) 
> OR ((log_xid < '1349053093') AND (log_xid >= '1349052761')) 
> OR ((log_xid < '1349053093') AND (log_xid >= '1349052761'))
>                      )

> Will the same range be scanned 4 times ?

Yes.  However, I don't understand how you got that result; AFAIK the
planner should have eliminated the duplicate subclauses.  For example,
in 8.0 I get

regression=# explain select * from tenk1 where unique1 between 1 and 100 or unique1 between 1 and 100 or unique1
between1 and 100;                                  QUERY PLAN
 
---------------------------------------------------------------------------------Index Scan using tenk1_unique1 on
tenk1 (cost=0.00..360.63 rows=102 width=244)  Index Cond: ((unique1 >= 1) AND (unique1 <= 100))
 
(2 rows)

Is Slony doing something to bypass the planner?
        regards, tom lane

Re: Question about explain of index scan

From

Alvaro Herrera

Date:

02 September 2005, 11:58:43

On Fri, Sep 02, 2005 at 10:31:45AM -0400, Tom Lane wrote:
> Hannu Krosing <hannu@skype.net> writes:
> > If I get a plan like this, what will actually be performed if EXPLAIN
> > shows this:
> 
> >  Sort  (cost=12.90..12.91 rows=1 width=207)
> >    Sort Key: log_actionseq
> >    ->  Index Scan using sl_log_1_idx2_hu, sl_log_1_idx2_hu,
> > sl_log_1_idx2_hu, sl_log_1_idx2_hu on sl_log_1  (cost=0.00..12.89 rows=1
> > width=207)
> >          Index Cond: (
> >    ((log_xid < '1349053093') AND (log_xid >= '1349052761')) 
> > OR ((log_xid < '1349053093') AND (log_xid >= '1349052761')) 
> > OR ((log_xid < '1349053093') AND (log_xid >= '1349052761')) 
> > OR ((log_xid < '1349053093') AND (log_xid >= '1349052761'))
> >                      )
> 
> > Will the same range be scanned 4 times ?
> 
> Yes.  However, I don't understand how you got that result; AFAIK the
> planner should have eliminated the duplicate subclauses.

Maybe it has to do with the xxid datatype Slony-I adds; maybe it's
missing some operator or property.

I wonder why we don't support more operators on Xid, so these things are
avoided?  Right now we only have =, AFAIR.

-- 
Alvaro Herrera -- Valdivia, Chile         Architect, www.EnterpriseDB.com
"Hay quien adquiere la mala costumbre de ser infeliz" (M. A. Evans)

Re: Question about explain of index scan

From

Tom Lane

Date:

02 September 2005, 12:15:58

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> I wonder why we don't support more operators on Xid, so these things are
> avoided?  Right now we only have =, AFAIR.

I once started to make a btree opclass for XID, and stopped when it
occurred to me that XID comparison doesn't obey the transitive law.
btree won't like that...
        regards, tom lane

Re: Question about explain of index scan

From

Hannu Krosing

Date:

02 September 2005, 12:36:02

On R, 2005-09-02 at 10:31 -0400, Tom Lane wrote:
> Hannu Krosing <hannu@skype.net> writes:
> > If I get a plan like this, what will actually be performed if EXPLAIN
> > shows this:
> 
> >  Sort  (cost=12.90..12.91 rows=1 width=207)
> >    Sort Key: log_actionseq
> >    ->  Index Scan using sl_log_1_idx2_hu, sl_log_1_idx2_hu,
> > sl_log_1_idx2_hu, sl_log_1_idx2_hu on sl_log_1  (cost=0.00..12.89 rows=1
> > width=207)
> >          Index Cond: (
> >    ((log_xid < '1349053093') AND (log_xid >= '1349052761')) 
> > OR ((log_xid < '1349053093') AND (log_xid >= '1349052761')) 
> > OR ((log_xid < '1349053093') AND (log_xid >= '1349052761')) 
> > OR ((log_xid < '1349053093') AND (log_xid >= '1349052761'))
> >                      )
> 
> > Will the same range be scanned 4 times ?
> 
> Yes.  However, I don't understand how you got that result; AFAIK the
> planner should have eliminated the duplicate subclauses.  For example,
> in 8.0 I get

This was on 7.4, sorry for forgetting to mention it. I also edited out
xid types and filter expression. maybe that filter expression is also
something that is shown in a weird way for mulltiple range scans ?

the query was similar to this:

-----------------------------------------------------------------------------

select     log_origin, log_xid, log_tableid,     log_actionseq,
log_cmdtype, log_cmddata  from "_bbb_cluster".sl_log_1 where log_origin = 1 
and (  
( log_tableid in (3,9008,9007,9005,9004,2002,2001) 
and (log_xid < '1312955843'     and "_bbb_cluster".xxid_lt_snapshot(log_xid,
'1312950014:1312955843:''1312955836'',''1312955783'',''1312955806'',''1312950014'',''1312952044'''))

and (log_xid >= '1312942023'     and "_bbb_cluster".xxid_ge_snapshot(log_xid,
'1312942023:1312947935:''1312947917'',''1312947924'',''1312942023'',''1312946242'''))
) 
or 
( log_tableid in

(1002,1003,1013,1041,1037,1028,1026,1023,1031,1012,1048,1050,1046,1021,1019,1024,1027,1029,1025,1035,1011,1009,1010,1016,1032,1018,1030,1138)


and (log_xid < '1312955843'     and "_bbb_cluster".xxid_lt_snapshot(log_xid,
'1312950014:1312955843:''1312955836'',''1312955783'',''1312955806'',''1312950014'',''1312952044''')) 

and (log_xid >= '1312942023'     and "_bbb_cluster".xxid_ge_snapshot(log_xid,
'1312942023:1312947935:''1312947917'',''1312947924'',''1312942023'',''1312946242''')) 
)
or 
( log_tableid in
(7001,7008,7007,7004,7039,7002,7030,7018,7038,7003,7005,7006,7009,7011,7012,7013,7016,7021,
7022,7025,7026,7027,7028,7029,7031,7033,7034,7035,7036,7037,1075,9009,9011,9012,9013,9014,
9015,9016,9017,1051,1052,1053,1054,1055,1056,1057,1058,1059,1060,1061,1062,1063,1064,1065,
1066,1067,1068,1070,1071,1072,1073,1074,1076,1077,1078) 
and (log_xid < '1312955843'     and "_bbb_cluster".xxid_lt_snapshot(log_xid,
'1312950014:1312955843:''1312955836'',''1312955783'',''1312955806'',''1312950014'',''1312952044''')) 
and (log_xid >= '1312942023'     and "_bbb_cluster".xxid_ge_snapshot(log_xid,
'1312942023:1312947935:''1312947917'',''1312947924'',''1312942023'',''1312946242''')) 
)
or 
( log_tableid in

(7051,7050,7052,7053,7054,7055,7056,7057,7058,7059,7060,7061,7062,7063,7064,7065,7066,7067,7068,7069,7070,7071,7072,7073,7074)

and (log_xid < '1312955843'     and "_bbb_cluster".xxid_lt_snapshot(log_xid,
'1312950014:1312955843:''1312955836'',''1312955783'',''1312955806'',''1312950014'',''1312952044''')) 
and (log_xid >= '1312942023'     and "_bbb_cluster".xxid_ge_snapshot(log_xid,
'1312942023:1312947935:''1312947917'',''1312947924'',''1312942023'',''1312946242''')) ) 
) 
order by log_actionseq;

-----------------------------------------------------------------------------------

the table used is :

CREATE TABLE sl_log_1 (   log_origin integer,   log_xid xxid,   log_tableid integer,   log_actionseq bigint,
log_cmdtypecharacter(1),   log_cmddata text
 
);

CREATE INDEX sl_log_1_idx1 ON sl_log_1 USING btree (log_origin, log_xid,
log_actionseq);

ALTER TABLE sl_log_1 CLUSTER ON sl_log_1_idx1;

CREATE INDEX sl_log_1_idx2_hu ON sl_log_1 USING btree (log_xid);

-----------------------------------------------------------------------------------

to get this plan you need to disable seqscan.

without second index you get an indexscan using sl_log_1_idx1 for
log_origin (always 1 in my case) and a really heavy filter.

-- 
Hannu Krosing <hannu@skype.net>

Re: Question about explain of index scan

From

Alvaro Herrera

Date:

02 September 2005, 12:39:17

On Fri, Sep 02, 2005 at 11:03:24AM -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > I wonder why we don't support more operators on Xid, so these things are
> > avoided?  Right now we only have =, AFAIR.
> 
> I once started to make a btree opclass for XID, and stopped when it
> occurred to me that XID comparison doesn't obey the transitive law.
> btree won't like that...

Not having it does affect the planner somehow, right?

Maybe we could have the opclass but somehow dictate that making indexes
with it is verboten.

-- 
Alvaro Herrera -- Valdivia, Chile         Architect, www.EnterpriseDB.com
"Right now the sectors on the hard disk run clockwise, but I heard a rumor that
you can squeeze 0.2% more throughput by running them counterclockwise.
It's worth the effort. Recommended."  (Gerry Pourwelle)

Re: Question about explain of index scan

From

Tom Lane

Date:

02 September 2005, 12:47:01

Hannu Krosing <hannu@skype.net> writes:
> On R, 2005-09-02 at 10:31 -0400, Tom Lane wrote:
>> Yes.  However, I don't understand how you got that result; AFAIK the
>> planner should have eliminated the duplicate subclauses.

> the query was similar to this:
> [snip]

Oh, the OR arms are actually *not* equivalent because of the log_tableid
subclauses.  So that explains why canonicalize_qual() didn't eliminate
them.

[ experiments... ]  It looks like 8.0 and up are smart enough to
consolidate the identical extracted indexquals, but not 7.4.
        regards, tom lane

Re: Question about explain of index scan

From

Tom Lane

Date:

02 September 2005, 12:52:53

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On Fri, Sep 02, 2005 at 11:03:24AM -0400, Tom Lane wrote:
>> I once started to make a btree opclass for XID, and stopped when it
>> occurred to me that XID comparison doesn't obey the transitive law.

> Not having it does affect the planner somehow, right?
> Maybe we could have the opclass but somehow dictate that making indexes
> with it is verboten.

The reason it affects the planner is that the planner assumes that
operators found in a btree opclass obey the normal laws of comparison.
Such an opclass would certainly break predtest.c for instance, as it
uses the assumption of transitivity directly.

(In any case Hannu's problem seems to be unrelated to the datatype,
see followups.)
        regards, tom lane

Re: Question about explain of index scan

From

Hannu Krosing

Date:

04 September 2005, 16:44:24

On R, 2005-09-02 at 11:03 -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > I wonder why we don't support more operators on Xid, so these things are
> > avoided?  Right now we only have =, AFAIR.
> 
> I once started to make a btree opclass for XID, and stopped when it
> occurred to me that XID comparison doesn't obey the transitive law.
> btree won't like that...

Does this mean that Slony's usage of btree index on XID gives
(occasionally) wrong results ?

-- 
Hannu Krosing <hannu@skype.net>

Re: Question about explain of index scan

From

Tom Lane

Date:

04 September 2005, 19:24:04

Hannu Krosing <hannu@skype.net> writes:
> On R, 2005-09-02 at 11:03 -0400, Tom Lane wrote:
>> I once started to make a btree opclass for XID, and stopped when it
>> occurred to me that XID comparison doesn't obey the transitive law.
>> btree won't like that...

> Does this mean that Slony's usage of btree index on XID gives
> (occasionally) wrong results ?

I seem to recall some discussion of that in the archives (but can't find
it right now).  If they do actually make btree indexes on XIDs then they
are probably broken.

XID comparison works OK as long as you make sure that all the XIDs
extant in the system at any one time are within +/- 2 billion of each
other, and so transitivity does hold within that subset.  The problem
with a btree is that upper-level tree nodes are likely to contain page
boundary keys copied from data that vanished some time ago from the
underlying table.  VACUUM-like techniques can guarantee that the
underlying table is free of old XIDs before the wraparound horizon is
reached, but I don't know how much extra safety margin is needed to
guarantee no inconsistencies inside a btree index (if indeed any such
guarantee is possible at all).
        regards, tom lane

Re: Question about explain of index scan

From

Alvaro Herrera

Date:

04 September 2005, 19:36:37

On Sun, Sep 04, 2005 at 06:21:51PM -0400, Tom Lane wrote:

> XID comparison works OK as long as you make sure that all the XIDs
> extant in the system at any one time are within +/- 2 billion of each
> other, and so transitivity does hold within that subset.  The problem
> with a btree is that upper-level tree nodes are likely to contain page
> boundary keys copied from data that vanished some time ago from the
> underlying table.

So there would be no problem if a REINDEX was forced every two billion
transactions, right?  (A bit less, I think.)

-- 
Alvaro Herrera -- Valdivia, Chile         Architect, www.EnterpriseDB.com
"Oh, great altar of passive entertainment, bestow upon me thy discordant images
at such speed as to render linear thought impossible" (Calvin a la TV)

Re: Question about explain of index scan

From

Hannu Krosing

Date:

04 September 2005, 20:11:54

On P, 2005-09-04 at 18:21 -0400, Tom Lane wrote:
> Hannu Krosing <hannu@skype.net> writes:
> > On R, 2005-09-02 at 11:03 -0400, Tom Lane wrote:
> >> I once started to make a btree opclass for XID, and stopped when it
> >> occurred to me that XID comparison doesn't obey the transitive law.
> >> btree won't like that...
> 
> > Does this mean that Slony's usage of btree index on XID gives
> > (occasionally) wrong results ?
> 
> I seem to recall some discussion of that in the archives (but can't find
> it right now).  If they do actually make btree indexes on XIDs then they
> are probably broken.
> 
> XID comparison works OK as long as you make sure that all the XIDs
> extant in the system at any one time are within +/- 2 billion of each
> other, and so transitivity does hold within that subset. 

The table itself is very dynamic, only a a few hundred thousand rows are
live at any time, consumed in FIFO fashion.

> The problem
> with a btree is that upper-level tree nodes are likely to contain page
> boundary keys copied from data that vanished some time ago from the
> underlying table.

So a reindex / cluster every now and then should keep it usable ?

> VACUUM-like techniques can guarantee that the
> underlying table is free of old XIDs before the wraparound horizon is
> reached, but I don't know how much extra safety margin is needed to
> guarantee no inconsistencies inside a btree index (if indeed any such
> guarantee is possible at all).

I'm not sure how the ordering by xid works in slony though - after a
wraparound has happened, it seems very hard to determine the first XID,
even though the ordering between any two XID's may be ok.

-- 
Hannu Krosing <hannu@skype.net>

Re: Question about explain of index scan

From

Tom Lane

Date:

04 September 2005, 20:29:21

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On Sun, Sep 04, 2005 at 06:21:51PM -0400, Tom Lane wrote:
>> XID comparison works OK as long as you make sure that all the XIDs
>> extant in the system at any one time are within +/- 2 billion of each
>> other, and so transitivity does hold within that subset.  The problem
>> with a btree is that upper-level tree nodes are likely to contain page
>> boundary keys copied from data that vanished some time ago from the
>> underlying table.

> So there would be no problem if a REINDEX was forced every two billion
> transactions, right?  (A bit less, I think.)

That seems a bit brute-force, but it'd probably work.  (IIRC the
convention we use for vacuuming is to force some activity every 1
billion transactions, because waiting 2 billion leaves no safety
margin.)
        regards, tom lane