Thread: Seq Scan because of stats or because of cast?

Seq Scan because of stats or because of cast?

From
Dominique Devienne
Date:
I'm interested in the members of specific roles, providing the roles of interest to the query via an array of integers (binary bind in code, not textual array literal like I had to use to have the EXPLAIN work, see below).

I figured that query would use the "pg_auth_members_role_member_index" index,
but instead it's using a sequential scan.

And I'm wondering is this is because the cardinality of that catalog is small (172),
which is just an artifact of my dev-testing, or whether that's because I cast roleid
to an int4, preventing the use of the index?

In production, the cardinality will be much greator, which is why I worry a bit.
Also, I don't really need the grantor and admin_option columns for now, thus
it could even be an index-only scan, IF the index was used by the plan.

I tried changing the cast around, or allowing an index-only scan,
but it's still a Seq Scan on the table (see below).

Is there a way to know why the index is not used, in any of my attempts?

I currently does not support (binary) binding Oids in my case, thus the ::int4 casts.
Would supporting binding actual Oid arrays instead of Int4 arrays help in this case?

I'd appreciate some insights here. Thanks, --DD

PS: libpq uses unsigned int for Oid. So do OIDs go above 2GB in practice?
    I'm asking, since I'm casting to ::int4, thus if they do, then that case might overflow.

PPS: Are OIDs recycled / reused? Or are they monotonically increasing?
  What happens when the Cluster runs out of OIDs?
  Are they Cluster-wide unique or it depends on the OID type?

dd_pns2=> explain SELECT roleid::int4, member::int4, grantor::int4, admin_option
dd_pns2->   FROM pg_auth_members
dd_pns2->  WHERE roleid::int4 = ANY($1);
ERROR:  there is no parameter $1
LINE 3:  WHERE roleid::int4 = ANY($1);
                                  ^
dd_pns2=> explain SELECT roleid::int4, member::int4, grantor::int4, admin_option
dd_pns2->   FROM pg_auth_members
dd_pns2->  WHERE roleid::int4 = ANY(array[1,2,3]);
                           QUERY PLAN
----------------------------------------------------------------
 Seq Scan on pg_auth_members  (cost=0.00..5.33 rows=3 width=13)
   Filter: ((roleid)::integer = ANY ('{1,2,3}'::integer[]))
(2 rows)

dd_pns2=> explain SELECT roleid::int4, member::int4, grantor::int4, admin_option
dd_pns2->   FROM pg_auth_members
dd_pns2->  WHERE roleid = ANY(array[1::oid,2::oid,3::oid]);
                           QUERY PLAN
----------------------------------------------------------------
 Seq Scan on pg_auth_members  (cost=0.00..5.33 rows=3 width=13)
   Filter: (roleid = ANY ('{1,2,3}'::oid[]))
(2 rows)

dd_pns2=> explain SELECT roleid::int4, member::int4
dd_pns2->   FROM pg_auth_members
dd_pns2->  WHERE roleid = ANY(array[1::oid,2::oid,3::oid]);
                          QUERY PLAN
---------------------------------------------------------------
 Seq Scan on pg_auth_members  (cost=0.00..5.33 rows=3 width=8)
   Filter: (roleid = ANY ('{1,2,3}'::oid[]))
(2 rows)

dd_pns2=> \d pg_auth_members
           Table "pg_catalog.pg_auth_members"
    Column    |  Type   | Collation | Nullable | Default
--------------+---------+-----------+----------+---------
 roleid       | oid     |           | not null |
 member       | oid     |           | not null |
 grantor      | oid     |           | not null |
 admin_option | boolean |           | not null |
Indexes:
    "pg_auth_members_member_role_index" UNIQUE, btree (member, roleid), tablespace "pg_global"
    "pg_auth_members_role_member_index" UNIQUE, btree (roleid, member), tablespace "pg_global"
Tablespace: "pg_global"


dd_pns2=> select count(*) from pg_auth_members;
 count
-------
   172
(1 row)

Re: Seq Scan because of stats or because of cast?

From
Tom Lane
Date:
Dominique Devienne <ddevienne@gmail.com> writes:
> I figured that query would use the "pg_auth_members_role_member_index"
> index,
> but instead it's using a sequential scan.
> And I'm wondering is this is because the cardinality of that catalog is
> small (172),
> which is just an artifact of my dev-testing, or whether that's because I
> cast roleid
> to an int4, preventing the use of the index?

Both.

> Is there a way to know why the index is not used, in any of my attempts?

For testing purposes, you could set enable_seqscan = off and then see
whether the plan changes.  When I try this example I get

regression=# explain select * from pg_auth_members WHERE roleid = ANY(array[1::oid,2::oid,3::oid]);
                           QUERY PLAN
----------------------------------------------------------------
 Seq Scan on pg_auth_members  (cost=0.00..1.04 rows=3 width=19)
   Filter: (roleid = ANY ('{1,2,3}'::oid[]))
(2 rows)

regression=# set enable_seqscan to 0;
SET
regression=# explain select * from pg_auth_members WHERE roleid = ANY(array[1::oid,2::oid,3::oid]);
                                                QUERY PLAN
-----------------------------------------------------------------------------------------------------------
 Index Scan using pg_auth_members_role_member_index on pg_auth_members  (cost=0.13..12.44 rows=3 width=19)
   Index Cond: (roleid = ANY ('{1,2,3}'::oid[]))
(2 rows)

So it's clearly not going to use the index until pg_auth_members gets
a great deal larger than it is on my test installation --- but it's
capable of doing so once it looks cost-attractive.

> PS: libpq uses unsigned int for Oid. So do OIDs go above 2GB in practice?

Yes, eventually.

>     I'm asking, since I'm casting to ::int4, thus if they do, then that
> case might overflow.

I'd use int8.

> PPS: Are OIDs recycled / reused? Or are they monotonically increasing?
>   What happens when the Cluster runs out of OIDs?

The counter wraps around.  But it does so at 2^32 not 2^31.

>   Are they Cluster-wide unique or it depends on the OID type?

They're unique per catalog.  We don't attempt to guarantee
more than that.  In practice, they'll be unique across the
installation until after the first OID wraparound, and then
not so much.

            regards, tom lane



Re: Seq Scan because of stats or because of cast?

From
Dominique Devienne
Date:
On Mon, Mar 13, 2023 at 2:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
regards, tom lane

Thank you very much Tom. Very informative. 

Re: Seq Scan because of stats or because of cast?

From
Dominique Devienne
Date:
On Mon, Mar 13, 2023 at 2:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> PS: libpq uses unsigned int for Oid. So do OIDs go above 2GB in practice?

Yes, eventually.

OK, I've added support for native OIDs values (i.e. unsigned int, sizeof(4)) in my libpq wrapper.
Tested with binary binding and getting of scalar and array values.

But to truly test this is working OK, I'd need OIDs in the range [2^31, 2^32),
while the OIDs in my DB only reach in the 200M range.

So, any way to force the DB to create OIDs in that special range?
Without hosing my DB / Cluster that is... This is not a throw-away DB / Cluster.

Thanks, --DD