Re: [GENERAL] Why is this functional index not used? - Mailing list pgsql-general

From Rafia Sabih
Subject Re: [GENERAL] Why is this functional index not used?
Date
Msg-id CAOGQiiNkbpoH+kdTMSCEB4TTHe_fG7wEP5ocpC2Jo5iu4+zfSg@mail.gmail.com
Whole thread Raw
In response to [GENERAL] Why is this functional index not used?  (ibeq GmbH <info@ibeq.com>)
List pgsql-general


On Mon, Mar 20, 2017 at 10:58 PM, ibeq GmbH <info@ibeq.com> wrote:

Given a country table and an order table:

 

CREATE TABLE g.country

(

-- inherited from table g.standard:  oid uuid NOT NULL,

-- … some more columns inherited…

  lisocode integer NOT NULL, -- Numeric ISO 3166 code

  szisoalpha2 character varying(2), -- The 2 letter country code

  szisoalpha3 character varying(3), -- The 3 letter country code

  szisonum3 character varying(3), -- The NUMERIC country code with leading zeros

-- a few more columns here

CONSTRAINT country_pkey PRIMARY KEY (oid)

) INHERITS (g.standard)

WITH (

  OIDS=FALSE

);

 

This table is filled with ISO 3166 country codes.

 

The order table:

 

CREATE TABLE t."order"

(

-- inherited from table g.standard:  oid uuid NOT NULL,

-- … some more columns inherited…

  szxmladdress text,

-- many more columns in this table

CONSTRAINT country_pkey PRIMARY KEY (oid)

) INHERITS (g.standard)

WITH (

  OIDS=FALSE

);

 

 

A typical entry in t."order".szxmladdress looks like

 

<address entityType="Person" country="DE">

  <field name="Address.oid">ae0eb84f-9b8b-4fef-b87a-d6757bdfeaf9</field>

  <field name="Address.uaddrtyperef">0bbdb48c-21c7-429e-944e-59a4d9ace9d5</field>

  <field name="Address.szstreet">Hauptstraße</field>

  <!—some more xml tags here -->

</address>

 

No XML field in the order table exceeds 2kB.

 

Getting the 2 letter country code from the xml address by this function:

 

CREATE OR REPLACE FUNCTION g.getxmladdresscountryid(xaddr text)

  RETURNS uuid AS

$BODY$BEGIN

        RETURN (SELECT oID FROM G.Country WHERE szIsoAlpha2 = array_to_string(xpath('/address/@country', xAddr::xml), '') ORDER BY lIsoCode LIMIT 1);

END$BODY$

  LANGUAGE plpgsql IMMUTABLE;

 

I know that this function is not really IMMUTABLE but the country table is changing only every several years.

 

Created a functional index on the order table:

 

CREATE INDEX order_getxmladdresscountryid_fidx

  ON t."order"

  USING btree

  (g.getxmladdresscountryid(szxmladdress));

 

Joining order and country table limiting to 10 rows uses the index:

 

explain analyse

SELECT

GCountry.szISOAlpha2,

GCountry.szISOAlpha3,

GCountry.szISONum3

from

t.order torder

left join G.Country GCountry ON G.GetXmlAddressCountryID(TOrder.szXmlAddress) = GCountry.oID

limit 10

 

Gives:

 

Limit  (cost=0.56..8.45 rows=10 width=11) (actual time=0.644..4.764 rows=10 loops=1)

  ->  Merge Right Join  (cost=0.56..10670.45 rows=13517 width=11) (actual time=0.642..4.754 rows=10 loops=1)

        Merge Cond: (gcountry.oid = g.getxmladdresscountryid(torder.szxmladdress))

        ->  Index Scan using country_pkey on country gcountry  (cost=0.27..38.05 rows=258 width=27) (actual time=0.025..0.067 rows=32 loops=1)

        ->  Index Scan using order_getxmladdresscountryid_fidx on "order" torder  (cost=0.29..7019.04 rows=13517 width=366) (actual time=0.020..0.058 rows=10 loops=1)

Planning time: 0.603 ms

Execution time: 4.898 ms

 

But when I remove the “limit 10”, the index is no longer used:

 

Hash Left Join  (cost=17.81..5397.46 rows=13517 width=11) (actual time=0.941..4721.372 rows=13510 loops=1)

  Hash Cond: (g.getxmladdresscountryid(torder.szxmladdress) = gcountry.oid)

  ->  Seq Scan on "order" torder  (cost=0.00..3504.17 rows=13517 width=366) (actual time=0.011..27.542 rows=13510 loops=1)

  ->  Hash  (cost=14.58..14.58 rows=258 width=27) (actual time=0.427..0.427 rows=258 loops=1)

        Buckets: 1024  Batches: 1  Memory Usage: 23kB

        ->  Seq Scan on country gcountry  (cost=0.00..14.58 rows=258 width=27) (actual time=0.008..0.226 rows=258 loops=1)

Planning time: 0.580 ms

Execution time: 4728.602 ms

 

Event a “limit 100”  does not use the index any more.

 

However, the result of 4728 ms is almost a linear increase from 10 rows to 13500 rows.

 

I played around with COST of the function between 1 and 20000 and with several options on postgresql.conf without luck.

 

A sequential scan on the order table alone is carried out in 15 ms.

 

Thanks for any idea.

 

Klaus

 

Notice the number of output rows in both the cases, when number of output rows is low enough usage of index is beneficial otherwise it'll be costly to have that many random accesses. 
Anyhow, if you want to check the performance of an indexed plan for such a query, try lowering down the value of random_page_cost.

--
Regards,
Rafia Sabih

pgsql-general by date:

Previous
From: Rafia Sabih
Date:
Subject: Re: [GENERAL] Why so long?
Next
From: Vick Khera
Date:
Subject: Re: [GENERAL] Large data and slow queries