Thread: partitioned table and ORDER BY indexed_field DESC LIMIT 1
Hi. I just created partitioned table, n_traf, sliced by month (n_traf_y2007m01, n_traf_y2007m02... and so on, see below). They are indexed by 'date_time' column. Then I populate it (last value have date 2007-08-...) and do VACUUM ANALYZE ON n_traf_y2007... all of it. Now I try to select latest value (ORDER BY date_time LIMIT 1), but Postgres produced the ugly plan: =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1; QUERY PLAN --------------------------------------------------------------------------------------------------------- Limit (cost=824637.69..824637.69 rows=1 width=32) -> Sort (cost=824637.69..838746.44 rows=5643499 width=32) Sort Key: public.n_traf.date_time -> Result (cost=0.00..100877.99 rows=5643499 width=32) -> Append (cost=0.00..100877.99 rows=5643499 width=32) -> Seq Scan on n_traf (cost=0.00..22.30 rows=1230 width=32) -> Seq Scan on n_traf_y2007m01 n_traf (cost=0.00..22.30 rows=1230 width=32) -> Seq Scan on n_traf_y2007m02 n_traf (cost=0.00..22.30 rows=1230 width=32) -> Seq Scan on n_traf_y2007m03 n_traf (cost=0.00..22.30 rows=1230 width=32) -> Seq Scan on n_traf_y2007m04 n_traf (cost=0.00..1.01 rows=1 width=32) -> Seq Scan on n_traf_y2007m05 n_traf (cost=0.00..9110.89 rows=509689 width=32) -> Seq Scan on n_traf_y2007m06 n_traf (cost=0.00..32003.89 rows=1790489 width=32) -> Seq Scan on n_traf_y2007m07 n_traf (cost=0.00..33881.10 rows=1895510 width=32) -> Seq Scan on n_traf_y2007m08 n_traf (cost=0.00..25702.70 rows=1437970 width=32) -> Seq Scan on n_traf_y2007m09 n_traf (cost=0.00..22.30 rows=1230 width=32) -> Seq Scan on n_traf_y2007m10 n_traf (cost=0.00..22.30 rows=1230 width=32) -> Seq Scan on n_traf_y2007m11 n_traf (cost=0.00..22.30 rows=1230 width=32) -> Seq Scan on n_traf_y2007m12 n_traf (cost=0.00..22.30 rows=1230 width=32) (18 rows) Why it no uses indexes at all? ------------------------------------------- The simplier query goes fast, use index. =# explain analyze SELECT * FROM n_traf_y2007m08 ORDER BY date_time DESC LIMIT 1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=0.00..0.03 rows=1 width=32) (actual time=0.156..0.158 rows=1 loops=1) -> Index Scan Backward using n_traf_y2007m08_date_time_login_id on n_traf_y2007m08 (cost=0.00..39489.48 rows=1437970 width=32) (actual time=0.150..0.150 rows=1 loops=1) Total runtime: 0.241 ms (3 rows) Table n_traf looks like this: =# \d n_traf Table "public.n_traf" Column | Type | Modifiers -------------+-----------------------------+-------------------- login_id | integer | not null traftype_id | integer | not null date_time | timestamp without time zone | not null bytes_in | bigint | not null default 0 bytes_out | bigint | not null default 0 Indexes: "n_traf_login_id_key" UNIQUE, btree (login_id, traftype_id, date_time) "n_traf_date_time_login_id" btree (date_time, login_id) Foreign-key constraints: "n_traf_login_id_fkey" FOREIGN KEY (login_id) REFERENCES n_logins(login_id) ON UPDATE CASCADE ON DELETE CASCADE "n_traf_traftype_id_fkey" FOREIGN KEY (traftype_id) REFERENCES n_traftypes(traftype_id) ON UPDATE CASCADE Rules: n_traf_insert_y2007m01 AS ON INSERT TO n_traf WHERE new.date_time >= '2007-01-01'::date AND new.date_time < '2007-02-01 00:00:00'::timestamp without time zone DO INSTEAD INSERT INTO n_traf_y2007m01 (login_id, traftype_id, date_time, bytes_in, bytes_out) VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in, new.bytes_out) n_traf_insert_y2007m02 AS ON INSERT TO n_traf WHERE new.date_time >= '2007-02-01'::date AND new.date_time < '2007-03-01 00:00:00'::timestamp without time zone DO INSTEAD INSERT INTO n_traf_y2007m02 (login_id, traftype_id, date_time, bytes_in, bytes_out) VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in, new.bytes_out) n_traf_insert_y2007m03 AS ON INSERT TO n_traf WHERE new.date_time >= '2007-03-01'::date AND new.date_time < '2007-04-01 00:00:00'::timestamp without time zone DO INSTEAD INSERT INTO n_traf_y2007m03 (login_id, traftype_id, date_time, bytes_in, bytes_out) VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in, new.bytes_out) n_traf_insert_y2007m04 AS ON INSERT TO n_traf WHERE new.date_time >= '2007-04-01'::date AND new.date_time < '2007-05-01 00:00:00'::timestamp without time zone DO INSTEAD INSERT INTO n_traf_y2007m04 (login_id, traftype_id, date_time, bytes_in, bytes_out) VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in, new.bytes_out) n_traf_insert_y2007m05 AS ON INSERT TO n_traf WHERE new.date_time >= '2007-05-01'::date AND new.date_time < '2007-06-01 00:00:00'::timestamp without time zone DO INSTEAD INSERT INTO n_traf_y2007m05 (login_id, traftype_id, date_time, bytes_in, bytes_out) VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in, new.bytes_out) n_traf_insert_y2007m06 AS ON INSERT TO n_traf WHERE new.date_time >= '2007-06-01'::date AND new.date_time < '2007-07-01 00:00:00'::timestamp without time zone DO INSTEAD INSERT INTO n_traf_y2007m06 (login_id, traftype_id, date_time, bytes_in, bytes_out) VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in, new.bytes_out) n_traf_insert_y2007m07 AS ON INSERT TO n_traf WHERE new.date_time >= '2007-07-01'::date AND new.date_time < '2007-08-01 00:00:00'::timestamp without time zone DO INSTEAD INSERT INTO n_traf_y2007m07 (login_id, traftype_id, date_time, bytes_in, bytes_out) VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in, new.bytes_out) n_traf_insert_y2007m08 AS ON INSERT TO n_traf WHERE new.date_time >= '2007-08-01'::date AND new.date_time < '2007-09-01 00:00:00'::timestamp without time zone DO INSTEAD INSERT INTO n_traf_y2007m08 (login_id, traftype_id, date_time, bytes_in, bytes_out) VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in, new.bytes_out) n_traf_insert_y2007m09 AS ON INSERT TO n_traf WHERE new.date_time >= '2007-09-01'::date AND new.date_time < '2007-10-01 00:00:00'::timestamp without time zone DO INSTEAD INSERT INTO n_traf_y2007m09 (login_id, traftype_id, date_time, bytes_in, bytes_out) VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in, new.bytes_out) n_traf_insert_y2007m10 AS ON INSERT TO n_traf WHERE new.date_time >= '2007-10-01'::date AND new.date_time < '2007-11-01 00:00:00'::timestamp without time zone DO INSTEAD INSERT INTO n_traf_y2007m10 (login_id, traftype_id, date_time, bytes_in, bytes_out) VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in, new.bytes_out) n_traf_insert_y2007m11 AS ON INSERT TO n_traf WHERE new.date_time >= '2007-11-01'::date AND new.date_time < '2007-12-01 00:00:00'::timestamp without time zone DO INSTEAD INSERT INTO n_traf_y2007m11 (login_id, traftype_id, date_time, bytes_in, bytes_out) VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in, new.bytes_out) n_traf_insert_y2007m12 AS ON INSERT TO n_traf WHERE new.date_time >= '2007-12-01'::date AND new.date_time < '2008-01-01 00:00:00'::timestamp without time zone DO INSTEAD INSERT INTO n_traf_y2007m12 (login_id, traftype_id, date_time, bytes_in, bytes_out) VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in, new.bytes_out) Tables n_traf_y2007m... looks like these Table "public.n_traf_y2007m01" Column | Type | Modifiers -------------+-----------------------------+-------------------- login_id | integer | not null traftype_id | integer | not null date_time | timestamp without time zone | not null bytes_in | bigint | not null default 0 bytes_out | bigint | not null default 0 Indexes: "n_traf_y2007m01_date_time_login_id" btree (date_time, login_id) Check constraints: "n_traf_y2007m01_date_time_check" CHECK (date_time >= '2007-01-01'::date AND date_time < '2007-02-01 00:00:00'::timestamp without time zone) Inherits: n_traf Index "public.n_traf_y2007m01_date_time_login_id" Column | Type -----------+----------------------------- date_time | timestamp without time zone login_id | integer btree, for table "public.n_traf_y2007m01" Table "public.n_traf_y2007m02" Column | Type | Modifiers -------------+-----------------------------+-------------------- login_id | integer | not null traftype_id | integer | not null date_time | timestamp without time zone | not null bytes_in | bigint | not null default 0 bytes_out | bigint | not null default 0 Indexes: "n_traf_y2007m02_date_time_login_id" btree (date_time, login_id) Check constraints: "n_traf_y2007m02_date_time_check" CHECK (date_time >= '2007-02-01'::date AND date_time < '2007-03-01 00:00:00'::timestamp without time zone) Inherits: n_traf ... -- engineer
On 8/24/07, Anton <anton200@gmail.com> wrote:
Hi.
I just created partitioned table, n_traf, sliced by month
(n_traf_y2007m01, n_traf_y2007m02... and so on, see below). They are
indexed by 'date_time' column.
Then I populate it (last value have date 2007-08-...) and do VACUUM
ANALYZE ON n_traf_y2007... all of it.
Now I try to select latest value (ORDER BY date_time LIMIT 1), but
Postgres produced the ugly plan:
=# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Limit (cost=824637.69..824637.69 rows=1 width=32)
-> Sort (cost=824637.69..838746.44 rows=5643499 width=32)
Sort Key: public.n_traf.date_time
-> Result (cost=0.00..100877.99 rows=5643499 width=32)
-> Append (cost= 0.00..100877.99 rows=5643499 width=32)
-> Seq Scan on n_traf (cost=0.00..22.30
rows=1230 width=32)
-> Seq Scan on n_traf_y2007m01 n_traf
(cost=0.00..22.30 rows=1230 width=32)
-> Seq Scan on n_traf_y2007m02 n_traf
(cost=0.00..22.30 rows=1230 width=32)
-> Seq Scan on n_traf_y2007m03 n_traf
(cost=0.00..22.30 rows=1230 width=32)
-> Seq Scan on n_traf_y2007m04 n_traf
(cost=0.00..1.01 rows=1 width=32)
-> Seq Scan on n_traf_y2007m05 n_traf
(cost=0.00..9110.89 rows=509689 width=32)
-> Seq Scan on n_traf_y2007m06 n_traf
(cost= 0.00..32003.89 rows=1790489 width=32)
-> Seq Scan on n_traf_y2007m07 n_traf
(cost=0.00..33881.10 rows=1895510 width=32)
-> Seq Scan on n_traf_y2007m08 n_traf
(cost= 0.00..25702.70 rows=1437970 width=32)
-> Seq Scan on n_traf_y2007m09 n_traf
(cost=0.00..22.30 rows=1230 width=32)
-> Seq Scan on n_traf_y2007m10 n_traf
(cost=0.00..22.30 rows=1230 width=32)
-> Seq Scan on n_traf_y2007m11 n_traf
(cost=0.00..22.30 rows=1230 width=32)
-> Seq Scan on n_traf_y2007m12 n_traf
(cost=0.00..22.30 rows=1230 width=32)
(18 rows)
Why it no uses indexes at all?
-------------------------------------------
I'm no expert but I'd guess that the the planner doesn't know which partition holds the latest time so it has to read them all.
Regards
MP
> > =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1; > > QUERY PLAN > --------------------------------------------------------------------------------------------------------- > > Limit (cost=824637.69..824637.69 rows=1 width=32) > > -> Sort (cost=824637.69..838746.44 rows=5643499 width=32) > > Sort Key: public.n_traf.date_time > > -> Result (cost=0.00..100877.99 rows=5643499 width=32) > > -> Append (cost= 0.00..100877.99 rows=5643499 width=32) > > -> Seq Scan on n_traf (cost=0.00..22.30 > > rows=1230 width=32) > > -> Seq Scan on n_traf_y2007m01 n_traf > > (cost=0.00..22.30 rows=1230 width=32) ... > > -> Seq Scan on n_traf_y2007m12 n_traf > > (cost=0.00..22.30 rows=1230 width=32) > > (18 rows) > > > > Why it no uses indexes at all? > > ------------------------------------------- > I'm no expert but I'd guess that the the planner doesn't know which > partition holds the latest time so it has to read them all. Agree. But why it not uses indexes when it reading them? -- engineer
Anton wrote: >>> =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1; >>> QUERY PLAN >> --------------------------------------------------------------------------------------------------------- >>> Limit (cost=824637.69..824637.69 rows=1 width=32) >>> -> Sort (cost=824637.69..838746.44 rows=5643499 width=32) >>> Sort Key: public.n_traf.date_time >>> -> Result (cost=0.00..100877.99 rows=5643499 width=32) >>> -> Append (cost= 0.00..100877.99 rows=5643499 width=32) >>> -> Seq Scan on n_traf (cost=0.00..22.30 >>> rows=1230 width=32) >>> -> Seq Scan on n_traf_y2007m01 n_traf >>> (cost=0.00..22.30 rows=1230 width=32) > ... >>> -> Seq Scan on n_traf_y2007m12 n_traf >>> (cost=0.00..22.30 rows=1230 width=32) >>> (18 rows) >>> >>> Why it no uses indexes at all? >>> ------------------------------------------- >> I'm no expert but I'd guess that the the planner doesn't know which >> partition holds the latest time so it has to read them all. > > Agree. But why it not uses indexes when it reading them? The planner isn't smart enough to push the "ORDER BY ... LIMIT ..." below the append node. Therefore it needs to fetch all rows from all the tables, and the fastest way to do that is a seq scan. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
We just fixed this - I'll post a patch, but I don't have time to verify against HEAD. - Luke On 8/24/07 3:38 AM, "Heikki Linnakangas" <heikki@enterprisedb.com> wrote: > Anton wrote: >>>> =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1; >>>> QUERY PLAN >>> ---------------------------------------------------------------------------- >>> ----------------------------- >>>> Limit (cost=824637.69..824637.69 rows=1 width=32) >>>> -> Sort (cost=824637.69..838746.44 rows=5643499 width=32) >>>> Sort Key: public.n_traf.date_time >>>> -> Result (cost=0.00..100877.99 rows=5643499 width=32) >>>> -> Append (cost= 0.00..100877.99 rows=5643499 width=32) >>>> -> Seq Scan on n_traf (cost=0.00..22.30 >>>> rows=1230 width=32) >>>> -> Seq Scan on n_traf_y2007m01 n_traf >>>> (cost=0.00..22.30 rows=1230 width=32) >> ... >>>> -> Seq Scan on n_traf_y2007m12 n_traf >>>> (cost=0.00..22.30 rows=1230 width=32) >>>> (18 rows) >>>> >>>> Why it no uses indexes at all? >>>> ------------------------------------------- >>> I'm no expert but I'd guess that the the planner doesn't know which >>> partition holds the latest time so it has to read them all. >> >> Agree. But why it not uses indexes when it reading them? > > The planner isn't smart enough to push the "ORDER BY ... LIMIT ..." > below the append node. Therefore it needs to fetch all rows from all the > tables, and the fastest way to do that is a seq scan.
Below is a patch against 8.2.4 (more or less), Heikki can you take a look at it? This enables the use of index scan of a child table by recognizing sort order of the append node. Kurt Harriman did the work. - Luke Index: cdb-pg/src/backend/optimizer/path/indxpath.c =================================================================== RCS file: /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz er/path/indxpath.c,v diff -u -N -r1.22 -r1.22.2.1 --- cdb-pg/src/backend/optimizer/path/indxpath.c 25 Apr 2007 22:07:21 -0000 1.22 +++ cdb-pg/src/backend/optimizer/path/indxpath.c 10 Aug 2007 03:41:15 -0000 1.22.2.1 @@ -379,8 +379,51 @@ index_pathkeys = build_index_pathkeys(root, index, ForwardScanDirection, true); - useful_pathkeys = truncate_useless_pathkeys(root, rel, - index_pathkeys); + /* + * CDB: For appendrel child, pathkeys contain Var nodes in terms + * of the child's baserel. Transform the pathkey list to refer to + * columns of the appendrel. + */ + if (rel->reloptkind == RELOPT_OTHER_MEMBER_REL) + { + AppendRelInfo *appinfo = NULL; + RelOptInfo *appendrel = NULL; + ListCell *appcell; + CdbPathLocus notalocus; + + /* Find the appendrel of which this baserel is a child. */ + foreach(appcell, root->append_rel_list) + { + appinfo = (AppendRelInfo *)lfirst(appcell); + if (appinfo->child_relid == rel->relid) + break; + } + Assert(appinfo); + appendrel = find_base_rel(root, appinfo->parent_relid); + + /* + * The pathkey list happens to have the same format as the + * partitioning key of a Hashed locus, so by disguising it + * we can use cdbpathlocus_pull_above_projection() to do the + * transformation. + */ + CdbPathLocus_MakeHashed(¬alocus, index_pathkeys); + notalocus = + cdbpathlocus_pull_above_projection(root, + notalocus, + rel->relids, + rel->reltargetlist, + appendrel->reltargetlist, + appendrel->relid); + if (CdbPathLocus_IsHashed(notalocus)) + index_pathkeys = truncate_useless_pathkeys(root, appendrel, + notalocus.partkey); + else + index_pathkeys = NULL; + } + + useful_pathkeys = truncate_useless_pathkeys(root, rel, + index_pathkeys); } else useful_pathkeys = NIL; Index: cdb-pg/src/backend/optimizer/path/pathkeys.c =================================================================== RCS file: /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz er/path/pathkeys.c,v diff -u -N -r1.18 -r1.18.2.1 --- cdb-pg/src/backend/optimizer/path/pathkeys.c 30 Apr 2007 05:44:07 -0000 1.18 +++ cdb-pg/src/backend/optimizer/path/pathkeys.c 10 Aug 2007 03:41:15 -0000 1.18.2.1 @@ -1403,55 +1403,53 @@ { PathKeyItem *item; Expr *newexpr; + AttrNumber targetindex; Assert(pathkey); - /* Use constant expr if available. Will be at head of list. */ - if (CdbPathkeyEqualsConstant(pathkey)) + /* Find an expr that we can rewrite to use the projected columns. */ + item = cdbpullup_findPathKeyItemInTargetList(pathkey, + relids, + targetlist, + &targetindex); // OUT + + /* If not found, see if the equiv class contains a constant expr. */ + if (!item && + CdbPathkeyEqualsConstant(pathkey)) { item = (PathKeyItem *)linitial(pathkey); newexpr = (Expr *)copyObject(item->key); } - /* New vars for old! */ - else - { - AttrNumber targetindex; + /* Fail if no usable expr. */ + else if (!item) + return NULL; - /* Find an expr that we can rewrite to use the projected columns. */ - item = cdbpullup_findPathKeyItemInTargetList(pathkey, - relids, - targetlist, - &targetindex); // OUT - if (!item) - return NULL; + /* If found matching targetlist item, make a Var that references it. */ + else if (targetindex > 0) + newexpr = (Expr *)cdbpullup_makeVar(newrelid, + targetindex, + newvarlist, + (Expr *)item->key); - /* If found matching targetlist item, make a Var that references it. */ - if (targetindex > 0) - newexpr = (Expr *)cdbpullup_makeVar(newrelid, - targetindex, - newvarlist, - (Expr *)item->key); + /* Replace expr's Var nodes with new ones referencing the targetlist. */ + else + newexpr = cdbpullup_expr((Expr *)item->key, + targetlist, + newvarlist, + newrelid); - /* Replace expr's Var nodes with new ones referencing the targetlist. */ - else - newexpr = cdbpullup_expr((Expr *)item->key, - targetlist, - newvarlist, - newrelid); + /* Pull up RelabelType node too, unless tlist expr has right type. */ + if (IsA(item->key, RelabelType)) + { + RelabelType *oldrelabel = (RelabelType *)item->key; - /* Pull up RelabelType node too, unless tlist expr has right type. */ - if (IsA(item->key, RelabelType)) - { - RelabelType *oldrelabel = (RelabelType *)item->key; - - if (oldrelabel->resulttype != exprType((Node *)newexpr) || - oldrelabel->resulttypmod != exprTypmod((Node *)newexpr)) - newexpr = (Expr *)makeRelabelType(newexpr, - oldrelabel->resulttype, - oldrelabel->resulttypmod, - oldrelabel->relabelformat); - } + if (oldrelabel->resulttype != exprType((Node *)newexpr) || + oldrelabel->resulttypmod != exprTypmod((Node *)newexpr)) + newexpr = (Expr *)makeRelabelType(newexpr, + oldrelabel->resulttype, + oldrelabel->resulttypmod, + oldrelabel->relabelformat); } Insist(newexpr); Index: cdb-pg/src/backend/optimizer/util/pathnode.c =================================================================== RCS file: /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz er/util/pathnode.c,v diff -u -N -r1.52.2.4 -r1.52.2.5 --- cdb-pg/src/backend/optimizer/util/pathnode.c 5 Aug 2007 23:06:44 -0000 1.52.2.4 +++ cdb-pg/src/backend/optimizer/util/pathnode.c 10 Aug 2007 03:41:15 -0000 1.52.2.5 @@ -1563,7 +1563,15 @@ pathnode->path.rescannable = false; } - return pathnode; + /* + * CDB: If there is exactly one subpath, its ordering is preserved. + * Child rel's pathkey exprs are already expressed in terms of the + * columns of the parent appendrel. See find_usable_indexes(). + */ + if (list_length(subpaths) == 1) + pathnode->path.pathkeys = ((Path *)linitial(subpaths))->pathkeys; + + return pathnode; } /* On 8/24/07 3:38 AM, "Heikki Linnakangas" <heikki@enterprisedb.com> wrote: > Anton wrote: >>>> =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1; >>>> QUERY PLAN >>> ---------------------------------------------------------------------------- >>> ----------------------------- >>>> Limit (cost=824637.69..824637.69 rows=1 width=32) >>>> -> Sort (cost=824637.69..838746.44 rows=5643499 width=32) >>>> Sort Key: public.n_traf.date_time >>>> -> Result (cost=0.00..100877.99 rows=5643499 width=32) >>>> -> Append (cost= 0.00..100877.99 rows=5643499 width=32) >>>> -> Seq Scan on n_traf (cost=0.00..22.30 >>>> rows=1230 width=32) >>>> -> Seq Scan on n_traf_y2007m01 n_traf >>>> (cost=0.00..22.30 rows=1230 width=32) >> ... >>>> -> Seq Scan on n_traf_y2007m12 n_traf >>>> (cost=0.00..22.30 rows=1230 width=32) >>>> (18 rows) >>>> >>>> Why it no uses indexes at all? >>>> ------------------------------------------- >>> I'm no expert but I'd guess that the the planner doesn't know which >>> partition holds the latest time so it has to read them all. >> >> Agree. But why it not uses indexes when it reading them? > > The planner isn't smart enough to push the "ORDER BY ... LIMIT ..." > below the append node. Therefore it needs to fetch all rows from all the > tables, and the fastest way to do that is a seq scan.
Pn, 2007 08 24 14:53 +0600, Anton rašė: > Hi. > > I just created partitioned table, n_traf, sliced by month > (n_traf_y2007m01, n_traf_y2007m02... and so on, see below). They are > indexed by 'date_time' column. > Then I populate it (last value have date 2007-08-...) and do VACUUM > ANALYZE ON n_traf_y2007... all of it. > > Now I try to select latest value (ORDER BY date_time LIMIT 1), but > Postgres produced the ugly plan: > > =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1; > QUERY PL can you test performance and send explain results of select like this : select * from n_traf where date_time = (select max(date_time) from n_traf); i have similar problem with ~70M rows table (then using ordering), but my table not partitioned. I`m interesting how this select will works on partitioned table. -- Pagarbiai, Tomas Tamošaitis Projektų Vadovas Connecty Skype://mazgis1009?add Mob: +370 652 86127 e-pastas: tomas.tamosaitis@connecty.lt web: www.connecty.lt
Bruce, would you please add this to the 8.4 patch queue so we remember to look at this later? It didn't occur to me that we can do that in the degenerate case when there's just a single node below the Append. A more general solution would be to check if the pathkeys of all the child nodes match, and do a "merge append" similar to a merge join. Luke Lonergan wrote: > Below is a patch against 8.2.4 (more or less), Heikki can you take a look at > it? > > This enables the use of index scan of a child table by recognizing sort > order of the append node. Kurt Harriman did the work. > > - Luke > > Index: cdb-pg/src/backend/optimizer/path/indxpath.c > =================================================================== > RCS file: > /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz > er/path/indxpath.c,v > diff -u -N -r1.22 -r1.22.2.1 > --- cdb-pg/src/backend/optimizer/path/indxpath.c 25 Apr 2007 22:07:21 > -0000 1.22 > +++ cdb-pg/src/backend/optimizer/path/indxpath.c 10 Aug 2007 03:41:15 > -0000 1.22.2.1 > @@ -379,8 +379,51 @@ > index_pathkeys = build_index_pathkeys(root, index, > ForwardScanDirection, > true); > - useful_pathkeys = truncate_useless_pathkeys(root, rel, > - index_pathkeys); > + /* > + * CDB: For appendrel child, pathkeys contain Var nodes in > terms > + * of the child's baserel. Transform the pathkey list to refer > to > + * columns of the appendrel. > + */ > + if (rel->reloptkind == RELOPT_OTHER_MEMBER_REL) > + { > + AppendRelInfo *appinfo = NULL; > + RelOptInfo *appendrel = NULL; > + ListCell *appcell; > + CdbPathLocus notalocus; > + > + /* Find the appendrel of which this baserel is a child. */ > + foreach(appcell, root->append_rel_list) > + { > + appinfo = (AppendRelInfo *)lfirst(appcell); > + if (appinfo->child_relid == rel->relid) > + break; > + } > + Assert(appinfo); > + appendrel = find_base_rel(root, appinfo->parent_relid); > + > + /* > + * The pathkey list happens to have the same format as the > + * partitioning key of a Hashed locus, so by disguising it > + * we can use cdbpathlocus_pull_above_projection() to do > the > + * transformation. > + */ > + CdbPathLocus_MakeHashed(¬alocus, index_pathkeys); > + notalocus = > + cdbpathlocus_pull_above_projection(root, > + notalocus, > + rel->relids, > + rel->reltargetlist, > + > appendrel->reltargetlist, > + appendrel->relid); > + if (CdbPathLocus_IsHashed(notalocus)) > + index_pathkeys = truncate_useless_pathkeys(root, > appendrel, > + > notalocus.partkey); > + else > + index_pathkeys = NULL; > + } > + > + useful_pathkeys = truncate_useless_pathkeys(root, rel, > + index_pathkeys); > } > else > useful_pathkeys = NIL; > Index: cdb-pg/src/backend/optimizer/path/pathkeys.c > =================================================================== > RCS file: > /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz > er/path/pathkeys.c,v > diff -u -N -r1.18 -r1.18.2.1 > --- cdb-pg/src/backend/optimizer/path/pathkeys.c 30 Apr 2007 05:44:07 > -0000 1.18 > +++ cdb-pg/src/backend/optimizer/path/pathkeys.c 10 Aug 2007 03:41:15 > -0000 1.18.2.1 > @@ -1403,55 +1403,53 @@ > { > PathKeyItem *item; > Expr *newexpr; > + AttrNumber targetindex; > > Assert(pathkey); > > - /* Use constant expr if available. Will be at head of list. */ > - if (CdbPathkeyEqualsConstant(pathkey)) > + /* Find an expr that we can rewrite to use the projected columns. */ > + item = cdbpullup_findPathKeyItemInTargetList(pathkey, > + relids, > + targetlist, > + &targetindex); // OUT > + > + /* If not found, see if the equiv class contains a constant expr. */ > + if (!item && > + CdbPathkeyEqualsConstant(pathkey)) > { > item = (PathKeyItem *)linitial(pathkey); > newexpr = (Expr *)copyObject(item->key); > } > > - /* New vars for old! */ > - else > - { > - AttrNumber targetindex; > + /* Fail if no usable expr. */ > + else if (!item) > + return NULL; > > - /* Find an expr that we can rewrite to use the projected columns. > */ > - item = cdbpullup_findPathKeyItemInTargetList(pathkey, > - relids, > - targetlist, > - &targetindex); // OUT > - if (!item) > - return NULL; > + /* If found matching targetlist item, make a Var that references it. */ > + else if (targetindex > 0) > + newexpr = (Expr *)cdbpullup_makeVar(newrelid, > + targetindex, > + newvarlist, > + (Expr *)item->key); > > - /* If found matching targetlist item, make a Var that references > it. */ > - if (targetindex > 0) > - newexpr = (Expr *)cdbpullup_makeVar(newrelid, > - targetindex, > - newvarlist, > - (Expr *)item->key); > + /* Replace expr's Var nodes with new ones referencing the targetlist. > */ > + else > + newexpr = cdbpullup_expr((Expr *)item->key, > + targetlist, > + newvarlist, > + newrelid); > > - /* Replace expr's Var nodes with new ones referencing the > targetlist. */ > - else > - newexpr = cdbpullup_expr((Expr *)item->key, > - targetlist, > - newvarlist, > - newrelid); > + /* Pull up RelabelType node too, unless tlist expr has right type. */ > + if (IsA(item->key, RelabelType)) > + { > + RelabelType *oldrelabel = (RelabelType *)item->key; > > - /* Pull up RelabelType node too, unless tlist expr has right type. > */ > - if (IsA(item->key, RelabelType)) > - { > - RelabelType *oldrelabel = (RelabelType *)item->key; > - > - if (oldrelabel->resulttype != exprType((Node *)newexpr) || > - oldrelabel->resulttypmod != exprTypmod((Node *)newexpr)) > - newexpr = (Expr *)makeRelabelType(newexpr, > - oldrelabel->resulttype, > - oldrelabel->resulttypmod, > - > oldrelabel->relabelformat); > - } > + if (oldrelabel->resulttype != exprType((Node *)newexpr) || > + oldrelabel->resulttypmod != exprTypmod((Node *)newexpr)) > + newexpr = (Expr *)makeRelabelType(newexpr, > + oldrelabel->resulttype, > + oldrelabel->resulttypmod, > + oldrelabel->relabelformat); > } > Insist(newexpr); > > Index: cdb-pg/src/backend/optimizer/util/pathnode.c > =================================================================== > RCS file: > /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz > er/util/pathnode.c,v > diff -u -N -r1.52.2.4 -r1.52.2.5 > --- cdb-pg/src/backend/optimizer/util/pathnode.c 5 Aug 2007 23:06:44 > -0000 1.52.2.4 > +++ cdb-pg/src/backend/optimizer/util/pathnode.c 10 Aug 2007 03:41:15 > -0000 1.52.2.5 > @@ -1563,7 +1563,15 @@ > pathnode->path.rescannable = false; > } > > - return pathnode; > + /* > + * CDB: If there is exactly one subpath, its ordering is preserved. > + * Child rel's pathkey exprs are already expressed in terms of the > + * columns of the parent appendrel. See find_usable_indexes(). > + */ > + if (list_length(subpaths) == 1) > + pathnode->path.pathkeys = ((Path *)linitial(subpaths))->pathkeys; > + > + return pathnode; > } > > /* > > > On 8/24/07 3:38 AM, "Heikki Linnakangas" <heikki@enterprisedb.com> wrote: > >> Anton wrote: >>>>> =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1; >>>>> QUERY PLAN >>>> ---------------------------------------------------------------------------- >>>> ----------------------------- >>>>> Limit (cost=824637.69..824637.69 rows=1 width=32) >>>>> -> Sort (cost=824637.69..838746.44 rows=5643499 width=32) >>>>> Sort Key: public.n_traf.date_time >>>>> -> Result (cost=0.00..100877.99 rows=5643499 width=32) >>>>> -> Append (cost= 0.00..100877.99 rows=5643499 width=32) >>>>> -> Seq Scan on n_traf (cost=0.00..22.30 >>>>> rows=1230 width=32) >>>>> -> Seq Scan on n_traf_y2007m01 n_traf >>>>> (cost=0.00..22.30 rows=1230 width=32) >>> ... >>>>> -> Seq Scan on n_traf_y2007m12 n_traf >>>>> (cost=0.00..22.30 rows=1230 width=32) >>>>> (18 rows) >>>>> >>>>> Why it no uses indexes at all? >>>>> ------------------------------------------- >>>> I'm no expert but I'd guess that the the planner doesn't know which >>>> partition holds the latest time so it has to read them all. >>> Agree. But why it not uses indexes when it reading them? >> The planner isn't smart enough to push the "ORDER BY ... LIMIT ..." >> below the append node. Therefore it needs to fetch all rows from all the >> tables, and the fastest way to do that is a seq scan. > > -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
I want ask about problem with partioned tables (it was discussed some time ago, see below). Is it fixed somehow in 8.2.5 ? 2007/8/24, Luke Lonergan <llonergan@greenplum.com>: > Below is a patch against 8.2.4 (more or less), Heikki can you take a look at > it? > > This enables the use of index scan of a child table by recognizing sort > order of the append node. Kurt Harriman did the work. ... > > On 8/24/07 3:38 AM, "Heikki Linnakangas" <heikki@enterprisedb.com> wrote: > > > Anton wrote: > >>>> =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1; > >>>> QUERY PLAN > >>> ---------------------------------------------------------------------------- > >>> ----------------------------- > >>>> Limit (cost=824637.69..824637.69 rows=1 width=32) > >>>> -> Sort (cost=824637.69..838746.44 rows=5643499 width=32) > >>>> Sort Key: public.n_traf.date_time > >>>> -> Result (cost=0.00..100877.99 rows=5643499 width=32) > >>>> -> Append (cost= 0.00..100877.99 rows=5643499 width=32) > >>>> -> Seq Scan on n_traf (cost=0.00..22.30 > >>>> rows=1230 width=32) > >>>> -> Seq Scan on n_traf_y2007m01 n_traf > >>>> (cost=0.00..22.30 rows=1230 width=32) > >> ... > >>>> -> Seq Scan on n_traf_y2007m12 n_traf > >>>> (cost=0.00..22.30 rows=1230 width=32) > >>>> (18 rows) > >>>> > >>>> Why it no uses indexes at all? > >>>> ------------------------------------------- > >>> I'm no expert but I'd guess that the the planner doesn't know which > >>> partition holds the latest time so it has to read them all. > >> > >> Agree. But why it not uses indexes when it reading them? > > > > The planner isn't smart enough to push the "ORDER BY ... LIMIT ..." > > below the append node. Therefore it needs to fetch all rows from all the > > tables, and the fastest way to do that is a seq scan. -- engineer
Anton <anton200@gmail.com> writes: > I want ask about problem with partioned tables (it was discussed some > time ago, see below). Is it fixed somehow in 8.2.5 ? No. The patch you mention never was considered at all, since it consisted of a selective quote from Greenplum source code. It would not even compile in community Postgres, because it adds calls to half a dozen Greenplum routines that we've never seen. Not to mention that the base of the diff is Greenplum proprietary code, so the patch itself wouldn't even apply successfully. As to whether it would work if we had the full story ... well, not having the full story, I don't want to opine. regards, tom lane
2007/10/27, Tom Lane <tgl@sss.pgh.pa.us>: > Anton <anton200@gmail.com> writes: > > I want ask about problem with partioned tables (it was discussed some > > time ago, see below). Is it fixed somehow in 8.2.5 ? > > No. The patch you mention never was considered at all, since it > consisted of a selective quote from Greenplum source code. It would ... > As to whether it would work if we had the full story ... well, not > having the full story, I don't want to opine. Sorry, my english is not good enough to understand your last sentence. I repost here my original question "Why it no uses indexes?" (on partitioned table and ORDER BY indexed_field DESC LIMIT 1), if you mean that you miss this discussion. > I just created partitioned table, n_traf, sliced by month > (n_traf_y2007m01, n_traf_y2007m02... and so on, see below). They are > indexed by 'date_time' column. > Then I populate it (last value have date 2007-08-...) and do VACUUM > ANALYZE ON n_traf_y2007... all of it. > > Now I try to select latest value (ORDER BY date_time LIMIT 1), but > Postgres produced the ugly plan: > > =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1; > QUERY PLAN > --------------------------------------------------------------------------------------------------------- > Limit (cost=824637.69..824637.69 rows=1 width=32) > -> Sort (cost=824637.69..838746.44 rows=5643499 width=32) > Sort Key: public.n_traf.date_time > -> Result (cost=0.00..100877.99 rows=5643499 width=32) > -> Append (cost=0.00..100877.99 rows=5643499 width=32) > -> Seq Scan on n_traf (cost=0.00..22.30 > rows=1230 width=32) > -> Seq Scan on n_traf_y2007m01 n_traf > (cost=0.00..22.30 rows=1230 width=32) > -> Seq Scan on n_traf_y2007m02 n_traf > (cost=0.00..22.30 rows=1230 width=32) > -> Seq Scan on n_traf_y2007m03 n_traf > (cost=0.00..22.30 rows=1230 width=32) > -> Seq Scan on n_traf_y2007m04 n_traf > (cost=0.00..1.01 rows=1 width=32) > -> Seq Scan on n_traf_y2007m05 n_traf > (cost=0.00..9110.89 rows=509689 width=32) > -> Seq Scan on n_traf_y2007m06 n_traf > (cost=0.00..32003.89 rows=1790489 width=32) > -> Seq Scan on n_traf_y2007m07 n_traf > (cost=0.00..33881.10 rows=1895510 width=32) > -> Seq Scan on n_traf_y2007m08 n_traf > (cost=0.00..25702.70 rows=1437970 width=32) > -> Seq Scan on n_traf_y2007m09 n_traf > (cost=0.00..22.30 rows=1230 width=32) > -> Seq Scan on n_traf_y2007m10 n_traf > (cost=0.00..22.30 rows=1230 width=32) > -> Seq Scan on n_traf_y2007m11 n_traf > (cost=0.00..22.30 rows=1230 width=32) > -> Seq Scan on n_traf_y2007m12 n_traf > (cost=0.00..22.30 rows=1230 width=32) > (18 rows) > > > Why it no uses indexes at all? > ------------------------------------------- > > The simplier query goes fast, use index. > =# explain analyze SELECT * FROM n_traf_y2007m08 ORDER BY date_time > DESC LIMIT 1; > > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > Limit (cost=0.00..0.03 rows=1 width=32) (actual time=0.156..0.158 > rows=1 loops=1) > -> Index Scan Backward using n_traf_y2007m08_date_time_login_id on > n_traf_y2007m08 (cost=0.00..39489.48 rows=1437970 width=32) (actual > time=0.150..0.150 rows=1 loops=1) > Total runtime: 0.241 ms > (3 rows) > > Table n_traf looks like this: > =# \d n_traf > Table "public.n_traf" > Column | Type | Modifiers > -------------+-----------------------------+-------------------- > login_id | integer | not null > traftype_id | integer | not null > date_time | timestamp without time zone | not null > bytes_in | bigint | not null default 0 > bytes_out | bigint | not null default 0 > Indexes: > "n_traf_login_id_key" UNIQUE, btree (login_id, traftype_id, date_time) > "n_traf_date_time_login_id" btree (date_time, login_id) > Foreign-key constraints: > "n_traf_login_id_fkey" FOREIGN KEY (login_id) REFERENCES > n_logins(login_id) ON UPDATE CASCADE ON DELETE CASCADE > "n_traf_traftype_id_fkey" FOREIGN KEY (traftype_id) REFERENCES > n_traftypes(traftype_id) ON UPDATE CASCADE > Rules: > n_traf_insert_y2007m01 AS > ON INSERT TO n_traf > WHERE new.date_time >= '2007-01-01'::date AND new.date_time < > '2007-02-01 00:00:00'::timestamp without time zone DO INSTEAD > INSERT INTO n_traf_y2007m01 (login_id, traftype_id, date_time, > bytes_in, bytes_out) > VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in, > new.bytes_out) > n_traf_insert_y2007m02 AS > ON INSERT TO n_traf > WHERE new.date_time >= '2007-02-01'::date AND new.date_time < > '2007-03-01 00:00:00'::timestamp without time zone DO INSTEAD > INSERT INTO n_traf_y2007m02 (login_id, traftype_id, date_time, > bytes_in, bytes_out) > VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in, > new.bytes_out) > n_traf_insert_y2007m03 AS > ON INSERT TO n_traf > WHERE new.date_time >= '2007-03-01'::date AND new.date_time < > '2007-04-01 00:00:00'::timestamp without time zone DO INSTEAD > INSERT INTO n_traf_y2007m03 (login_id, traftype_id, date_time, > bytes_in, bytes_out) > VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in, > new.bytes_out) > n_traf_insert_y2007m04 AS > ON INSERT TO n_traf > WHERE new.date_time >= '2007-04-01'::date AND new.date_time < > '2007-05-01 00:00:00'::timestamp without time zone DO INSTEAD > INSERT INTO n_traf_y2007m04 (login_id, traftype_id, date_time, > bytes_in, bytes_out) > VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in, > new.bytes_out) > n_traf_insert_y2007m05 AS > ON INSERT TO n_traf > WHERE new.date_time >= '2007-05-01'::date AND new.date_time < > '2007-06-01 00:00:00'::timestamp without time zone DO INSTEAD > INSERT INTO n_traf_y2007m05 (login_id, traftype_id, date_time, > bytes_in, bytes_out) > VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in, > new.bytes_out) > n_traf_insert_y2007m06 AS > ON INSERT TO n_traf > WHERE new.date_time >= '2007-06-01'::date AND new.date_time < > '2007-07-01 00:00:00'::timestamp without time zone DO INSTEAD > INSERT INTO n_traf_y2007m06 (login_id, traftype_id, date_time, > bytes_in, bytes_out) > VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in, > new.bytes_out) > n_traf_insert_y2007m07 AS > ON INSERT TO n_traf > WHERE new.date_time >= '2007-07-01'::date AND new.date_time < > '2007-08-01 00:00:00'::timestamp without time zone DO INSTEAD INSERT > INTO n_traf_y2007m07 (login_id, traftype_id, date_time, bytes_in, > bytes_out) > VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in, > new.bytes_out) > n_traf_insert_y2007m08 AS > ON INSERT TO n_traf > WHERE new.date_time >= '2007-08-01'::date AND new.date_time < > '2007-09-01 00:00:00'::timestamp without time zone DO INSTEAD INSERT > INTO n_traf_y2007m08 (login_id, traftype_id, date_time, bytes_in, > bytes_out) > VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in, > new.bytes_out) > n_traf_insert_y2007m09 AS > ON INSERT TO n_traf > WHERE new.date_time >= '2007-09-01'::date AND new.date_time < > '2007-10-01 00:00:00'::timestamp without time zone DO INSTEAD INSERT > INTO n_traf_y2007m09 (login_id, traftype_id, date_time, bytes_in, > bytes_out) > VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in, > new.bytes_out) > n_traf_insert_y2007m10 AS > ON INSERT TO n_traf > WHERE new.date_time >= '2007-10-01'::date AND new.date_time < > '2007-11-01 00:00:00'::timestamp without time zone DO INSTEAD INSERT > INTO n_traf_y2007m10 (login_id, traftype_id, date_time, bytes_in, > bytes_out) > VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in, > new.bytes_out) > n_traf_insert_y2007m11 AS > ON INSERT TO n_traf > WHERE new.date_time >= '2007-11-01'::date AND new.date_time < > '2007-12-01 00:00:00'::timestamp without time zone DO INSTEAD INSERT > INTO n_traf_y2007m11 (login_id, traftype_id, date_time, bytes_in, > bytes_out) > VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in, > new.bytes_out) > n_traf_insert_y2007m12 AS > ON INSERT TO n_traf > WHERE new.date_time >= '2007-12-01'::date AND new.date_time < > '2008-01-01 00:00:00'::timestamp without time zone DO INSTEAD INSERT > INTO n_traf_y2007m12 (login_id, traftype_id, date_time, bytes_in, > bytes_out) > VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in, > new.bytes_out) > > > Tables n_traf_y2007m... looks like these > > Table "public.n_traf_y2007m01" > Column | Type | Modifiers > -------------+-----------------------------+-------------------- > login_id | integer | not null > traftype_id | integer | not null > date_time | timestamp without time zone | not null > bytes_in | bigint | not null default 0 > bytes_out | bigint | not null default 0 > Indexes: > "n_traf_y2007m01_date_time_login_id" btree (date_time, login_id) > Check constraints: > "n_traf_y2007m01_date_time_check" CHECK (date_time >= > '2007-01-01'::date AND date_time < '2007-02-01 00:00:00'::timestamp > without time zone) > Inherits: n_traf > > Index "public.n_traf_y2007m01_date_time_login_id" > Column | Type > -----------+----------------------------- > date_time | timestamp without time zone > login_id | integer > btree, for table "public.n_traf_y2007m01" > > Table "public.n_traf_y2007m02" > Column | Type | Modifiers > -------------+-----------------------------+-------------------- > login_id | integer | not null > traftype_id | integer | not null > date_time | timestamp without time zone | not null > bytes_in | bigint | not null default 0 > bytes_out | bigint | not null default 0 > Indexes: > "n_traf_y2007m02_date_time_login_id" btree (date_time, login_id) > Check constraints: > "n_traf_y2007m02_date_time_check" CHECK (date_time >= > '2007-02-01'::date AND date_time < '2007-03-01 00:00:00'::timestamp > without time zone) > Inherits: n_traf > ... -- engineer
Anton wrote: > I repost here my original question "Why it no uses indexes?" (on > partitioned table and ORDER BY indexed_field DESC LIMIT 1), if you > mean that you miss this discussion. As I said back then: The planner isn't smart enough to push the "ORDER BY ... LIMIT ..." below the append node. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > Bruce, would you please add this to the 8.4 patch queue so we remember > to look at this later? OK, added to queue, but Tom's patch queue comment is: This is useless since it does not represent a complete patch; the provided code calls a lot of Greenplum-private routines that weren't provided. It's not even reviewable let alone a candidate to apply. --------------------------------------------------------------------------- > > It didn't occur to me that we can do that in the degenerate case when > there's just a single node below the Append. A more general solution > would be to check if the pathkeys of all the child nodes match, and do a > "merge append" similar to a merge join. > > Luke Lonergan wrote: > > Below is a patch against 8.2.4 (more or less), Heikki can you take a look at > > it? > > > > This enables the use of index scan of a child table by recognizing sort > > order of the append node. Kurt Harriman did the work. > > > > - Luke > > > > Index: cdb-pg/src/backend/optimizer/path/indxpath.c > > =================================================================== > > RCS file: > > /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz > > er/path/indxpath.c,v > > diff -u -N -r1.22 -r1.22.2.1 > > --- cdb-pg/src/backend/optimizer/path/indxpath.c 25 Apr 2007 22:07:21 > > -0000 1.22 > > +++ cdb-pg/src/backend/optimizer/path/indxpath.c 10 Aug 2007 03:41:15 > > -0000 1.22.2.1 > > @@ -379,8 +379,51 @@ > > index_pathkeys = build_index_pathkeys(root, index, > > ForwardScanDirection, > > true); > > - useful_pathkeys = truncate_useless_pathkeys(root, rel, > > - index_pathkeys); > > + /* > > + * CDB: For appendrel child, pathkeys contain Var nodes in > > terms > > + * of the child's baserel. Transform the pathkey list to refer > > to > > + * columns of the appendrel. > > + */ > > + if (rel->reloptkind == RELOPT_OTHER_MEMBER_REL) > > + { > > + AppendRelInfo *appinfo = NULL; > > + RelOptInfo *appendrel = NULL; > > + ListCell *appcell; > > + CdbPathLocus notalocus; > > + > > + /* Find the appendrel of which this baserel is a child. */ > > + foreach(appcell, root->append_rel_list) > > + { > > + appinfo = (AppendRelInfo *)lfirst(appcell); > > + if (appinfo->child_relid == rel->relid) > > + break; > > + } > > + Assert(appinfo); > > + appendrel = find_base_rel(root, appinfo->parent_relid); > > + > > + /* > > + * The pathkey list happens to have the same format as the > > + * partitioning key of a Hashed locus, so by disguising it > > + * we can use cdbpathlocus_pull_above_projection() to do > > the > > + * transformation. > > + */ > > + CdbPathLocus_MakeHashed(¬alocus, index_pathkeys); > > + notalocus = > > + cdbpathlocus_pull_above_projection(root, > > + notalocus, > > + rel->relids, > > + rel->reltargetlist, > > + > > appendrel->reltargetlist, > > + appendrel->relid); > > + if (CdbPathLocus_IsHashed(notalocus)) > > + index_pathkeys = truncate_useless_pathkeys(root, > > appendrel, > > + > > notalocus.partkey); > > + else > > + index_pathkeys = NULL; > > + } > > + > > + useful_pathkeys = truncate_useless_pathkeys(root, rel, > > + index_pathkeys); > > } > > else > > useful_pathkeys = NIL; > > Index: cdb-pg/src/backend/optimizer/path/pathkeys.c > > =================================================================== > > RCS file: > > /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz > > er/path/pathkeys.c,v > > diff -u -N -r1.18 -r1.18.2.1 > > --- cdb-pg/src/backend/optimizer/path/pathkeys.c 30 Apr 2007 05:44:07 > > -0000 1.18 > > +++ cdb-pg/src/backend/optimizer/path/pathkeys.c 10 Aug 2007 03:41:15 > > -0000 1.18.2.1 > > @@ -1403,55 +1403,53 @@ > > { > > PathKeyItem *item; > > Expr *newexpr; > > + AttrNumber targetindex; > > > > Assert(pathkey); > > > > - /* Use constant expr if available. Will be at head of list. */ > > - if (CdbPathkeyEqualsConstant(pathkey)) > > + /* Find an expr that we can rewrite to use the projected columns. */ > > + item = cdbpullup_findPathKeyItemInTargetList(pathkey, > > + relids, > > + targetlist, > > + &targetindex); // OUT > > + > > + /* If not found, see if the equiv class contains a constant expr. */ > > + if (!item && > > + CdbPathkeyEqualsConstant(pathkey)) > > { > > item = (PathKeyItem *)linitial(pathkey); > > newexpr = (Expr *)copyObject(item->key); > > } > > > > - /* New vars for old! */ > > - else > > - { > > - AttrNumber targetindex; > > + /* Fail if no usable expr. */ > > + else if (!item) > > + return NULL; > > > > - /* Find an expr that we can rewrite to use the projected columns. > > */ > > - item = cdbpullup_findPathKeyItemInTargetList(pathkey, > > - relids, > > - targetlist, > > - &targetindex); // OUT > > - if (!item) > > - return NULL; > > + /* If found matching targetlist item, make a Var that references it. */ > > + else if (targetindex > 0) > > + newexpr = (Expr *)cdbpullup_makeVar(newrelid, > > + targetindex, > > + newvarlist, > > + (Expr *)item->key); > > > > - /* If found matching targetlist item, make a Var that references > > it. */ > > - if (targetindex > 0) > > - newexpr = (Expr *)cdbpullup_makeVar(newrelid, > > - targetindex, > > - newvarlist, > > - (Expr *)item->key); > > + /* Replace expr's Var nodes with new ones referencing the targetlist. > > */ > > + else > > + newexpr = cdbpullup_expr((Expr *)item->key, > > + targetlist, > > + newvarlist, > > + newrelid); > > > > - /* Replace expr's Var nodes with new ones referencing the > > targetlist. */ > > - else > > - newexpr = cdbpullup_expr((Expr *)item->key, > > - targetlist, > > - newvarlist, > > - newrelid); > > + /* Pull up RelabelType node too, unless tlist expr has right type. */ > > + if (IsA(item->key, RelabelType)) > > + { > > + RelabelType *oldrelabel = (RelabelType *)item->key; > > > > - /* Pull up RelabelType node too, unless tlist expr has right type. > > */ > > - if (IsA(item->key, RelabelType)) > > - { > > - RelabelType *oldrelabel = (RelabelType *)item->key; > > - > > - if (oldrelabel->resulttype != exprType((Node *)newexpr) || > > - oldrelabel->resulttypmod != exprTypmod((Node *)newexpr)) > > - newexpr = (Expr *)makeRelabelType(newexpr, > > - oldrelabel->resulttype, > > - oldrelabel->resulttypmod, > > - > > oldrelabel->relabelformat); > > - } > > + if (oldrelabel->resulttype != exprType((Node *)newexpr) || > > + oldrelabel->resulttypmod != exprTypmod((Node *)newexpr)) > > + newexpr = (Expr *)makeRelabelType(newexpr, > > + oldrelabel->resulttype, > > + oldrelabel->resulttypmod, > > + oldrelabel->relabelformat); > > } > > Insist(newexpr); > > > > Index: cdb-pg/src/backend/optimizer/util/pathnode.c > > =================================================================== > > RCS file: > > /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz > > er/util/pathnode.c,v > > diff -u -N -r1.52.2.4 -r1.52.2.5 > > --- cdb-pg/src/backend/optimizer/util/pathnode.c 5 Aug 2007 23:06:44 > > -0000 1.52.2.4 > > +++ cdb-pg/src/backend/optimizer/util/pathnode.c 10 Aug 2007 03:41:15 > > -0000 1.52.2.5 > > @@ -1563,7 +1563,15 @@ > > pathnode->path.rescannable = false; > > } > > > > - return pathnode; > > + /* > > + * CDB: If there is exactly one subpath, its ordering is preserved. > > + * Child rel's pathkey exprs are already expressed in terms of the > > + * columns of the parent appendrel. See find_usable_indexes(). > > + */ > > + if (list_length(subpaths) == 1) > > + pathnode->path.pathkeys = ((Path *)linitial(subpaths))->pathkeys; > > + > > + return pathnode; > > } > > > > /* > > > > > > On 8/24/07 3:38 AM, "Heikki Linnakangas" <heikki@enterprisedb.com> wrote: > > > >> Anton wrote: > >>>>> =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1; > >>>>> QUERY PLAN > >>>> ---------------------------------------------------------------------------- > >>>> ----------------------------- > >>>>> Limit (cost=824637.69..824637.69 rows=1 width=32) > >>>>> -> Sort (cost=824637.69..838746.44 rows=5643499 width=32) > >>>>> Sort Key: public.n_traf.date_time > >>>>> -> Result (cost=0.00..100877.99 rows=5643499 width=32) > >>>>> -> Append (cost= 0.00..100877.99 rows=5643499 width=32) > >>>>> -> Seq Scan on n_traf (cost=0.00..22.30 > >>>>> rows=1230 width=32) > >>>>> -> Seq Scan on n_traf_y2007m01 n_traf > >>>>> (cost=0.00..22.30 rows=1230 width=32) > >>> ... > >>>>> -> Seq Scan on n_traf_y2007m12 n_traf > >>>>> (cost=0.00..22.30 rows=1230 width=32) > >>>>> (18 rows) > >>>>> > >>>>> Why it no uses indexes at all? > >>>>> ------------------------------------------- > >>>> I'm no expert but I'd guess that the the planner doesn't know which > >>>> partition holds the latest time so it has to read them all. > >>> Agree. But why it not uses indexes when it reading them? > >> The planner isn't smart enough to push the "ORDER BY ... LIMIT ..." > >> below the append node. Therefore it needs to fetch all rows from all the > >> tables, and the fastest way to do that is a seq scan. > > > > > > > -- > Heikki Linnakangas > EnterpriseDB http://www.enterprisedb.com > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +