Re: [GENERAL] Possible bug with row_to_json - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [GENERAL] Possible bug with row_to_json
Date
Msg-id 27771.1375799015@sss.pgh.pa.us
Whole thread Raw
Responses Re: [GENERAL] Possible bug with row_to_json  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Jack Christensen <jack@jackchristensen.com> writes:
> jack=# create table player(
> jack(#   player_id serial primary key,
> jack(#   name varchar not null unique
> jack(# );
> CREATE TABLE
> jack=# insert into player(name) values('Jack');
> INSERT 0 1
> jack=# select row_to_json(t)
> jack-# from (
> jack(#   select player_id as renamed, name
> jack(#   from player
> jack(#   order by name
> jack(# ) t;
>            row_to_json
> -------------------------------
>   {"player_id":1,"name":"Jack"}
> (1 row)

> It ignored the rename.

I looked into this and found that the culprit is the optimization that
skips ExecProject() if a scan plan node is not doing any useful
projection.  In the plan for this query:

 Subquery Scan on t  (cost=0.15..81.98 rows=1230 width=60)
   Output: row_to_json(t.*)
   ->  Index Scan using player_name_key on public.player  (cost=0.15..66.60 rows=1230 width=36)
         Output: player.player_id, player.name

the indexscan node decides it can just return its "scan tuple" slot
directly to the caller, rather than projecting into the "result tuple"
slot.  The problem is that the scan tuple slot's tuple descriptor is that
of the table being scanned, and in particular it has the actual column
names of the underlying table, plus a tdtypeid matching the table's
rowtype OID.  This info is what's looked at by ExecEvalWholeRowVar to form
a tuple datum, so the datum ends up looking like it has exactly the
table's rowtype, and then row_to_json is going to be given the table's
tuple descriptor when it asks lookup_rowtype_tupdesc for a tupdesc.

The attached quick-hack patch fixes the reported case.  I *think* it's
safe as is, but I wonder whether we ought to install additional checks
to verify physical compatibility of the two tuple descriptors before
we decide it's okay to skip projection.  Another potential issue is
that we're losing whatever benefit there may be in having tuple datums
marked with named rowtypes rather than RECORD --- I'm not sure there's
any meaningful performance difference, but I'm not sure there isn't,
either.  We could be even more paranoid by choosing to only install
the other descriptor if there's actually a difference in column names;
but is it worth the extra complexity?

Comments?

            regards, tom lane

diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index 3ea6460..b91ae03 100644
*** a/src/backend/executor/execScan.c
--- b/src/backend/executor/execScan.c
*************** ExecScan(ScanState *node,
*** 234,246 ****
   *        Set up projection info for a scan node, if necessary.
   *
   * We can avoid a projection step if the requested tlist exactly matches
!  * the underlying tuple type.  If so, we just set ps_ProjInfo to NULL.
   * Note that this case occurs not only for simple "SELECT * FROM ...", but
   * also in most cases where there are joins or other processing nodes above
   * the scan node, because the planner will preferentially generate a matching
   * tlist.
   *
!  * ExecAssignScanType must have been called already.
   */
  void
  ExecAssignScanProjectionInfo(ScanState *node)
--- 234,253 ----
   *        Set up projection info for a scan node, if necessary.
   *
   * We can avoid a projection step if the requested tlist exactly matches
!  * the underlying tuple type.  If so, we set ps_ProjInfo to NULL to inform
!  * later processing that the scan tuple can be returned as-is.  We also
!  * have to tweak the scan tuple slot to have the exact same tuple descriptor
!  * the projection slot does --- they are physically compatible, if the tlist
!  * matches, but the projection slot might contain different column names and
!  * we want anything that looks at the slot tupdesc to see the right names.
!  *
   * Note that this case occurs not only for simple "SELECT * FROM ...", but
   * also in most cases where there are joins or other processing nodes above
   * the scan node, because the planner will preferentially generate a matching
   * tlist.
   *
!  * ExecAssignScanType and ExecAssignResultTypeFromTL must have been called
!  * already.
   */
  void
  ExecAssignScanProjectionInfo(ScanState *node)
*************** ExecAssignScanProjectionInfo(ScanState *
*** 258,264 ****
--- 265,277 ----
                                scan->plan.targetlist,
                                varno,
                                node->ss_ScanTupleSlot->tts_tupleDescriptor))
+     {
+         /* no physical projection needed */
          node->ps.ps_ProjInfo = NULL;
+         /* make sure scan tuple slot's tupdesc exposes the right names */
+         ExecSetSlotDescriptor(node->ss_ScanTupleSlot,
+                               node->ps.ps_ResultTupleSlot->tts_tupleDescriptor);
+     }
      else
          ExecAssignProjectionInfo(&node->ps,
                                   node->ss_ScanTupleSlot->tts_tupleDescriptor);

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: File-per-GUC WAS: Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: Proposal for Allow postgresql.conf values to be changed via SQL [review])
Next
From: 'Bruce Momjian'
Date:
Subject: Re: File-per-GUC WAS: Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: Proposal for Allow postgresql.conf values to be changed via SQL [review])