Thread: Trying to compute the median

Trying to compute the median

From
Maciej Bliziński
Date:
Hello,

I started to write the query that should compute the median.
Surprisingly, I get following error message:

"server closed the connection unexpectedly This probably means the server
terminated abnormally before or while processing the request."

I am using PostgreSQL 7.4.2, compiled from source under
Slackware-current, Linux 2.6.4.

Here's the query (it's not finished of course, but generates the error):

------------8<------------- ------------8<-------------

CREATE TEMPORARY TABLE test (
    value INTEGER PRIMARY KEY
);

INSERT INTO test VALUES (-1);
INSERT INTO test VALUES (0);
INSERT INTO test VALUES (3);
INSERT INTO test VALUES (5);
INSERT INTO test VALUES (8);

SELECT
    count(value) AS count,
    CASE
        WHEN mod(count(value), 2) = 1
        THEN
            /* odd number of elements */
            (
                SELECT value FROM test AS t2
                ORDER BY
                    value ASC
                OFFSET (count(t1.value) / 2)::INTEGER
                LIMIT 1
            )
        ELSE
            /* even number of elements */
            0.0
        END
        AS median
FROM
    test AS t1
;

------------8<------------- ------------8<-------------

Is it a PostgreSQL bug, or is my query so badly broken?

Re: Trying to compute the median

From
Tom Lane
Date:
Maciej =?iso-8859-2?Q?Blizi=F1ski?= <maciej.blizinski@dobranet.polbox.pl> writes:
> I started to write the query that should compute the median.
> Surprisingly, I get following error message:
> "server closed the connection unexpectedly

Yeah, that's a bug.  The patch is attached if you need it.  However, I
think you will wind up looking for some other way to solve the problem,
because this query won't scale well to large datasets.

            regards, tom lane

Index: setrefs.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/optimizer/plan/setrefs.c,v
retrieving revision 1.97
diff -c -r1.97 setrefs.c
*** setrefs.c    8 Aug 2003 21:41:50 -0000    1.97
--- setrefs.c    11 May 2004 12:45:54 -0000
***************
*** 189,195 ****
          case T_Sort:
          case T_Unique:
          case T_SetOp:
-         case T_Limit:

              /*
               * These plan types don't actually bother to evaluate their
--- 189,194 ----
***************
*** 201,206 ****
--- 200,214 ----
               * reprocessing subplans that also appear in lower levels of
               * the plan tree!
               */
+             break;
+         case T_Limit:
+             /*
+              * Like the plan types above, Limit doesn't evaluate its
+              * tlist or quals.  It does have live expressions for
+              * limit/offset, however.
+              */
+             fix_expr_references(plan, ((Limit *) plan)->limitOffset);
+             fix_expr_references(plan, ((Limit *) plan)->limitCount);
              break;
          case T_Agg:
          case T_Group:
Index: subselect.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/optimizer/plan/subselect.c,v
retrieving revision 1.83.2.1
diff -c -r1.83.2.1 subselect.c
*** subselect.c    25 Nov 2003 23:59:32 -0000    1.83.2.1
--- subselect.c    11 May 2004 12:45:54 -0000
***************
*** 1018,1023 ****
--- 1018,1030 ----
                                &context);
              break;

+         case T_Limit:
+             finalize_primnode(((Limit *) plan)->limitOffset,
+                               &context);
+             finalize_primnode(((Limit *) plan)->limitCount,
+                               &context);
+             break;
+
          case T_Hash:
              finalize_primnode((Node *) ((Hash *) plan)->hashkeys,
                                &context);
***************
*** 1029,1035 ****
          case T_Sort:
          case T_Unique:
          case T_SetOp:
-         case T_Limit:
          case T_Group:
              break;

--- 1036,1041 ----

Re: Trying to compute the median

From
Maciej Bliziński
Date:
Tom Lane napisał:
> Yeah, that's a bug.  The patch is attached if you need it.  However, I
> think you will wind up looking for some other way to solve the problem,
> because this query won't scale well to large datasets.

I already did some googling and found two examples of median calculating
functions, but they didn't convince me; one of them was creating
temporary table to calculate the median, and it doesn't look for me like
a well-scaling solution.

What would you suggest then?

Maciej Bliziński