Interesting group by behavior - Mailing list pgsql-general

From Culberson, Philip
Subject Interesting group by behavior
Date
Msg-id 83CC9259BB25D31186F400A0C9975D9C9CE216@datmail02.dat.com
Whole thread Raw
List pgsql-general
Greetings:

I am somewhat baffled by the horrendous time difference between doing two
separate queries versus a single query with a group by clause.  The table in
question has ~2.5 million rows, of which 847,324 are in the date range
selected.  An explain plan on all three queries show that the cost is the
same for each, although the select .. group by takes 50 times longer.

Below I've included output from both PSQL and the postmaster log.  Note that
the select .. group by query had a 100% buffer hit rate but a huge jump in
the "Direct blocks" reads and writes.

I have two question's:

1) What is the difference between the "Shared/Local/Direct" block stats?
2) Is the long running "select .. group by" an undocumented feature (bug)?

Regards,
Phil Culberson

test=> \d posting
Table    = posting
+----------------------------------+----------------------------------+-----
--+
|              Field               |              Type                |
Length|
+----------------------------------+----------------------------------+-----
--+
| date                             | date                             |
4 |
| comp_id                          | int4                             |
4 |
| posting_type                     | char()                           |
1 |
| equip_type                       | char()                           |
2 |
| partial_flag                     | bool                             |
1 |
| origin_id                        | int4                             |
4 |
| dest_id                          | int4                             |
4 |
+----------------------------------+----------------------------------+-----
--+
Indices:  date_idx
          dest_idx
          orig_idx

test=> \d date_idx
Table    = date_idx
+----------------------------------+----------------------------------+-----
--+
|              Field               |              Type                |
Length|
+----------------------------------+----------------------------------+-----
--+
| date                             | date                             |
4 |
+----------------------------------+----------------------------------+-----
--+



Log output
==========
991006.12:36:15.659  [1321] StartTransactionCommand
991006.12:36:15.660  [1321] query: SELECT COUNT(posting_type) FROM posting
WHERE date BETWEEN 19990801::date AND 19990831::date AND posting_type = 'L';
991006.12:36:15.667  [1321] ProcessQuery
! system usage stats:
!       41.747049 elapsed 39.280000 user 0.240000 system sec
!       [243.440000 user 2.450000 sys total]
!       1/0 [19/6] filesystem blocks in/out
!       474/0 [7425/0] page faults/reclaims, 0 [0] swaps
!       0 [1] signals rcvd, 0/0 [11/13] messages rcvd/sent
!       475/503 [7472/3142] voluntary/involuntary context switches
! postgres usage stats:
!       Shared blocks:       1800 read,          0 written, buffer hit rate
= 99.79%
!       Local  blocks:          0 read,          0 written, buffer hit rate
= 0.00%
!       Direct blocks:          0 read,          0 written
991006.12:36:57.408  [1321] CommitTransactionCommand

Elapsed time: ~42 seconds

PSQL output
===========
SELECT COUNT(posting_type)
FROM posting
WHERE date BETWEEN 19990801::date AND 19990831::date AND posting_type = 'L';
 count
------
805124
(1 row)


Log output
==========
991006.12:36:57.418  [1321] StartTransactionCommand
991006.12:36:57.418  [1321] query: SELECT COUNT(posting_type) FROM posting
WHERE date BETWEEN 19990801::date AND 19990831::date AND posting_type = 'E';
991006.12:36:57.424  [1321] ProcessQuery
! system usage stats:
!       27.584268 elapsed 27.550000 user 0.000000 system sec
!       [271.000000 user 2.450000 sys total]
!       0/0 [19/6] filesystem blocks in/out
!       0/0 [7425/0] page faults/reclaims, 0 [0] swaps
!       0 [1] signals rcvd, 0/0 [12/14] messages rcvd/sent
!       0/528 [7473/3670] voluntary/involuntary context switches
! postgres usage stats:
!       Shared blocks:          0 read,          0 written, buffer hit rate
= 100.00%
!       Local  blocks:          0 read,          0 written, buffer hit rate
= 0.00%
!       Direct blocks:          0 read,          0 written
991006.12:37:25.004  [1321] CommitTransactionCommand

Elapsed time: ~28 seconds

PSQL output
===========
SELECT COUNT(posting_type)
FROM posting
WHERE date BETWEEN 19990801::date AND 19990831::date AND posting_type = 'E';
count
-----
42200
(1 row)


Log output
==========
991006.12:37:25.014  [1321] StartTransactionCommand
991006.12:37:25.015  [1321] query: SELECT posting_type, COUNT(posting_type)
FROM posting WHERE date BETWEEN 19990801::date AND 19990831::date GROUP BY
posting_type;
991006.12:37:25.019  [1321] ProcessQuery
! system usage stats:
!       3515.408886 elapsed 3508.790000 user 2.050000 system sec
!       [3779.800000 user 4.510000 sys total]
!       4/19 [23/25] filesystem blocks in/out
!       0/0 [7425/0] page faults/reclaims, 0 [0] swaps
!       0 [1] signals rcvd, 0/0 [13/15] messages rcvd/sent
!       41/34863 [7515/38533] voluntary/involuntary context switches
! postgres usage stats:
!       Shared blocks:          0 read,          0 written, buffer hit rate
= 100.00%
!       Local  blocks:          0 read,          0 written, buffer hit rate
= 0.00%
!       Direct blocks:      10137 read,      10964 written
991006.13:36:00.425  [1321] CommitTransactionCommand

Elapsed time: ~58 MINUTES 35 SECONDS!

PSQL output
===========
SELECT posting_type, COUNT(posting_type)
FROM posting
WHERE date BETWEEN 19990801::date AND 19990831::date
GROUP BY posting_type;
posting_type| count
------------+------
E           | 42200
L           |805124
(2 rows)

pgsql-general by date:

Previous
From: Max Buvry
Date:
Subject: [Q] TRIGGER-RULE
Next
From: Bruce Momjian
Date:
Subject: Re: [GENERAL] what does this mean - pg_psort.1922.0 in database directory?]