big un stacking query - save me from myself - Mailing list pgsql-general

From Kirk Wythers
Subject big un stacking query - save me from myself
Date
Msg-id 6B13ACE8-6EE7-4D82-8349-53969961A1ED@umn.edu
Whole thread Raw
In response to big un stacking query - help save me from myself  (Kirk Wythers <kirk.wythers@gmail.com>)
List pgsql-general
I have a fairly large table with two columns that I need to "de-normalize" (235 million rows) There has got to be a
better(i.e. faster) approach than what I am doing. I am using a MAX CASE on each of the 24 variables (column names
variableand value) that I want to unstack.  Any suggestions would be most appreciated.  Here is the ANALYZE  

GroupAggregate  (cost=107156950.16..174988710.12 rows=23511876 width=286)
  ->  Sort  (cost=107156950.16..107744747.04 rows=235118752 width=286)
        Sort Key: site, canopy, block, plot, measurement_interval, warming_treatment, treatment_code, treatment_abbr,
water_treatment,variable_name, variable_channel, variable_id, rowid, time2, "timestamp", block_name, table_name,
batt_volt,program, flag1, flag2, flag3, amb_a_avg, amb_closed_avg, airtc_avg, airtemp_avg, airtemp_max, airtemp_min,
all_avgt,am25tref1, ptemp_avg, rh, s_all_avgt_avg, vp_avg, tabove_sdupper, tabove_sdlower, tabove_meantrim,
tabove_mean_dc,tsoil_sdupper, tsoil_sdlower, tsoil_meantrim, tsoil_mean_dc 
        ->  Seq Scan on derived_15min_joined  (cost=0.00..11782048.52 rows=235118752 width=286)
(4 rows)


And here is the table:


b4warmed3=# \d derived_15min_joined
             Table "public.derived_15min_joined"
       Column        |            Type             | Modifiers
----------------------+-----------------------------+-----------
site                 | character varying(6)        |
canopy               | character varying(24)       |
block                | character(2)                |
plot                 | character(6)                |
measurement_interval | interval                    |
warming_treatment    | character varying(24)       |
treatment_code       | character varying(24)       |
treatment_abbr       | character varying(24)       |
water_treatment      | character varying(24)       |
variable_name        | character varying(24)       |
variable_channel     | character varying(24)       |
variable_id          | character varying(24)       |
rowid                | character varying(48)       |
time2                | timestamp without time zone |
timestamp            | timestamp without time zone |
block_name           | character varying(8)        |
table_name           | character varying(10)       |
batt_volt            | real                        |
program              | character varying(48)       |
flag1                | integer                     |
flag2                | integer                     |
flag3                | integer                     |
amb_a_avg            | real                        |
amb_closed_avg       | real                        |
airtc_avg            | real                        |
airtemp_avg          | real                        |
airtemp_max          | real                        |
airtemp_min          | real                        |
all_avgt             | real                        |
am25tref1            | real                        |
ptemp_avg            | real                        |
rh                   | real                        |
s_all_avgt_avg       | real                        |
vp_avg               | real                        |
tabove_sdupper       | real                        |
tabove_sdlower       | real                        |
tabove_meantrim      | real                        |
tabove_mean_dc       | real                        |
tsoil_sdupper        | real                        |
tsoil_sdlower        | real                        |
tsoil_meantrim       | real                        |
tsoil_mean_dc        | real                        |
variable             | text                        |
value                | real                        |
Indexes:
   "derived_15min_joined_lower_idx" btree (lower(variable_name::text))
   "derived_15min_joined_time2_idx" btree (time2)

b4warmed3=#

And here is my query.

SELECT
    site,
    canopy,
    block,
    plot,
    measurement_interval,
    warming_treatment,
    treatment_code,
    treatment_abbr,
    water_treatment,
    variable_name,
    variable_channel,
    variable_id,
    rowid,
    time2,
    timestamp,
    block_name,
    table_name,
    batt_volt,
    program,
    flag1,
    flag2,
    flag3,
    amb_a_avg,
    amb_closed_avg,
    airtc_avg,
    airtemp_avg,
    airtemp_max,
    airtemp_min,
    all_avgt,
    am25tref1,
    ptemp_avg,
    rh,
    s_all_avgt_avg,
    vp_avg,
    tabove_sdupper,
    tabove_sdlower,
    tabove_meantrim,
    tabove_mean_dc,
    tsoil_sdupper,
    tsoil_sdlower,
    tsoil_meantrim,
    tsoil_mean_dc,
    MAX (
        CASE
        WHEN lower(variable_name) = 'tabove' THEN

        value

        END
    ) AS tabove,
    MAX (
        CASE
        WHEN lower(variable_name) = 'tabove_sc' THEN

        value

        END
    ) AS tabove_sc,
    MAX (
        CASE
        WHEN lower(variable_name) = 'tabove_delta' THEN

        value

        END
    ) AS tabove_delta,
    MAX (
        CASE
        WHEN lower(variable_name) = 'tsoil' THEN

        value

        END
    ) AS tsoil,
    MAX (
        CASE
        WHEN lower(variable_name) = 'tsoil_sc' THEN

        value

        END
    ) AS tsoil_sc,
    MAX (
        CASE
        WHEN lower(variable_name) = 'tsoil_delta' THEN

        value

        END
    ) AS tsoil_delta,
    MAX (
        CASE
        WHEN lower(variable_name) = 's_sdm_out' THEN

        value

        END
    ) AS s_sdm_out,
    MAX (
        CASE
        WHEN lower(variable_name) = 'sbtemp' THEN

        value

        END
    ) AS sbtemp,
    MAX (
        CASE
        WHEN lower(variable_name) = 'heat_a_avg' THEN

        value

        END
    ) AS heat_a_avg,
    MAX (
        CASE
        WHEN lower(variable_name) = 'b_dc_avg' THEN

        value

        END
    ) AS b_dc_avg,
    MAX (
        CASE
        WHEN lower(variable_name) = 'targettemp' THEN

        value

        END
    ) AS targettemp,
    MAX (
        CASE
        WHEN lower(variable_name) = 's_scldout' THEN

        value

        END
    ) AS s_scldout,
    MAX (
        CASE
        WHEN lower(variable_name) = 'tmv' THEN

        value

        END
    ) AS tmv,
    MAX (
        CASE
        WHEN lower(variable_name) = 'a_dc' THEN

        value

        END
    ) AS a_dc,
    MAX (
        CASE
        WHEN lower(variable_name) = 'a_targettemp' THEN

        value

        END
    ) AS a_targettemp,
    MAX (
        CASE
        WHEN lower(variable_name) = 'scldout' THEN

        value

        END
    ) AS scldout,
    MAX (
        CASE
        WHEN lower(variable_name) = 'pid_lmt' THEN

        value

        END
    ) AS pid_lmt,
    MAX (
        CASE
        WHEN lower(variable_name) = 'targettemp_adj' THEN

        value

        END
    ) AS targettemp_adj,
    MAX (
        CASE
        WHEN lower(variable_name) = 'sdm_out' THEN

        value

        END
    ) AS sdm_out,
    MAX (
        CASE
        WHEN lower(variable_name) = 's_pid_lmt' THEN

        value

        END
    ) AS s_pid_lmt,
    MAX (
        CASE
        WHEN lower(variable_name) = 'tsoilr' THEN

        value

        END
    ) AS tsoilr,
    MAX (
        CASE
        WHEN lower(variable_name) = 's_pid_out' THEN

        value

        END
    ) AS s_pid_out,
    MAX (
        CASE
        WHEN lower(variable_name) = 'sctemp' THEN

        value

        END
    ) AS sctemp,
    MAX (
        CASE
        WHEN lower(variable_name) = 'amb_avg' THEN

        value

        END
    ) AS amb_avg
FROM
    derived_15min_joined
GROUP BY
    site,
    canopy,
    block,
    plot,
    measurement_interval,
    warming_treatment,
    treatment_code,
    treatment_abbr,
    water_treatment,
    variable_name,
    variable_channel,
    variable_id,
    rowid,
    time2,
    timestamp,
    block_name,
    table_name,
    batt_volt,
    program,
    flag1,
    flag2,
    flag3,
    amb_a_avg,
    amb_closed_avg,
    airtc_avg,
    airtemp_avg,
    airtemp_max,
    airtemp_min,
    all_avgt,
    am25tref1,
    ptemp_avg,
    rh,
    s_all_avgt_avg,
    vp_avg,
    tabove_sdupper,
    tabove_sdlower,
    tabove_meantrim,
    tabove_mean_dc,
    tsoil_sdupper,
    tsoil_sdlower,
    tsoil_meantrim,
    tsoil_mean_dc
;





pgsql-general by date:

Previous
From: Kirk Wythers
Date:
Subject: big un stacking query - help save me from myself
Next
From: Ian Lawrence Barwick
Date:
Subject: Re: Using psql to feed a file line by line to a table column