Thread: Problem with Crosstab - Allocating value to wrong column
Hi there,
I have a database with statistical tables which look like this:
id | year_start | value
3 1960 736.1
3 1961 723.3
....
4 1960 123.4
Years can cover 40 years for each of the countries (ID) and each (variable) table .
Now, if for a given year there is no value, expressed not in form of NoData, but as missing line in the table, than I have a problem with this SQL:
SELECT
*
FROM
crosstab( 'SELECT
d.year_start AS year,
d.value AS value
FROM
co2_total_cdiac AS d
RIGHT JOIN
countries_view AS c ON c.id = d.id_country
WHERE
(d.year_start = 1960 OR d.year_start = 2007 )
ORDER BY
1,2;', 3) AS ct (name varchar, y_1960 numeric, y_2007 numeric)
For one of the countries, I have a value for 2007, but not for 1960. When using only the inner query, than I see one line: Andorra - 2007 - 539
But when running the whole SQL, the value for year 2007 get's allocated to the year 1960. The table looks as follows:
name | y_1960 | y_2007
Andorra 539 NULL
(Not sure if the copy/pasted part displays correct:
name | y_1960 | y_2007 |
---|---|---|
Afghanistan | 415 | 716 |
Albania | 2026 | 4246 |
Algeria | 6166 | 140234 |
Andorra | 539 | NULL |
)
Can anyone give me a hint in which direction to search for a solution?
Thanks a lot!
Stefan
On 10/19/2010 03:07 AM, Stefan Schwarzer wrote: > For one of the countries, I have a value for 2007, but not for 1960. > When using only the inner query, than I see one line: Andorra - 2007 > - 539 But when running the whole SQL, the value for year 2007 get's > allocated to the year 1960. The table looks as follows: > > name | y_1960 | y_2007 > Andorra | 539 | NULL That is documented behavior. See: http://www.postgresql.org/docs/8.4/interactive/tablefunc.html > F.33.1.2. crosstab(text) ... > The crosstab function produces one output row for each consecutive > group of input rows with the same row_name value. It fills the output > value columns, left to right, with the value fields from these rows. > If there are fewer rows in a group than there are output value > columns, the extra output columns are filled with nulls; if there are > more rows, the extra input rows are skipped. You probably want the other form of crosstab > F.33.1.4. crosstab(text, text) > The main limitation of the single-parameter form of crosstab is that > it treats all values in a group alike, inserting each value into the > first available column. If you want the value columns to correspond to > specific categories of data, and some groups might not have data for > some of the categories, that doesn't work well. The two-parameter form > of crosstab handles this case by providing an explicit list of the > categories corresponding to the output columns. HTH, Joe -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, & 24x7 Support
Attachment
>> For one of the countries, I have a value for 2007, but not for 1960. >> When using only the inner query, than I see one line: Andorra - 2007 >> - 539 But when running the whole SQL, the value for year 2007 get's >> allocated to the year 1960. The table looks as follows: >> >> name | y_1960 | y_2007 >> Andorra | 539 | NULL > > > That is documented behavior. See: > http://www.postgresql.org/docs/8.4/interactive/tablefunc.html > > You probably want the other form of crosstab > >> > F.33.1.4. crosstab(text, text) Thanks a lot for the help. Indeed, that should be the one which should do it. Unfortunately, I ran into an error message: I have this query: SELECT * FROM crosstab( 'SELECT c.name AS name, d.year_start AS year, d.value AS value FROM co2_total_cdiac AS d RIGHT JOIN countries_view AS c ON c.id = d.id_country WHERE (d.year_start = 1960 OR d.year_start = 1965 OR d.year_start = 2003 OR d.year_start = 2007 ) ORDER BY 1', 'SELECT DISTINCT ''y_'' || year_start AS year FROM co2_total_cdiac AS d WHERE (d.year_start = 1960 OR d.year_start = 1965 OR d.year_start = 2003 OR d.year_start = 2007 ) ORDER BY 1' ) AS ct(name varchar, y_1960 numeric, y_1965 numeric, y_2003 numeric, y_2007 numeric) Now, I get an error message: ERROR: invalid input syntax for integer: "SELECT DISTINCT 'y_' || year_start AS year FROM co2_total_cdiac AS d WHERE (d.year_start = 1960 OR d.year_start = 1965 OR d.year_start = 2003 OR d.year_start = 2007 ) ORDER BY 1" LINE 15: 'SELECT ^ I need to convert the integer values for the years into column names, i.e. "1965" into "y_1965". How do I achieve this then? Thanks for any help! Stef
>> For one of the countries, I have a value for 2007, but not for 1960. >> When using only the inner query, than I see one line: Andorra - 2007 >> - 539 But when running the whole SQL, the value for year 2007 get's >> allocated to the year 1960. The table looks as follows: >> >> name | y_1960 | y_2007 >> Andorra | 539 | NULL > > > That is documented behavior. See: > http://www.postgresql.org/docs/8.4/interactive/tablefunc.html > > You probably want the other form of crosstab > >> > F.33.1.4. crosstab(text, text) Thanks a lot for the help. Indeed, that should be the one which should do it. Unfortunately, I ran into an error message: I have this query: SELECT * FROM crosstab( 'SELECT c.name AS name, d.year_start AS year, d.value AS value FROM co2_total_cdiac AS d RIGHT JOIN countries_view AS c ON c.id = d.id_country WHERE (d.year_start = 1960 OR d.year_start = 1965 OR d.year_start = 2003 OR d.year_start = 2007 ) ORDER BY 1', 'SELECT DISTINCT ''y_'' || year_start AS year FROM co2_total_cdiac AS d WHERE (d.year_start = 1960 OR d.year_start = 1965 OR d.year_start = 2003 OR d.year_start = 2007 ) ORDER BY 1' ) AS ct(name varchar, y_1960 numeric, y_1965 numeric, y_2003 numeric, y_2007 numeric) Now, I get an error message: ERROR: invalid input syntax for integer: "SELECT DISTINCT 'y_' || year_start AS year FROM co2_total_cdiac AS d WHERE (d.year_start = 1960 OR d.year_start = 1965 OR d.year_start = 2003 OR d.year_start = 2007 ) ORDER BY 1" LINE 15: 'SELECT ^ I need to convert the integer values for the years into column names, i.e. "1965" into "y_1965". How do I achieve this then? Thanks for any help! Stef
On 11/01/2010 06:24 AM, Stefan Schwarzer wrote: > I need to convert the integer values for the years into column names, i.e. "1965" into "y_1965". How do I achieve thisthen? Try something like: create table foo ( name text, year_start int, value float8); insert into foo values('a',2010,1.23),('b',2011,2.34); SELECT * FROM crosstab( 'SELECT name, year_start, value FROM foo ORDER BY 1', 'SELECT DISTINCT year_start FROM foo' ) AS ct(name varchar, y_2010 float8, y_2011 float8); name | y_2010 | y_2011 ------+--------+-------- a | | 1.23 b | 2.34 | (2 rows) HTH, Joe -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, & 24x7 Support
Attachment
>> I need to convert the integer values for the years into column names, i.e. "1965" into "y_1965". How do I achieve thisthen? > > Try something like: > > create table foo ( > name text, > year_start int, > value float8); > > insert into foo values('a',2010,1.23),('b',2011,2.34); > > SELECT * FROM > crosstab( > 'SELECT name, year_start, value FROM foo ORDER BY 1', > 'SELECT DISTINCT year_start FROM foo' > ) > AS ct(name varchar, y_2010 float8, y_2011 float8); > > name | y_2010 | y_2011 > ------+--------+-------- > a | | 1.23 > b | 2.34 | > (2 rows) Hi Joe. Thanks a lot for the suggestions. Tried it out, but same error message: ERROR: invalid input syntax for integer: "SELECT DISTINCT year_start FROM foo" LINE 4: 'SELECT DISTINCT year_start FROM foo' ^ Did this work for you? Then this would indeed be strange. I wonder if the second crosstab SQL must have the same column names as the final output or not ("2010" vs. "y_2010"). Anyone can help me out? Thanks a lot for any tips! Stef
On 11/1/10 11:54 PM, Stefan Schwarzer wrote: >> SELECT * FROM >> crosstab( >> 'SELECT name, year_start, value FROM foo ORDER BY 1', >> 'SELECT DISTINCT year_start FROM foo' >> ) >> AS ct(name varchar, y_2010 float8, y_2011 float8); > Hi Joe. Thanks a lot for the suggestions. Tried it out, but same error message: > > ERROR: invalid input syntax for integer: "SELECT DISTINCT year_start FROM foo" > LINE 4: 'SELECT DISTINCT year_start FROM foo' > ^ > > Did this work for you? Then this would indeed be strange. > > I wonder if the second crosstab SQL must have the same column names as the final output or not ("2010" vs. "y_2010"). > > Anyone can help me out? Thanks a lot for any tips! What version of PostgreSQL are you running? The error seems to indicate that you don't have the crosstab(text,text) form of the function. In psql do: contrib_regression=# \df crosstab List of functions Schema | Name | Result data type | Argument data types | Type --------+----------+------------------+---------------------+-------- public | crosstab | SETOF record | text | normal public | crosstab | SETOF record | text, integer | normal public | crosstab | SETOF record | text, text | normal (3 rows) Does it look like this? Joe -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, & 24x7 Support
> What version of PostgreSQL are you running? The error seems to indicate > that you don't have the crosstab(text,text) form of the function. > > In psql do: > contrib_regression=# \df crosstab > List of functions > Schema | Name | Result data type | Argument data types | Type > --------+----------+------------------+---------------------+-------- > public | crosstab | SETOF record | text | normal > public | crosstab | SETOF record | text, integer | normal > public | crosstab | SETOF record | text, text | normal > (3 rows) > > Does it look like this? Ha, that's a thing! Indeed, it looks like this in my case: Schema | Name | Result data type | Argument data types | Type --------+----------+------------------+---------------------+-------- public | crosstab | SETOF record | text, integer | normal (1 row) I am running version 8.4.3, on Mac with Kingchaos libraries. Installed the contrib/crosstab myself afterwards. What can I do now to include the other functions too? Or is this only with newer versions of Crosstab? Thanks a lot for your help! Stef
>> What version of PostgreSQL are you running? The error seems to indicate >> that you don't have the crosstab(text,text) form of the function. >> >> In psql do: >> contrib_regression=# \df crosstab >> List of functions >> Schema | Name | Result data type | Argument data types | Type >> --------+----------+------------------+---------------------+-------- >> public | crosstab | SETOF record | text | normal >> public | crosstab | SETOF record | text, integer | normal >> public | crosstab | SETOF record | text, text | normal >> (3 rows) >> >> Does it look like this? > > Ha, that's a thing! Indeed, it looks like this in my case: > > > Schema | Name | Result data type | Argument data types | Type > --------+----------+------------------+---------------------+-------- > public | crosstab | SETOF record | text, integer | normal > (1 row) > > > I am running version 8.4.3, on Mac with Kingchaos libraries. Installed the contrib/crosstab myself afterwards. > > What can I do now to include the other functions too? Or is this only with newer versions of Crosstab? Ok, got it re-compiled and re-inserted, and now they are there, these functions. Thanks a lot for your help!