Thread: Solving my query needs with Rank and may be CrossTab

Solving my query needs with Rank and may be CrossTab

From

Iaam Onkara

Date:

01 December 2019, 22:54:02

Hi Friends,

I have a table with data like this gist https://gist.github.com/daya/d0794efcd4278fc5dce6e7339d03a8fd and I want to fetch the latest values for a given set of attributes so the result set looks like this gist https://gist.github.com/daya/0cb7f8682520a1dd4cdda8c0266f77f6

Please note in the desired result set

There is an assumed mapping of Code to display i.e. code 39156-5 is BMI.
"Oxygen Saturation" has only one value and "Pulse" has no value

In my attempts and with some help I have this query

with v_max as
(SELECT
code, uom, val, created_on, dense_rank() over ( partition by code order by created_on desc) as r
FROM vitals v
where (v.code = '8480-6' or v.code='8462-4' or v.code='39156-5' or v.code='8302-2')
)
SELECT c.display, uom, val, created_on
from v_max v inner join codes c on v.code=c.code
where r = 1;

which gives this result

But the result set that I want I am unable to get. Or if is it even possible to get?

Thanks for your help

Re: Solving my query needs with Rank and may be CrossTab

From

Rob Sargent

Date:

01 December 2019, 23:28:29

On Dec 1, 2019, at 3:54 PM, Iaam Onkara <iamonkara@gmail.com> wrote:

Hi Friends,

I have a table with data like this gist https://gist.github.com/daya/d0794efcd4278fc5dce6e7339d03a8fd and I want to fetch the latest values for a given set of attributes so the result set looks like this gist https://gist.github.com/daya/0cb7f8682520a1dd4cdda8c0266f77f6

Please note in the desired result set
There is an assumed mapping of Code to display i.e. code 39156-5 is BMI.
"Oxygen Saturation" has only one value and "Pulse" has no value
In my attempts and with some help I have this query

with v_max as
(SELECT
code, uom, val, created_on, dense_rank() over ( partition by code order by created_on desc) as r
FROM vitals v
where (v.code = '8480-6' or v.code='8462-4' or v.code='39156-5' or v.code='8302-2')
)
SELECT c.display, uom, val, created_on
from v_max v inner join codes c on v.code=c.code
where r = 1;

which gives this result

But the result set that I want I am unable to get. Or if is it even possible to get?

Thanks for your help

I take it the last value by timestamp per code per patient is the one to be reported? Or is there a time window?

Turning rows into columns can be done with sub-selects per derived column or (usually faster) temporary tables built up in separate selects with each adding usually one column (but possibly more).

Re: Solving my query needs with Rank and may be CrossTab

From

Iaam Onkara

Date:

01 December 2019, 23:38:17

@Rob: There is no time window. It is the latest values for given set of attributes regardless of timestamp. If some attributes have multiple values then multiple rows can be returned with other attributes having blank values.

Creating one sub select for one column is an obvious approach but will not be performant specially when the dataset grows, I am looking for a solution which doesn't require one sub select per column.

On Sun, Dec 1, 2019 at 5:28 PM Rob Sargent <robjsargent@gmail.com> wrote:

On Dec 1, 2019, at 3:54 PM, Iaam Onkara <iamonkara@gmail.com> wrote:

Hi Friends,

I have a table with data like this gist https://gist.github.com/daya/d0794efcd4278fc5dce6e7339d03a8fd and I want to fetch the latest values for a given set of attributes so the result set looks like this gist https://gist.github.com/daya/0cb7f8682520a1dd4cdda8c0266f77f6

Please note in the desired result set
There is an assumed mapping of Code to display i.e. code 39156-5 is BMI.
"Oxygen Saturation" has only one value and "Pulse" has no value
In my attempts and with some help I have this query

with v_max as
(SELECT
code, uom, val, created_on, dense_rank() over ( partition by code order by created_on desc) as r
FROM vitals v
where (v.code = '8480-6' or v.code='8462-4' or v.code='39156-5' or v.code='8302-2')
)
SELECT c.display, uom, val, created_on
from v_max v inner join codes c on v.code=c.code
where r = 1;

which gives this result

But the result set that I want I am unable to get. Or if is it even possible to get?

Thanks for your help

I take it the last value by timestamp per code per patient is the one to be reported? Or is there a time window?
Turning rows into columns can be done with sub-selects per derived column or (usually faster) temporary tables built up in separate selects with each adding usually one column (but possibly more).

Re: Solving my query needs with Rank and may be CrossTab

From

Rob Sargent

Date:

02 December 2019, 04:58:27

On Dec 1, 2019, at 4:38 PM, Iaam Onkara <iamonkara@gmail.com> wrote:

@Rob: There is no time window. It is the latest values for given set of attributes regardless of timestamp. If some attributes have multiple values then multiple rows can be returned with other attributes having blank values.

Creating one sub select for one column is an obvious approach but will not be performant specially when the dataset grows, I am looking for a solution which doesn't require one sub select per column.

On Sun, Dec 1, 2019 at 5:28 PM Rob Sargent <robjsargent@gmail.com> wrote:

On Dec 1, 2019, at 3:54 PM, Iaam Onkara <iamonkara@gmail.com> wrote:

Hi Friends,

I have a table with data like this gist https://gist.github.com/daya/d0794efcd4278fc5dce6e7339d03a8fd and I want to fetch the latest values for a given set of attributes so the result set looks like this gist https://gist.github.com/daya/0cb7f8682520a1dd4cdda8c0266f77f6

Please note in the desired result set
There is an assumed mapping of Code to display i.e. code 39156-5 is BMI.
"Oxygen Saturation" has only one value and "Pulse" has no value
In my attempts and with some help I have this query

with v_max as
(SELECT
code, uom, val, created_on, dense_rank() over ( partition by code order by created_on desc) as r
FROM vitals v
where (v.code = '8480-6' or v.code='8462-4' or v.code='39156-5' or v.code='8302-2')
)
SELECT c.display, uom, val, created_on
from v_max v inner join codes c on v.code=c.code
where r = 1;

which gives this result

But the result set that I want I am unable to get. Or if is it even possible to get?

Thanks for your help

I take it the last value by timestamp per code per patient is the one to be reported? Or is there a time window?
Turning rows into columns can be done with sub-selects per derived column or (usually faster) temporary tables built up in separate selects with each adding usually one column (but possibly more).

I agree the sub select per column can easily become slow. Incremental tables does not, in my experience, suffer the same problem. (One can also repeatedly update a single table with the predefined structure.) There maybe a way to get what you want with multiple CTEs but I suspect that approach would be more akin to multiple sub selects than to incremental tables.

From your further description of the problem it will be critical to have an index on the code AND time stamp columns of the source table.

Re: Solving my query needs with Rank and may be CrossTab

From

Iaam Onkara

Date:

02 December 2019, 06:09:03

Yes indexes on Code and Timestamp column may also be needed even though Patient_ID column will be indexed.

By incremental tables do you mean tables with Auto Increment primary key for ID ?

What I am having tough time figuring out is how to transform the result into this even after using multiple CTEs

On Sun, Dec 1, 2019 at 10:58 PM Rob Sargent <robjsargent@gmail.com> wrote:

On Dec 1, 2019, at 4:38 PM, Iaam Onkara <iamonkara@gmail.com> wrote:

@Rob: There is no time window. It is the latest values for given set of attributes regardless of timestamp. If some attributes have multiple values then multiple rows can be returned with other attributes having blank values.

Creating one sub select for one column is an obvious approach but will not be performant specially when the dataset grows, I am looking for a solution which doesn't require one sub select per column.

On Sun, Dec 1, 2019 at 5:28 PM Rob Sargent <robjsargent@gmail.com> wrote:

On Dec 1, 2019, at 3:54 PM, Iaam Onkara <iamonkara@gmail.com> wrote:

Hi Friends,

I have a table with data like this gist https://gist.github.com/daya/d0794efcd4278fc5dce6e7339d03a8fd and I want to fetch the latest values for a given set of attributes so the result set looks like this gist https://gist.github.com/daya/0cb7f8682520a1dd4cdda8c0266f77f6

Please note in the desired result set
There is an assumed mapping of Code to display i.e. code 39156-5 is BMI.
"Oxygen Saturation" has only one value and "Pulse" has no value
In my attempts and with some help I have this query

with v_max as
(SELECT
code, uom, val, created_on, dense_rank() over ( partition by code order by created_on desc) as r
FROM vitals v
where (v.code = '8480-6' or v.code='8462-4' or v.code='39156-5' or v.code='8302-2')
)
SELECT c.display, uom, val, created_on
from v_max v inner join codes c on v.code=c.code
where r = 1;

which gives this result

But the result set that I want I am unable to get. Or if is it even possible to get?

Thanks for your help

I take it the last value by timestamp per code per patient is the one to be reported? Or is there a time window?
Turning rows into columns can be done with sub-selects per derived column or (usually faster) temporary tables built up in separate selects with each adding usually one column (but possibly more).

I agree the sub select per column can easily become slow. Incremental tables does not, in my experience, suffer the same problem. (One can also repeatedly update a single table with the predefined structure.) There maybe a way to get what you want with multiple CTEs but I suspect that approach would be more akin to multiple sub selects than to incremental tables.
From your further description of the problem it will be critical to have an index on the code AND time stamp columns of the source table.

Re: Solving my query needs with Rank and may be CrossTab

From

Rob Sargent

Date:

02 December 2019, 13:42:28

On Dec 1, 2019, at 11:09 PM, Iaam Onkara <iamonkara@gmail.com> wrote:

Yes indexes on Code and Timestamp column may also be needed even though Patient_ID column will be indexed.

I believe you will want a compound index covering both columns

By incremental tables do you mean tables with Auto Increment primary key for ID

No. I mean a series of intermediate tables each with one more report column. These can be temporary and unlogged but the will need an index on patient. (Again, you have the option of predefining the full report table and repeatedly updating a single column.)

What I am having tough time figuring out is how to transform the result into this even after using multiple CTEs

On Sun, Dec 1, 2019 at 10:58 PM Rob Sargent <robjsargent@gmail.com> wrote:

On Dec 1, 2019, at 4:38 PM, Iaam Onkara <iamonkara@gmail.com> wrote:

@Rob: There is no time window. It is the latest values for given set of attributes regardless of timestamp. If some attributes have multiple values then multiple rows can be returned with other attributes having blank values.

Creating one sub select for one column is an obvious approach but will not be performant specially when the dataset grows, I am looking for a solution which doesn't require one sub select per column.

On Sun, Dec 1, 2019 at 5:28 PM Rob Sargent <robjsargent@gmail.com> wrote:

On Dec 1, 2019, at 3:54 PM, Iaam Onkara <iamonkara@gmail.com> wrote:

Hi Friends,

I have a table with data like this gist https://gist.github.com/daya/d0794efcd4278fc5dce6e7339d03a8fd and I want to fetch the latest values for a given set of attributes so the result set looks like this gist https://gist.github.com/daya/0cb7f8682520a1dd4cdda8c0266f77f6

Please note in the desired result set
There is an assumed mapping of Code to display i.e. code 39156-5 is BMI.
"Oxygen Saturation" has only one value and "Pulse" has no value
In my attempts and with some help I have this query

with v_max as
(SELECT
code, uom, val, created_on, dense_rank() over ( partition by code order by created_on desc) as r
FROM vitals v
where (v.code = '8480-6' or v.code='8462-4' or v.code='39156-5' or v.code='8302-2')
)
SELECT c.display, uom, val, created_on
from v_max v inner join codes c on v.code=c.code
where r = 1;

which gives this result

But the result set that I want I am unable to get. Or if is it even possible to get?

Thanks for your help

I take it the last value by timestamp per code per patient is the one to be reported? Or is there a time window?
Turning rows into columns can be done with sub-selects per derived column or (usually faster) temporary tables built up in separate selects with each adding usually one column (but possibly more).

I agree the sub select per column can easily become slow. Incremental tables does not, in my experience, suffer the same problem. (One can also repeatedly update a single table with the predefined structure.) There maybe a way to get what you want with multiple CTEs but I suspect that approach would be more akin to multiple sub selects than to incremental tables.
From your further description of the problem it will be critical to have an index on the code AND time stamp columns of the source table.

Re: Solving my query needs with Rank and may be CrossTab

From

Iaam Onkara

Date:

02 December 2019, 16:55:27

@Rob. What your referring to sounds like Materialized views, isn't it? An example query would be helpful in understand your recommendation/approach better.

On Mon, Dec 2, 2019 at 7:42 AM Rob Sargent <robjsargent@gmail.com> wrote:

On Dec 1, 2019, at 11:09 PM, Iaam Onkara <iamonkara@gmail.com> wrote:

Yes indexes on Code and Timestamp column may also be needed even though Patient_ID column will be indexed.

I believe you will want a compound index covering both columns
By incremental tables do you mean tables with Auto Increment primary key for ID
No. I mean a series of intermediate tables each with one more report column. These can be temporary and unlogged but the will need an index on patient. (Again, you have the option of predefining the full report table and repeatedly updating a single column.)

What I am having tough time figuring out is how to transform the result into this even after using multiple CTEs

On Sun, Dec 1, 2019 at 10:58 PM Rob Sargent <robjsargent@gmail.com> wrote:

On Dec 1, 2019, at 4:38 PM, Iaam Onkara <iamonkara@gmail.com> wrote:

@Rob: There is no time window. It is the latest values for given set of attributes regardless of timestamp. If some attributes have multiple values then multiple rows can be returned with other attributes having blank values.

Creating one sub select for one column is an obvious approach but will not be performant specially when the dataset grows, I am looking for a solution which doesn't require one sub select per column.

On Sun, Dec 1, 2019 at 5:28 PM Rob Sargent <robjsargent@gmail.com> wrote:

On Dec 1, 2019, at 3:54 PM, Iaam Onkara <iamonkara@gmail.com> wrote:

Hi Friends,

I have a table with data like this gist https://gist.github.com/daya/d0794efcd4278fc5dce6e7339d03a8fd and I want to fetch the latest values for a given set of attributes so the result set looks like this gist https://gist.github.com/daya/0cb7f8682520a1dd4cdda8c0266f77f6

Please note in the desired result set
There is an assumed mapping of Code to display i.e. code 39156-5 is BMI.
"Oxygen Saturation" has only one value and "Pulse" has no value
In my attempts and with some help I have this query

with v_max as
(SELECT
code, uom, val, created_on, dense_rank() over ( partition by code order by created_on desc) as r
FROM vitals v
where (v.code = '8480-6' or v.code='8462-4' or v.code='39156-5' or v.code='8302-2')
)
SELECT c.display, uom, val, created_on
from v_max v inner join codes c on v.code=c.code
where r = 1;

which gives this result

But the result set that I want I am unable to get. Or if is it even possible to get?

Thanks for your help

I take it the last value by timestamp per code per patient is the one to be reported? Or is there a time window?
Turning rows into columns can be done with sub-selects per derived column or (usually faster) temporary tables built up in separate selects with each adding usually one column (but possibly more).

I agree the sub select per column can easily become slow. Incremental tables does not, in my experience, suffer the same problem. (One can also repeatedly update a single table with the predefined structure.) There maybe a way to get what you want with multiple CTEs but I suspect that approach would be more akin to multiple sub selects than to incremental tables.
From your further description of the problem it will be critical to have an index on the code AND time stamp columns of the source table.

Re: Solving my query needs with Rank and may be CrossTab

From

Rob Sargent

Date:

02 December 2019, 21:14:26

On Dec 2, 2019, at 9:55 AM, Iaam Onkara <iamonkara@gmail.com> wrote:

@Rob. What your referring to sounds like Materialized views, isn't it? An example query would be helpful in understand your recommendation/approach better.

On Mon, Dec 2, 2019 at 7:42 AM Rob Sargent <robjsargent@gmail.com> wrote:

Using the update-fixed-table style:

-- Get all possible people, null their values

create table report as

select distinct patient, null::float as bmi, null::float as sysbp, null::float as diabp, null::int as height

from source_table;

create index on report(patient);

-- get the height code (8302-2 using tilde operator because the import included leading blanks)

update report r set height = last_value

from (select distinct patient, last_value(measurement) over

(partition by patient, code

order by sampletime)

from source_table

where code ~ '8302-2') as m where r.patient = m.patient

;

-- then similar for other codes. You may want to format the results, as in combining sys/dia bp readings after the update operations

-- the time drag of course is forever finding max(measurement time). A composite index might help; indeed the unique key on the source is patient,code,timestamp I think.