Thread: Multithreaded query onto 4 postgresql instances

Multithreaded query onto 4 postgresql instances

From
Alessandro Candini
Date:
Here you are my probably uncommon situation.

I have installed 4 different instances of postgresql-9.0.2 on the same
machine, on ports 5433, 5434, 5435, 5436.
On these instances I have splitted a huge database, dividing it per date
(from 1995 to 1998 on 5433, from 1999 to 2002 on 5434 and so on...).
Then I have developed a C function using libpq which creates 4 threads,
each one which query a 1/4 of the db. After that I merge the results in
one single response.

My function works fine, but I need to include it inside a postgresql
instance in order to launch it as a normal SQL query (SELECT myfunc(...);).
Why I have to do something tricky like this is long too explain...

I have read the documentation here
http://www.postgresql.org/docs/9.0/static/xfunc-sql.html#XFUNC-SQL-FUNCTIONS-RETURNING-SET,
but I'm pretty confuse and I don't know if this is the right way to
accomplish my goal.

Have you got any ideas or suggestions?

Thanks a lot!

--
Alessandro Candini

MEEO S.r.l.
Via Saragat 9
I-44122 Ferrara, Italy
Tel: +39 0532 1861501
Fax: +39 0532 1861637
http://www.meeo.it

========================================
"ATTENZIONE:le informazioni contenute in questo messaggio sono
da considerarsi confidenziali ed il loro utilizzo è riservato unicamente
al destinatario sopra indicato. Chi dovesse ricevere questo messaggio
per errore è tenuto ad informare il mittente ed a rimuoverlo
definitivamente da ogni supporto elettronico o cartaceo."

"WARNING:This message contains confidential and/or proprietary
information which may be subject to privilege or immunity and which
is intended for use of its addressee only. Should you receive this
message in error, you are kindly requested to inform the sender and
to definitively remove it from any paper or electronic format."


Re: Multithreaded query onto 4 postgresql instances

From
pasman pasmański
Date:
I think this is bad idea. Better you use cursors.

2011/2/10, Alessandro Candini <candini@meeo.it>:
> Here you are my probably uncommon situation.
>
> I have installed 4 different instances of postgresql-9.0.2 on the same
> machine, on ports 5433, 5434, 5435, 5436.
> On these instances I have splitted a huge database, dividing it per date
> (from 1995 to 1998 on 5433, from 1999 to 2002 on 5434 and so on...).
> Then I have developed a C function using libpq which creates 4 threads,
> each one which query a 1/4 of the db. After that I merge the results in
> one single response.
>
> My function works fine, but I need to include it inside a postgresql
> instance in order to launch it as a normal SQL query (SELECT myfunc(...);).
> Why I have to do something tricky like this is long too explain...
>
> I have read the documentation here
> http://www.postgresql.org/docs/9.0/static/xfunc-sql.html#XFUNC-SQL-FUNCTIONS-RETURNING-SET,
> but I'm pretty confuse and I don't know if this is the right way to
> accomplish my goal.
>
> Have you got any ideas or suggestions?
>
> Thanks a lot!
>
> --
> Alessandro Candini
>
> MEEO S.r.l.
> Via Saragat 9
> I-44122 Ferrara, Italy
> Tel: +39 0532 1861501
> Fax: +39 0532 1861637
> http://www.meeo.it
>
> ========================================
> "ATTENZIONE:le informazioni contenute in questo messaggio sono
> da considerarsi confidenziali ed il loro utilizzo è riservato unicamente
> al destinatario sopra indicato. Chi dovesse ricevere questo messaggio
> per errore è tenuto ad informare il mittente ed a rimuoverlo
> definitivamente da ogni supporto elettronico o cartaceo."
>
> "WARNING:This message contains confidential and/or proprietary
> information which may be subject to privilege or immunity and which
> is intended for use of its addressee only. Should you receive this
> message in error, you are kindly requested to inform the sender and
> to definitively remove it from any paper or electronic format."
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


--
------------
pasman

Re: Multithreaded query onto 4 postgresql instances

From
Allan Kamau
Date:
> 2011/2/10, Alessandro Candini <candini@meeo.it>:
>> Here you are my probably uncommon situation.
>>
>> I have installed 4 different instances of postgresql-9.0.2 on the same
>> machine, on ports 5433, 5434, 5435, 5436.
>> On these instances I have splitted a huge database, dividing it per date
>> (from 1995 to 1998 on 5433, from 1999 to 2002 on 5434 and so on...).
>> Then I have developed a C function using libpq which creates 4 threads,
>> each one which query a 1/4 of the db. After that I merge the results in
>> one single response.
>>
>> My function works fine, but I need to include it inside a postgresql
>> instance in order to launch it as a normal SQL query (SELECT myfunc(...);).
>> Why I have to do something tricky like this is long too explain...
>>
>> I have read the documentation here
>> http://www.postgresql.org/docs/9.0/static/xfunc-sql.html#XFUNC-SQL-FUNCTIONS-RETURNING-SET,
>> but I'm pretty confuse and I don't know if this is the right way to
>> accomplish my goal.
>>
>> Have you got any ideas or suggestions?
>>
>> Thanks a lot!
>>
>> --
>> Alessandro Candini
>>
>> MEEO S.r.l.
>> Via Saragat 9
>> I-44122 Ferrara, Italy
>> Tel: +39 0532 1861501
>> Fax: +39 0532 1861637
>> http://www.meeo.it
>>
>> ========================================
>> "ATTENZIONE:le informazioni contenute in questo messaggio sono
>> da considerarsi confidenziali ed il loro utilizzo è riservato unicamente
>> al destinatario sopra indicato. Chi dovesse ricevere questo messaggio
>> per errore è tenuto ad informare il mittente ed a rimuoverlo
>> definitivamente da ogni supporto elettronico o cartaceo."
>>
>> "WARNING:This message contains confidential and/or proprietary
>> information which may be subject to privilege or immunity and which
>> is intended for use of its addressee only. Should you receive this
>> message in error, you are kindly requested to inform the sender and
>> to definitively remove it from any paper or electronic format."
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
>
> --
> ------------
> pasman
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

2011/2/12 pasman pasmański <pasman.p@gmail.com>:
> I think this is bad idea. Better you use cursors.
>


Instead of splitting a database into several smaller database to
reside on the same hardware, it may be better to make use of data
partitioning and/or making use of compound indexes where the left most
field of such index would be the field which holds the values you've
implicitly partitioned, in your case this field would be "year".


Regards,
Allan.

Re: Multithreaded query onto 4 postgresql instances

From
Alban Hertroys
Date:
On 10 Feb 2011, at 9:01, Alessandro Candini wrote:

> I have installed 4 different instances of postgresql-9.0.2 on the same machine, on ports 5433, 5434, 5435, 5436.

I do hope you intend to put those databases on different machines eventually, or some such? Otherwise you probably
didn'tgain anything by splitting your database up like that - you've just reduced the available resources on that
singlemachine. 

> Why I have to do something tricky like this is long too explain...

It would help to know what you're trying to achieve by splitting your database up like this. We don't need the full
story,just a summary is fine; Maybe this is some experimental setup that's more related to multi-threading than to the
actualdatabase design? Maybe management smoked something outlandish and put you up with this? Maybe this is a
macroscopicattempt to table partitioning? 

> Then I have developed a C function using libpq which creates 4 threads, each one which query a 1/4 of the db. After
thatI merge the results in one single response. 
>
> My function works fine, but I need to include it inside a postgresql instance in order to launch it as a normal SQL
query(SELECT myfunc(...);). 

> I have read the documentation here
http://www.postgresql.org/docs/9.0/static/xfunc-sql.html#XFUNC-SQL-FUNCTIONS-RETURNING-SET,but I'm pretty confuse and I
don'tknow if this is the right way to accomplish my goal. 

What are you confused about? That's a chapter about set-returning functions written in SQL - perhaps you're looking for
informationabout set-returning functions from an external library like yours? Perhaps you're looking for:
http://www.postgresql.org/docs/9.0/interactive/xfunc-c.html

You say you don't know if this is the right way to accomplish your goal - which is...?

> Have you got any ideas or suggestions?


As others have suggested, you should probably have a look at table-partitioning, possibly in combination with
tablespacesif you want to divide your database among multiple disks/filesystems. Putting them in different servers on
thesame hardware is probably not going to be a very good solution. As you've already found out, it makes querying the
datasilly difficult. But, we don't know the reason you're doing that of course. 

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4d567a3611733933816998!



Re: Multithreaded query onto 4 postgresql instances

From
"mark"
Date:
> Otherwise you probably didn't gain anything
> by splitting your database up like that - you've just reduced the
> available resources on that single machine.
>

Unless that single machine has more resource than a single PG instance can
consume to satisfy one query. (see how much faster a greenplum single node
can be at some workloads on a server with lots of disk)

What is commonly considered commodity hardware often has plenty of spare
resources if deployed with stock PG in a data warehouse roll and doesn't
have many queries being run against it at any given time.


Of course if he doesn't have spare resources on the machine he might just be
making his life worse, and far more complex.


Just my thoughts, I don't consider myself an expert on the subject matter.
Mark









Re: Multithreaded query onto 4 postgresql instances

From
Alessandro Candini
Date:
No, this database is on a single machine, but a very powerful one.
Processors with 16 cores each and ssd disks.

I already use partitioning and tablespaces for every instance of my db
and I gain a lot with my splitted configuration.
My db is pretty huge: 600 milions of records and partitioning is not
enough...
I performed tests with a query returning more or less 100000 records and
using my C module I obtain the following results (every test performed
cleaning cache before):
- single db: 9.555 sec
- splitted in 4: 5.496 sec

So I think this can be a good approach...
I have already read this

http://www.postgresql.org/docs/9.0/interactive/xfunc-c.html

I posted it in my previous thread. But it is not clear to me how to
embed my C function into postgresql.
I mean, I know how to compile and insert it into postgresql, I've
already done it for simple function which return a boolean, for example.
But it is not clear to me how to do this if I want as return a complete
table, or set of rows...
Can you give me a minimalistic example?

By the way, my goal is to perform a SELECT query in the smallest time
possible.
For that reason an SQL function is not good beacuse slower than a C
function...

Thank you!

> On 10 Feb 2011, at 9:01, Alessandro Candini wrote:
>
>> I have installed 4 different instances of postgresql-9.0.2 on the same machine, on ports 5433, 5434, 5435, 5436.
> I do hope you intend to put those databases on different machines eventually, or some such? Otherwise you probably
didn'tgain anything by splitting your database up like that - you've just reduced the available resources on that
singlemachine. 
>
>> Why I have to do something tricky like this is long too explain...
> It would help to know what you're trying to achieve by splitting your database up like this. We don't need the full
story,just a summary is fine; Maybe this is some experimental setup that's more related to multi-threading than to the
actualdatabase design? Maybe management smoked something outlandish and put you up with this? Maybe this is a
macroscopicattempt to table partitioning? 
>
>> Then I have developed a C function using libpq which creates 4 threads, each one which query a 1/4 of the db. After
thatI merge the results in one single response. 
>>
>> My function works fine, but I need to include it inside a postgresql instance in order to launch it as a normal SQL
query(SELECT myfunc(...);). 
>> I have read the documentation here
http://www.postgresql.org/docs/9.0/static/xfunc-sql.html#XFUNC-SQL-FUNCTIONS-RETURNING-SET,but I'm pretty confuse and I
don'tknow if this is the right way to accomplish my goal. 
> What are you confused about? That's a chapter about set-returning functions written in SQL - perhaps you're looking
forinformation about set-returning functions from an external library like yours? Perhaps you're looking for:
http://www.postgresql.org/docs/9.0/interactive/xfunc-c.html
>
> You say you don't know if this is the right way to accomplish your goal - which is...?
>
>> Have you got any ideas or suggestions?
>
> As others have suggested, you should probably have a look at table-partitioning, possibly in combination with
tablespacesif you want to divide your database among multiple disks/filesystems. Putting them in different servers on
thesame hardware is probably not going to be a very good solution. As you've already found out, it makes querying the
datasilly difficult. But, we don't know the reason you're doing that of course. 
>
> Alban Hertroys
>
> --
> If you can't see the forest for the trees,
> cut the trees and you'll see there is no forest.
>
>
> !DSPAM:1234,4d567a2a11731320518513!
>
>


--
Alessandro Candini

MEEO S.r.l.
Via Saragat 9
I-44122 Ferrara, Italy
Tel: +39 0532 1861501
Fax: +39 0532 1861637
http://www.meeo.it

========================================
"ATTENZIONE:le informazioni contenute in questo messaggio sono
da considerarsi confidenziali ed il loro utilizzo è riservato unicamente
al destinatario sopra indicato. Chi dovesse ricevere questo messaggio
per errore è tenuto ad informare il mittente ed a rimuoverlo
definitivamente da ogni supporto elettronico o cartaceo."

"WARNING:This message contains confidential and/or proprietary
information which may be subject to privilege or immunity and which
is intended for use of its addressee only. Should you receive this
message in error, you are kindly requested to inform the sender and
to definitively remove it from any paper or electronic format."


Re: Multithreaded query onto 4 postgresql instances

From
Alessandro Candini
Date:
For shure my life is more complex with this configuration :-D

But the performance tests that I performed (described in the prevoius
thread) tell me that this is a good way...
>> Otherwise you probably didn't gain anything
>> by splitting your database up like that - you've just reduced the
>> available resources on that single machine.
>>
> Unless that single machine has more resource than a single PG instance can
> consume to satisfy one query. (see how much faster a greenplum single node
> can be at some workloads on a server with lots of disk)
>
> What is commonly considered commodity hardware often has plenty of spare
> resources if deployed with stock PG in a data warehouse roll and doesn't
> have many queries being run against it at any given time.
>
>
> Of course if he doesn't have spare resources on the machine he might just be
> making his life worse, and far more complex.
>
>
> Just my thoughts, I don't consider myself an expert on the subject matter.
> Mark
>
>
>
>
>
>
>
>


--
Alessandro Candini

MEEO S.r.l.
Via Saragat 9
I-44122 Ferrara, Italy
Tel: +39 0532 1861501
Fax: +39 0532 1861637
http://www.meeo.it

========================================
"ATTENZIONE:le informazioni contenute in questo messaggio sono
da considerarsi confidenziali ed il loro utilizzo è riservato unicamente
al destinatario sopra indicato. Chi dovesse ricevere questo messaggio
per errore è tenuto ad informare il mittente ed a rimuoverlo
definitivamente da ogni supporto elettronico o cartaceo."

"WARNING:This message contains confidential and/or proprietary
information which may be subject to privilege or immunity and which
is intended for use of its addressee only. Should you receive this
message in error, you are kindly requested to inform the sender and
to definitively remove it from any paper or electronic format."


Re: Multithreaded query onto 4 postgresql instances

From
Alban Hertroys
Date:
On 14 Feb 2011, at 9:38, Alessandro Candini wrote:

> I performed tests with a query returning more or less 100000 records and using my C module I obtain the following
results(every test performed cleaning cache before): 
> - single db: 9.555 sec
> - splitted in 4: 5.496 sec

Is that a single query on that one DB compared to 4 queries on 4 DB's? How does a single DB with 4 parallel queries
perform?I'd expect that to win from 4 DB's, due to the overhead those extra DB instances are generating. 
I do suppose you tried tuning that DB before you started making things complicated?

> So I think this can be a good approach...
> I have already read this
>
> http://www.postgresql.org/docs/9.0/interactive/xfunc-c.html

Did you read all the way to section 35.9.10? That explains how to create SRF's like yours, including examples. If the
stepfrom simple functions to SRF's is too large for you, create a few sample-functions to learn how the intermediary
stepswork. We can't upload the info to your brain, after all - you'll have to do the learning part by yourself. 

> I posted it in my previous thread. But it is not clear to me how to embed my C function into postgresql.
> I mean, I know how to compile and insert it into postgresql, I've already done it for simple function which return a
boolean,for example. 
> But it is not clear to me how to do this if I want as return a complete table, or set of rows...
> Can you give me a minimalistic example?

I'm not aware of any difference between adding a function returning a scalar vs. one that returns a set. You have to
createa wrapper function to your library so that PG knows about your function, but apart from the return types (BOOLEAN
vs.SETOF something), that declaration is exactly the same. 

> By the way, my goal is to perform a SELECT query in the smallest time possible.
> For that reason an SQL function is not good beacuse slower than a C function...

I have no idea what you're referring to, since you're top-posting, but I never suggested to use SQL functions. I just
pointedout that the document you linked was about SQL functions instead of C functions, in the assumption that was what
wascausing your confusion. You didn't explain what you were confused about after all, I had to guess. 

>> On 10 Feb 2011, at 9:01, Alessandro Candini wrote:
>>
>>> I have installed 4 different instances of postgresql-9.0.2 on the same machine, on ports 5433, 5434, 5435, 5436.
>> I do hope you intend to put those databases on different machines eventually, or some such? Otherwise you probably
didn'tgain anything by splitting your database up like that - you've just reduced the available resources on that
singlemachine. 
>>
>>> Why I have to do something tricky like this is long too explain...
>> It would help to know what you're trying to achieve by splitting your database up like this. We don't need the full
story,just a summary is fine; Maybe this is some experimental setup that's more related to multi-threading than to the
actualdatabase design? Maybe management smoked something outlandish and put you up with this? Maybe this is a
macroscopicattempt to table partitioning? 
>>
>>> Then I have developed a C function using libpq which creates 4 threads, each one which query a 1/4 of the db. After
thatI merge the results in one single response. 
>>>
>>> My function works fine, but I need to include it inside a postgresql instance in order to launch it as a normal SQL
query(SELECT myfunc(...);). 
>>> I have read the documentation here
http://www.postgresql.org/docs/9.0/static/xfunc-sql.html#XFUNC-SQL-FUNCTIONS-RETURNING-SET,but I'm pretty confuse and I
don'tknow if this is the right way to accomplish my goal. 
>> What are you confused about? That's a chapter about set-returning functions written in SQL - perhaps you're looking
forinformation about set-returning functions from an external library like yours? Perhaps you're looking for:
http://www.postgresql.org/docs/9.0/interactive/xfunc-c.html
>>
>> You say you don't know if this is the right way to accomplish your goal - which is...?
>>
>>> Have you got any ideas or suggestions?
>>
>> As others have suggested, you should probably have a look at table-partitioning, possibly in combination with
tablespacesif you want to divide your database among multiple disks/filesystems. Putting them in different servers on
thesame hardware is probably not going to be a very good solution. As you've already found out, it makes querying the
datasilly difficult. But, we don't know the reason you're doing that of course. 
>>
>> Alban Hertroys
>>
>> --
>> If you can't see the forest for the trees,
>> cut the trees and you'll see there is no forest.
>>
>>
>>
>>
>>
>
>
> --
> Alessandro Candini
>
> MEEO S.r.l.
> Via Saragat 9
> I-44122 Ferrara, Italy
> Tel: +39 0532 1861501
> Fax: +39 0532 1861637
> http://www.meeo.it
>
> ========================================
> "ATTENZIONE:le informazioni contenute in questo messaggio sono
> da considerarsi confidenziali ed il loro utilizzo è riservato unicamente
> al destinatario sopra indicato. Chi dovesse ricevere questo messaggio
> per errore è tenuto ad informare il mittente ed a rimuoverlo
> definitivamente da ogni supporto elettronico o cartaceo."
>
> "WARNING:This message contains confidential and/or proprietary
> information which may be subject to privilege or immunity and which
> is intended for use of its addressee only. Should you receive this
> message in error, you are kindly requested to inform the sender and
> to definitively remove it from any paper or electronic format."
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Alban Hertroys

--
Screwing up is an excellent way to attach something to the ceiling.


!DSPAM:737,4d59822411731063283927!



Re: Multithreaded query onto 4 postgresql instances

From
Allan Kamau
Date:
On Mon, Feb 14, 2011 at 10:38 AM, Alessandro Candini <candini@meeo.it> wrote:
> No, this database is on a single machine, but a very powerful one.
> Processors with 16 cores each and ssd disks.
>
> I already use partitioning and tablespaces for every instance of my db and I
> gain a lot with my splitted configuration.
> My db is pretty huge: 600 milions of records and partitioning is not
> enough...
> I performed tests with a query returning more or less 100000 records and
> using my C module I obtain the following results (every test performed
> cleaning cache before):
> - single db: 9.555 sec
> - splitted in 4: 5.496 sec
>
> So I think this can be a good approach...
> I have already read this
>
> http://www.postgresql.org/docs/9.0/interactive/xfunc-c.html
>
> I posted it in my previous thread. But it is not clear to me how to embed my
> C function into postgresql.
> I mean, I know how to compile and insert it into postgresql, I've already
> done it for simple function which return a boolean, for example.
> But it is not clear to me how to do this if I want as return a complete
> table, or set of rows...
> Can you give me a minimalistic example?
>
> By the way, my goal is to perform a SELECT query in the smallest time
> possible.
> For that reason an SQL function is not good beacuse slower than a C
> function...
>
> Thank you!
>
>> On 10 Feb 2011, at 9:01, Alessandro Candini wrote:
>>
>>> I have installed 4 different instances of postgresql-9.0.2 on the same
>>> machine, on ports 5433, 5434, 5435, 5436.
>>
>> I do hope you intend to put those databases on different machines
>> eventually, or some such? Otherwise you probably didn't gain anything by
>> splitting your database up like that - you've just reduced the available
>> resources on that single machine.
>>
>>> Why I have to do something tricky like this is long too explain...
>>
>> It would help to know what you're trying to achieve by splitting your
>> database up like this. We don't need the full story, just a summary is fine;
>> Maybe this is some experimental setup that's more related to multi-threading
>> than to the actual database design? Maybe management smoked something
>> outlandish and put you up with this? Maybe this is a macroscopic attempt to
>> table partitioning?
>>
>>> Then I have developed a C function using libpq which creates 4 threads,
>>> each one which query a 1/4 of the db. After that I merge the results in one
>>> single response.
>>>
>>> My function works fine, but I need to include it inside a postgresql
>>> instance in order to launch it as a normal SQL query (SELECT myfunc(...);).
>>> I have read the documentation here
>>> http://www.postgresql.org/docs/9.0/static/xfunc-sql.html#XFUNC-SQL-FUNCTIONS-RETURNING-SET,
>>> but I'm pretty confuse and I don't know if this is the right way to
>>> accomplish my goal.
>>
>> What are you confused about? That's a chapter about set-returning
>> functions written in SQL - perhaps you're looking for information about
>> set-returning functions from an external library like yours? Perhaps you're
>> looking for: http://www.postgresql.org/docs/9.0/interactive/xfunc-c.html
>>
>> You say you don't know if this is the right way to accomplish your goal -
>> which is...?
>>
>>> Have you got any ideas or suggestions?
>>
>> As others have suggested, you should probably have a look at
>> table-partitioning, possibly in combination with tablespaces if you want to
>> divide your database among multiple disks/filesystems. Putting them in
>> different servers on the same hardware is probably not going to be a very
>> good solution. As you've already found out, it makes querying the data silly
>> difficult. But, we don't know the reason you're doing that of course.
>>
>> Alban Hertroys
>>
>> --
>> If you can't see the forest for the trees,
>> cut the trees and you'll see there is no forest.
>>
>>
>> !DSPAM:1234,4d567a2a11731320518513!
>>
>>
>
>
> --
> Alessandro Candini
>
> MEEO S.r.l.
> Via Saragat 9
> I-44122 Ferrara, Italy
> Tel: +39 0532 1861501
> Fax: +39 0532 1861637
> http://www.meeo.it
>
> ========================================
> "ATTENZIONE:le informazioni contenute in questo messaggio sono
> da considerarsi confidenziali ed il loro utilizzo č riservato unicamente
> al destinatario sopra indicato. Chi dovesse ricevere questo messaggio
> per errore č tenuto ad informare il mittente ed a rimuoverlo
> definitivamente da ogni supporto elettronico o cartaceo."
>
> "WARNING:This message contains confidential and/or proprietary
> information which may be subject to privilege or immunity and which
> is intended for use of its addressee only. Should you receive this
> message in error, you are kindly requested to inform the sender and
> to definitively remove it from any paper or electronic format."
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Could it be that your single instance PostgreSQL seems slow because a
query can only use one process(or), and in your case you could be
maxing out the bandwidth of a single core. And the multi-instance
implementation is "faster" simply because of simultaneous queries
using one core per query and hence more bandwidth in total.

Now depending on the situation you are trying to solve, the simple
test you've present may not perform very well in a real situation for
example.
1)If you intend to perform single client, single user, single query
then you may want to formulating a series of queries each different
from another by some value (range) in its where clause then run these
queries in parallel then merge the results somehow. I think this setup
will still be faster than the multi-instance setup.

2)If such large queries will be run simultaneous hence the use of
several CPU cores, the single instance will again most likely
outperform the multi-instance setup.

Try simulating your situation my writing some threaded client and
seeing the performance.


Allan.

Re: Multithreaded query onto 4 postgresql instances

From
Alessandro Candini
Date:
> On 14 Feb 2011, at 9:38, Alessandro Candini wrote:
>
>> I performed tests with a query returning more or less 100000 records and using my C module I obtain the following
results(every test performed cleaning cache before): 
>> - single db: 9.555 sec
>> - splitted in 4: 5.496 sec
> Is that a single query on that one DB compared to 4 queries on 4 DB's? How does a single DB with 4 parallel queries
perform?I'd expect that to win from 4 DB's, due to the overhead those extra DB instances are generating. 
> I do suppose you tried tuning that DB before you started making things complicated?
>

Maybe my configuration and test is not clear

Single instance ---> 600 millions of records in one single db (port
5433) ---> query of 100000 records ---> 9.555 sec

Splitted instances:
600 millions of records in total splitted into 4 postgresql instances
(port 5433, 5434, 5435, 5436), let's say more or less:

5433 ---> 150 millions of records
5434 ---> 150 millions of records
5435 ---> 150 millions of records
5436 ---> 150 millions of records

I've launched the same query reported above with 4 concurrent threads
using a C program with libpq, which merges the results also,
returning the same ordered 100000 tuples retrieved in the single db case
but in 5.496 sec.
>> So I think this can be a good approach...
>> I have already read this
>>
>> http://www.postgresql.org/docs/9.0/interactive/xfunc-c.html
> Did you read all the way to section 35.9.10? That explains how to create SRF's like yours, including examples. If the
stepfrom simple functions to SRF's is too large for you, create a few sample-functions to learn how the intermediary
stepswork. We can't upload the info to your brain, after all - you'll have to do the learning part by yourself. 
>
Do you think is a good idea continue using libpq or should I abandon
them and focus on postgresql documentation examples?
>> I posted it in my previous thread. But it is not clear to me how to embed my C function into postgresql.
>> I mean, I know how to compile and insert it into postgresql, I've already done it for simple function which return a
boolean,for example. 
>> But it is not clear to me how to do this if I want as return a complete table, or set of rows...
>> Can you give me a minimalistic example?
> I'm not aware of any difference between adding a function returning a scalar vs. one that returns a set. You have to
createa wrapper function to your library so that PG knows about your function, but apart from the return types (BOOLEAN
vs.SETOF something), that declaration is exactly the same. 
>
>> By the way, my goal is to perform a SELECT query in the smallest time possible.
>> For that reason an SQL function is not good beacuse slower than a C function...
> I have no idea what you're referring to, since you're top-posting, but I never suggested to use SQL functions. I just
pointedout that the document you linked was about SQL functions instead of C functions, in the assumption that was what
wascausing your confusion. You didn't explain what you were confused about after all, I had to guess. 
>
You're right, I copy/pasted the wrong link, sorry.
>>> On 10 Feb 2011, at 9:01, Alessandro Candini wrote:
>>>
>>>> I have installed 4 different instances of postgresql-9.0.2 on the same machine, on ports 5433, 5434, 5435, 5436.
>>> I do hope you intend to put those databases on different machines eventually, or some such? Otherwise you probably
didn'tgain anything by splitting your database up like that - you've just reduced the available resources on that
singlemachine. 
>>>
>>>> Why I have to do something tricky like this is long too explain...
>>> It would help to know what you're trying to achieve by splitting your database up like this. We don't need the full
story,just a summary is fine; Maybe this is some experimental setup that's more related to multi-threading than to the
actualdatabase design? Maybe management smoked something outlandish and put you up with this? Maybe this is a
macroscopicattempt to table partitioning? 
>>>
>>>> Then I have developed a C function using libpq which creates 4 threads, each one which query a 1/4 of the db.
Afterthat I merge the results in one single response. 
>>>>
>>>> My function works fine, but I need to include it inside a postgresql instance in order to launch it as a normal
SQLquery (SELECT myfunc(...);). 
>>>> I have read the documentation here
http://www.postgresql.org/docs/9.0/static/xfunc-sql.html#XFUNC-SQL-FUNCTIONS-RETURNING-SET,but I'm pretty confuse and I
don'tknow if this is the right way to accomplish my goal. 
>>> What are you confused about? That's a chapter about set-returning functions written in SQL - perhaps you're looking
forinformation about set-returning functions from an external library like yours? Perhaps you're looking for:
http://www.postgresql.org/docs/9.0/interactive/xfunc-c.html
>>>
>>> You say you don't know if this is the right way to accomplish your goal - which is...?
>>>
>>>> Have you got any ideas or suggestions?
>>> As others have suggested, you should probably have a look at table-partitioning, possibly in combination with
tablespacesif you want to divide your database among multiple disks/filesystems. Putting them in different servers on
thesame hardware is probably not going to be a very good solution. As you've already found out, it makes querying the
datasilly difficult. But, we don't know the reason you're doing that of course. 
>>>
>>> Alban Hertroys
>>>
>>> --
>>> If you can't see the forest for the trees,
>>> cut the trees and you'll see there is no forest.
>>>
>>>
>>>
>>>
>>>
>>
>> --
>> Alessandro Candini
>>
>> MEEO S.r.l.
>> Via Saragat 9
>> I-44122 Ferrara, Italy
>> Tel: +39 0532 1861501
>> Fax: +39 0532 1861637
>> http://www.meeo.it
>>
>> ========================================
>> "ATTENZIONE:le informazioni contenute in questo messaggio sono
>> da considerarsi confidenziali ed il loro utilizzo è riservato unicamente
>> al destinatario sopra indicato. Chi dovesse ricevere questo messaggio
>> per errore è tenuto ad informare il mittente ed a rimuoverlo
>> definitivamente da ogni supporto elettronico o cartaceo."
>>
>> "WARNING:This message contains confidential and/or proprietary
>> information which may be subject to privilege or immunity and which
>> is intended for use of its addressee only. Should you receive this
>> message in error, you are kindly requested to inform the sender and
>> to definitively remove it from any paper or electronic format."
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
> Alban Hertroys
>
> --
> Screwing up is an excellent way to attach something to the ceiling.
>
>
> !DSPAM:1234,4d59820a11731774212429!
>
>


--
Alessandro Candini

MEEO S.r.l.
Via Saragat 9
I-44122 Ferrara, Italy
Tel: +39 0532 1861501
Fax: +39 0532 1861637
http://www.meeo.it

========================================
"ATTENZIONE:le informazioni contenute in questo messaggio sono
da considerarsi confidenziali ed il loro utilizzo è riservato unicamente
al destinatario sopra indicato. Chi dovesse ricevere questo messaggio
per errore è tenuto ad informare il mittente ed a rimuoverlo
definitivamente da ogni supporto elettronico o cartaceo."

"WARNING:This message contains confidential and/or proprietary
information which may be subject to privilege or immunity and which
is intended for use of its addressee only. Should you receive this
message in error, you are kindly requested to inform the sender and
to definitively remove it from any paper or electronic format."


Re: Multithreaded query onto 4 postgresql instances

From
Alessandro Candini
Date:
Il 14/02/2011 21:00, Allan Kamau ha scritto:
> On Mon, Feb 14, 2011 at 10:38 AM, Alessandro Candini<candini@meeo.it>  wrote:
>> No, this database is on a single machine, but a very powerful one.
>> Processors with 16 cores each and ssd disks.
>>
>> I already use partitioning and tablespaces for every instance of my db and I
>> gain a lot with my splitted configuration.
>> My db is pretty huge: 600 milions of records and partitioning is not
>> enough...
>> I performed tests with a query returning more or less 100000 records and
>> using my C module I obtain the following results (every test performed
>> cleaning cache before):
>> - single db: 9.555 sec
>> - splitted in 4: 5.496 sec
>>
>> So I think this can be a good approach...
>> I have already read this
>>
>> http://www.postgresql.org/docs/9.0/interactive/xfunc-c.html
>>
>> I posted it in my previous thread. But it is not clear to me how to embed my
>> C function into postgresql.
>> I mean, I know how to compile and insert it into postgresql, I've already
>> done it for simple function which return a boolean, for example.
>> But it is not clear to me how to do this if I want as return a complete
>> table, or set of rows...
>> Can you give me a minimalistic example?
>>
>> By the way, my goal is to perform a SELECT query in the smallest time
>> possible.
>> For that reason an SQL function is not good beacuse slower than a C
>> function...
>>
>> Thank you!
>>
>>> On 10 Feb 2011, at 9:01, Alessandro Candini wrote:
>>>
>>>> I have installed 4 different instances of postgresql-9.0.2 on the same
>>>> machine, on ports 5433, 5434, 5435, 5436.
>>> I do hope you intend to put those databases on different machines
>>> eventually, or some such? Otherwise you probably didn't gain anything by
>>> splitting your database up like that - you've just reduced the available
>>> resources on that single machine.
>>>
>>>> Why I have to do something tricky like this is long too explain...
>>> It would help to know what you're trying to achieve by splitting your
>>> database up like this. We don't need the full story, just a summary is fine;
>>> Maybe this is some experimental setup that's more related to multi-threading
>>> than to the actual database design? Maybe management smoked something
>>> outlandish and put you up with this? Maybe this is a macroscopic attempt to
>>> table partitioning?
>>>
>>>> Then I have developed a C function using libpq which creates 4 threads,
>>>> each one which query a 1/4 of the db. After that I merge the results in one
>>>> single response.
>>>>
>>>> My function works fine, but I need to include it inside a postgresql
>>>> instance in order to launch it as a normal SQL query (SELECT myfunc(...);).
>>>> I have read the documentation here
>>>> http://www.postgresql.org/docs/9.0/static/xfunc-sql.html#XFUNC-SQL-FUNCTIONS-RETURNING-SET,
>>>> but I'm pretty confuse and I don't know if this is the right way to
>>>> accomplish my goal.
>>> What are you confused about? That's a chapter about set-returning
>>> functions written in SQL - perhaps you're looking for information about
>>> set-returning functions from an external library like yours? Perhaps you're
>>> looking for: http://www.postgresql.org/docs/9.0/interactive/xfunc-c.html
>>>
>>> You say you don't know if this is the right way to accomplish your goal -
>>> which is...?
>>>
>>>> Have you got any ideas or suggestions?
>>> As others have suggested, you should probably have a look at
>>> table-partitioning, possibly in combination with tablespaces if you want to
>>> divide your database among multiple disks/filesystems. Putting them in
>>> different servers on the same hardware is probably not going to be a very
>>> good solution. As you've already found out, it makes querying the data silly
>>> difficult. But, we don't know the reason you're doing that of course.
>>>
>>> Alban Hertroys
>>>
>>> --
>>> If you can't see the forest for the trees,
>>> cut the trees and you'll see there is no forest.
>>>
>>>
>>> !DSPAM:1234,4d567a2a11731320518513!
>>>
>>>
>>
>> --
>> Alessandro Candini
>>
>> MEEO S.r.l.
>> Via Saragat 9
>> I-44122 Ferrara, Italy
>> Tel: +39 0532 1861501
>> Fax: +39 0532 1861637
>> http://www.meeo.it
>>
>> ========================================
>> "ATTENZIONE:le informazioni contenute in questo messaggio sono
>> da considerarsi confidenziali ed il loro utilizzo č riservato unicamente
>> al destinatario sopra indicato. Chi dovesse ricevere questo messaggio
>> per errore č tenuto ad informare il mittente ed a rimuoverlo
>> definitivamente da ogni supporto elettronico o cartaceo."
>>
>> "WARNING:This message contains confidential and/or proprietary
>> information which may be subject to privilege or immunity and which
>> is intended for use of its addressee only. Should you receive this
>> message in error, you are kindly requested to inform the sender and
>> to definitively remove it from any paper or electronic format."
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
> Could it be that your single instance PostgreSQL seems slow because a
> query can only use one process(or), and in your case you could be
> maxing out the bandwidth of a single core. And the multi-instance
> implementation is "faster" simply because of simultaneous queries
> using one core per query and hence more bandwidth in total.
>
> Now depending on the situation you are trying to solve, the simple
> test you've present may not perform very well in a real situation for
> example.
> 1)If you intend to perform single client, single user, single query
> then you may want to formulating a series of queries each different
> from another by some value (range) in its where clause then run these
> queries in parallel then merge the results somehow. I think this setup
> will still be faster than the multi-instance setup.
>
> 2)If such large queries will be run simultaneous hence the use of
> several CPU cores, the single instance will again most likely
> outperform the multi-instance setup.
>
> Try simulating your situation my writing some threaded client and
> seeing the performance.
>
>
> Allan.
Writing a threaded client?!
But it is exactly what I did!

Is there a postgresql forum? Using a mailing list is so frustrating...

--
Alessandro Candini

MEEO S.r.l.
Via Saragat 9
I-44122 Ferrara, Italy
Tel: +39 0532 1861501
Fax: +39 0532 1861637
http://www.meeo.it

========================================
"ATTENZIONE:le informazioni contenute in questo messaggio sono
da considerarsi confidenziali ed il loro utilizzo è riservato unicamente
al destinatario sopra indicato. Chi dovesse ricevere questo messaggio
per errore è tenuto ad informare il mittente ed a rimuoverlo
definitivamente da ogni supporto elettronico o cartaceo."

"WARNING:This message contains confidential and/or proprietary
information which may be subject to privilege or immunity and which
is intended for use of its addressee only. Should you receive this
message in error, you are kindly requested to inform the sender and
to definitively remove it from any paper or electronic format."


Re: Multithreaded query onto 4 postgresql instances

From
Christian Ullrich
Date:
* Alessandro Candini wrote:

> Il 14/02/2011 21:00, Allan Kamau ha scritto:
>> On Mon, Feb 14, 2011 at 10:38 AM, Alessandro Candini<candini@meeo.it>   wrote:
>>> No, this database is on a single machine, but a very powerful one.
>>> Processors with 16 cores each and ssd disks.
>>>
>>> I already use partitioning and tablespaces for every instance of my db and I
>>> gain a lot with my splitted configuration.
>>> My db is pretty huge: 600 milions of records and partitioning is not
>>> enough...
>>> I performed tests with a query returning more or less 100000 records and
>>> using my C module I obtain the following results (every test performed
>>> cleaning cache before):
>>> - single db: 9.555 sec
>>> - splitted in 4: 5.496 sec

So your problem is that one query, which is executed by a single backend
process, is too slow. You fixed that by spreading the data across four
database clusters on the same machine, querying them in parallel and
merging the results in the client.

I think you may have thought too far out of the box here. What is the
performance if you leave all the data in a single database, suitably
partitioned, then open multiple connections to that same database and
run as many queries as you need to query each partition at the same
time? As others here, I can hardly imagine that a setup involving four
postmasters with one active backend each can be faster than having a
single postmaster with, say, four active backends.

It's true that PostgreSQL cannot parallelize a single query. But it is
very good at running multiple queries in parallel, especially if you can
limit I/O and lock contention by matching queries to partitions.

--
Christian


Re: Multithreaded query onto 4 postgresql instances

From
Alban Hertroys
Date:
On 15 Feb 2011, at 9:32, Alessandro Candini wrote:

>> Is that a single query on that one DB compared to 4 queries on 4 DB's? How does a single DB with 4 parallel queries
perform?I'd expect that to win from 4 DB's, due to the overhead those extra DB instances are generating. 
>
> Maybe my configuration and test is not clear

It is clear. I gave you another suggestion for something to try instead.

> Splitted instances:
> 600 millions of records in total splitted into 4 postgresql instances (port 5433, 5434, 5435, 5436), let's say more
orless: 
>
> 5433 ---> 150 millions of records
> 5434 ---> 150 millions of records
> 5435 ---> 150 millions of records
> 5436 ---> 150 millions of records

Try the above on a single DB using 4 threads. It will very probably perform better.
To use your example:
5432 ---> 150 million records
5432 ---> 150 million records
5432 ---> 150 million records
5432 ---> 150 million records

>> Did you read all the way to section 35.9.10? That explains how to create SRF's like yours, including examples. If
thestep from simple functions to SRF's is too large for you, create a few sample-functions to learn how the
intermediarysteps work. We can't upload the info to your brain, after all - you'll have to do the learning part by
yourself.
>>
> Do you think is a good idea continue using libpq or should I abandon them and focus on postgresql documentation
examples?

To learn how to write functions like these? Best to use a simple case so it's clear what's going on. If you try to
alteralready complicated code for that purpose you'll probably just get more confused. 

Alban Hertroys

--
Screwing up is an excellent way to attach something to the ceiling.


!DSPAM:737,4d5ac6b711731056715104!



Re: Multithreaded query onto 4 postgresql instances

From
Alessandro Candini
Date:
Il 15/02/2011 19:32, Alban Hertroys ha scritto:
> On 15 Feb 2011, at 9:32, Alessandro Candini wrote:
>
>>> Is that a single query on that one DB compared to 4 queries on 4 DB's? How does a single DB with 4 parallel queries
perform?I'd expect that to win from 4 DB's, due to the overhead those extra DB instances are generating. 
>> Maybe my configuration and test is not clear
> It is clear. I gave you another suggestion for something to try instead.
>
>> Splitted instances:
>> 600 millions of records in total splitted into 4 postgresql instances (port 5433, 5434, 5435, 5436), let's say more
orless: 
>>
>> 5433 --->  150 millions of records
>> 5434 --->  150 millions of records
>> 5435 --->  150 millions of records
>> 5436 --->  150 millions of records
> Try the above on a single DB using 4 threads. It will very probably perform better.
> To use your example:
> 5432 --->  150 million records
> 5432 --->  150 million records
> 5432 --->  150 million records
> 5432 --->  150 million records
>

Excuse me but query must to be performed on the whole db...with your
approach, how to merge results of every query in one single response?

>>> Did you read all the way to section 35.9.10? That explains how to create SRF's like yours, including examples. If
thestep from simple functions to SRF's is too large for you, create a few sample-functions to learn how the
intermediarysteps work. We can't upload the info to your brain, after all - you'll have to do the learning part by
yourself.
>>>
>> Do you think is a good idea continue using libpq or should I abandon them and focus on postgresql documentation
examples?
> To learn how to write functions like these? Best to use a simple case so it's clear what's going on. If you try to
alteralready complicated code for that purpose you'll probably just get more confused. 
>
> Alban Hertroys
>
> --
> Screwing up is an excellent way to attach something to the ceiling.
>
>
> !DSPAM:1234,4d5ac6b211738438191676!
>
>


--
Alessandro Candini

MEEO S.r.l.
Via Saragat 9
I-44122 Ferrara, Italy
Tel: +39 0532 1861501
Fax: +39 0532 1861637
http://www.meeo.it

========================================
"ATTENZIONE:le informazioni contenute in questo messaggio sono
da considerarsi confidenziali ed il loro utilizzo è riservato unicamente
al destinatario sopra indicato. Chi dovesse ricevere questo messaggio
per errore è tenuto ad informare il mittente ed a rimuoverlo
definitivamente da ogni supporto elettronico o cartaceo."

"WARNING:This message contains confidential and/or proprietary
information which may be subject to privilege or immunity and which
is intended for use of its addressee only. Should you receive this
message in error, you are kindly requested to inform the sender and
to definitively remove it from any paper or electronic format."


Re: Multithreaded query onto 4 postgresql instances

From
Alban Hertroys
Date:
On 16 Feb 2011, at 9:54, Alessandro Candini wrote:
>> Try the above on a single DB using 4 threads. It will very probably perform better.
>> To use your example:
>> 5432 --->  150 million records
>> 5432 --->  150 million records
>> 5432 --->  150 million records
>> 5432 --->  150 million records
>>
>
> Excuse me but query must to be performed on the whole db...with your approach, how to merge results of every query in
onesingle response? 


You have several options there. You can limit the query results by adding WHERE-clauses restricting each query to a
particularset of data (partial indices may help there). Or you can split the database across multiple schema's,
emulatingthe multi-database setup you have now. Or you can partition your tables (across multiple tablespaces would
probablybe a good idea). Plenty of options there. 

Alban Hertroys

--
Screwing up is an excellent way to attach something to the ceiling.


!DSPAM:737,4d5c16fc11737633677592!