Thread: Subqueries - performance and use question
I need some basic advice on how to run a subquery, or if there's a better way. Let me set up a situation, and get some advice on it. This is my first post on this list, so I hope this kind of noob question is ok. Say I have a table of customers and table of accounts receivable transactions, There is a one-to-many relationship between the two (obviously a customer can have more than one purchase/transaction). I want to run a query where I pull a set of customers based on some parameter like, for sake of an example, where their status = new, and also pull the most recent billing date from the accounts receivable table. Here's what I came up with: SELECT customers.id, customers.firstname, customers.lastname, customers.phone number, (SELECT ar.billdate FROM ar WHERE customers.customerid = ar.customerid ORDER BY ar.billdate LIMIT 1) AS lastarbilldate FROM customers WHERE customers.status = 'new'; As you can see, I'm using a subquery here to get the latest billing date. My question is twofold. Is this subquery style the only way to get one record in a one-to-many relationship, or is there a way to do it with a join? Also, if this is fine, is there a way to do it that would get the same results but be faster? It may not matter on a small database, but if you've got millions of customers and transactions, a subquery can get expensive. I'm just looking for some basic direction. I hope my fake example makes sense. I'm running PostgreSQL 8.1, on a Windows 2003 server. TIA -Jeff This email is intended only for the individual or entity to which it is addressed. This email may contain information thatis privileged, confidential or otherwise protected from disclosure. Dissemination, distribution or copying of this e-mailor any attachments by anyone other than the intended recipient, or an employee or agent responsible for deliveringthe message to the intended recipient, is prohibited. If you are not the intended recipient of this message orthe employee or agent responsible for delivery of this email to the intended recipient, please notify the sender by replyingto this message and then delete it from your system. Any use, dissemination, distribution, or reproduction of thismessage by unintended recipients is strictly prohibited and may be unlawful.
try this approach: SELECT c.id, c.firstname, c.lastname, a.latest_billdate FROM customers c INNER JOIN -- or LEFT if you want the NULLs ( SELECT customer_id, max(billdate) as latest_billdate FROM ar ) a ON c.customerid = a.customerid WHERE c.status = 'new'; > -----Original Message----- > From: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Demel, Jeff > Sent: Thursday, February 01, 2007 8:08 AM > To: pgsql-general@postgresql.org > Subject: [GENERAL] Subqueries - performance and use question > > I need some basic advice on how to run a subquery, or if there's a > better way. Let me set up a situation, and get some advice > on it. This > is my first post on this list, so I hope this kind of noob question is > ok. > > Say I have a table of customers and table of accounts receivable > transactions, There is a one-to-many relationship between the two > (obviously a customer can have more than one purchase/transaction). > > I want to run a query where I pull a set of customers based on some > parameter like, for sake of an example, where their status = new, and > also pull the most recent billing date from the accounts receivable > table. > > Here's what I came up with: > > SELECT customers.id, customers.firstname, > customers.lastname, customers.phone number, > (SELECT ar.billdate FROM ar > WHERE customers.customerid = ar.customerid > ORDER BY ar.billdate LIMIT 1) > AS lastarbilldate > FROM customers > WHERE customers.status = 'new'; > > As you can see, I'm using a subquery here to get the latest billing > date. My question is twofold. Is this subquery style the only way to > get one record in a one-to-many relationship, or is there a > way to do it > with a join? Also, if this is fine, is there a way to do it > that would > get the same results but be faster? It may not matter on a small > database, but if you've got millions of customers and transactions, a > subquery can get expensive. > > I'm just looking for some basic direction. I hope my fake > example makes > sense. > > I'm running PostgreSQL 8.1, on a Windows 2003 server. > > TIA > > -Jeff > This email is intended only for the individual or entity to > which it is addressed. This email may contain information > that is privileged, confidential or otherwise protected from > disclosure. Dissemination, distribution or copying of this > e-mail or any attachments by anyone other than the intended > recipient, or an employee or agent responsible for delivering > the message to the intended recipient, is prohibited. If you > are not the intended recipient of this message or the > employee or agent responsible for delivery of this email to > the intended recipient, please notify the sender by replying > to this message and then delete it from your system. Any > use, dissemination, distribution, or reproduction of this > message by unintended recipients is strictly prohibited and > may be unlawful. > > ---------------------------(end of > broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq >
sorry, missing GROUP BY and some column naming was messed up but hopefully you get the idea: SELECT c.id, c.firstname, c.lastname, a.latest_billdate FROM customers c INNER JOIN -- or LEFT if you want the NULLs ( SELECT customerid, max(billdate) as latest_billdate FROM ar GROUP BY customerid ) a USING (customerid) WHERE c.status = 'new'; > -----Original Message----- > From: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org] On Behalf Of George Pavlov > Sent: Thursday, February 01, 2007 8:53 AM > To: Demel, Jeff; pgsql-general@postgresql.org > Subject: Re: [GENERAL] Subqueries - performance and use question > > try this approach: > > SELECT > c.id, > c.firstname, > c.lastname, > a.latest_billdate > FROM > customers c > INNER JOIN -- or LEFT if you want the NULLs > ( > SELECT > customer_id, > max(billdate) as latest_billdate > FROM > ar > ) a > ON > c.customerid = a.customerid > WHERE > c.status = 'new'; > > > > > -----Original Message----- > > From: pgsql-general-owner@postgresql.org > > [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Demel, Jeff > > Sent: Thursday, February 01, 2007 8:08 AM > > To: pgsql-general@postgresql.org > > Subject: [GENERAL] Subqueries - performance and use question > > > > I need some basic advice on how to run a subquery, or if there's a > > better way. Let me set up a situation, and get some advice > > on it. This > > is my first post on this list, so I hope this kind of noob > question is > > ok. > > > > Say I have a table of customers and table of accounts receivable > > transactions, There is a one-to-many relationship between the two > > (obviously a customer can have more than one purchase/transaction). > > > > I want to run a query where I pull a set of customers based on some > > parameter like, for sake of an example, where their status > = new, and > > also pull the most recent billing date from the accounts receivable > > table. > > > > Here's what I came up with: > > > > SELECT customers.id, customers.firstname, > > customers.lastname, customers.phone number, > > (SELECT ar.billdate FROM ar > > WHERE customers.customerid = ar.customerid > > ORDER BY ar.billdate LIMIT 1) > > AS lastarbilldate > > FROM customers > > WHERE customers.status = 'new'; > > > > As you can see, I'm using a subquery here to get the latest billing > > date. My question is twofold. Is this subquery style the > only way to > > get one record in a one-to-many relationship, or is there a > > way to do it > > with a join? Also, if this is fine, is there a way to do it > > that would > > get the same results but be faster? It may not matter on a small > > database, but if you've got millions of customers and > transactions, a > > subquery can get expensive. > > > > I'm just looking for some basic direction. I hope my fake > > example makes > > sense. > > > > I'm running PostgreSQL 8.1, on a Windows 2003 server. > > > > TIA > > > > -Jeff > > This email is intended only for the individual or entity to > > which it is addressed. This email may contain information > > that is privileged, confidential or otherwise protected from > > disclosure. Dissemination, distribution or copying of this > > e-mail or any attachments by anyone other than the intended > > recipient, or an employee or agent responsible for delivering > > the message to the intended recipient, is prohibited. If you > > are not the intended recipient of this message or the > > employee or agent responsible for delivery of this email to > > the intended recipient, please notify the sender by replying > > to this message and then delete it from your system. Any > > use, dissemination, distribution, or reproduction of this > > message by unintended recipients is strictly prohibited and > > may be unlawful. > > > > ---------------------------(end of > > broadcast)--------------------------- > > TIP 3: Have you checked our extensive FAQ? > > > > http://www.postgresql.org/docs/faq > > > > ---------------------------(end of > broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org > so that your > message can get through to the mailing list cleanly >
This works, I just need to run some checks and benchmarks. What do you guys think? SELECT c.customerid, c.regionid, c.districtid, c.first_name, c.last_name, a.latest_bill_date FROM customers c INNER JOIN (SELECT max(bill_date) AS latest_bill_date, customerid, regionid, districtid FROM ar GROUP BY customerid, regionid, districtid) a ON c.customerid = a.customerid AND c.regionid = a.regionid AND c.districtid = a.districtid WHERE c.districtid = 340 ; -----Original Message----- From: George Pavlov [mailto:gpavlov@mynewplace.com] Sent: Thursday, February 01, 2007 11:09 AM To: Demel, Jeff; pgsql-general@postgresql.org Subject: RE: [GENERAL] Subqueries - performance and use question sorry, missing GROUP BY and some column naming was messed up but hopefully you get the idea: SELECT c.id, c.firstname, c.lastname, a.latest_billdate FROM customers c INNER JOIN -- or LEFT if you want the NULLs ( SELECT customerid, max(billdate) as latest_billdate FROM ar GROUP BY customerid ) a USING (customerid) WHERE c.status = 'new'; > -----Original Message----- > From: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org] On Behalf Of George Pavlov > Sent: Thursday, February 01, 2007 8:53 AM > To: Demel, Jeff; pgsql-general@postgresql.org > Subject: Re: [GENERAL] Subqueries - performance and use question > > try this approach: > > SELECT > c.id, > c.firstname, > c.lastname, > a.latest_billdate > FROM > customers c > INNER JOIN -- or LEFT if you want the NULLs ( > SELECT > customer_id, > max(billdate) as latest_billdate > FROM > ar > ) a > ON > c.customerid = a.customerid > WHERE > c.status = 'new'; > > > > > -----Original Message----- > > From: pgsql-general-owner@postgresql.org > > [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Demel, Jeff > > Sent: Thursday, February 01, 2007 8:08 AM > > To: pgsql-general@postgresql.org > > Subject: [GENERAL] Subqueries - performance and use question > > > > I need some basic advice on how to run a subquery, or if there's a > > better way. Let me set up a situation, and get some advice > > on it. This > > is my first post on this list, so I hope this kind of noob > question is > > ok. > > > > Say I have a table of customers and table of accounts receivable > > transactions, There is a one-to-many relationship between the two > > (obviously a customer can have more than one purchase/transaction). > > > > I want to run a query where I pull a set of customers based on some > > parameter like, for sake of an example, where their status > = new, and > > also pull the most recent billing date from the accounts receivable > > table. > > > > Here's what I came up with: > > > > SELECT customers.id, customers.firstname, > > customers.lastname, customers.phone number, > > (SELECT ar.billdate FROM ar > > WHERE customers.customerid = ar.customerid > > ORDER BY ar.billdate LIMIT 1) > > AS lastarbilldate > > FROM customers > > WHERE customers.status = 'new'; > > > > As you can see, I'm using a subquery here to get the latest billing > > date. My question is twofold. Is this subquery style the > only way to > > get one record in a one-to-many relationship, or is there a > > way to do it > > with a join? Also, if this is fine, is there a way to do it > > that would > > get the same results but be faster? It may not matter on a small > > database, but if you've got millions of customers and > transactions, a > > subquery can get expensive. > > > > I'm just looking for some basic direction. I hope my fake > > example makes > > sense. > > > > I'm running PostgreSQL 8.1, on a Windows 2003 server. > > > > TIA > > > > -Jeff > > This email is intended only for the individual or entity to > > which it is addressed. This email may contain information > > that is privileged, confidential or otherwise protected from > > disclosure. Dissemination, distribution or copying of this > > e-mail or any attachments by anyone other than the intended > > recipient, or an employee or agent responsible for delivering > > the message to the intended recipient, is prohibited. If you > > are not the intended recipient of this message or the > > employee or agent responsible for delivery of this email to > > the intended recipient, please notify the sender by replying > > to this message and then delete it from your system. Any > > use, dissemination, distribution, or reproduction of this > > message by unintended recipients is strictly prohibited and > > may be unlawful. > > > > ---------------------------(end of > > broadcast)--------------------------- > > TIP 3: Have you checked our extensive FAQ? > > > > http://www.postgresql.org/docs/faq > > > > ---------------------------(end of > broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org > so that your > message can get through to the mailing list cleanly > This email is intended only for the individual or entity to which it is addressed. This email may contain information thatis privileged, confidential or otherwise protected from disclosure. Dissemination, distribution or copying of this e-mailor any attachments by anyone other than the intended recipient, or an employee or agent responsible for deliveringthe message to the intended recipient, is prohibited. If you are not the intended recipient of this message orthe employee or agent responsible for delivery of this email to the intended recipient, please notify the sender by replyingto this message and then delete it from your system. Any use, dissemination, distribution, or reproduction of thismessage by unintended recipients is strictly prohibited and may be unlawful.
"Demel, Jeff" <Jeff.Demel@JavelinDirect.com> writes: > Here's what I came up with: > SELECT customers.id, customers.firstname, > customers.lastname, customers.phone number, > (SELECT ar.billdate FROM ar > WHERE customers.customerid = ar.customerid > ORDER BY ar.billdate LIMIT 1) > AS lastarbilldate > FROM customers > WHERE customers.status = 'new'; Are you expecting a whole lot of answer rows from this query, or just a few? If just a few, this way is fine, but if a lot you probably want to try to recast it as a join. As is, it's pretty much like a nestloop join of the two tables, which is good for a few result rows and not so good for a lot. The thing you need to make it fast as a subselect is an index on (customerid, billdate) in that order. Oh, you also need PG 8.1 or later, but I see you have that. Personally I'd just write (SELECT max(billdate) FROM ar WHERE customers.customerid = ar.customerid) rather than trying to get cute with ORDER BY/LIMIT --- the planner versions that are able to handle this case decently will deal with either one about as well. regards, tom lane
I just wanted to say thanks to everyone for your help. -Jeff This email is intended only for the individual or entity to which it is addressed. This email may contain information thatis privileged, confidential or otherwise protected from disclosure. Dissemination, distribution or copying of this e-mailor any attachments by anyone other than the intended recipient, or an employee or agent responsible for deliveringthe message to the intended recipient, is prohibited. If you are not the intended recipient of this message orthe employee or agent responsible for delivery of this email to the intended recipient, please notify the sender by replyingto this message and then delete it from your system. Any use, dissemination, distribution, or reproduction of thismessage by unintended recipients is strictly prohibited and may be unlawful.