|
|||
|
Claudia:
First I would replace a.ID<>b.ID with a.ID^=b.ID (or use an equivalent 'not equal' operator). The operator '<>' has a different meaning (officially) in SAS. Next, I'd replace 'GROUP BY b.ID' with 'GROUP BY a.country, a.year'. Since you are joining the tables on country and year, all rows in a that do not match a row in b on country and year will have a missing diff value (because b.size will have a missing value). You won't have a meaningful value of diff to summarize. You may still find more than one company in a matching a company in b, and more than one company in b matching a company in a. The query is joining the tables on country and year. You are constraining the yield of the join by limiting the solution to minimum differences in size within each country and year group. Any reason why a company ID in a might not match on country and year to more than one company ID in b? The HAVING condition constrains the solution after the LEFT JOIN executes and may eliminate some of the companies. Consider an alternative LEFT outer join strategy. Replace LEFT join with INNER join. Execute the query. Find all companies in a that do not match any companies in b on country and year. Select rows with those company ID's and UNION them to the INNER join solution. That will give you what you are expecting from a LEFT outer join. Finding matches in a database where key linkage fails turns out to be a difficult problem. You have just scratched the surface of fuzzy and probabilistic linkage. Sig -----Original Message----- From: owner-sas-l@listserv.uga.edu [mailto wner-sas-l@listserv.uga.edu]On Behalf Of Claudia Sent: Tuesday, September 06, 2005 10:20 AM To: sas-l@uga.edu Subject: Outer join with a MIN constraint Good morning, I have 2 different sub-samples of companies: Dataset A includes: - Cie ID - Year - Country - Size Dataset B includes: - Cie ID - Year - Country - Size What I need: Create a "matching command" so that FOR EVERY COMPANY in sample A, I will try to find a matching company from sample B (if the match is possible). This match will be made on 3 possible variables: 1. Year 2. Country 3. Size The first 2 criteria need to be set as equal for datasets A and B (the 2 companies MUST be from the same year and country). If there is still more than 1 possible match after these 2 conditions are met, then the match should be made with the company in sample B that is the closest in size (in dollars) to the company in A. I should therefore obtain only ONE match (or none) for each observation in A. Here is what I came up with: Proc sql feedback; Create table Match as Select *, abs(a.size - b.size) as diff from a left join b on a.country=b.country and a.year = b.year and a.ID<>b.ID group by b.ID having diff=min(diff); quit; However, I notice at least 2 problems with my command: 1. It matches some companies in A with more than 1 company in B 2. It doesn't return all the companies in A that didn't have a matched company in B (I need those non-matched observations as well... that's why I used an outer join) Anybody can help? Thanks |
|
|
||||
|
||||
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Re: proc sql outer join with more than two tables | toby dunn | Newsgroup comp.soft-sys.sas | 0 | 01-10-2009 07:34 PM |
| Creating an Inner Join with Outer Join-Style Syntax | Rune Runnestø | Newsgroup comp.soft-sys.sas | 2 | 12-19-2005 05:27 PM |
| Re: Creating an Inner Join with Outer Join-Style Syntax | Sigurd Hermansen | Newsgroup comp.soft-sys.sas | 0 | 12-19-2005 02:57 PM |
| Re: Outer join with a MIN constraint | nospam@HOWLES.COM (Howard Schreier | Newsgroup comp.soft-sys.sas | 1 | 09-06-2005 05:41 PM |
| Re: Outer join using SAS datastep | Crawford, Peter1 | Newsgroup comp.soft-sys.sas | 0 | 09-06-2005 02:34 PM |