Go Back   Rhinocerus > Newsgroup > Newsgroup comp.soft-sys.sas

Reply
 
Thread Tools Display Modes
  #1 (permalink)  
Old 09-06-2005, 05:50 PM
Sigurd Hermansen
Guest
 
Posts: n/a
Default Re: Outer join with a MIN constraint

Claudia:
First I would replace a.ID<>b.ID with a.ID^=b.ID (or use an equivalent
'not equal' operator). The operator '<>' has a different meaning
(officially) in SAS.

Next, I'd replace 'GROUP BY b.ID' with 'GROUP BY a.country, a.year'.
Since you are joining the tables on country and year, all rows in a that
do not match a row in b on country and year will have a missing diff
value (because b.size will have a missing value). You won't have a
meaningful value of diff to summarize.

You may still find more than one company in a matching a company in b,
and more than one company in b matching a company in a. The query is
joining the tables on country and year. You are constraining the yield
of the join by limiting the solution to minimum differences in size
within each country and year group. Any reason why a company ID in a
might not match on country and year to more than one company ID in b?

The HAVING condition constrains the solution after the LEFT JOIN
executes and may eliminate some of the companies. Consider an
alternative LEFT outer join strategy. Replace LEFT join with INNER join.
Execute the query. Find all companies in a that do not match any
companies in b on country and year. Select rows with those company ID's
and UNION them to the INNER join solution. That will give you what you
are expecting from a LEFT outer join.

Finding matches in a database where key linkage fails turns out to be a
difficult problem. You have just scratched the surface of fuzzy and
probabilistic linkage.
Sig


-----Original Message-----
From: owner-sas-l@listserv.uga.edu [mailtowner-sas-l@listserv.uga.edu]
On Behalf Of Claudia
Sent: Tuesday, September 06, 2005 10:20 AM
To: sas-l@uga.edu
Subject: Outer join with a MIN constraint


Good morning,

I have 2 different sub-samples of companies:

Dataset A includes:
- Cie ID
- Year
- Country
- Size


Dataset B includes:
- Cie ID
- Year
- Country
- Size


What I need:
Create a "matching command" so that FOR EVERY COMPANY in sample A, I
will try to find a matching company from sample B (if the match is
possible). This match will be made on 3 possible variables: 1. Year 2.
Country 3. Size

The first 2 criteria need to be set as equal for datasets A and B (the 2
companies MUST be from the same year and country). If there is still
more than 1 possible match after these 2 conditions are met, then the
match should be made with the company in sample B that is the closest in
size (in dollars) to the company in A.

I should therefore obtain only ONE match (or none) for each observation
in A.


Here is what I came up with:

Proc sql feedback;
Create table Match as
Select *, abs(a.size - b.size) as diff
from a left join b
on a.country=b.country and
a.year = b.year and
a.ID<>b.ID
group by b.ID
having diff=min(diff);
quit;


However, I notice at least 2 problems with my command:
1. It matches some companies in A with more than 1 company in B
2. It doesn't return all the companies in A that didn't have a
matched company in B (I need those non-matched observations as well...
that's why I used an outer join)


Anybody can help?

Thanks
Reply With Quote
Alt Today
Advertising
 
and become member of Rhinocerus
Standard Sponsored Links

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: proc sql outer join with more than two tables toby dunn Newsgroup comp.soft-sys.sas 0 01-10-2009 07:34 PM
Creating an Inner Join with Outer Join-Style Syntax Rune Runnestø Newsgroup comp.soft-sys.sas 2 12-19-2005 05:27 PM
Re: Creating an Inner Join with Outer Join-Style Syntax Sigurd Hermansen Newsgroup comp.soft-sys.sas 0 12-19-2005 02:57 PM
Re: Outer join with a MIN constraint nospam@HOWLES.COM (Howard Schreier Newsgroup comp.soft-sys.sas 1 09-06-2005 05:41 PM
Re: Outer join using SAS datastep Crawford, Peter1 Newsgroup comp.soft-sys.sas 0 09-06-2005 02:34 PM



All times are GMT. The time now is 05:34 PM.


Copyright ©2009

LinkBacks Enabled by vBSEO 3.3.0 RC2 © 2009, Crawlability, Inc.