Re: Outer join with a MIN constraint
First I would replace a.ID<>b.ID with a.ID^=b.ID (or use an equivalent
'not equal' operator). The operator '<>' has a different meaning
(officially) in SAS.
Next, I'd replace 'GROUP BY b.ID' with 'GROUP BY a.country, a.year'.
Since you are joining the tables on country and year, all rows in a that
do not match a row in b on country and year will have a missing diff
value (because b.size will have a missing value). You won't have a
meaningful value of diff to summarize.
You may still find more than one company in a matching a company in b,
and more than one company in b matching a company in a. The query is
joining the tables on country and year. You are constraining the yield
of the join by limiting the solution to minimum differences in size
within each country and year group. Any reason why a company ID in a
might not match on country and year to more than one company ID in b?
The HAVING condition constrains the solution after the LEFT JOIN
executes and may eliminate some of the companies. Consider an
alternative LEFT outer join strategy. Replace LEFT join with INNER join.
Execute the query. Find all companies in a that do not match any
companies in b on country and year. Select rows with those company ID's
and UNION them to the INNER join solution. That will give you what you
are expecting from a LEFT outer join.
Finding matches in a database where key linkage fails turns out to be a
difficult problem. You have just scratched the surface of fuzzy and
From: email@example.com [firstname.lastname@example.org]
On Behalf Of Claudia
Sent: Tuesday, September 06, 2005 10:20 AM
Subject: Outer join with a MIN constraint
I have 2 different sub-samples of companies:
Dataset A includes:
- Cie ID
Dataset B includes:
- Cie ID
What I need:
Create a "matching command" so that FOR EVERY COMPANY in sample A, I
will try to find a matching company from sample B (if the match is
possible). This match will be made on 3 possible variables: 1. Year 2.
Country 3. Size
The first 2 criteria need to be set as equal for datasets A and B (the 2
companies MUST be from the same year and country). If there is still
more than 1 possible match after these 2 conditions are met, then the
match should be made with the company in sample B that is the closest in
size (in dollars) to the company in A.
I should therefore obtain only ONE match (or none) for each observation
Here is what I came up with:
Proc sql feedback;
Create table Match as
Select *, abs(a.size - b.size) as diff
from a left join b
on a.country=b.country and
a.year = b.year and
group by b.ID
However, I notice at least 2 problems with my command:
1. It matches some companies in A with more than 1 company in B
2. It doesn't return all the companies in A that didn't have a
matched company in B (I need those non-matched observations as well...
that's why I used an outer join)
Anybody can help?
|Thread||Thread Starter||Forum||Replies||Last Post|
|Re: proc sql outer join with more than two tables||toby dunn||Newsgroup comp.soft-sys.sas||0||01-10-2009 07:34 PM|
|Creating an Inner Join with Outer Join-Style Syntax||Rune Runnest°||Newsgroup comp.soft-sys.sas||2||12-19-2005 05:27 PM|
|Re: Creating an Inner Join with Outer Join-Style Syntax||Sigurd Hermansen||Newsgroup comp.soft-sys.sas||0||12-19-2005 02:57 PM|
|Re: Outer join with a MIN constraint||nospam@HOWLES.COM (Howard Schreier||Newsgroup comp.soft-sys.sas||1||09-06-2005 05:41 PM|
|Re: Outer join using SAS datastep||Crawford, Peter1||Newsgroup comp.soft-sys.sas||0||09-06-2005 02:34 PM|