|
|||
|
It's taken a while, but I think I see the problem now.
You need to avoid double counting in your summary. Sample data: data values; input client $ product $ imports exports; cards; a Dairy 11 0 a Forest 0 22 b Dairy 33 0 ; The numerics are export/import values (expressed perhaps in thousands of NZ dollars). You want to aggregate those figures, both by product and overall. You also want to count clients, both by product and overall. Client a should be counted once under each product and once (not twice) in the overall line. Your 3-way classifications (exporter, importer, combined 2-way trader) are mutually exclusive. Client a is a Dairy importer, a Forest exporter, and a two-way trader in the overall line. Try this: proc sql; create table client_totals as select client, 'Total' as product, sum(imports) as imports, sum(exports) as exports from values group by client; The first statement builds client-by-client totals, ignoring the product stratification. Next: create table all as select * from values union corresponding all select * from client_totals; The totals have now been concatenated with the original data. Finally: create table report as select product, sum(imports) as imports, sum(exports) as exports, sum(imports and not exports) as importers, sum(exports and not imports) as exporters, sum(imports and exports) as combined from all group by product order by product='Total', product; This last step creates the classification indicators and then aggregates over clients. The code could be rolled up into 2 or even just 1 statement, but it's easier to follow as written. Result: product imports exports importers exporters combined Dairy 44 0 2 0 0 Forest 0 22 0 1 0 Total 44 22 1 0 1 On Sat, 12 Jun 2004 22:22:23 +1200, Dean Edwards <dean@DREAMNET.CO.NZ> wrote: >I have 115,000 observations and have to output a grand total on client >code (excluding any duplicates). In the code way down below if you are >interested, I did use proc sql but had to discard "distinct"which you can >read why. In the example >below client code 100056 exports in Dairy and imports in Forest. So for the >client code 100056 this is a (combined exporter and importer and has the >code) combine count is 1. > >I don't how to do it. But in population Census data you use something like a >loop statement and look at each dwelling to extract out the population or >persons for every dwelling. So I want to use a loop statement hopefully >someone can help me here unless someone has any other suggestions. >It would look like this >Data b; >set a; > by client_code; > retain client_exporters client_importers combined; > if first.client_code then do; > if client_exporters = 1 and client_importers = 1 then combined = 1; > end; > if last.client_code then output; >run; > > >So in my code I need hopefully someone can understand what I want to do is >to loop through every client code and output on one line the results. In >one example I had a client code repeat 8 times with different combinations >for product code. I am getting stuck here. IF you see the example below >client > > > >data a; > >input client product $ imports exports combine; > >cards; > >100056 Dairy 0 1 0 > >100056 Forest 1 0 0 > >100788 Other 0 0 1 > >100799 Meat 1 0 0 > >100899 Meat 1 0 1 > >100899 Dairy 1 1 1 > >run; > > Final ouput shouldlook like this I don't need product any longer for the >grand total. I want to find out how many >importers and exporters are combined for a count of 1. > >Client code combine >100056 1 >100788 1 >100799 0 > >Final ouput should look like this belowI don't need product any longer for >the >grand total. I want to find out how many >importers and exporters are combined for a count of 1. > >FINAL OUTPUT >Client code combine >100056 1 >100788 1 >100799 0 >100899 1 > > >The problem I have is in the duplicate for 100056 the client is an exporter >and then in another observation >an importer for a different product. This means they must be countede only >once as a combined (exporter/importer) and deleted as an importer only or >exporter only. You will see in >data set bbb below how I use the client code 100899 > >combined = 1 then client_exporters = 0; > if combined = 1 then client_importers = 0; > >so it looks like this originally >client client exporter client importer combined >100899 1 1 1 > >now resolved that they only get counted once as a combined importer/exporter >client client exporter client importer combined >100899 0 0 1 > but doesn't work when there are two observations like 100056 > > > >Thank you for the helpful information on duplicates for SQL. The table >below is >what I am trying to achieve. The number of clients exporting , importing, >combined totgether and total by product group. I got the results I wanted >except fot total. >At the moment I am trying to remove duplicates from the final grand total. >So instead of getting output for (2) which is incorrect. I want to get the >data for (1). I had to discard distinct because of the results. >For example the duplicate client code for 100000. I might not get a count of >1 as a combined exporter and importer. Sometimes I had up to 8 duplicates >on one client code. >obs client code product import exports combined >1 100000 Dairy 1 1 >1 >2 100000 Forestry 1 0 >0 >because it grabs at random observation 2 instead, having removed duplicate >observation 1.. > >I am still trying to achieve the goal of getting a grand total for (1) which >removes the duplicates. > >Any suggestions welcome. > > Product Group Exports Only Imports Only Exports and Imports Total >Clients > > Dairy 193 132 26 351 > Forestry Products 1,518 8,206 1,330 11,054 > Fruit/Veg/Beverages and Tobacco 1,148 2,182 671 4,001 > Meat 128 74 31 233 > Mechanical and Electrical Machinery 1,774 12,830 4,404 19,008 > Metals 1,280 7,294 1,785 10,359 > Other Manufactures 2,952 21,430 3,564 27,946 > Raw Materials 1,601 10,265 2,682 14,548 > Seafood 205 167 39 411 > Textiles and Apparel 955 7,222 1,455 9,632 > Total Other 879 5,585 779 7,243 > Total Trade (2) (wrong data) 12,633 75,387 16,766 104,786 > Total Trade (1) (correct data) 5,374 32,664 10,222 48,260 > > (2) Number of clients for Total Trade excludes any duplicates, for >clients in more than one product group. > (1) Number of clients for Total Trade includes any duplicates, for >clients in more than one product group. > > > The code below is what I have got so far to create the table above. > > > > > > proc sql; > > create table dean.trade as > > select distinct (client) as client_code, > > hsgroup, > > sum (exports) as export_totals, > > sum (imports) as import_totals, > > (not (exports = 0)) as client_exporters, > > (not (imports = 0)) as client_importers, > > (not (imports = 0 or exports = 0)) as combined > > from aaa > > group by > > client_code > > order by > > client_code > > ; > > quit; > > > > > > data bbb; > > set dean.trade; > > if combined = 1 then client_exporters = 0; > > if combined = 1 then client_importers = 0; > > run; > > > > proc sort; > > by hsgroup; > > run; > > > > proc summary data=bbb; > > var export_totals import_totals > > client_exporters client_importers combined ; > > by hsgroup; > > output out=ccc sum=; > > run; > > > > proc print; > > run; > > Number of Number of Number of > product Freq exports value imports value exporters Importers Combined >importer/exporters > > Dairy 370 $8,416,580,661 $1,402,933,531 198 144 28 > Forestry Products 12033 $19,505,864,706 $26,459,096,852 1662 8961 >1410 > Fruit/Veg/Beverages and Tobacco 4326 $18,034,496,180 $10,043,749,581 >1258 2390 678 > Meat 227 $5,983,556,031 $576,620,300 123 74 30 > Mechanical and Electrical Machinery 20825 $17,257,190,286 >$29,114,797,446 1985 14236 4604 > Metals 11502 $14,679,947,645 $26,190,770,552 1407 8226 1869 > Other Manufactures 30887 $16,598,180,333 $29,068,054,233 3279 23974 >3634 > Raw Materials 16117 $25,474,678,291 $34,187,461,189 1748 11486 2883 > Seafood 395 $2,767,599,529 $567,419,746 187 171 37 > Textiles and Apparel 10807 $12,482,236,529 $21,968,725,925 1081 8160 >1566 > Total Other 8142 $13,757,723,172 $19,305,817,968 989 6298 855 [snip] |
|
|
||||
|
||||
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Re: removing duplicates by ID available in a lookup table in data | Ken Borowiak | Newsgroup comp.soft-sys.sas | 0 | 08-23-2006 07:05 PM |
| Re: removing duplicates by ID available in a lookup table in data | Ai Hua Wang | Newsgroup comp.soft-sys.sas | 0 | 08-23-2006 12:44 PM |
| Re: removing duplicates by ID available in a lookup table in data | Arthur Tabachneck | Newsgroup comp.soft-sys.sas | 0 | 08-23-2006 08:54 AM |
| removing duplicates by ID available in a lookup table in data step | Ai Hua Wang | Newsgroup comp.soft-sys.sas | 0 | 08-23-2006 02:59 AM |
| Re: Fw: Removing duplicates in Proc SQL SAS | nospam@HOWLES.COM (Howard Schreier | Newsgroup comp.soft-sys.sas | 0 | 06-12-2006 03:00 AM |