Go Back   Rhinocerus > Newsgroup > Newsgroup comp.soft-sys.sas

Reply
 
Thread Tools Display Modes
  #1 (permalink)  
Old 06-06-2007, 07:45 PM
data _null_;
Guest
 
Posts: n/a
Default Re: PROC FREQ--DATA STEP--MODELING QUESTION

proc summary data=<whatever> nway missing;
class id code;
output out=work.counts(drop=_type_);
run;
proc freq data=work.counts;
tables ID * CODE/ noprint outpct out=stats;
weight _freq_ / zeros;
run;

I don't know if this will help. You may have the same problem with
PROC SUMMARY. How many levels of ID and CODE do you have.

Many levels crossed with many levels equals, a bunch. :-)



On 6/6/07, Tom White <tw2@mail.com> wrote:
> Hi Muthia,
>
> I know Matthew's solution seems the simplest.
>
> But no one has yet responded:
>
> How can I get this PROC FREQ
>
> proc FREQ data = foo;
> tables ID * CODE/ noprint outpct out=TEST1;
> run;
>
>
> to run on my real data set of 10 million records?
>
> The above code and Matthew's code do different things because his code counts ONLY inside the ID group,
> whereas mine does the cross tabulation ID * CODE.
>
> Both PROC FRECs (Matthew's and mine) are important for my problem. Except that mine does not run.
>
> The only way I suspect for my PROC FREQ to run is to convert it into a data step. But how?
>
> Thank you.
>
> tom
>
>
> ----- Original Message -----
> From: "Muthia Kachirayan"
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: Re: PROC FREQ--DATA STEP--MODELING QUESTION
> Date: Wed, 6 Jun 2007 15:13:30 -0400
>
>
> Tom,
>
> Use of array might be little difficult for you. The solution given by
> Matthew is far simpler. In the new output using the code
>
> proc sort data=foo; by ID; run;
> proc freq data=foo;
> tables CODE/ noprint out=TEST2;
> by ID;
> run;
>
> your PCT_Row is the PERCENT in the output. To get your Percent, you need to
> add COUNT for all IDs having non-blank CODE. Let it be K. Then divide each
> COUNT by K and multiply by 100. Your PCT_Col is 100 always.
>
> Regards,
>
> Muthia Kachirayan
>
>
> On 6/6/07, Tom White wrote:
> >
> > Yes, it looks like I may have to do some kind of merging.
> >
> > Here is what I don't understand with the PROC FREQ.
> >
> > Please use the FOO data set as an example.
> >
> >
> > When I run my version of PROC FREQ (which does not run on my real data
> > set, WHY?????),
> >
> > proc FREQ data = foo;
> > tables ID * CODE/ noprint outpct out=TEST1;
> > run;
> >
> >
> > and when I run your version of PROC FREQ (which does run on my real data
> > set)
> >
> > proc sort data=foo; by ID; run;
> > proc freq data=foo;
> > tables CODE/ noprint out=TEST2;
> > by ID;
> > run;
> >
> > (1)
> > I don't get the same results for the Percent of Total Frequency (PERCENT).
> >
> > Clearly, then, the two procedures divide by different numbers.
> >
> > (2)
> > Also, in my version of PROC FREQ, how is the Percent of Row Frequency
> > (PCT_ROW) calculated?
> >
> > (3)
> > Is there a way to run my version of PROC FREQ on my real data set without
> > getting an error message?
> > Maybe some kind of a data step instead?
> >
> > I don't know--is this a difficult problem fro SAS-L gurus?
> >
> > Thank you.
> >
> > tom
> >
> >
> > ----- Original Message -----
> > From: "Zack, Matthew M. (CDC/CCHP/NCCDPHP)"
> > To: "Tom White"
> > Subject: RE: PROC FREQ--DATA STEP--MODELING QUESTION
> > Date: Wed, 6 Jun 2007 14:10:42 -0400
> >
> >
> > Since PROC FREQ will write the variables, ID and CODE, to an output SAS
> > data set, you can
> > merge this summarized data set back with your original data by these two
> > variables.
> >
> > Matthew Zack
> >
> > -----Original Message-----
> > From: Tom White [mailto:tw2@mail.com]
> > Sent: Wednesday, June 06, 2007 2:06 PM
> > To: Zack, Matthew M. (CDC/CCHP/NCCDPHP)
> > Cc: sas-l@listserv.uga.edu
> > Subject: RE: PROC FREQ--DATA STEP--MODELING QUESTION
> >
> > Thank you for your suggestion Matthew.
> >
> > I will try your attached recommendation.
> >
> > But, I don't think I can use PROC FREQ because this procedure rolls up
> > my records by ID.
> >
> > Think of the FOO data set as you see it. Then imagine other columns
> > beyond the ones I show in FOO.
> >
> > Suppose a few of these other colunns represent SALARY, AGE, WEIGHT,
> > EDUCATION_LEVEL, etc.
> >
> > Now, by running PROC FREQ as shown in your attachmnet, it will roll up
> > by observation by ID.
> >
> > In the process, what will happen to the other fields?
> >
> > Let's say, as an example, that in my real data set I have 153 obs of
> > ID=1271. Suppose that 32 of these obs have missing CODE, and the
> > remaining of the non-missing obs (=153-32 obs) take on various values of
> > CODE as shown in FOO.
> >
> > Now, by running PROC FREQ, what will happen to the other fields lke
> > SALARY, etc. I certainly don't want to roll them up the way PROC FREQ
> > does, yet, at the same time, I do want to have the counts and
> > percentages for the CODE field (by ID) as PROC FREQ shows.
> >
> > That's my problem.
> >
> > Maybe first I run PROC FREQ, then I do some kind of a merge between the
> > table the PROC FREQ produces and my original table.
> >
> > Maybe that way I will preserve my original data in its present form, yet
> > be able to have the results of PROC FREQ on CODE by ID in there as well.
> > But yet I do want to create counts and percentages and so on for my CODE
> > variable. This is a very important predictor variable and as such, we
> > want to create more predictive variables out of it like the ones
> > produced by PROC FREQ.
> >
> > Do you see my problem now?
> >
> > Any suggestions?
> >
> > Thank you.
> >
> > tom
> >
> >
> > ----- Original Message -----
> > From: "Zack, Matthew M. (CDC/CCHP/NCCDPHP)"
> > To: "Tom White"
> > Subject: RE: PROC FREQ--DATA STEP--MODELING QUESTION
> > Date: Wed, 6 Jun 2007 13:26:24 -0400
> >
> >
> > To get part of the first table, sort by ID, and tabulate the
> > CODEs
> > within each ID (cf., attachment).
> > The output data set contains the variables--ID, CODE, COUNT, and
> > PERCENT--where PERCENT corresponds
> > to your variable, PCT_ROW.
> >
> > In your table 1, I don't see any worth for the variable,
> > PCT_COL, which
> > always has the value, 100.
> > In your table 2, the value of your variable, PERCENT, equals the
> > percentage of the total number
> > of observations across all IDs and non-missing codes
> > [=(1/16)*100% =
> > 6.25%], which again has little
> > worth, because it too always has the value 6.25.
> >
> > Matthew Zack
> >
> > -----Original Message-----
> > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On
> > Behalf Of Tom
> > White
> > Sent: Wednesday, June 06, 2007 11:13 AM
> > To: SAS-L@LISTSERV.UGA.EDU
> > Subject: Re: PROC FREQ--DATA STEP--MODELING QUESTION
> >
> > Well, it looks like I can'r even get PROC FREQ to create the
> > firts
> > table, as the error message below indicates. The second table is
> > needed
> > because that's how my real data set looks like. When I apply
> > PROC FREQ
> > to the second table (well, if anyone helps me with PROC FREQ),
> > then I
> > get the firts table. PROC FREQ rolls up the rows in the second
> > table and
> > turns them into the raws in the first table. So, for my
> > purposes, I need
> > to create additianl predictive fields using counts and
> > percentages for
> > the CODE variable. This CODE variable is avery important
> > variable from a
> > business perspective, so we thought that we should use it to
> > generate
> > more predictive fields out of it like counts and percentages. I
> > thought
> > that if I could even genereate table (1) then maybe it would be
> > easier
> > to genearte table (2). Any help, anyone? Thank you. tom
> >
> > ----- Original Message -----
> > From: "Zack, Matthew M. (CDC/CCHP/NCCDPHP)"
> > To: "Tom White"
> > Subject: RE: PROC FREQ--DATA STEP--MODELING QUESTION
> > Date: Wed, 6 Jun 2007 07:57:17 -0400
> >
> >
> > Why do you need your second table to run PROC LOGISTIC?
> >
> > Matthew Zack
> >
> > -----Original Message-----
> > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On
> > Behalf Of
> > Tom
> > White
> > Sent: Tuesday, June 05, 2007 6:27 PM
> > To: SAS-L@LISTSERV.UGA.EDU
> > Subject: PROC FREQ--DATA STEP--MODELING QUESTION
> >
> > Hello SAS-L,
> >
> > The data below are already sorted (or can be sorted if need be)
> > by ID
> > (first) and CODE (second).
> >
> > The data set of interest contains about 10 mil records and about
> > 100
> > fields.
> >
> > In this example, I show 20 obs and two fields of interest.
> >
> >
> > data foo;
> > input ID $ CODE $;
> > cards;
> > 1271 .
> > 1271 201
> > 1435 .
> > 1435 842
> > 1435 842
> > 1435 307
> > 1435 307
> > 1435 307
> > 1435 309
> > 1435 .
> > 1434 .
> > 8393 070
> > 8393 070
> > 8393 070
> > 8393 070
> > 8393 070
> > 8393 070
> > 8393 070
> > 8393 070
> > 8393 070
> > ;
> > run;
> >
> >
> > I would like to produce two datasets like:
> >
> > (1)
> >
> > ID CODE COUNT PERCENT PCT_ROW PCT_COL
> > 1271 1
> > 1271 201 1 6.25 100 100
> > 1435 3
> > 1435 842 2 12.5 33.33 100
> > 1435 307 3 18.75 50 100
> > 1435 309 1 6.25 16.67 100
> > 8393 070 9 56.25 100 100
> >
> > This data set is easily created by using;
> >
> > proc freq data=foo;
> > tables ID * CODE/outpct out=stats;
> > run;
> >
> >
> > However, when I run this PROC FREQ on the entire dataset of 10
> > mil
> > obs,
> > I get a message error
> >
> >
> > ERROR: The requested table is too large to process.
> > NOTE: The SAS System stopped processing this step because of
> > errors.
> > NOTE: There were 10154176 observations read from the data set
> > WORK.FOO.
> > WARNING: The data set WORK.STATS may be incomplete. When this
> > step
> > was
> > stopped there were 0
> > observations and 6 variables.
> > WARNING: Data set WORK.STATS was not replaced because this step
> > was
> > stopped.
> > NOTE: PROCEDURE FREQ used:
> > real time 54.70 seconds
> > cpu time 53.28 seconds
> >
> >
> >
> > (2)
> >
> > The other table I would like to get is
> >
> > ID CODE COUNT PERCENT PCT_ROW PCT_COL
> > 1271
> > 1271 201 1 6.25 100 100
> > 1435 1
> > 1435 842 1 6.25 16.665 50
> > 1435 842 1 6.25 16.665 50
> > 1435 307 1 6.25 16.667 33.33
> > 1435 307 1 6.25 16.667 33.33
> > 1435 307 1 6.25 16.667 33.33
> > 1435 309 1 6.25 16.67 100
> > 1435 1
> > 1434 1
> > 8393 070 1 6.25 11.11 11.11
> > 8393 070 1 6.25 11.11 11.11
> > 8393 070 1 6.25 11.11 11.11
> > 8393 070 1 6.25 11.11 11.11
> > 8393 070 1 6.25 11.11 11.11
> > 8393 070 1 6.25 11.11 11.11
> > 8393 070 1 6.25 11.11 11.11
> > 8393 070 1 6.25 11.11 11.11
> > 8393 070 1 6.25 11.11 11.11
> >
> > This table is simply the same one as table (1) except that it is
> > not
> > rolled-up like table (1) is.
> >
> > For example, in (1) I show,
> >
> > ID CODE COUNT PERCENT PCT_ROW PCT_COL
> > 1435 3
> > 1435 842 2 12.5 33.33 100
> > 1435 307 3 18.75 50 100
> > 1435 309 1 6.25 16.67 100
> >
> > That's 9 instances of ID 1435 as shown in tbale (2) above--not
> > rolled-up.
> >
> > So I am thinking, since I will need table (2) at some point for
> > modeling, I need to keep (2) in it's original form and not like
> > in
> > form
> > (1) coming out of PROC FREQ. yet, I still need to create the new
> > modeling variables COUNT, PERCENT, PCT_ROW, and maybe PCT_COL as
> > shown
> > in (1) and populate these corect values in the form of table
> > (2).
> >
> > These new variables in (2) will become inputs to my logistic
> > model I
> > am
> > trying to build.
> >
> > So, then, since in (1) PERCENT=12.5 for ID=1435 having COUNT=2,
> > then,
> > I
> > am thinking, maybe wrongly, that if I divide 12.5 by 2, I should
> > put
> > 6.25 in above table (2).
> >
> > And so on with all the rest of the numbers, I just divide
> > numbers in
> > (1) by however many counts (COUNT) I have, and hopefully I get
> > something
> > like table (2).
> >
> > Please give me some guidance as to how to create table (1) and
> > then,
> > for
> > modeling purposes, how can I keep the original data intact, yet
> > create
> > new modeling fields like counts and percentages as shown above
> > in
> > (2).
> >
> > Thank you.
> >
> > tom
> >
> >
> >
> >
> > =
> > Pedometers as Low as $1 - Free Shipping
> > Huge Selection of Quality Brands Like, Yamax, Sportline,
> > Freestyle,
> > and
> > More. Customer Logos, Free Shipping. Fast Delivery.
> >
> >
> > http://a8-asy.a8ww.net/a8-ads/adftrc...f8bac8e902c6af
> > baf95c7b7930
> >
> > --
> > Get a free http://www.mail.com account & e-mail address today!
> > Choose from over 100 personalized domains.
> > << freqprb2.sas >>
> > << freqprb2.lst >>
> >
> >
> >
> > --
> > Get a free mail.com account
> > erm=070601&utm_content=textlink&utm_campaign=100_p ersonalized_domains>
> > & e-mail address today!
> > Choose from over 100 personalized domains.
> >
> >
> > --
> > Get a free http://www.mail.com account & e-mail address today!
> > Choose from over 100 personalized domains.
> >

>
>
> --
> Get a free http://www.mail.com account & e-mail address today!
> Choose from over 100 personalized domains.
>

Reply With Quote
Alt Today
Advertising
 
and become member of Rhinocerus
Standard Sponsored Links

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Question about efficient data extraction Muthia Kachirayan Newsgroup comp.soft-sys.sas 0 06-05-2008 12:13 AM
Re: Why use a view? data _null_, Newsgroup comp.soft-sys.sas 0 02-21-2008 02:30 PM
Re: Automatically outputing the results of a PROC FREQ Long, Stuart Newsgroup comp.soft-sys.sas 0 11-21-2007 05:36 PM
Re: FW: PROC FREQ--DATA STEP--MODELING QUESTION: PROC FREQ memory Tom White Newsgroup comp.soft-sys.sas 0 06-07-2007 05:14 PM
Re: Data step questions Ian Whitlock Newsgroup comp.soft-sys.sas 1 09-09-2006 10:37 AM



All times are GMT. The time now is 01:45 PM.


Copyright ©2009

LinkBacks Enabled by vBSEO 3.3.0 RC2 © 2009, Crawlability, Inc.