|
|||
|
Hi guys,
I'm doing stock data research recently and came across a problem. I think there must be some simple way to solve it but since I'm quite a newbie on SAS, it's a little bit difficult to figure out the google key words on the solutions. The problem I have is: I have a dataset looks like: ---------------------------------- Date Stock Price ... 1 A 10.1 ... 2 A 10.2 ... 4 A 11.1 ... 1 B 45.1 ... 2 B 43.2 ... 3 B 42.1 ... 1 C 24.1 ... 3 C 23.2 ... 4 C 22.1 ... .... ... ---------------------------------- and you can see that there're some gaps on the data, e.g. for stock A, day 3 is missing. What I wanna do is to fill up all these gaps so I can have a data set at least looks like: ---------------------------------- Date Stock Price ... 1 A 10.1 ... 2 A 10.2 ... 3 . . ... 4 A 11.1 ... 1 B 45.1 ... 2 B 43.2 ... 3 B 42.1 ... 4 . . ... 1 C 24.1 ... 2 . . ... 3 C 23.2 ... 4 C 22.1 ... .... ... ---------------------------------- then I can fill up the "." with retain. Since I have about 300 stocks for 50 years data, this problem did drive me crazy. Do anyone of you guys has any idea on this? I'm an absolutely newbie, any suggestion would be 200% appreciated!! |
|
|
||||
|
||||
|
|
|
|||
|
How are dates actually represented? Presumably they express year, month, and day, but how?
How are weekends, holidays, and emergency closures to be handled? On Sunday, July 22, 2012 3:15:44 PM UTC-4, YANG LI wrote: > Hi guys, > > I'm doing stock data research recently and came across a problem. I think there must be some simple way to solve it but since I'm quite a newbie on SAS, it's a little bit difficult to figure out the google key words on the solutions. > > The problem I have is: > I have a dataset looks like: > ---------------------------------- > Date Stock Price ... > 1 A 10.1 ... > 2 A 10.2 ... > 4 A 11.1 ... > 1 B 45.1 ... > 2 B 43.2 ... > 3 B 42.1 ... > 1 C 24.1 ... > 3 C 23.2 ... > 4 C 22.1 ... > ... ... > ---------------------------------- > > and you can see that there're some gaps on the data, e.g. for stock A, day 3 is missing. > > What I wanna do is to fill up all these gaps so I can have a data set at least looks like: > > ---------------------------------- > Date Stock Price ... > 1 A 10.1 ... > 2 A 10.2 ... > 3 . . ... > 4 A 11.1 ... > 1 B 45.1 ... > 2 B 43.2 ... > 3 B 42.1 ... > 4 . . ... > 1 C 24.1 ... > 2 . . ... > 3 C 23.2 ... > 4 C 22.1 ... > ... ... > ---------------------------------- > > then I can fill up the "." with retain. > > Since I have about 300 stocks for 50 years data, this problem did drive me crazy. Do anyone of you guys has any idea on this? I'm an absolutely newbie, any suggestion would be 200% appreciated!! |
|
|||
|
Hi there,
Thanks for the reply. I appreciate it. Actually I'm calculating weekly returns for stocks (price of this Friday / price of last Friday), if the price of a specific Friday is missing, I willuse the last existing Friday price to substitute it. for example, for stock A, the records are: Date Stock Price 7/1/2011 A 10 <-------ASSUME IT'S FRIDAY 7/8/2011 A 10.2 <-------NEXT FRIDAY 7/15/2011 A 10.1 <-------NEXT FRIDAY 7/29/2011 A 9.9 <-------Here's a Friday missing The "date" is DMMYYYY format and I will only use Friday data. What I want to have is: Date Stock Price 7/1/2011 A 10 7/8/2011 A 10.2 7/15/2011 A 10.1 7/22/2011 A 10.1 <-------Observation Added 7/29/2011 A 9.9 The price of added observation is actually from the previous observation. I got no idea on how to add such observation, so it would be great if you can share some ideas, thanks! lol On Sunday, July 22, 2012 5:21:19 PM UTC-4, (unknown) wrote: > How are dates actually represented? Presumably they express year, month, and day, but how? > > How are weekends, holidays, and emergency closures to be handled? > > On Sunday, July 22, 2012 3:15:44 PM UTC-4, YANG LI wrote: > > Hi guys, > > > > I&#39;m doing stock data research recently and came across a problem. I think there must be some simple way to solve it but since I&#39;m quite a newbie on SAS, it&#39;s a little bit difficult to figure outthe google key words on the solutions. > > > > The problem I have is: > > I have a dataset looks like: > > ---------------------------------- > > Date Stock Price ... > > 1 A 10.1 ... > > 2 A 10.2 ... > > 4 A 11.1 ... > > 1 B 45.1 ... > > 2 B 43.2 ... > > 3 B 42.1 ... > > 1 C 24.1 ... > > 3 C 23.2 ... > > 4 C 22.1 ... > > ... ... > > ---------------------------------- > > > > and you can see that there&#39;re some gaps on the data, e.g. for stock A, day 3 is missing. > > > > What I wanna do is to fill up all these gaps so I can have a data set at least looks like: > > > > ---------------------------------- > > Date Stock Price ... > > 1 A 10.1 ... > > 2 A 10.2 ... > > 3 . . ... > > 4 A 11.1 ... > > 1 B 45.1 ... > > 2 B 43.2 ... > > 3 B 42.1 ... > > 4 . . ... > > 1 C 24.1 ... > > 2 . . ... > > 3 C 23.2 ... > > 4 C 22.1 ... > > ... ... > > ---------------------------------- > > > > then I can fill up the &quot;.&quot; with retain. > > > > Since I have about 300 stocks for 50 years data, this problem did drive me crazy. Do anyone of you guys has any idea on this? I&#39;m an absolutely newbie, any suggestion would be 200% appreciated!! |
|
|||
|
On Jul 23, 7:09*am, YANG LI <liyan...@gmail.com> wrote:
> Hi there, > > Thanks for the reply. I appreciate it. > > Actually I'm calculating weekly returns for stocks (price of this Friday / price of last Friday), if the price of a specific Friday is missing, I will use the last existing Friday price to substitute it. for example, for stock A, the records are: > > Date * * * Stock *Price > 7/1/2011 * *A * * *10 * * * * <-------ASSUME IT'S FRIDAY > 7/8/2011 * *A * * *10.2 * * * * <-------NEXT FRIDAY > 7/15/2011 * *A * * *10.1 * * * * <-------NEXT FRIDAY > 7/29/2011 * *A * * *9.9 * * * * <-------Here's a Fridaymissing > > The "date" is DMMYYYY format and I will only use Friday data. What I wantto have is: > > Date * * * Stock *Price > 7/1/2011 * *A * * *10 > 7/8/2011 * *A * * *10.2 > 7/15/2011 * *A * * *10.1 > 7/22/2011 * *A * * *10.1 * * * * * *<-------Observation Added > 7/29/2011 * *A * * *9.9 > > The price of added observation is actually from the previous observation. > > I got no idea on how to add such observation, so it would be great if youcan share some ideas, thanks! lol > > > > On Sunday, July 22, 2012 5:21:19 PM UTC-4, (unknown) wrote: > > How are dates actually represented? Presumably they express year, month, and day, but how? > > > How are weekends, holidays, and emergency closures to be handled? > > > On Sunday, July 22, 2012 3:15:44 PM UTC-4, YANG LI wrote: > > > Hi guys, > > > > > > I&#39;m doing stock data research recently and came across a problem. I think there must be some simple way to solve it but since I&#39;m quite a newbie on SAS, it&#39;s a little bit difficult to figure out the google key words on the solutions. > > > > > > The problem I have is: > > > I have a dataset looks like: > > > ---------------------------------- > > > Date * * Stock * * Price * * ... > > > 1 * * * * *A * * * *10.1 * * ... > > > 2 * * * * *A * * * *10.2 * * ... > > > 4 * * * * *A * * * *11.1 * * ... > > > 1 * * * * *B * * * *45.1 * * ... > > > 2 * * * * *B * * * *43.2 * * ... > > > 3 * * * * *B * * * *42.1 * * ... > > > 1 * * * * *C * * * *24.1 * * ... > > > 3 * * * * *C * * * *23.2 * * ... > > > 4 * * * * *C * * * *22.1 * * ... > > > ... ... > > > ---------------------------------- > > > > > > and you can see that there&#39;re some gaps on the data, e.g. for stock A, day 3 is missing. > > > > > > What I wanna do is to fill up all these gaps so I can have a data set at least looks like: > > > > > > ---------------------------------- > > > Date * * Stock * * Price * * ... > > > 1 * * * * *A * * * *10.1 * * ... > > > 2 * * * * *A * * * *10.2 * * ... > > > 3 * * * * *. * * * *. * * * *... > > > 4 * * * * *A * * * *11.1 * * ... > > > 1 * * * * *B * * * *45.1 * * ... > > > 2 * * * * *B * * * *43.2 * * ... > > > 3 * * * * *B * * * *42.1 * * ... > > > 4 * * * * *. * * * *. * * * *... > > > 1 * * * * *C * * * *24.1 * * ... > > > 2 * * * * *. * * * *. * * * *... > > > 3 * * * * *C * * * *23.2 * * ... > > > 4 * * * * *C * * * *22.1 * * ... > > > ... ... > > > ---------------------------------- > > > > > > then I can fill up the &quot;.&quot; with retain. > > > > > > Since I have about 300 stocks for 50 years data, this problem did drive me crazy. Do anyone of you guys has any idea on this? I&#39;m an absolutely newbie, any suggestion would be 200% appreciated!!- Hide quoted text - > > - Show quoted text - data have; input Date Stock $ Price; datalines; 1 A 10.1 2 A 10.2 4 A 11.1 1 B 45.1 2 B 43.2 3 B 42.1 1 C 24.1 3 C 23.2 4 C 22.1 ; proc transpose data = have out = have1; by stock; id date; var price; run; data have2 ; set have1; array _col _1 - _4 ; do over _col ; if _col = . then _col = 999 ; end; run; proc transpose data = have2 out = have3(rename = (_name_= date)); by stock; var _1 - _4 ; run; data want; set have3; if price = 999 then do ; stock = '' ; price = . ; end; date = compress(date, '_'); run; proc print data = want; var date stock price; run; Proc summary is more convinient to do it, you can try it by yourself. |
|
|||
|
Thanks! I will have a try!
On Monday, July 23, 2012 5:38:50 AM UTC-4, lchmdream wrote: > On Jul 23, 7:09*am, YANG LI <liyan...@gmail.com> wrote: > > Hi there, > > > > Thanks for the reply. I appreciate it. > > > > Actually I'm calculating weekly returns for stocks (price of this Friday / price of last Friday), if the price of a specific Friday is missing, I will use the last existing Friday price to substitute it. for example, for stock A, the records are: > > > > Date * * * Stock *Price > > 7/1/2011 * *A * * *10 * * * * <-------ASSUME IT'S FRIDAY > > 7/8/2011 * *A * * *10.2 * * * * <-------NEXT FRIDAY > > 7/15/2011 * *A * * *10.1 * * * * <-------NEXT FRIDAY > > 7/29/2011 * *A * * *9.9 * * * * <-------Here's a Friday missing > > > > The "date" is DMMYYYY format and I will only use Friday data. What I want to have is: > > > > Date * * * Stock *Price > > 7/1/2011 * *A * * *10 > > 7/8/2011 * *A * * *10.2 > > 7/15/2011 * *A * * *10.1 > > 7/22/2011 * *A * * *10.1 * * * * * *<-------Observation Added > > 7/29/2011 * *A * * *9.9 > > > > The price of added observation is actually from the previous observation. > > > > I got no idea on how to add such observation, so it would be great if you can share some ideas, thanks! lol > > > > > > > > On Sunday, July 22, 2012 5:21:19 PM UTC-4, (unknown) wrote: > > > How are dates actually represented? Presumably they express year, month, and day, but how? > > > > > How are weekends, holidays, and emergency closures to be handled? > > > > > On Sunday, July 22, 2012 3:15:44 PM UTC-4, YANG LI wrote: > > > &gt; Hi guys, > > > &gt; > > > &gt; I&amp;#39;m doing stock data research recently andcame across a problem. I think there must be some simple way to solve it but since I&amp;#39;m quite a newbie on SAS, it&amp;#39;s a little bit difficult to figure out the google key words on the solutions. > > > &gt; > > > &gt; The problem I have is: > > > &gt; I have a dataset looks like: > > > &gt; ---------------------------------- > > > &gt; Date * * Stock * * Price * * ... > > > &gt; 1 * * * * *A * * * *10.1 * * ... > > > &gt; 2 * * * * *A * * * *10.2 * * ... > > > &gt; 4 * * * * *A * * * *11.1 * * ... > > > &gt; 1 * * * * *B * * * *45.1 * * ... > > > &gt; 2 * * * * *B * * * *43.2 * * ... > > > &gt; 3 * * * * *B * * * *42.1 * * ... > > > &gt; 1 * * * * *C * * * *24.1 * * ... > > > &gt; 3 * * * * *C * * * *23.2 * * ... > > > &gt; 4 * * * * *C * * * *22.1 * * ... > > > &gt; ... ... > > > &gt; ---------------------------------- > > > &gt; > > > &gt; and you can see that there&amp;#39;re some gaps onthe data, e.g. for stock A, day 3 is missing. > > > &gt; > > > &gt; What I wanna do is to fill up all these gaps so I can have a data set at least looks like: > > > &gt; > > > &gt; ---------------------------------- > > > &gt; Date * * Stock * * Price * * ... > > > &gt; 1 * * * * *A * * * *10.1 * * ... > > > &gt; 2 * * * * *A * * * *10.2 * * ... > > > &gt; 3 * * * * *. * * * *. * * * *... > > > &gt; 4 * * * * *A * * * *11.1 * * ... > > > &gt; 1 * * * * *B * * * *45.1 * * ... > > > &gt; 2 * * * * *B * * * *43.2 * * ... > > > &gt; 3 * * * * *B * * * *42.1 * * ... > > > &gt; 4 * * * * *. * * * *. * * * *... > > > &gt; 1 * * * * *C * * * *24.1 * * ... > > > &gt; 2 * * * * *. * * * *. * * * *... > > > &gt; 3 * * * * *C * * * *23.2 * * ... > > > &gt; 4 * * * * *C * * * *22.1 * * ... > > > &gt; ... ... > > > &gt; ---------------------------------- > > > &gt; > > > &gt; then I can fill up the &amp;quot;.&amp;quot; with retain. > > > &gt; > > > &gt; Since I have about 300 stocks for 50 years data, this problem did drive me crazy. Do anyone of you guys has any idea on this? I&amp;#39;m an absolutely newbie, any suggestion would be 200% appreciated!!- Hide quoted text - > > > > - Show quoted text - > > data have; > input Date Stock $ Price; > datalines; > 1 A 10.1 > 2 A 10.2 > 4 A 11.1 > 1 B 45.1 > 2 B 43.2 > 3 B 42.1 > 1 C 24.1 > 3 C 23.2 > 4 C 22.1 > ; > > proc transpose data = have out = have1; > by stock; > id date; > var price; > run; > > data have2 ; > set have1; > array _col _1 - _4 ; > do over _col ; > if _col = . then _col = 999 ; > end; > run; > > proc transpose data = have2 out = have3(rename = (_name_= date)); > by stock; > var _1 - _4 ; > run; > > data want; > set have3; > if price = 999 then do ; > stock = '' ; price = . ; > end; > date = compress(date, '_'); > run; > > proc print data = want; > var date stock price; > run; > > > Proc summary is more convinient to do it, you can try it by yourself. |
|
|||
|
On Sunday, July 22, 2012 3:15:44 PM UTC-4, YANG LI wrote:
> Hi guys, > > I'm doing stock data research recently and came across a problem. I think there must be some simple way to solve it but since I'm quite a newbie on SAS, it's a little bit difficult to figure out the google key words on the solutions. > > The problem I have is: > I have a dataset looks like: > ---------------------------------- > Date Stock Price ... > 1 A 10.1 ... > 2 A 10.2 ... > 4 A 11.1 ... > 1 B 45.1 ... > 2 B 43.2 ... > 3 B 42.1 ... > 1 C 24.1 ... > 3 C 23.2 ... > 4 C 22.1 ... > ... ... > ---------------------------------- > > and you can see that there're some gaps on the data, e.g. for stock A, day 3 is missing. > > What I wanna do is to fill up all these gaps so I can have a data set at least looks like: > > ---------------------------------- > Date Stock Price ... > 1 A 10.1 ... > 2 A 10.2 ... > 3 . . ... > 4 A 11.1 ... > 1 B 45.1 ... > 2 B 43.2 ... > 3 B 42.1 ... > 4 . . ... > 1 C 24.1 ... > 2 . . ... > 3 C 23.2 ... > 4 C 22.1 ... > ... ... > ---------------------------------- > > then I can fill up the "." with retain. > > Since I have about 300 stocks for 50 years data, this problem did drive me crazy. Do anyone of you guys has any idea on this? I'm an absolutely newbie, any suggestion would be 200% appreciated!! I do this problem all the time on the WRDS system. It becomes even more challenging when you include dividends and accounting data. I am not going to assume your set is sorted, if it is you can ignore a bit of this. proc sort data=old_data out=new_data; by stock date; run; proc summary data=new_data min max; by stock; var date; output out=boundaries min=dmin max=dmax; run; data framework; set boundaries; by stock; do date=dmin to dmax; output; end; keep stock date; run; data complete_set; merge framework new_data; by stock date; run; |
|
|||
|
Here is one way. It uses two DATA step UPDATE statements, one to fill in missing observations and the other to do the last-obs.-carry-forward (thanks to Data Null, http://www.listserv.uga.edu/cgi-bin/...A&F=&S=&P=5140).
data have ; input Date date9. Stock $ Price ; format Date date9. ; datalines ; 27Jul2012 A 10.1 03Aug2012 A 10.2 24Aug2012 A 11.1 20Jul2012 B . 27Jul2012 B 43.2 03Aug2012 B 42.1 27Jul2012 C 24.1 ; proc sort data=have ; by stock date ; run ; proc summary data=have nway ; class stock ; var date ; output out=ends(drop = _ min=earliest max=latest ;run ; data skeleton ; set ends ; do date = earliest to latest by 7 ; output ; end ; drop earliest latest ; format Date date9. ; run ; data allobs ; update skeleton have ; by stock date ; retain constantkey 2468 ; groupfence = last.stock ; run ; data want (drop=constantkey groupfence) ; if groupfence then call missing(price) ; update allobs(obs=0 keep=constantkey) allobs ; by constantkey ; output ; run ; On Sunday, July 22, 2012 7:09:05 PM UTC-4, YANG LI wrote: > Hi there, > > Thanks for the reply. I appreciate it. > > Actually I'm calculating weekly returns for stocks (price of this Friday / price of last Friday), if the price of a specific Friday is missing, I will use the last existing Friday price to substitute it. for example, for stock A, the records are: > > Date Stock Price > 7/1/2011 A 10 <-------ASSUME IT'S FRIDAY > 7/8/2011 A 10.2 <-------NEXT FRIDAY > 7/15/2011 A 10.1 <-------NEXT FRIDAY > 7/29/2011 A 9.9 <-------Here's a Friday missing > > > The "date" is DMMYYYY format and I will only use Friday data. What I want to have is: > > Date Stock Price > 7/1/2011 A 10 > 7/8/2011 A 10.2 > 7/15/2011 A 10.1 > 7/22/2011 A 10.1 <-------Observation Added > 7/29/2011 A 9.9 > > The price of added observation is actually from the previous observation. > > I got no idea on how to add such observation, so it would be great if youcan share some ideas, thanks! lol > > > > > > On Sunday, July 22, 2012 5:21:19 PM UTC-4, (unknown) wrote: > > How are dates actually represented? Presumably they express year, month, and day, but how? > > > > How are weekends, holidays, and emergency closures to be handled? > > > > On Sunday, July 22, 2012 3:15:44 PM UTC-4, YANG LI wrote: > > &gt; Hi guys, > > &gt; > > &gt; I&amp;#39;m doing stock data research recently and cameacross a problem. I think there must be some simple way to solve it but since I&amp;#39;m quite a newbie on SAS, it&amp;#39;s a little bit difficult to figure out the google key words on the solutions. > > &gt; > > &gt; The problem I have is: > > &gt; I have a dataset looks like: > > &gt; ---------------------------------- > > &gt; Date Stock Price ... > > &gt; 1 A 10.1 ... > > &gt; 2 A 10.2 ... > > &gt; 4 A 11.1 ... > > &gt; 1 B 45.1 ... > > &gt; 2 B 43.2 ... > > &gt; 3 B 42.1 ... > > &gt; 1 C 24.1 ... > > &gt; 3 C 23.2 ... > > &gt; 4 C 22.1 ... > > &gt; ... ... > > &gt; ---------------------------------- > > &gt; > > &gt; and you can see that there&amp;#39;re some gaps on the data, e.g. for stock A, day 3 is missing. > > &gt; > > &gt; What I wanna do is to fill up all these gaps so I can have a data set at least looks like: > > &gt; > > &gt; ---------------------------------- > > &gt; Date Stock Price ... > > &gt; 1 A 10.1 ... > > &gt; 2 A 10.2 ... > > &gt; 3 . . ... > > &gt; 4 A 11.1 ... > > &gt; 1 B 45.1 ... > > &gt; 2 B 43.2 ... > > &gt; 3 B 42.1 ... > > &gt; 4 . . ... > > &gt; 1 C 24.1 ... > > &gt; 2 . . ... > > &gt; 3 C 23.2 ... > > &gt; 4 C 22.1 ... > > &gt; ... ... > > &gt; ---------------------------------- > > &gt; > > &gt; then I can fill up the &amp;quot;.&amp;quot; with retain. > > &gt; > > &gt; Since I have about 300 stocks for 50 years data, this problem did drive me crazy. Do anyone of you guys has any idea on this? I&amp;#39;m an absolutely newbie, any suggestion would be 200% appreciated!! |
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|