Go Back   Rhinocerus > Newsgroup > Newsgroup comp.soft-sys.sas

Reply
 
Thread Tools Display Modes
  #1 (permalink)  
Old 05-05-2005, 05:01 PM
nospam@HOWLES.COM (Howard Schreier
Guest
 
Posts: n/a
Default Re: a conditional selection problem with cap

You've posted two addenda without quoting the earlier messages, which makes
things fragmented and hard to follow.

I suggest re-stating the problem. Also, why not provide a starting point by
attempting to code the solution, in psuedocode if not in SAS?

On Wed, 4 May 2005 16:30:28 -0700, Yiyu <shenyiyu@GMAIL.COM> wrote:

>Hi, here is a complicated problem in selecting observations:
>The data set is like this:
>
>Date Stock Signal Porfolio
>950501 A 0 0
>950501 B 1 1
>950501 C 0 0
>950501 D 1 1
>950501 E 0 0
>
>950502 A 1 1
>950502 B 1 1
>950502 C 1 0
>950502 D 1 1
>950502 E 0 0
>
>950503 A 1 1
>950503 B 1 1
>950503 C 1 0
>950503 D 1 1
>950503 E 1 0
>
>950504 A 0 0
>950504 B 1 1
>950504 C 0 0
>950504 D 0 0
>950504 E 1 1
>...................................
>...................................
>
>I want to build a portfolio which is essentially composed of all the
>Stocks with signal 1, but with a condition: the number of stocks in
>each date can't exceed a certain maximum, for example, 3, and more
>stocks with singal 1 can't be added to the portfolio until there's some
>vacant slots in the portfolio. So you can see at 950501, the portfolio
>has B and D in it, but at 950502, even though both A and C also become
>1, I can't put both of them into portfolio, instead I randomly select
>one stock into portfolio, suppose A is randomly selected. Now the
>portfolio is full, I don't add anthing until there's vacant in
>portfolio again. So at 950503 even though E also has a singal of 1 it
>cannot be added into the portfolio. At 950504 A and D are dropped out
>of the potfolio due to the singal 0, now since E still has a singal 1
>it's added into the portfolio.
>
>Could you please tell me how to implement this algorithm in SAS? Thank
>you very much!

Reply With Quote
Alt Today
Advertising
 
and become member of Rhinocerus
Standard Sponsored Links

  #2 (permalink)  
Old 05-05-2005, 10:14 PM
Richard A. DeVenezia
Guest
 
Posts: n/a
Default Re: Re: a conditional selection problem with cap

"Howard Schreier <hs AT dc-sug DOT org>" wrote:
> You've posted two addenda without quoting the earlier messages, which
> makes things fragmented and hard to follow.
>
> I suggest re-stating the problem. Also, why not provide a starting
> point by attempting to code the solution, in psuedocode if not in SAS?
>
> On Wed, 4 May 2005 16:30:28 -0700, Yiyu <shenyiyu@GMAIL.COM> wrote:
>
>> Hi, here is a complicated problem in selecting observations:
>> The data set is like this:
>>
>> Date Stock Signal Porfolio
>> 950501 A 0 0
>> 950501 B 1 1
>> 950501 C 0 0
>> 950501 D 1 1
>> 950501 E 0 0
>>
>> 950502 A 1 1
>> 950502 B 1 1
>> 950502 C 1 0
>> 950502 D 1 1
>> 950502 E 0 0
>>
>> 950503 A 1 1
>> 950503 B 1 1
>> 950503 C 1 0
>> 950503 D 1 1
>> 950503 E 1 0
>>
>> 950504 A 0 0
>> 950504 B 1 1
>> 950504 C 0 0
>> 950504 D 0 0
>> 950504 E 1 1
>> ...................................
>> ...................................
>>
>> I want to build a portfolio which is essentially composed of all the
>> Stocks with signal 1, but with a condition: the number of stocks in
>> each date can't exceed a certain maximum, for example, 3, and more
>> stocks with singal 1 can't be added to the portfolio until there's
>> some vacant slots in the portfolio. So you can see at 950501, the
>> portfolio has B and D in it, but at 950502, even though both A and C
>> also become 1, I can't put both of them into portfolio, instead I
>> randomly select one stock into portfolio, suppose A is randomly
>> selected. Now the portfolio is full, I don't add anthing until
>> there's vacant in portfolio again. So at 950503 even though E also
>> has a singal of 1 it cannot be added into the portfolio. At 950504 A
>> and D are dropped out of the potfolio due to the singal 0, now since
>> E still has a singal 1 it's added into the portfolio.
>>
>> Could you please tell me how to implement this algorithm in SAS?


Based on the criteria quoted by Howard I see this as a non-trivial task, not
amenable to a single pass solution in SQL.

The sample below, based on a generic interpretaion of the problem attacks it
using a slew of atypical constructs. (Slew = 2)

One. A double DOW loop
Two. Dual hashes

The solution amends the requirement by storing in the pool variable the
'age' of the item in the pool as it persists over groups.
A more fair algorithm might consider a time series of the signal as the
criteria for pool worthiness. At present, only the most current signal is
the only criteria for worthiness. (But some may stipulate that too heavy a
reliance on too long a time series can create unfair cliques of 'old boy'
networks -- also, whilst the aging active pool members may be gruntled, the
disgruntled items are those that missed the boat early on, and are
exhibiting equally long strings of signals as those in the boat)

-----------------------------
/*
* Problem statement:
* 0. A point has three attributes; itemIdentifier, signal and
groupIdentifier
* 1. Points with identical groupIdentifiers form a Group
* 2. Groups are ordered by groupIdentifier
* 3. A point is eligible for inclusion in an 'active pool' if its signal is
1
* 3b. A point is removed from the active pool if its signal is 0
* 4. The active pool persists across group iteration
* 4b. The active pool has a limit
* 5. Items in the active pool have preference over items newly eligible
(seniority)
* 6. When there are more items eligible than slots available in the active
pool
* then the eligible items enter the active pool by random selection via
k/n
*/

* this sample uses hash to perform lookup;
* if data is sensibly sized integers,
* then arrays and direct addressing could be used and would be faster;

%let seed = 0DEAFx;
%let nGroups = 1000;
%let nItems = 25;
%let POOLSIZE = 12;

data groupedItems;
do group = 1 to &nGroups;
do item = 1 to &nitems;
signal = rannor(&seed) > 0;
output;
end;
end;
run;

data itemsWithPoolIndicator ;

if _n_ = 1 then do;
declare hash AP(); * Active pool;
AP.defineKey ('item');
AP.defineData('age');
AP.defineDone();

declare hash C(); * Pool Candidates;
C.defineKey ('item');
C.defineData ('item');
C.defineDone();

declare hiter hi;
end;

* fill the groups candidate pool ;

do until (last.group);
set groupedItems (where=(not missing(item)));
by group;

* determine state;
_isActive = (AP.find() = 0);

* add items into candidate pool or remove from active pool;
if ^_isActive and signal then C.replace();
else
if _isActive and ^signal then AP.remove();
else
if _isActive and signal then AP.replace(key:item,data:age+1);
end;

* move random candidates into available pool slots;

_N = C.num_items;
_K = &POOLSIZE - AP.num_items;

if _K > 0 then do;
age = 1;
hi = _new_ hiter('C');
do while (hi.next()=0 and _N > 0);
if ranuni (&SEED) < _K / _N then do;
_K = _K - 1;
AP.add();
end;
_N = _N - 1;
end;
hi.delete();
end;

* mark items as in active pool;
do until (last.group);
set groupedItems (where=(not missing(item)));
by group;
if AP.find()=0 then pool=age;else pool=0;
OUTPUT;
end;

* clear the candidate hash;
*--- this loopage will be C.CLEAR() in v9.2;
hi = _new_ hiter('C');
_prior = .;
do while (hi.next()=0);
_rc = C.remove (key:_prior); * assignment prevents error message on
first iteration;
_prior=item;
end;
_rc = C.remove (key:_prior);
hi.delete();
*--- this loopage will be C.CLEAR() in v9.2;

drop _: ;
run;
-----------------------------

Richard A. DeVenezia -- Learn how to customize SAS Explorer
http://www.devenezia.com/downloads/sas/actions/




Reply With Quote
  #3 (permalink)  
Old 05-06-2005, 06:54 AM
Yiyu
Guest
 
Posts: n/a
Default Re: a conditional selection problem with cap

Thanks Richard.

But unfortunately the object dot syntax is only available in V 9.0,
while my SAS version is 8.2 Is there any other way to do it?

Thanks!
Yiyu

Reply With Quote
  #4 (permalink)  
Old 05-06-2005, 02:54 PM
Richard A. DeVenezia
Guest
 
Posts: n/a
Default Re: a conditional selection problem with cap

Yiyu wrote:
> Thanks Richard.
>
> But unfortunately the object dot syntax is only available in V 9.0,
> while my SAS version is 8.2 Is there any other way to do it?


Yiyu:

You can preprocess the data and assign your own numeric itemId to each item.
When the item identifier is a reasonably sized numeric, then arrays can be
used in place of hashing.

Friday puzzler:
This sample is correct except for one if statement, where the logic is
inverted. If you can fix the inversion you will have your answer *and* you
will understand the program quite well

---------------------------
/*
* 1. The are groups, sequentially ordered
* 2. A group contains items
* 3. In each group an item is eligible for inclusion in an 'active pool'
* if its signal attribute is 1
* 4. The active pool has a limit
* 5. Items active in previous group have preference over items first
* establishing eligibility (seniority if you will)
* 6. When the number of items eligible for entry into the active pool
* would cause the pool to exceed its limits, the items that enter
* the pool are randomly selected via k/n
*/

* this sample uses hash to perform lookup;
* if data is sensibly sized integers,
* then arrays and direct addressing could be used and would be faster;

%let seed = 0DEAFx;
%let nGroups = 1000;
%let nItems = 1000;
%let POOLSIZE = 500;

data groups;
do group = 1 to &nGroups;
output;
end;
run;

proc sql;
create table items (item char(6)) ;
create unique index item on items(item) ;
quit;

%let ranChar = byte(rank('A')+26*ranuni(&SEED));

data items(index=(item/unique));
do _n_ = 1 to &nItems;
item = &ranChar. || &ranChar. || &ranChar.
|| &ranChar. || &ranChar. || &ranChar.
;
output;
end;
stop;
run;

data groupedItems;
set groups;
do _i = 1 to _n;
set items nobs=_n point=_i;
signal = rannor(&SEED) > 0;
output;
end;
run;

* step 1. - create a mapping from a item to an itemId presuming there is no
prior knowledge of item values. If you have an apriori table of allowed item
values then work of that instead of the raw data ;

proc sort data=groupedItems(keep=item) nodupkey out=myItems;
by item;
run;

* use this if you have an apriori table;
* data myItems/view=myItems;
* set MASTER.ITEMS;
* run;

data myItemsFormat;
set myItems(rename=item=start);
retain fmtname '@TOID';
label+1;
run;

proc format cntlin=myItemsFormat;
run;

* step 2. - measure the data (or use your own apriori set limits);

proc sql noprint;
select count(*) into :nItems from myItems;
quit;

* step 3. - determine active pool at each group state
* (using arrays)
* presumes an item appears only once in a group
*;

data itemsWithPoolIndicator ;

retain poolsize 0;

array AP[&nItems] _temporary_;
array CP[&nItems] _temporary_;

do _i = 1 to dim(CP);
CP[_i] = 0;
end;

* fill the candidate pool;

candsize = 0;

do until (last.group);
set groupedItems (where=(not missing(item)));
by group;

itemId = input(item,TOID.);

* determine state;
_isActive = AP[itemId];

* add items into candidate pool or remove from active pool;
if ^_isActive and signal then do; CP[itemId]=1; candsize+ 1; end;
else
if _isActive and ^signal then do; AP[itemId]=0; poolsize+-1; end;
else
if _isActive and signal then do; AP[itemId]+1; end; * this is where a
dup item in a group would hurt the age accuracy;
end;

* move random candidates into available pool slots;

_N = candsize;
_K = &POOLSIZE - poolsize;

if _K > 0 then do;
do itemId = 1 to dim (CP) while (_N > 0);
if CP[itemId] then continue;

if ranuni (&SEED) < _K / _N then do;
_K = _K - 1;
AP[itemId]=1;
poolsize + 1;
end;
_N = _N - 1;
end;
end;

* mark item as in active pool;
do until (last.group);
set groupedItems (where=(not missing(item)));
by group;

itemId = input(item,TOID.);

pool = AP[itemId];

OUTPUT;
end;

drop _: itemId ;
run;
---------------------------

The 'level-up' in this type of problem is to use bit coding to track an
items membership in a set. Such tracking is only feasible if the pool is to
have no age memory. If you want to track age bitwise, then each item
requires N bits where the max age is known to be <= 2**N

--
Richard A. DeVenezia -- Learn how to customize SAS Explorer
http://www.devenezia.com/downloads/sas/actions/


Reply With Quote
  #5 (permalink)  
Old 05-06-2005, 03:03 PM
Yiyu
Guest
 
Posts: n/a
Default Re: a conditional selection problem with cap

Thanks Richard, seems it will be a good passtime for the weekend

Reply With Quote
  #6 (permalink)  
Old 05-08-2005, 10:24 PM
Yiyu
Guest
 
Posts: n/a
Default Re: a conditional selection problem with cap

Richard, I believe the inversion is here:

' if CP[itemId] then continue; '

Instead it should be if ^CP[itemId] then continue;

Correct?

Also what's the meaning of 'poolsize+-'? why both + and -?

Using the data created in your sample the code performs as expected.
But when I run it on my real data set, which has 3 million obs, the
behavior is quite strange. More specifically, 'group' here is date,
from 19600101 to 20031231, from around 1960 to 1963 the result is as
expected, but after 1963 the eligible obs selected into pool is fewer
and few, and after 1990 there 's almost only 1 obs in pool for each
day. This is obviously not correct since for each date after 1990 there
are at least 300-500 obs with hold = 1 and it's supposed for many days
pool = 100. I attach the code here:

libname yiyu '/sastemp3/yiyu';

* 1. The are groups, sequentially ordered
* 2. A group contains items
* 3. In each group an item is eligible for inclusion in an 'active
pool'
* if its signal attribute is 1
* 4. The active pool has a limit
* 5. Items active in previous group have preference over items first
* establishing eligibility (seniority if you will)
* 6. When the number of items eligible for entry into the active pool
* would cause the pool to exceed its limits, the items that enter
* the pool are randomly selected via k/n

%let seed = 0DEAFx;
%let nItems = 20000;
%let POOLSIZE = 100;

* there are about 16000 stocks, I want to build a portfolio with max
number of 100;
* permno is stock's id;
* variable Hold indicates the stock should be portfolio if no
restricitons ;


data yiyu.test99;
set yiyu.max_3;
keep date permno hold ini;
run;


proc sort data=yiyu.test99(keep=permno) nodupkey out=yiyu.myItems;
by permno;
run;

data yiyu.myItems;
set yiyu.myItems;
itemID = _N_; * create key for each stock;
run;

proc sort data=yiyu.test99;
by permno;
run;

proc sort data=yiyu.myItems;
by permno;
run;

data yiyu.test99;
merge yiyu.test99 yiyu.myItems;
by permno;
run;


* step 3. - determine active pool at each group state
* (using arrays)
* presumes an item appears only once in a group
*;

proc sort data=yiyu.test99;
by date permno;
run;

proc sort data= yiyu.test99;
by date;
run;

data yiyu.PoolIndicator ;

retain poolsize 0;

array AP[&nItems] _temporary_; * Active Pool;
array CP[&nItems] _temporary_; * Candidate Pool;

do _i = 1 to dim(CP);
CP[_i] = 0;
end;

* fill the candidate pool;

candsize = 0;

do until (last.date);
set yiyu.test99 ;
by date;


* determine state;
_isActive = AP[itemId];

* add items into candidate pool or remove from active pool;

if ^_isActive and hold then

do; CP[itemId]=1; candsize+ 1; end;
else
if _isActive and ^hold then do; AP[itemId]=0; poolsize+-1; end;
else
if _isActive and hold then do; AP[itemId]+1; end; * this is where
a
* dup item in a group would hurt the age accuracy;
end;

* move random candidates into available pool slots;

_N = candsize;
_K = &POOLSIZE - poolsize;

if _K > 0 then do;
do itemId = 1 to dim (CP) while (_N > 0);
if ^CP[itemId] then continue;

if ranuni(12345) < _K / _N then do;
_K = _K - 1;
AP[itemId]=1;
poolsize + 1;
end;
_N = _N - 1;
end;
end;

* mark item as in active pool;
do until (last.date);
set yiyu.test99 ;
by date;


pool = AP[itemId];
c_pool = CP[itemId];

OUTPUT;
end;

run;

Thanks a lot!
Yiyu

Reply With Quote
  #7 (permalink)  
Old 05-08-2005, 10:35 PM
Yiyu
Guest
 
Posts: n/a
Default Re: a conditional selection problem with cap

For code of core data step 'yiyu.PoolIndicator' I changed nothing but
the name of variables, and used merge instead of Proc Format to create
indices, it's really weird why the code performs correctly on simulated
data sets but not on my own data set.

Reply With Quote
  #8 (permalink)  
Old 05-09-2005, 01:43 AM
Richard A. DeVenezia
Guest
 
Posts: n/a
Default Re: a conditional selection problem with cap

Yiyu wrote:

> Instead it should be if ^CP[itemId] then continue;
> Correct?


Yes, correct.

> For code of core data step 'yiyu.PoolIndicator' I changed nothing but
> the name of variables, and used merge instead of Proc Format to create
> indices, it's really weird why the code performs correctly on
> simulated data sets but not on my own data set.



sort,data,sort,sort,merge,sort,sort is extraneous. A sort,data,sort would
work equally well, and you can check for dup permno within date.

proc sort data=yiyu.test99;
by permno date;
run;

data yiyu.test99;
set data yiyu.test99 end=end;
by permno date;
if first.permno then itemId + 1;
if first.date and not last.date then put permno= ' appears more than once
in ' date= ;
if end then call symput ('nItems', put(itemId,12.-L));
run;


proc sort data=yiyu.test99;
by date permno;
run;


Discussions of k/n in SAS go back at least ten years. Look for Dr. John
Whittigon threads. The Sep 15-24, 1999 thread was illuminating.
As for the problem, it is likely the slight miscoding of the k/n algorithm
is impacting you. The test ranuni() < k/n should be a <=, as in

if ranuni(&seed) <= _K/_N then do;
.....
end;

Let us know if this seems to fix the problem.

--
Richard A. DeVenezia


Reply With Quote
  #9 (permalink)  
Old 05-09-2005, 02:09 PM
Yiyu
Guest
 
Posts: n/a
Default Re: a conditional selection problem with cap

Hi, Richard, I figure out where the problem is and it's not related
with k/n, the problem is actually with the way of table-lookup. The
difference between the artificial sample in the code sample and my real
data is that in the artificial sample each group has every item, so
when we assign the key itemID to each item and look up them using
AP[itemID] we are sure we are looking at the status of exact same item.
If C is assign key 3 , then ' if ^AP[3] ' is always checking whether
C is in the pool. But in my real data each date has different groups of
permno, previous permno will often be dropped in later dates, it's
actually like this:

Date Permno Hold
19900101 10001 1
19900101 10002 0
19900101 10003 0

19900102 10001 0
19900102 10004 0
19900102 10005 1
19900102 10006 1

So following the code

_isActive = AP[itemId];

if ^_isActive

at 19900102, when ItemID = 3 we want to check if 10005 is active and
whether its Hold =1 , but actually we checked whether 10003 is active
and hold = 1, since 10003 already dropped out of sample, no wonder we
missed one obs which should have been added to pool.

And for the whole 19900102 by last.date we checked 4 obs, though we are
supposed to check 10001, 10004, 10005, 10006, we acutally checked
10001, 10002, 10003, 10004.


Since with date goes further, more earlier permno's are dropped out of
sample, no wonder the pool will be smaller and smaller, since we are
checking more and more permno's not in the group.

So instead of using if ^AP[ItemID] we actually need something like
^AP[permno], i.e. we need to directly check if the stock with the
permno is active, or hold =1, etc ..., put it another way, we need to
use permno as the key directly.

Any way to do this?

Thanks!

Yiyu

Reply With Quote
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: wierd selection problem nospam@HOWLES.COM (Howard Schreier Newsgroup comp.soft-sys.sas 0 05-15-2007 02:30 AM
Re: wierd selection problem Sigurd Hermansen Newsgroup comp.soft-sys.sas 0 05-14-2007 05:42 PM
Re: a sas problem bout regression David L Cassell Newsgroup comp.soft-sys.sas 0 04-23-2007 05:49 AM
Re: Usefulness of a text editor? Peter Crawford Newsgroup comp.soft-sys.sas 0 07-08-2005 03:22 PM
a conditional selection problem with cap Yiyu Newsgroup comp.soft-sys.sas 2 05-05-2005 12:45 AM



All times are GMT. The time now is 03:50 PM.


Copyright ©2009

LinkBacks Enabled by vBSEO 3.3.0 RC2 © 2009, Crawlability, Inc.