|
|||
|
On Tue, 28 Dec 2004 10:18:15 -0800, David L. Cassell
<cassell.david@EPAMAIL.EPA.GOV> wrote: >"James, Steve" <SPJ1@CDC.GOV> wrote: >> We have a problem that I'd like your help on. We have a huge data set >> which needs to be updated daily with patient records. Each record has >a >> patient ID and date of visit. Thus, there are multiple occurrences of >> both patient ID and date of visit. What we want to do is to add the >> incoming records to the existing data and then set a flag to designate >> which records are from the most recent visit. For example: .... >As I see it, the 'most recent visit' flag is your headache. It will >prevent you from finding a fast solution. No matter how you run this, >the software will have to find the relevant ID, then check the dates, >zero out the flag, and append the new data. > >Wouldn't it be easier just to drop the flag? Presumably you're then >using it as an _a_priori_ tool with a WHERE clause or IF statement, >to select out the most recent data on each value of ID. But you don't >need the flag for that. All you need is a SQL statement that picks out >the max value of DATE for each ID. Hi, At first, I thought that I cannot agree with David more. But then, I can see one instance when having the flag variable makes sense. This is when the flag is indexed, so that one can extract the most recent records for all the patients fast. Having index on id and date will not help in extracting the most recent records for all ID's, since different patients may have different most recent date. But as Steve indicated, the problem with the flag variable is that it takes whole a lot of time to update and he has to update *daily*. Well, I do agree with David, in that the flag variable method is crude -- it has to go. But then, what are the alternative designs that will keep the fast extraction speed and will improve the update speed? An alternative seems to be: First, data structure: (1) modify the big table so that it does not have the flag column. create an index on id and date. (2) create another table called mostRecent with two columns: id and date -- one record per id. Second, business rules: (1) when append, just append new records to the big table -- sas should automatically update the index. (2) but at the same time, update the other table, mostRecent, so that the date column has the most recent data for each id. With this design, updating should be fast, since you just update the smaller, mostRecent, table. For the big table, you just insert the records and sas will automatically update the index. When you extract the most recent records for each patient, just left (right) join the mostRecent table with the big table -- it will be an eqi- join, so that sas will utilize the index and should go fast, also. Just my two cents' worth. Cheers, Chang |
|
|
||||
|
||||
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Re: Update a table on itself | Sigurd Hermansen | Newsgroup comp.soft-sys.sas | 0 | 12-11-2006 02:10 PM |
| Re: Big Table / Little Table Update | Sigurd Hermansen | Newsgroup comp.soft-sys.sas | 0 | 01-06-2005 11:51 PM |
| Re: Big Table / Little Table Update | nospam@HOWLES.COM (Howard Schreier | Newsgroup comp.soft-sys.sas | 0 | 12-29-2004 01:38 AM |
| Re: Big Table / Little Table Update | James, Steve | Newsgroup comp.soft-sys.sas | 0 | 12-28-2004 10:15 PM |
| Re: Big Table / Little Table Update | Mike Rhoads | Newsgroup comp.soft-sys.sas | 0 | 12-28-2004 07:40 PM |