|
|||
|
Hello,
Fortran 2003's new_line('a') returns a single character, namely "\n" (achar(10)). The question is now, whether this should be transformed into "\r\n" on writing on Windows systems or not. What should write(*,*) 'one'//new_line//'two' print? " one\ntwo\r\n" or " one\r\ntwo\r\n" ? Similarly for: open(11,file='form.dat',form='formatted',access='s equential',& position='rewind') write(11,'(a)') 'Record1'//new_line('a')//'Record2' close(11) open(11,file='stream.dat',form='formatted',access= 'stream',& position='rewind') write(11,'(a)') 'Record1'//new_line('a')//'Record2' close(11) g95/Windows (27 August version) writes new_line as "\n" and "\r\n" after the record for both sequential and stream formatted access. gfortran currently writes also "\n" for all, but will write "\r\n" for formatted streams soon. (Reasoning for formatted streams is given by Brooks Moses in http://gcc.gnu.org/ml/fortran/2006-09/msg00415.html ) What should be written for formatted sequential files? In principle I would assume "\r\n", but this will break the distinction between "one"//achar(13)//achar(10)//"two" and "one"//achar(10)//"two"; I don't know whether this is a problem or not. (For unformatted, this is probably a problem and thus no such translation makes sense!?) Tobias |
|
|
||||
|
||||
|
|
|
|||
|
Tobias wrote:
(snip) > What should be written for formatted sequential files? In principle I > would assume "\r\n", but this will break the distinction between > "one"//achar(13)//achar(10)//"two" and "one"//achar(10)//"two"; I don't > know whether this is a problem or not. (For unformatted, this is > probably a problem and thus no such translation makes sense!?) For C text files the conversion is done, and for binary files it isn't. As many Fortran libraries use the C library to do I/O, that may or may not matter. For achar(13)//acar(10) you would get "\r\r\n". -- glen |
|
|||
|
Tobias wrote:
> Hello, > > Fortran 2003's new_line('a') returns a single character, namely "\n" > (achar(10)). The question is now, whether this should be transformed > into "\r\n" on writing on Windows systems or not. Not. The idea that anything other than the actual characters in the I/O list be transmitted is rather odd. I can't find any justification for that in the language. > What should > write(*,*) 'one'//new_line//'two' print? > " one\ntwo\r\n" or " one\r\ntwo\r\n" ? Define "print". From the Fortran perspective, this outputs one record that happens to contain a control character (or two) within it. new_line returns achar(10) for the ASCII collating sequence, a single character. There's no justification I can think of to assume that embedded LF characters are intended to be record delimiters on platforms where LF by itself is not a record delimiter. (What would you do on VMS with its VAR-format records that don't have delimiters at all?) I understand what you're trying to do, but it goes beyond what a Fortran processor ought to do, in my opinion. I suppose that it's too bad that the standard doesn't define NEW_LINE as returning the character sequence used for delimiting "formatted stream records", if any exists, but it doesn't. Steve |
|
|||
|
Steve Lionel wrote:
> Tobias wrote: >>Fortran 2003's new_line('a') returns a single character, namely "\n" >>(achar(10)). The question is now, whether this should be transformed >>into "\r\n" on writing on Windows systems or not. > > Not. The idea that anything other than the actual characters in the I/O > list be transmitted is rather odd. I can't find any justification for > that in the language. For formatted stream I/O in the Fortran 2003 standard, it's pretty clearly there. The relevant section is 10.6.3: "If the file is connected for stream access, the output may be split across more than one record if it contains newline characters. ... Beginning with the first character of the output field, each character that is not a newline is written to the current record in successive positions; each newline character causes file positioning at that point as if by slash editing...." See also note 9.9, on page 175, which points out that due to things like CR/LF pairs, the file can have positions that do not correspond to characters written. > Define "print". From the Fortran perspective, this outputs one record > that happens to contain a control character (or two) within it. > new_line returns achar(10) for the ASCII collating sequence, a single > character. There's no justification I can think of to assume that > embedded LF characters are intended to be record delimiters on > platforms where LF by itself is not a record delimiter. (What would you > do on VMS with its VAR-format records that don't have delimiters at > all?) The paragraph I quoted from section 10.6.3 goes on to say, "... (the current record is terminated at that point, a new empty record is created following the current record, this new record becomes the last and current record of the file, and the file is positioned at the beginning of this new record)." I think that's pretty clear for what should happen on VMS-like systems. On the other hand, the is all quite clearly applied _only_ to formatted stream output, and I would agree that there hasn't been such a concept in previous Fortran standards. Thus, I would say that this sort of translation of NEWLINE() characters into record breaks (which may be \n characters, \r\n characters, or some other non-delimiter form) should be applied to formatted stream output, and should not be applied to any other forms of output. (One could, I suppose, argue that the relevant paragraph only says "may", and thus implementations are free to ignore it. I'd consider that an unfortunate choice, however.) - Brooks -- The "bmoses-nospam" address is valid; no unmunging needed. |
|
|||
|
Brooks Moses <bmoses-nospam@cits1.stanford.edu> wrote:
> On the other hand, the is all quite clearly applied _only_ to formatted > stream output, Yes. Sequential and stream have different rules here. For formatted sequential, it is processor-dependent whether control characters such as /cr and /lf are allowed at all. Thus, what happens to them is up to the processor. It could say that they are not allowed - period. It could say that allowing them is an extension that invokes special behavior (such as generating new records), or whatever. I've worked on systems where /cr, /lf, or any other character was just a normal character that had no particular relationship to record boundaries; if you wrote them in a record, you just got them in the record; hard to do that on most current systems, which is why the standard says that processors may prohibit them. For stream, the idea of the newline function is *NOT* to return whatever the processor uses to physically represent record ends. The processsor might not use control characters at all for that purpose. Rather, the idea is to return something that can be used to indicate that the user wants a new record at that point. This flag value is converted to whatever is needed to make a new record. For default character kind, the newline function is pretty pointless; it is essentially guaranteed to return achar(10). I'd have to look up the exact definition for "strange" character sets, but in the case of character sets that have a /lf character, that's what you will always get. This is basically portable. You do not have to worry about what the physical repreesentation on a particular system is. In fact, if you try to worry to much about that, you'll probably just mess things up. As noted, if you write /cr/lf on a Windows system, you'll get the wrong thing (/cr/cr/lf). Note that this is modelled directly on C text files. That is intentional, as stream was introduced as an interoperability feature. It is true that it is one of the interoperability features that will find use even in contexts outside of interoperability, but the interoperability issues were definitely design drivers. -- Richard Maine | Good judgment comes from experience; email: my first.last at org.domain| experience comes from bad judgment. org: nasa, domain: gov | -- Mark Twain |
|
|||
|
Steve Lionel wrote:
> Tobias wrote: >>Fortran 2003's new_line('a') returns a single character, namely "\n" >>(achar(10)). The question is now, whether this should be transformed >>into "\r\n" on writing on Windows systems or not. > Not. The idea that anything other than the actual characters in the I/O > list be transmitted is rather odd. I can't find any justification for > that in the language. On MVS you will get a '\n' character, maybe X'25', in the middle of the line. All 256 characters are allowed in any position on the line. I would agree, then, that one should not use '\n' when one wants a new record. I find the idea of using characters as record marks rather odd, but it seems to be popular. Given that some systems do use in-band record marks, I don't find it obvious one way or the other what the result should be when they are actually written as characters. -- glen |
|
|||
|
Richard E Maine wrote:
(snip) > I've worked on systems where /cr, /lf, or any other character was just a > normal character that had no particular relationship to record > boundaries; if you wrote them in a record, you just got them in the > record; hard to do that on most current systems, which is why the > standard says that processors may prohibit them. Having used systems like that for years before I saw a system using characters as record marks, that latter seemed very strange at first, and sometimes still does. > For stream, the idea of the newline function is *NOT* to return whatever > the processor uses to physically represent record ends. The processsor > might not use control characters at all for that purpose. Rather, the > idea is to return something that can be used to indicate that the user > wants a new record at that point. This flag value is converted to > whatever is needed to make a new record. And if one wants to write that character on systems that don't use characters as record marks? > For default character kind, the newline function is pretty pointless; it > is essentially guaranteed to return achar(10). I'd have to look up the > exact definition for "strange" character sets, but in the case of > character sets that have a /lf character, that's what you will always > get. There aren't that many character sets. EBCDIC has CR, LF, and NL (carriage return, line feed, and new line) as three separate characters. > This is basically portable. You do not have to worry about what the > physical repreesentation on a particular system is. In fact, if you try > to worry to much about that, you'll probably just mess things up. As > noted, if you write /cr/lf on a Windows system, you'll get the wrong > thing (/cr/cr/lf). > Note that this is modelled directly on C text files. That is > intentional, as stream was introduced as an interoperability feature. It > is true that it is one of the interoperability features that will find > use even in contexts outside of interoperability, but the > interoperability issues were definitely design drivers. C allows for two kinds of files, text and binary. For text files, '\n' is a record mark, as appropriate for the system in use. For binary files, it is a character and written as-is. C makes no guarantee on the value of '\n' (an integer constant), and one should not assume it is 10. Because of the possible conversion, C has restrictions on the use of fseek()/ftell() file positioning for text files. -- glen |
|
|||
|
glen herrmannsfeldt wrote:
> Richard E Maine wrote: > > (snip) > >> I've worked on systems where /cr, /lf, or any other character was just a >> normal character that had no particular relationship to record >> boundaries; if you wrote them in a record, you just got them in the >> record; hard to do that on most current systems, which is why the >> standard says that processors may prohibit them. > > > Having used systems like that for years before I saw a system using > characters as record marks, that latter seemed very strange at first, > and sometimes still does. > >> For stream, the idea of the newline function is *NOT* to return whatever >> the processor uses to physically represent record ends. The processsor >> might not use control characters at all for that purpose. Rather, the >> idea is to return something that can be used to indicate that the user >> wants a new record at that point. This flag value is converted to >> whatever is needed to make a new record. > > > And if one wants to write that character on systems that don't use > characters as record marks? > >> For default character kind, the newline function is pretty pointless; it >> is essentially guaranteed to return achar(10). I'd have to look up the >> exact definition for "strange" character sets, but in the case of >> character sets that have a /lf character, that's what you will always >> get. > > > There aren't that many character sets. EBCDIC has CR, LF, and NL > (carriage return, line feed, and new line) as three separate characters. as god intended. I always hated systems that interpreted LF as effectively CR+LF. > >> This is basically portable. You do not have to worry about what the >> physical repreesentation on a particular system is. In fact, if you try >> to worry to much about that, you'll probably just mess things up. As >> noted, if you write /cr/lf on a Windows system, you'll get the wrong >> thing (/cr/cr/lf). > > >> Note that this is modelled directly on C text files. That is >> intentional, as stream was introduced as an interoperability feature. It >> is true that it is one of the interoperability features that will find >> use even in contexts outside of interoperability, but the >> interoperability issues were definitely design drivers. > > > C allows for two kinds of files, text and binary. For text files, > '\n' is a record mark, as appropriate for the system in use. For > binary files, it is a character and written as-is. C makes no > guarantee on the value of '\n' (an integer constant), and one should > not assume it is 10. Because of the possible conversion, C has > restrictions on the use of fseek()/ftell() file positioning for > text files. > > -- glen > -- Gary Scott mailto:garylscott@sbcglobal dot net Fortran Library: http://www.fortranlib.com Support the Original G95 Project: http://www.g95.org -OR- Support the GNU GFortran Project: http://gcc.gnu.org/fortran/index.html Why are there two? God only knows. If you want to do the impossible, don't hire an expert because he knows it can't be done. -- Henry Ford |
|
|||
|
Gary Scott wrote:
(snip, I wrote) >> There aren't that many character sets. EBCDIC has CR, LF, and NL >> (carriage return, line feed, and new line) as three separate characters. > as god intended. I always hated systems that interpreted LF as > effectively CR+LF. The first terminal I ever used, and before I had even heard of ASCII, the IBM 2741, I believe uses one character for newline. The character code actually used isn't EBCDIC but one based on the printing hardware of the IBM selectric typewriter. Among others, there are characters for shift and unshift, which cause a 180 degree rotation of the type ball. It also has hardware tabs. That is, physical, mechanical tabs, not electronics pretending to be tabs. I did some Fortran programming for 2741's, though probably using hex constants for special characters like newline and tab. Programming is done in EBCDIC, and the conversion is done sometime later. -- glen |
|
|||
|
Gary Scott wrote:
(snip) > I realize out it is actually defined, but the definition of unformatted > also confuses many that think it describes a stream type of file (i.e. > most think of it as devoid of record delimiters as well). This is the > first I recall of someone refering to list directed files as > "unformatted" but I can understand that confusion as well. I think the > term "unformatted" isn't sufficiently clear. I think the OP really meant 'UNFORMATTED' in that it is presumably possible to write ACHAR(10) as a character in an unformatted file. Later posts seem to have confused things. As it is now October 15th in most of the world, and half the US, I will note that in IBM 704 Fortran, according to the manual dated October 15th, 1956, there are two types of I/O statements, described in terms of BCD and binary. "The WRITE OUTPUT TAPE statement causes the object program to write BCD information on tape unit i, where i=1, 2, ..., 10. Record after record is written in accordance with the FORMAT statement until the complete list has been written." later, "The WRITE TAPE statement causes the object program to write binary information on tape unit i, i=1, 2, ...., 10. One one record is written; its length will be that of the list. The distinction we now have between FORMATTED and UNFORMATTED was then between BCD and BINARY. BCD is a six bit code, related to BCDIC, which was later extended to eight bits as EBCDIC. -- glen |
|
|||
|
glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:
> Richard E Maine wrote: > > For stream, ... This flag value is converted to > > whatever is needed to make a new record. > > And if one wants to write that character on systems that don't use > characters as record marks? You can't do it in formatted stream. I neglected to say "formatted stream" in the above, but I should have said it. If you are wanting to output arbitrary control characters, you probably ought to be using unformatted stream, for which no such funniness occurs. -- Richard Maine | Good judgment comes from experience; email: my first.last at org.domain| experience comes from bad judgment. org: nasa, domain: gov | -- Mark Twain |
|
|||
|
On Sat, 14 Oct 2006 22:35:24 -0700, glen herrmannsfeldt
<gah@ugcs.caltech.edu> wrote: > Gary Scott wrote: > > (snip, I wrote) > > >> There aren't that many character sets. EBCDIC has CR, LF, and NL > >> (carriage return, line feed, and new line) as three separate characters. > > > as god intended. I always hated systems that interpreted LF as > > effectively CR+LF. > > The first terminal I ever used, and before I had even heard of ASCII, > the IBM 2741, I believe uses one character for newline. The character > code actually used isn't EBCDIC but one based on the printing hardware > of the IBM selectric typewriter. Among others, there are characters > for shift and unshift, which cause a 180 degree rotation of the type Yep. I believe the code was PTTC -- I saw a chart in a manual once. Multics made good(?) use of the shift/unshift codes; if your program ran more than some period of time without producing any output, which on a heavily loaded system happened all too often, it would send just shift+unshift to cause the typeball to 'twitch' without printing anything or changing the print position, to assure you the system hadn't crashed or been disconnected -- as also happened all too often. It didn't have an only-CR though. If you wanted to move to beginning of same line, you did a LOT of backspaces. The underlying Selectric(tm) mechanism did have only-LF called 'index' but I don't recall if this was available from the line/remote/computer. > ball. It also has hardware tabs. That is, physical, mechanical tabs, > not electronics pretending to be tabs. I did some Fortran programming APL also relied on these features, using a typeball with its special (Greek and math) symbols, and sending one tab to prompt for input which was conventionally in IIRC column 7 or maybe 9. > for 2741's, though probably using hex constants for special characters > like newline and tab. Programming is done in EBCDIC, and the conversion > is done sometime later. > After the program for output, and before it for input, of course. I'm pretty sure it was done in the 'communications controller' or later 'front end processor', an offloaded I/O controller. There were several models of these, but the one I remember is the 3741, because it is the one for which the (8") floppy disk was created, which had an impact hugely outweighing and outliving its origin, even more so than the 1403 printer outstripped the 1401 computer it was designed for. - David.Thompson1 at worldnet.att.net |
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|