|
|||
|
Given the following declarations and code fragment
----- character(1), dimension(8) :: t character(1), dimension(5) :: p real, dimension(6) :: r1 real, dimension(4) :: r2 data r1/1.0, 2.0, 3.0, 4.0, 5.0, 6.0/ data r2/41.0, 42.0, 43.0, 44.0/ t = 'ABCDEFGH' p = 'abcde' [...] t(1:5) = p(1:5) t(2:6) = p t(2:6) = p( ![]() r1(1:4) = r2(1:4) r1(3:6) = r2 ----- Is is just my compiler being weird, or is there a rationale that explains why these assignments should behave differently for the character array than they do for the corresponding real array assigments? I observe that the "character" assigments always copy only the first character element from the source into _every_ given element of the target array, while in the real variable case multiple values are copied instead (not just the first value). In other words, after the above t contains 'Aaaaaa' and not [maybe] the expected 'Aabcde', while r1 does contain (as expected) /41,42,41,42,43,44/ and NOT the character array equivalent of /41,41,41,41,41,41/. FWIW, it doesn't seem to matter whether I change the declaration to some other style like one of character(1) p*5 character(1), dimension(5) :: p character(5) :: p -- -+-Ben-+- |
|
|
||||
|
||||
|
|
|
|||
|
> t = 'ABCDEFGH'
> p = 'abcde' These initialize _all_ elements of t and p to "A" and "a", respectively. This follows the rules of assigning a scalar to an array. Perhaps you wanted to say t = (/"A", "B", "C", "D", "E", "F", "G", "H"/) and similarly for p? Jan |
|
|||
|
Ben Hetland <ben.a.hetland@sintef.no> wrote:
> character(1), dimension(8) :: t > character(1), dimension(5) :: p .... > t(1:5) = p(1:5) > t(2:6) = p > t(2:6) = p( ![]() These ought to be ok and do what you think. As a wild guess, it might be that the compiler is misparsing the section subscripts as substring ranges. Hmm. Probably not that good a guess, though because then the middle one shouldn't compile. Well, in any case, the code looks ok to me. > FWIW, it doesn't seem to matter whether I change the declaration to some > other style like one of > > character(1) p*5 > character(1), dimension(5) :: p > character(5) :: p These are more than just different styles; they are different in substance. If you get these things mixed up, you will have problems. The first and third declare scalar character strings of length 5. (In the first one, the *5 overrides the (1); I don't like that style because I think it confusing, but it is allowed). The middle one declares an array of length 1 strings. The difference here matters. If you get that confused, you'll be confused by the results. -- Richard Maine | Good judgement comes from experience; email: last name at domain . net | experience comes from bad judgement. domain: summertriangle | -- Mark Twain |
|
|||
|
Jan Vorbrüggen <jvorbrueggen@not-mediasec.de> wrote:
> > t = 'ABCDEFGH' > > p = 'abcde' > > These initialize _all_ elements of t and p to "A" and "a", respectively. > This follows the rules of assigning a scalar to an array. > > Perhaps you wanted to say > > t = (/"A", "B", "C", "D", "E", "F", "G", "H"/) > > and similarly for p? Oops. I missed that, while concentrating on the declarations and the other assignments. That's probably the OP's problem. Let me reemphasize that it is important to distinguish between arrays and strings. You'll get many things wrong (such as this) if you confuse them. In most cases, Fortran character data are more naturally declared as strings rather than arrays. If you try to make them arrays, you'll get messes like having to use the array constructor as Jan shows above. I/O of character arrays also won't be as convenient. There are some cases where arrays of character*1 data are convenient, but they are more the exception than the rule. You ought to have a concrete reason (more than just "you like to think of them that way") to go down that path or you will find the language hindering you. The F2003 C interop stuff has special provisions to deal with the fact that C character data is stored in arrays, C not having anything directly like Fortran character strings. EVen when interoperating with C, you usually want to keep the Fortran character data in strings instead of arrays - at least if you do much of anything with it in Fortran. -- Richard Maine | Good judgement comes from experience; email: last name at domain . net | experience comes from bad judgement. domain: summertriangle | -- Mark Twain |
|
|||
|
Ben:
It works for me if I change to Character(8):: T Character(5):: p Michael Ben Hetland wrote: > Given the following declarations and code fragment > > ----- > character(1), dimension(8) :: t > character(1), dimension(5) :: p > real, dimension(6) :: r1 > real, dimension(4) :: r2 > data r1/1.0, 2.0, 3.0, 4.0, 5.0, 6.0/ > data r2/41.0, 42.0, 43.0, 44.0/ > t = 'ABCDEFGH' > p = 'abcde' > [...] > t(1:5) = p(1:5) > t(2:6) = p > t(2:6) = p( ![]() > > r1(1:4) = r2(1:4) > r1(3:6) = r2 > ----- > > Is is just my compiler being weird, or is there a rationale that > explains why these assignments should behave differently for the > character array than they do for the corresponding real array assigments? > > I observe that the "character" assigments always copy only the first > character element from the source into _every_ given element of the > target array, while in the real variable case multiple values are copied > instead (not just the first value). > > In other words, after the above t contains 'Aaaaaa' and not [maybe] the > expected 'Aabcde', while r1 does contain (as expected) > /41,42,41,42,43,44/ and NOT the character array equivalent of > /41,41,41,41,41,41/. > > FWIW, it doesn't seem to matter whether I change the declaration to some > other style like one of > > character(1) p*5 > character(1), dimension(5) :: p > character(5) :: p > > > -- > -+-Ben-+- |
|
|||
|
Richard Maine wrote:
> Jan Vorbrüggen <jvorbrueggen@not-mediasec.de> wrote: [...] >>> t = 'ABCDEFGH' >>> p = 'abcde' >> These initialize _all_ elements of t and p to "A" and "a", respectively. >> This follows the rules of assigning a scalar to an array. Yes, further investigation in the (old) code I'm debugging proved that comment of yours quite true ... >> Perhaps you wanted to say >> >> t = (/"A", "B", "C", "D", "E", "F", "G", "H"/) >> >> and similarly for p? > > Oops. I missed that, while concentrating on the declarations and the > other assignments. That's probably the OP's problem. Yes, that's probably what I have mixed up in my understanding of the language. In the source code I'm working with I find all these "styles" in both older and newer variants, probably somewhat mixed up there too, so I missed the fine distinction there (again, for about the Nth time...:-( ). I probably doesn't help knowing C and C++ too well and observing that the different Fortran variants /appear/ to behave exactly the same when output with a WRITE statement. > Let me reemphasize that it is important to distinguish between arrays > and strings. You'll get many things wrong (such as this) if you confuse > them. Yes definitely! You are so right. But to my (poor) defence, it doesn't make the situation easier when there are too many choices of declaration syntax, AND that the array indexing/range syntax is quite acceptable on a character string that ISN'T an array: character t3*8 character(1) p*5 t3(2:6) = p ;-) > There are some cases where arrays of character*1 data are convenient, > but they are more the exception than the rule. You ought to have a > concrete reason (more than just "you like to think of them that way") to > go down that path or you will find the language hindering you. How about "I want to index/update individual characters and ranges of them"? It seems something like that could have been the reason way back, as a single character string/array has been used to contain 2 strings and several individual characters. In modern language I'd guess a TYPE declaraction with the aforementioned as members would have been a better choice ... (Or maybe done that way to avoid internal padding as these also go directly into binary files. One might speculate.) > The F2003 C interop stuff has special provisions to deal with the fact > that C character data is stored in arrays, C not having anything > directly like Fortran character strings. EVen when interoperating with > C, you usually want to keep the Fortran character data in strings > instead of arrays - at least if you do much of anything with it in > Fortran. Yes, and neither to C or Fortran does it seem to matter much in the way of transferring data as they both are just continuous storage of character elements anyway. There are more hazzle related to keeping track of the right kind of string length determination and padding vs. zero termination. And C++ adds another complexity with its std::string class... -- -+-Ben-+- |
|
|||
|
michael@athenavisual.com wrote:
> It works for me if I change to > > Character(8):: T > Character(5):: p Ah yes, thanks! I just discovered that it "works" if both are declared as character varname*N (n=numeric) too, but that should be equivalent if I understand Richard's comment correctly. -- -+-Ben-+- |
|
|||
|
Ben Hetland <ben.a.hetland@sintef.no> wrote:
> But to my (poor) defence, it doesn't make the situation easier when > there are too many choices of declaration syntax, AND that the array > indexing/range syntax is quite acceptable on a character string that > ISN'T an array: There are many parts of the Fortran standard you won't understand unless you know the history of the standard(s). CHARACTER variables and the substring operation were added in Fortran 77, array expressions were added later. It might be that things would have been done differently if they weren't added in that order. Fortran went through a few versions before the first ANSI standard in 1966, and then four newer versions of the standard since 1966. C only has the K&R version and two ANSI versions. -- glen |
|
|||
|
Ben Hetland <ben.a.hetland@sintef.no> wrote:
> Richard Maine wrote: > > There are some cases where arrays of character*1 data are convenient, > > but they are more the exception than the rule. You ought to have a > > concrete reason (more than just "you like to think of them that way") to > > go down that path or you will find the language hindering you. > > How about "I want to index/update individual characters and ranges of > them"? You can do that with strings as well. In fact, you could do ranges (substrings) in f77, while array slices didn't appear until f90. One reason I've seen arrays of char*1 used is for dynamic allocation. Not until f2003 can you have allocatable string length. > > The F2003 C interop stuff has special provisions to deal with the fact > > that C character data is stored in arrays.. > Yes, and neither to C or Fortran does it seem to matter much in the way > of transferring data as they both are just continuous storage of > character elements anyway. Well... not strictly necessarily. That's basically exactly what the special C interop provision ensures - that a string of length N is stored as a sequence of characters in the same way as an array of size N. Although I know of no contrary implementations, that is not required in all cases for Fortran. It is required for default character kind. The special provision also requires it for the C character kind. However, it is not required for all character kinds. Imagine multi-byte character sets with internal codes to switch subsets. I think that provision might have been targetting oriental character sets. I'm not aware of any compilers that take advantage of it, but then I'm not actually aware of any compilers that even have more than one character kind. This might be because I don't know much about compilers used in the orient. -- Richard Maine | Good judgement comes from experience; email: last name at domain . net | experience comes from bad judgement. domain: summertriangle | -- Mark Twain |
|
|||
|
Richard Maine wrote:
>> Yes, and neither to C or Fortran does it seem to matter much in the way >> of transferring data as they both are just continuous storage of >> character elements anyway. > > Well... not strictly necessarily. Probably strictly right for all mentioned languages, i.e. the "not necessarily" part. The only thing C and C++ require, is that if you increment a pointer to a char by 1, it must point to the next character in the array. Also sizeof(char) is exactly 1. So in an odd machine, then maybe they contain som internal padding that probably isn't easily accessible. > That's basically exactly what the special C interop provision ensures > - that a string of length N is stored as a sequence of characters in > the same way as an array of size N. [...] However, it > is not required for all character kinds. Imagine multi-byte character > sets with internal codes to switch subsets. Regarding the multi-byte character stuff, a similar (but not exactly the same) alternative is using 16-bit entities storing UTF16 (a subset of UNICODE). C++ got the new type 'wchar_t' to handle such cases. I was wondering what the equivalent in Fortran would be. Since Fortran has the (brilliant) feature of actually being able to specify the 'kind' (or size) along with the type specification in a declaration, one is tempted to conclude that the solution is very direct and simple: character(1)::asciitxt*32 ! for "old-fashioned" extended ASCII character(2)::utf16txt*32 ! like wchar_t in C++ and the second would hold a string of 32 16-byte elements. However, according to your previous answer in this thread, it isn't since the '*32' part would override the '(2)' part. Another attempt could then be: character(2),dimension(32)::utf16txt but then we would be back to the annoyances with the array vs. string handling that I originally had a problem with. -- -+-Ben-+- |
|
|||
|
> character(1)::asciitxt*32 ! for "old-fashioned" extended ASCII
> character(2)::utf16txt*32 ! like wchar_t in C++ I'm not sure about the syntax, but it would be something like character (kind=ascii) :: asciitxt*32 character (kind=utf16) :: utf16txt*32 with suitable definitions for the constants ascii and utf16. Jan |
|
|||
|
Ben Hetland <ben.a.hetland@sintef.no> wrote:
> Richard Maine wrote: > >> Yes, and neither to C or Fortran does it seem to matter much in the way > >> of transferring data as they both are just continuous storage of > >> character elements anyway. > > > > Well... not strictly necessarily. > > Probably strictly right for all mentioned languages Ah. I started a reply to this, but then after reading the rest of the paragraph I realized that I initially misread what you said here and that you were agreeing with me instead of disagreeing. > character(1)::asciitxt*32 ! for "old-fashioned" extended ASCII > character(2)::utf16txt*32 ! like wchar_t in C++ > > and the second would hold a string of 32 16-byte elements. However, > according to your previous answer in this thread, it isn't since the > '*32' part would override the '(2)' part. That's just because the 1 and 2 above specify length rathar than kind. There are 2 type parameters for character - kind and length. It is slightly odd that the length parameter is the first one when you specify them positionally. There are 2 reasons for that. 1. Probably the biggest actual reason. F77 had only a length type parameter (in fact, that was before the concept of type parameter had been generalized; character length was a distinct thing unlike any other). For compatibility with lots and lots of f77 code, character(n) has to be interpreted as a length of n rather than a kind of n. 2. Secondarily. It is hugely more common to specify length in practice, particularly since no compilers exist that I know of (recall previous caveat about the limits of my knowledge on this) that actually have more than one character kind. Thus you find almost no real code specifying character kind, but almost all real code specifies character length. This somewhat argues for making length the one that is syntactically simplest to specify. But this is probably just rationalization, as (1) would have been an overriding consideration anyway. Jan already mentioned the solution, which I repeat here for completeness. I wanted to add the above explanation. The solution is to specify character(kind=whatever) ... instead of character(whatever) And, of course, I'd recommend using named constants instead of hardwired 1 and 2 in real codes, but I'm sure you appreciate that point and were just simplifying for rhetorical purposes. The f2003 selected_char_kind intrinsic would probably be of use here. You can get ascii with selected_char_kind('ascii'), and ISO10646/USC-4 with selected_char_kind('iso_10646'). F2003 doesn't define a specific name for utf16, but that's a natural extension to selected_char_kind. -- Richard Maine | Good judgment comes from experience; email: my first.last at org.domain| experience comes from bad judgment. org: nasa, domain: gov | -- Mark Twain |
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|