Go Back   Rhinocerus > Newsgroup > Newsgroup comp.lang.* 1 > Newsgroup comp.lang.fortran

Reply
 
Thread Tools Display Modes
  #1 (permalink)  
Old 09-18-2006, 02:38 PM
Ben Hetland
Guest
 
Posts: n/a
Default Array range assignments and characters

Given the following declarations and code fragment

-----
character(1), dimension(8) :: t
character(1), dimension(5) :: p
real, dimension(6) :: r1
real, dimension(4) :: r2
data r1/1.0, 2.0, 3.0, 4.0, 5.0, 6.0/
data r2/41.0, 42.0, 43.0, 44.0/
t = 'ABCDEFGH'
p = 'abcde'
[...]
t(1:5) = p(1:5)
t(2:6) = p
t(2:6) = p(

r1(1:4) = r2(1:4)
r1(3:6) = r2
-----

Is is just my compiler being weird, or is there a rationale that
explains why these assignments should behave differently for the
character array than they do for the corresponding real array assigments?

I observe that the "character" assigments always copy only the first
character element from the source into _every_ given element of the
target array, while in the real variable case multiple values are copied
instead (not just the first value).

In other words, after the above t contains 'Aaaaaa' and not [maybe] the
expected 'Aabcde', while r1 does contain (as expected)
/41,42,41,42,43,44/ and NOT the character array equivalent of
/41,41,41,41,41,41/.

FWIW, it doesn't seem to matter whether I change the declaration to some
other style like one of

character(1) p*5
character(1), dimension(5) :: p
character(5) :: p


--
-+-Ben-+-
Reply With Quote
Alt Today
Advertising
 
and become member of Rhinocerus
Standard Sponsored Links

  #2 (permalink)  
Old 09-18-2006, 03:35 PM
Jan Vorbrüggen
Guest
 
Posts: n/a
Default Re: Array range assignments and characters

> t = 'ABCDEFGH'
> p = 'abcde'


These initialize _all_ elements of t and p to "A" and "a", respectively.
This follows the rules of assigning a scalar to an array.

Perhaps you wanted to say

t = (/"A", "B", "C", "D", "E", "F", "G", "H"/)

and similarly for p?

Jan
Reply With Quote
  #3 (permalink)  
Old 09-18-2006, 03:37 PM
Richard Maine
Guest
 
Posts: n/a
Default Re: Array range assignments and characters

Ben Hetland <ben.a.hetland@sintef.no> wrote:

> character(1), dimension(8) :: t
> character(1), dimension(5) :: p

....
> t(1:5) = p(1:5)
> t(2:6) = p
> t(2:6) = p(


These ought to be ok and do what you think. As a wild guess, it might be
that the compiler is misparsing the section subscripts as substring
ranges. Hmm. Probably not that good a guess, though because then the
middle one shouldn't compile.

Well, in any case, the code looks ok to me.

> FWIW, it doesn't seem to matter whether I change the declaration to some
> other style like one of
>
> character(1) p*5
> character(1), dimension(5) :: p
> character(5) :: p


These are more than just different styles; they are different in
substance. If you get these things mixed up, you will have problems.

The first and third declare scalar character strings of length 5. (In
the first one, the *5 overrides the (1); I don't like that style because
I think it confusing, but it is allowed). The middle one declares an
array of length 1 strings.

The difference here matters. If you get that confused, you'll be
confused by the results.

--
Richard Maine | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle | -- Mark Twain
Reply With Quote
  #4 (permalink)  
Old 09-18-2006, 03:50 PM
Richard Maine
Guest
 
Posts: n/a
Default Re: Array range assignments and characters

Jan Vorbrüggen <jvorbrueggen@not-mediasec.de> wrote:

> > t = 'ABCDEFGH'
> > p = 'abcde'

>
> These initialize _all_ elements of t and p to "A" and "a", respectively.
> This follows the rules of assigning a scalar to an array.
>
> Perhaps you wanted to say
>
> t = (/"A", "B", "C", "D", "E", "F", "G", "H"/)
>
> and similarly for p?


Oops. I missed that, while concentrating on the declarations and the
other assignments. That's probably the OP's problem.

Let me reemphasize that it is important to distinguish between arrays
and strings. You'll get many things wrong (such as this) if you confuse
them. In most cases, Fortran character data are more naturally declared
as strings rather than arrays. If you try to make them arrays, you'll
get messes like having to use the array constructor as Jan shows above.
I/O of character arrays also won't be as convenient.

There are some cases where arrays of character*1 data are convenient,
but they are more the exception than the rule. You ought to have a
concrete reason (more than just "you like to think of them that way") to
go down that path or you will find the language hindering you.

The F2003 C interop stuff has special provisions to deal with the fact
that C character data is stored in arrays, C not having anything
directly like Fortran character strings. EVen when interoperating with
C, you usually want to keep the Fortran character data in strings
instead of arrays - at least if you do much of anything with it in
Fortran.

--
Richard Maine | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle | -- Mark Twain
Reply With Quote
  #5 (permalink)  
Old 09-18-2006, 03:51 PM
michael@athenavisual.com
Guest
 
Posts: n/a
Default Re: Array range assignments and characters

Ben:
It works for me if I change to

Character(8):: T
Character(5):: p

Michael

Ben Hetland wrote:
> Given the following declarations and code fragment
>
> -----
> character(1), dimension(8) :: t
> character(1), dimension(5) :: p
> real, dimension(6) :: r1
> real, dimension(4) :: r2
> data r1/1.0, 2.0, 3.0, 4.0, 5.0, 6.0/
> data r2/41.0, 42.0, 43.0, 44.0/
> t = 'ABCDEFGH'
> p = 'abcde'
> [...]
> t(1:5) = p(1:5)
> t(2:6) = p
> t(2:6) = p(
>
> r1(1:4) = r2(1:4)
> r1(3:6) = r2
> -----
>
> Is is just my compiler being weird, or is there a rationale that
> explains why these assignments should behave differently for the
> character array than they do for the corresponding real array assigments?
>
> I observe that the "character" assigments always copy only the first
> character element from the source into _every_ given element of the
> target array, while in the real variable case multiple values are copied
> instead (not just the first value).
>
> In other words, after the above t contains 'Aaaaaa' and not [maybe] the
> expected 'Aabcde', while r1 does contain (as expected)
> /41,42,41,42,43,44/ and NOT the character array equivalent of
> /41,41,41,41,41,41/.
>
> FWIW, it doesn't seem to matter whether I change the declaration to some
> other style like one of
>
> character(1) p*5
> character(1), dimension(5) :: p
> character(5) :: p
>
>
> --
> -+-Ben-+-


Reply With Quote
  #6 (permalink)  
Old 09-18-2006, 05:05 PM
Ben Hetland
Guest
 
Posts: n/a
Default Re: Array range assignments and characters

Richard Maine wrote:
> Jan Vorbrüggen <jvorbrueggen@not-mediasec.de> wrote:

[...]
>>> t = 'ABCDEFGH'
>>> p = 'abcde'

>> These initialize _all_ elements of t and p to "A" and "a", respectively.
>> This follows the rules of assigning a scalar to an array.


Yes, further investigation in the (old) code I'm debugging proved that
comment of yours quite true ...


>> Perhaps you wanted to say
>>
>> t = (/"A", "B", "C", "D", "E", "F", "G", "H"/)
>>
>> and similarly for p?

>
> Oops. I missed that, while concentrating on the declarations and the
> other assignments. That's probably the OP's problem.


Yes, that's probably what I have mixed up in my understanding of the
language. In the source code I'm working with I find all these "styles"
in both older and newer variants, probably somewhat mixed up there too,
so I missed the fine distinction there (again, for about the Nth
time...:-( ). I probably doesn't help knowing C and C++ too well and
observing that the different Fortran variants /appear/ to behave exactly
the same when output with a WRITE statement.


> Let me reemphasize that it is important to distinguish between arrays
> and strings. You'll get many things wrong (such as this) if you confuse
> them.


Yes definitely! You are so right.

But to my (poor) defence, it doesn't make the situation easier when
there are too many choices of declaration syntax, AND that the array
indexing/range syntax is quite acceptable on a character string that
ISN'T an array:

character t3*8
character(1) p*5
t3(2:6) = p

;-)


> There are some cases where arrays of character*1 data are convenient,
> but they are more the exception than the rule. You ought to have a
> concrete reason (more than just "you like to think of them that way") to
> go down that path or you will find the language hindering you.


How about "I want to index/update individual characters and ranges of
them"? It seems something like that could have been the reason way back,
as a single character string/array has been used to contain 2 strings
and several individual characters. In modern language I'd guess a TYPE
declaraction with the aforementioned as members would have been a better
choice ...

(Or maybe done that way to avoid internal padding as these also go
directly into binary files. One might speculate.)


> The F2003 C interop stuff has special provisions to deal with the fact
> that C character data is stored in arrays, C not having anything
> directly like Fortran character strings. EVen when interoperating with
> C, you usually want to keep the Fortran character data in strings
> instead of arrays - at least if you do much of anything with it in
> Fortran.


Yes, and neither to C or Fortran does it seem to matter much in the way
of transferring data as they both are just continuous storage of
character elements anyway. There are more hazzle related to keeping
track of the right kind of string length determination and padding vs.
zero termination. And C++ adds another complexity with its std::string
class...

--
-+-Ben-+-
Reply With Quote
  #7 (permalink)  
Old 09-18-2006, 05:05 PM
Ben Hetland
Guest
 
Posts: n/a
Default Re: Array range assignments and characters

michael@athenavisual.com wrote:
> It works for me if I change to
>
> Character(8):: T
> Character(5):: p


Ah yes, thanks! I just discovered that it "works" if both are declared
as character varname*N (n=numeric) too, but that should be equivalent if
I understand Richard's comment correctly.

--
-+-Ben-+-
Reply With Quote
  #8 (permalink)  
Old 09-18-2006, 05:34 PM
glen herrmannsfeldt
Guest
 
Posts: n/a
Default Re: Array range assignments and characters

Ben Hetland <ben.a.hetland@sintef.no> wrote:

> But to my (poor) defence, it doesn't make the situation easier when
> there are too many choices of declaration syntax, AND that the array
> indexing/range syntax is quite acceptable on a character string that
> ISN'T an array:


There are many parts of the Fortran standard you won't understand
unless you know the history of the standard(s).

CHARACTER variables and the substring operation were added in
Fortran 77, array expressions were added later. It might be
that things would have been done differently if they weren't added
in that order.

Fortran went through a few versions before the first ANSI standard
in 1966, and then four newer versions of the standard since 1966.

C only has the K&R version and two ANSI versions.

-- glen
Reply With Quote
  #9 (permalink)  
Old 09-18-2006, 05:48 PM
Richard Maine
Guest
 
Posts: n/a
Default Re: Array range assignments and characters

Ben Hetland <ben.a.hetland@sintef.no> wrote:

> Richard Maine wrote:


> > There are some cases where arrays of character*1 data are convenient,
> > but they are more the exception than the rule. You ought to have a
> > concrete reason (more than just "you like to think of them that way") to
> > go down that path or you will find the language hindering you.

>
> How about "I want to index/update individual characters and ranges of
> them"?


You can do that with strings as well. In fact, you could do ranges
(substrings) in f77, while array slices didn't appear until f90.

One reason I've seen arrays of char*1 used is for dynamic allocation.
Not until f2003 can you have allocatable string length.

> > The F2003 C interop stuff has special provisions to deal with the fact
> > that C character data is stored in arrays..


> Yes, and neither to C or Fortran does it seem to matter much in the way
> of transferring data as they both are just continuous storage of
> character elements anyway.


Well... not strictly necessarily. That's basically exactly what the
special C interop provision ensures - that a string of length N is
stored as a sequence of characters in the same way as an array of size
N. Although I know of no contrary implementations, that is not required
in all cases for Fortran. It is required for default character kind. The
special provision also requires it for the C character kind. However, it
is not required for all character kinds. Imagine multi-byte character
sets with internal codes to switch subsets.

I think that provision might have been targetting oriental character
sets. I'm not aware of any compilers that take advantage of it, but then
I'm not actually aware of any compilers that even have more than one
character kind. This might be because I don't know much about compilers
used in the orient.

--
Richard Maine | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle | -- Mark Twain
Reply With Quote
  #10 (permalink)  
Old 09-19-2006, 10:03 AM
Ben Hetland
Guest
 
Posts: n/a
Default Re: Array range assignments and characters

Richard Maine wrote:
>> Yes, and neither to C or Fortran does it seem to matter much in the way
>> of transferring data as they both are just continuous storage of
>> character elements anyway.

>
> Well... not strictly necessarily.


Probably strictly right for all mentioned languages, i.e. the "not
necessarily" part. The only thing C and C++ require, is that if you
increment a pointer to a char by 1, it must point to the next character
in the array. Also sizeof(char) is exactly 1. So in an odd machine, then
maybe they contain som internal padding that probably isn't easily
accessible.


> That's basically exactly what the special C interop provision ensures
> - that a string of length N is stored as a sequence of characters in
> the same way as an array of size N. [...] However, it
> is not required for all character kinds. Imagine multi-byte character
> sets with internal codes to switch subsets.


Regarding the multi-byte character stuff, a similar (but not exactly the
same) alternative is using 16-bit entities storing UTF16 (a subset of
UNICODE). C++ got the new type 'wchar_t' to handle such cases. I was
wondering what the equivalent in Fortran would be.

Since Fortran has the (brilliant) feature of actually being able to
specify the 'kind' (or size) along with the type specification in a
declaration, one is tempted to conclude that the solution is very direct
and simple:

character(1)::asciitxt*32 ! for "old-fashioned" extended ASCII
character(2)::utf16txt*32 ! like wchar_t in C++

and the second would hold a string of 32 16-byte elements. However,
according to your previous answer in this thread, it isn't since the
'*32' part would override the '(2)' part.

Another attempt could then be:

character(2),dimension(32)::utf16txt

but then we would be back to the annoyances with the array vs. string
handling that I originally had a problem with.

--
-+-Ben-+-
Reply With Quote
  #11 (permalink)  
Old 09-19-2006, 10:16 AM
Jan Vorbrüggen
Guest
 
Posts: n/a
Default Re: Array range assignments and characters

> character(1)::asciitxt*32 ! for "old-fashioned" extended ASCII
> character(2)::utf16txt*32 ! like wchar_t in C++


I'm not sure about the syntax, but it would be something like

character (kind=ascii) :: asciitxt*32
character (kind=utf16) :: utf16txt*32

with suitable definitions for the constants ascii and utf16.

Jan
Reply With Quote
  #12 (permalink)  
Old 09-19-2006, 03:19 PM
Richard E Maine
Guest
 
Posts: n/a
Default Re: Array range assignments and characters

Ben Hetland <ben.a.hetland@sintef.no> wrote:

> Richard Maine wrote:
> >> Yes, and neither to C or Fortran does it seem to matter much in the way
> >> of transferring data as they both are just continuous storage of
> >> character elements anyway.

> >
> > Well... not strictly necessarily.

>
> Probably strictly right for all mentioned languages


Ah. I started a reply to this, but then after reading the rest of the
paragraph I realized that I initially misread what you said here and
that you were agreeing with me instead of disagreeing.

> character(1)::asciitxt*32 ! for "old-fashioned" extended ASCII
> character(2)::utf16txt*32 ! like wchar_t in C++
>
> and the second would hold a string of 32 16-byte elements. However,
> according to your previous answer in this thread, it isn't since the
> '*32' part would override the '(2)' part.


That's just because the 1 and 2 above specify length rathar than kind.
There are 2 type parameters for character - kind and length. It is
slightly odd that the length parameter is the first one when you specify
them positionally. There are 2 reasons for that.

1. Probably the biggest actual reason. F77 had only a length type
parameter (in fact, that was before the concept of type parameter had
been generalized; character length was a distinct thing unlike any
other). For compatibility with lots and lots of f77 code, character(n)
has to be interpreted as a length of n rather than a kind of n.

2. Secondarily. It is hugely more common to specify length in practice,
particularly since no compilers exist that I know of (recall previous
caveat about the limits of my knowledge on this) that actually have more
than one character kind. Thus you find almost no real code specifying
character kind, but almost all real code specifies character length.
This somewhat argues for making length the one that is syntactically
simplest to specify. But this is probably just rationalization, as (1)
would have been an overriding consideration anyway.

Jan already mentioned the solution, which I repeat here for
completeness. I wanted to add the above explanation. The solution is to
specify

character(kind=whatever) ...

instead of

character(whatever)

And, of course, I'd recommend using named constants instead of hardwired
1 and 2 in real codes, but I'm sure you appreciate that point and were
just simplifying for rhetorical purposes.

The f2003 selected_char_kind intrinsic would probably be of use here.
You can get ascii with selected_char_kind('ascii'), and ISO10646/USC-4
with selected_char_kind('iso_10646'). F2003 doesn't define a specific
name for utf16, but that's a natural extension to selected_char_kind.

--
Richard Maine | Good judgment comes from experience;
email: my first.last at org.domain| experience comes from bad judgment.
org: nasa, domain: gov | -- Mark Twain
Reply With Quote
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off




All times are GMT. The time now is 08:32 AM.


Copyright ©2009

LinkBacks Enabled by vBSEO 3.3.0 RC2 © 2009, Crawlability, Inc.