Go Back   Rhinocerus > Newsgroup > Newsgroup comp.lang.c

Reply
 
Thread Tools Display Modes
  #76 (permalink)  
Old 02-06-2011, 03:30 PM
Johannes Schaub (litb)
Guest
 
Posts: n/a
Default Re: Is the aliasing rule symmetric?

Joshua Maurice wrote:

> On Jan 26, 8:38 am, Keith Thompson <ks...@mib.org> wrote:
>> Joshua Maurice <joshuamaur...@gmail.com> writes:
>> > Is one good and the other not? If so, what's the important difference,
>> > and most importantly what part of the standard, if any, can be read to
>> > describe that difference?

>>
>> As a matter of style, it's a much more verbose way of saying essentially
>> the same thing.

>
> So let me ask again, to you and anyone else. Is there any difference
> between the two programs:
>
> #include <stddef.h>
> #include <stdlib.h>
> typedef struct T1 { int x; int y; } T1;
> typedef struct T2 { int x; int y; } T2;
> int main(void)
> { T1 *p = malloc(sizeof *p);
> p->x = 1;
> p->y = 2;
> return p->y;
> }
>
> and
>
> #include <stddef.h>
> #include <stdlib.h>
> typedef struct T1 { int x; int y; } T1;
> typedef struct T2 { int x; int y; } T2;
> int main()
> {
> void* p = malloc(sizeof(T1));
> * (int*) (((char*)p) + offsetof(T1, x)) = 1;
> * (int*) (((char*)p) + offsetof(T1, y)) = 2;
> return ((T1*)p)->y;
> }
>
> Specifically, I presume that everyone agrees C and C++ needs to
> support the first program with no UB. The interesting questions I have
> concern the second. Does the "return ((T1*)p)->y;" result in UB? Why?
> What's the important different between these two programs, and
> specifically the parts of the standards which explain the important
> differences.
>


Assuming layout and sizes are equal for structurally equal structs, and
assuming the spec is correct:

P1: In the first, no object prior to the first write has a declared type.
After the write, there are two objects that have an effective type of type
int, and the read afterwards is alright.

P2: In the second, the situation until the return statement is exactly
equal. At the return statement, you access the second object whose effective
type is 'int' by an 'int', so you go fine too (exactly like in P1). So this
is fine too.

> Also, if the second program has no UB, can we instead return "return
> ((T2*)p)->y;" for implementations which we've tested that T1 and T2
> have equivalent layout? That is, it might not be a portable program,
> but for those systems which there is no difference in layout, would
> the access through T2 have UB? Why?


Since we assume layout and size is equal, Casting p to T2 will make no
difference. You still access the second int by an int lvalue. So this is
fine too.

Let's make a different program

T1 *p = malloc(sizeof *p);
*p = (T1){ 0, 1 };

Now we have 1 effectively typed object, and that object has type T1. Given
that, the following is UB:

T2 p1 = *(T2*)p;

Because you violate the aliasing rule, accessing a T1 effectively typed
object by a T2 lvalue.

I think that is what the spec says. And I don't think it agrees with what
the committee says. The committee wants to say that in P1, there exists a T1
object in addition. According to the committee, if you insert a cast in to
T2 lvalue in P1's return statement, result is undefined. But the spec does
not say that.

Reply With Quote
Alt Today
Advertising
 
and become member of Rhinocerus
Standard Sponsored Links

  #77 (permalink)  
Old 02-06-2011, 03:55 PM
Johannes Schaub (litb)
Guest
 
Posts: n/a
Default Re: Is the aliasing rule symmetric?

Johannes Schaub (litb) wrote:

> Joshua Maurice wrote:
>
>> On Jan 26, 8:38 am, Keith Thompson <ks...@mib.org> wrote:
>>> Joshua Maurice <joshuamaur...@gmail.com> writes:
>>> > Is one good and the other not? If so, what's the important difference,
>>> > and most importantly what part of the standard, if any, can be read to
>>> > describe that difference?
>>>
>>> As a matter of style, it's a much more verbose way of saying essentially
>>> the same thing.

>>
>> So let me ask again, to you and anyone else. Is there any difference
>> between the two programs:
>>
>> #include <stddef.h>
>> #include <stdlib.h>
>> typedef struct T1 { int x; int y; } T1;
>> typedef struct T2 { int x; int y; } T2;
>> int main(void)
>> { T1 *p = malloc(sizeof *p);
>> p->x = 1;
>> p->y = 2;
>> return p->y;
>> }
>>
>> and
>>
>> #include <stddef.h>
>> #include <stdlib.h>
>> typedef struct T1 { int x; int y; } T1;
>> typedef struct T2 { int x; int y; } T2;
>> int main()
>> {
>> void* p = malloc(sizeof(T1));
>> * (int*) (((char*)p) + offsetof(T1, x)) = 1;
>> * (int*) (((char*)p) + offsetof(T1, y)) = 2;
>> return ((T1*)p)->y;
>> }
>>
>> Specifically, I presume that everyone agrees C and C++ needs to
>> support the first program with no UB. The interesting questions I have
>> concern the second. Does the "return ((T1*)p)->y;" result in UB? Why?
>> What's the important different between these two programs, and
>> specifically the parts of the standards which explain the important
>> differences.
>>

>
> Assuming layout and sizes are equal for structurally equal structs, and
> assuming the spec is correct:
>
> P1: In the first, no object prior to the first write has a declared type.
> After the write, there are two objects that have an effective type of type
> int, and the read afterwards is alright.
>
> P2: In the second, the situation until the return statement is exactly
> equal. At the return statement, you access the second object whose
> effective type is 'int' by an 'int', so you go fine too (exactly like in
> P1). So this is fine too.
>
>> Also, if the second program has no UB, can we instead return "return
>> ((T2*)p)->y;" for implementations which we've tested that T1 and T2
>> have equivalent layout? That is, it might not be a portable program,
>> but for those systems which there is no difference in layout, would
>> the access through T2 have UB? Why?

>
> Since we assume layout and size is equal, Casting p to T2 will make no
> difference. You still access the second int by an int lvalue. So this is
> fine too.
>
> Let's make a different program
>
> T1 *p = malloc(sizeof *p);
> *p = (T1){ 0, 1 };
>
> Now we have 1 effectively typed object, and that object has type T1. Given
> that, the following is UB:
>
> T2 p1 = *(T2*)p;
>
> Because you violate the aliasing rule, accessing a T1 effectively typed
> object by a T2 lvalue.
>
> I think that is what the spec says. And I don't think it agrees with what
> the committee says. The committee wants to say that in P1, there exists a
> T1 object in addition. According to the committee, if you insert a cast in
> to T2 lvalue in P1's return statement, result is undefined. But the spec
> does not say that.


In particular, I think the committee intends the spec to say that a struct
or union access expression involves an access with the struct or union
lvalue.

T1 *p = malloc(sizeof *p);
p->x = 0;

In this case, I think the committee's intent is that the object pointed to
by "p" is accesse by an lvalue of type T1, and so the effective type of the
object containing the int changes to T1. So a later cast and access by an
lvalue of T2 will be undefined behavior.

Reply With Quote
  #78 (permalink)  
Old 02-06-2011, 08:19 PM
Joshua Maurice
Guest
 
Posts: n/a
Default Re: Is the aliasing rule symmetric?

On Feb 6, 8:55*am, "Johannes Schaub (litb)"
<schaub.johan...@googlemail.com> wrote:
> Johannes Schaub (litb) wrote:
> > Joshua Maurice wrote:

>
> >> On Jan 26, 8:38 am, Keith Thompson <ks...@mib.org> wrote:
> >>> Joshua Maurice <joshuamaur...@gmail.com> writes:
> >>> > Is one good and the other not? If so, what's the important difference,
> >>> > and most importantly what part of the standard, if any, can be readto
> >>> > describe that difference?

>
> >>> As a matter of style, it's a much more verbose way of saying essentially
> >>> the same thing.

>
> >> So let me ask again, to you and anyone else. Is there any difference
> >> between the two programs:

>
> >> * #include <stddef.h>
> >> * #include <stdlib.h>
> >> * typedef struct T1 { int x; int y; } T1;
> >> * typedef struct T2 { int x; int y; } T2;
> >> * int main(void)
> >> * { T1 *p = malloc(sizeof *p);
> >> * * p->x = 1;
> >> * * p->y = 2;
> >> * * return p->y;
> >> * }

>
> >> and

>
> >> * #include <stddef.h>
> >> * #include <stdlib.h>
> >> * typedef struct T1 { int x; int y; } T1;
> >> * typedef struct T2 { int x; int y; } T2;
> >> * int main()
> >> * {
> >> * * void* p = malloc(sizeof(T1));
> >> * * * (int*) (((char*)p) + offsetof(T1, x)) = 1;
> >> * * * (int*) (((char*)p) + offsetof(T1, y)) = 2;
> >> * * return ((T1*)p)->y;
> >> * }

>
> >> Specifically, I presume that everyone agrees C and C++ needs to
> >> support the first program with no UB. The interesting questions I have
> >> concern the second. Does the "return ((T1*)p)->y;" result in UB? Why?
> >> What's the important different between these two programs, and
> >> specifically the parts of the standards which explain the important
> >> differences.

>
> > Assuming layout and sizes are equal for structurally equal structs, and
> > assuming the spec is correct:

>
> > P1: In the first, no object prior to the first write has a declared type.
> > After the write, there are two objects that have an effective type of type
> > int, and the read afterwards is alright.

>
> > P2: In the second, the situation until the return statement is exactly
> > equal. At the return statement, you access the second object whose
> > effective type is 'int' by an 'int', so you go fine too (exactly like in
> > P1). So this is fine too.

>
> >> Also, if the second program has no UB, can we instead return "return
> >> ((T2*)p)->y;" for implementations which we've tested that T1 and T2
> >> have equivalent layout? That is, it might not be a portable program,
> >> but for those systems which there is no difference in layout, would
> >> the access through T2 have UB? Why?

>
> > Since we assume layout and size is equal, Casting p to T2 will make no
> > difference. You still access the second int by an int lvalue. So this is
> > fine too.

>
> > Let's make a different program

>
> > * T1 *p = malloc(sizeof *p);
> > * *p = (T1){ 0, 1 };

>
> > Now we have 1 effectively typed object, and that object has type T1. Given
> > that, the following is UB:

>
> > * T2 p1 = *(T2*)p;

>
> > Because you violate the aliasing rule, accessing a T1 effectively typed
> > object by a T2 lvalue.

>
> > I think that is what the spec says. And I don't think it agrees with what
> > the committee says. The committee wants to say that in P1, there existsa
> > T1 object in addition. According to the committee, if you insert a castin
> > to T2 lvalue in P1's return statement, result is undefined. But the spec
> > does not say that.

>
> In particular, I think the committee intends the spec to say that a struct
> or union access expression involves an access with the struct or union
> lvalue.
>
> * * T1 *p = malloc(sizeof *p);
> * * p->x = 0;
>
> In this case, I think the committee's intent is that the object pointed to
> by "p" is accesse by an lvalue of type T1, and so the effective type of the
> object containing the int changes to T1. So a later cast and access by an
> lvalue of T2 will be undefined behavior.


I think this is also the only sensible interpretation of the
committee's intent. That is
p->y = 2;
is not equivalent to
* (int*) (((char*)p) + offsetof(T1, y)) = 2;
That is, the apparently only sensible way out is: the first somehow
participates in unwritten rules to make a T1 object, and the offsetof
way does not.

I wonder where they want to draw the difference. Let this be the
context for the following questions:
#include <stddef.h>
#include <stdlib.h>

typedef struct T1 { int x; int y; } T1;
typedef struct T2 { int x; int y; } T2;

int main()
{
void* p = malloc(sizeof(T1));
/* ... */
}

Consider the subsequent alterations. Let's start with the simple:
T1* a = (T1*) p;
a->y = 2;
return a->y;
Now, changing it to the following shouldn't give it UB.
T1* a = (T1*) p;
T2* b = (T2*) p;
a->y = 2;
return a->y;
Let's add an explicit temporarily variable as follows.
T1* a = (T1*) p;
T2* b = (T2*) p;
int* a_y = & a->y;
*a_y = 2;
return a->y;
Let's add another variable.
T1* a = (T1*) p;
T2* b = (T2*) p;
int* a_y = & a->y;
int* b_y = & b->y;
*a_y = 2;
return a->y;
Ok, up to this point, I'm pretty sure everyone would agree that we
have no UB. Now, let's take that one dubious step, and transform the
above to:
T1* a = (T1*) p;
T2* b = (T2*) p;
int* a_y = (int*) (((char*)a) + offsetof(T1, y));
int* b_y = (int*) (((char*)b) + offsetof(T2, y));
*a_y = 2;
return a->y;
Quick change, replacing some of the "a" and "b" with "p":
T1* a = (T1*) p;
T2* b = (T2*) p;
int* a_y = (int*) (((char*)p) + offsetof(T1, y));
int* b_y = (int*) (((char*)p) + offsetof(T2, y));
*a_y = 2;
return a->y;
Now we have a problem, because on any sane implementation,
offsetof(T1, y) == offsetof(T2, y), which means for most
implementations I can transform it to:
T1* a = (T1*) p;
T2* b = (T2*) p;
int* a_y = (int*) (((char*)p) + offsetof(T2, y));
int* b_y = (int*) (((char*)p) + offsetof(T2, y));
*a_y = 2;
return a->y;
and reverse the dubious step to get:
T1* a = (T1*) p;
T2* b = (T2*) p;
int* a_y = & b->y;
int* b_y = & b->y;
*a_y = 2;
return a->y;
simplify a bit:
((T2*)p)->y = 2;
return ((T1*)p)->y;
And we're done.

So, that means we need to conclude that:
int* y = & a->y;
is fundamentally different than:
int* y = (int*) (((char*)a) + offsetof(T1*, y));
And I think the only way we can formalize this is to require data
dependency analysis. Let me repeat the first program fragment here:
T1* a = (T1*) p;
a->y = 2;
return a->y;
aka:
T1* a = (T1*) p;
int* y = & a->y;
*y = 2;
return *y;
Our only way out appears to be: we have a object of effective type T1
because of the int write "*y = 2;", and because that int write went
through an int lvalue "*y" / int pointer "y" which was obtained via a
data dependency from a memberof expression on a T1 type lvalue "int* y
= & a->y;".

So, earlier when I was rambling on comp.std.c++ about making memberof
expressions special, I was right in that the only way out. However,
this just strikes me as fundamentally wrong though. I don't like it.
Reply With Quote
  #79 (permalink)  
Old 02-06-2011, 08:23 PM
Joshua Maurice
Guest
 
Posts: n/a
Default Re: Is the aliasing rule symmetric?

On Feb 6, 7:46*am, "Johannes Schaub (litb)"
<schaub.johan...@googlemail.com> wrote:
> Joshua Maurice wrote:
> > No. Don't think about it as an aliasing rule. Think about it as a rule
> > which restricts the types of lvalues with which you can legally access
> > objects.

>
> > You can always access an object through a char or unsigned char
> > lvalue. (Or maybe it's only for POD types - there's no consensus. I
> > would only use char and unsigned char to access POD objects.)

>
> > You can always access an object through a base class lvalue, but you
> > can never do the reverse: you can never take a complete object of type
> > T and access it through a derived type of type T.

>
> You cannot access an object of derived class type and access it as a base
> class lvalue either. You always need to point to the proper base class
> subobject. If you try to directly access the complete object by a base class
> lvalue, you will be lucky if it crashes.
>
> In this sense it's the same for base/derived relationship in both
> directions. If the base-class subobject and the complete object have the
> same address, you can reinterpret_cast and if you aren't lucky you can
> read/write with the resulting lvalue. If you do the proper thing and use an
> implicit conversion or an explicit conversion (for the downcast), you have
> defined behavior. But that has nothing to do with the aliasing rule. IMO the
> respective bullet in 3.10p15 is flawed.


Indeed and agreed. Pedantic, but still important. This becomes evident
in multiple inheritance and virtual inheritance cases.
Reply With Quote
  #80 (permalink)  
Old 02-06-2011, 08:33 PM
Joshua Maurice
Guest
 
Posts: n/a
Default Re: Is the aliasing rule symmetric?

On Feb 6, 7:55*am, "Johannes Schaub (litb)"
<schaub.johan...@googlemail.com> wrote:
> Johannes Schaub (litb) wrote:
> > Joshua Maurice wrote:

>
> >> On Jan 21, 9:20 am, "Johannes Schaub (litb)" <schaub-johan...@web.de>
> >> wrote:
> >>> Ben Bacarisse wrote:
> >>> > "Johannes Schaub (litb)" <schaub-johan...@web.de> writes:

>
> >>> >> Would we be allowed to do this in the opposite direction, if we know
> >>> >> that the alignment is fine?

>
> >>> > What's the opposite direction? *Are you asking if changing theint
> >>> > will
> >>> > change the value of *b? *If so, yes (provided the new int value's bits
> >>> > do indeed affect the byte in question).

>
> >>> I mean to ask: If aliasing of an A object by an lvalue of type B is OK,
> >>> is aliasing of a B object by an lvalue of type A OK?

>
> >> No. Don't think about it as an aliasing rule. Think about it as a rule
> >> which restricts the types of lvalues with which you can legally access
> >> objects.

>
> >> You can always access an object through a char or unsigned char
> >> lvalue. (Or maybe it's only for POD types - there's no consensus. I
> >> would only use char and unsigned char to access POD objects.)

>
> >> You can always access an object through a base class lvalue, but you
> >> can never do the reverse: you can never take a complete object of type
> >> T and access it through a derived type of type T.

>
> > You cannot access an object of derived class type and access it as a base
> > class lvalue either. You always need to point to the proper base class
> > subobject. If you try to directly access the complete object by a base
> > class lvalue, you will be lucky if it crashes.

>
> > In this sense it's the same for base/derived relationship in both
> > directions. If the base-class subobject and the complete object have the
> > same address, you can reinterpret_cast and if you aren't lucky you can
> > read/write with the resulting lvalue. If you do the proper thing and use
> > an implicit conversion or an explicit conversion (for the downcast), you
> > have defined behavior. But that has nothing to do with the aliasing rule.
> > IMO the respective bullet in 3.10p15 is flawed.

>
> Having thought about this again, I think the respective bullet is NOT
> flawed. The bullet implies that you already have made a successful
> conversion and have a proper lvalue.
>
> We do actually have the reverse (access a base class object by the derived
> class type), by means of "the dynamic type of the object" (first bullet).It
> is catched by that, and to my surprise, if you turn around the bullet about
> the base-class subobject rule according to symmetry rule, you get nearly the
> same wording
>
> * - a type that is the (possibly cv-qualified) dynamic class type of
> * * the type of the object


Implicit in that entire piece of standard is that you obtained that
lvalue through a "proper" explicit or implicit conversion or cast. If
you start throwing around reinterpret_casts, then it's quite easy to
break it. Consider:
struct A { int x; };
struct B : A {};
int main()
{
B b;
b.x = 1;
A* a = & b;
return a->x;
}
Now, what's left is quite pedantic, and I'm not sure of the exact
nomenclature. When I access the base class subobject ala "return a-
>x;", is that consider "accessing the stored value of the [derived

class] object" according to the wording of C++03 "3.10 Lvalues and
rvalues / 15" ? I presume yes. Those bullets are there just as
allowance that you /can/ access base class subobjects through base
class type lvalues and through lvalues of the member types, and you
can access the object through the dynamic type of the object. It
doesn't mention that the lvalues must have been properly obtained -
the following is an example of improperly obtaining the lvalue:
int main()
{
B b;
b.x = 1;
A* a = reinterpret_cast<A*>(&b);
return a->x;
}
Is the above UB? I don't know. Maybe? Either way you should never do
it. It definitely is UB if we have virtual or multiple inheritance.
Where is this fundamental distinction mentioned in the standard?
Nowhere where I can see.

> So I think we again see that the following rule seems to be true:
>
> * * If aliasing of an A object by an lvalue of type B is OK,
> * * is aliasing of a B object by an lvalue of type A OK?
>
> Please correct me If I'm misunderstanding anything.


Well, yes. If you have an A object, and you can access a sub-object of
that, or a containing object of that, through a B lvalue, then you can
definitely take that same B object, and access the corresponding A
object through an A lvalue. Are you trying to say something more?
Reply With Quote
  #81 (permalink)  
Old 02-06-2011, 09:36 PM
Johannes Schaub (litb)
Guest
 
Posts: n/a
Default Re: Is the aliasing rule symmetric?

Joshua Maurice wrote:

> On Feb 6, 8:55 am, "Johannes Schaub (litb)"
> <schaub.johan...@googlemail.com> wrote:
>> In particular, I think the committee intends the spec to say that a
>> struct or union access expression involves an access with the struct or
>> union lvalue.
>>
>> T1 *p = malloc(sizeof *p);
>> p->x = 0;
>>
>> In this case, I think the committee's intent is that the object pointed
>> to by "p" is accesse by an lvalue of type T1, and so the effective type
>> of the object containing the int changes to T1. So a later cast and
>> access by an lvalue of T2 will be undefined behavior.

>
> I think this is also the only sensible interpretation of the
> committee's intent. That is
> p->y = 2;
> is not equivalent to
> * (int*) (((char*)p) + offsetof(T1, y)) = 2;
> That is, the apparently only sensible way out is: the first somehow
> participates in unwritten rules to make a T1 object, and the offsetof
> way does not.
>
> I wonder where they want to draw the difference. Let this be the
> context for the following questions:
> #include <stddef.h>
> #include <stdlib.h>
>
> typedef struct T1 { int x; int y; } T1;
> typedef struct T2 { int x; int y; } T2;
>
> int main()
> {
> void* p = malloc(sizeof(T1));
> /* ... */
> }
>
> Consider the subsequent alterations. Let's start with the simple:
> T1* a = (T1*) p;
> a->y = 2;
> return a->y;
> Now, changing it to the following shouldn't give it UB.
> T1* a = (T1*) p;
> T2* b = (T2*) p;
> a->y = 2;
> return a->y;
> Let's add an explicit temporarily variable as follows.
> T1* a = (T1*) p;
> T2* b = (T2*) p;
> int* a_y = & a->y;
> *a_y = 2;
> return a->y;
> Let's add another variable.
> T1* a = (T1*) p;
> T2* b = (T2*) p;
> int* a_y = & a->y;
> int* b_y = & b->y;
> *a_y = 2;
> return a->y;
> Ok, up to this point, I'm pretty sure everyone would agree that we
> have no UB. Now, let's take that one dubious step, and transform the
> above to:
> T1* a = (T1*) p;
> T2* b = (T2*) p;
> int* a_y = (int*) (((char*)a) + offsetof(T1, y));
> int* b_y = (int*) (((char*)b) + offsetof(T2, y));
> *a_y = 2;
> return a->y;
> Quick change, replacing some of the "a" and "b" with "p":
> T1* a = (T1*) p;
> T2* b = (T2*) p;
> int* a_y = (int*) (((char*)p) + offsetof(T1, y));
> int* b_y = (int*) (((char*)p) + offsetof(T2, y));
> *a_y = 2;
> return a->y;
> Now we have a problem, because on any sane implementation,
> offsetof(T1, y) == offsetof(T2, y), which means for most
> implementations I can transform it to:
> T1* a = (T1*) p;
> T2* b = (T2*) p;
> int* a_y = (int*) (((char*)p) + offsetof(T2, y));
> int* b_y = (int*) (((char*)p) + offsetof(T2, y));
> *a_y = 2;
> return a->y;
> and reverse the dubious step to get:
> T1* a = (T1*) p;
> T2* b = (T2*) p;
> int* a_y = & b->y;
> int* b_y = & b->y;
> *a_y = 2;
> return a->y;
> simplify a bit:
> ((T2*)p)->y = 2;
> return ((T1*)p)->y;
> And we're done.
>


I think I'm missing something. This last simplification does not seem to be
valid according to the intent. In the unsimplified code, before executing
the "return a->y" you have for read access to "*a_y":

object 1: address X, sizeof(int), effective type: int

for the return access you have

object 1: lvalue T1, address X, sizeof(T1), effective type: T1
object 2: lvalue int, address X, sizeof(int), effective type: int

The effective type in the access to object 1 was taken from the type of the
"lvalue" used for the access. For object 2, the effective type was used that
were set by the write in "*a_y = 2". Now for your simplification, before
executing the "return ((T1*)p)->y" you have for the preceeding write:

object 1: address X, sizeof(T2), effective type: T2
object 2: address X, sizeof(int), affective type: int

Now you are doing a member access in the return statement accessing the
first object using an "lvalue" of type T1 but the object has effective type
T1, violating the aliasing rule.

Reply With Quote
  #82 (permalink)  
Old 02-06-2011, 10:03 PM
Joshua Maurice
Guest
 
Posts: n/a
Default Re: Is the aliasing rule symmetric?

On Feb 6, 2:36*pm, "Johannes Schaub (litb)"
<schaub.johan...@googlemail.com> wrote:
> Joshua Maurice wrote:
> > On Feb 6, 8:55 am, "Johannes Schaub (litb)"
> > <schaub.johan...@googlemail.com> wrote:
> >> In particular, I think the committee intends the spec to say that a
> >> struct or union access expression involves an access with the struct or
> >> union lvalue.

>
> >> T1 *p = malloc(sizeof *p);
> >> p->x = 0;

>
> >> In this case, I think the committee's intent is that the object pointed
> >> to by "p" is accesse by an lvalue of type T1, and so the effective type
> >> of the object containing the int changes to T1. So a later cast and
> >> access by an lvalue of T2 will be undefined behavior.

>
> > I think this is also the only sensible interpretation of the
> > committee's intent. That is
> > * p->y = 2;
> > is not equivalent to
> > * * (int*) (((char*)p) + offsetof(T1, y)) = 2;
> > That is, the apparently only sensible way out is: the first somehow
> > participates in unwritten rules to make a T1 object, and the offsetof
> > way does not.

>
> > I wonder where they want to draw the difference. Let this be the
> > context for the following questions:
> > * #include <stddef.h>
> > * #include <stdlib.h>

>
> > * typedef struct T1 { int x; int y; } T1;
> > * typedef struct T2 { int x; int y; } T2;

>
> > * int main()
> > * {
> > * * void* p = malloc(sizeof(T1));
> > * * /* ... */
> > * }

>
> > Consider the subsequent alterations. Let's start with the simple:
> > * T1* a = (T1*) p;
> > * a->y = 2;
> > * return a->y;
> > Now, changing it to the following shouldn't give it UB.
> > * T1* a = (T1*) p;
> > * T2* b = (T2*) p;
> > * a->y = 2;
> > * return a->y;
> > Let's add an explicit temporarily variable as follows.
> > * T1* a = (T1*) p;
> > * T2* b = (T2*) p;
> > * int* a_y = & a->y;
> > * *a_y = 2;
> > * return a->y;
> > Let's add another variable.
> > * T1* a = (T1*) p;
> > * T2* b = (T2*) p;
> > * int* a_y = & a->y;
> > * int* b_y = & b->y;
> > * *a_y = 2;
> > * return a->y;
> > Ok, up to this point, I'm pretty sure everyone would agree that we
> > have no UB. Now, let's take that one dubious step, and transform the
> > above to:
> > * T1* a = (T1*) p;
> > * T2* b = (T2*) p;
> > * int* a_y = (int*) (((char*)a) + offsetof(T1, y));
> > * int* b_y = (int*) (((char*)b) + offsetof(T2, y));
> > * *a_y = 2;
> > * return a->y;
> > Quick change, replacing some of the *"a" and "b" with "p":
> > * T1* a = (T1*) p;
> > * T2* b = (T2*) p;
> > * int* a_y = (int*) (((char*)p) + offsetof(T1, y));
> > * int* b_y = (int*) (((char*)p) + offsetof(T2, y));
> > * *a_y = 2;
> > * return a->y;
> > Now we have a problem, because on any sane implementation,
> > offsetof(T1, y) == offsetof(T2, y), which means for most
> > implementations I can transform it to:
> > * T1* a = (T1*) p;
> > * T2* b = (T2*) p;
> > * int* a_y = (int*) (((char*)p) + offsetof(T2, y));
> > * int* b_y = (int*) (((char*)p) + offsetof(T2, y));
> > * *a_y = 2;
> > * return a->y;
> > and reverse the dubious step to get:
> > * T1* a = (T1*) p;
> > * T2* b = (T2*) p;
> > * int* a_y = & b->y;
> > * int* b_y = & b->y;
> > * *a_y = 2;
> > * return a->y;
> > simplify a bit:
> > * ((T2*)p)->y = 2;
> > * return ((T1*)p)->y;
> > And we're done.

>
> I think I'm missing something. This last simplification does not seem to be
> valid according to the intent. In the unsimplified code, before executing
> the "return a->y" you have for read access to "*a_y":


I think that you need to look at it again. "a_y" originally held the
result of "& a->y ", but I slowly transformed it to hold the result of
"& b->y ".

The longer version of that one-step simplification is:
T1* a = (T1*) p;
T2* b = (T2*) p;
int* a_y = & b->y;
int* b_y = & b->y;
*a_y = 2;
return a->y;
simplifies to:
T1* a = (T1*) p;
T2* b = (T2*) p;
int* a_y = & b->y;
*a_y = 2;
return a->y;
simplifies to:
int* a_y = & ((T2*)p)->y;
*a_y = 2;
return ((T1*)p)->y;
simplifies to:
((T2*)p)->y = 2;
return ((T1*)p)->y;
Reply With Quote
  #83 (permalink)  
Old 02-06-2011, 10:34 PM
Johannes Schaub (litb)
Guest
 
Posts: n/a
Default Re: Is the aliasing rule symmetric?

Joshua Maurice wrote:

> On Feb 6, 2:36 pm, "Johannes Schaub (litb)"
> <schaub.johan...@googlemail.com> wrote:
>> Joshua Maurice wrote:
>> > On Feb 6, 8:55 am, "Johannes Schaub (litb)"
>> > <schaub.johan...@googlemail.com> wrote:
>> >> In particular, I think the committee intends the spec to say that a
>> >> struct or union access expression involves an access with the struct
>> >> or union lvalue.

>>
>> >> T1 *p = malloc(sizeof *p);
>> >> p->x = 0;

>>
>> >> In this case, I think the committee's intent is that the object
>> >> pointed to by "p" is accesse by an lvalue of type T1, and so the
>> >> effective type of the object containing the int changes to T1. So a
>> >> later cast and access by an lvalue of T2 will be undefined behavior.

>>
>> > I think this is also the only sensible interpretation of the
>> > committee's intent. That is
>> > p->y = 2;
>> > is not equivalent to
>> > * (int*) (((char*)p) + offsetof(T1, y)) = 2;
>> > That is, the apparently only sensible way out is: the first somehow
>> > participates in unwritten rules to make a T1 object, and the offsetof
>> > way does not.

>>
>> > I wonder where they want to draw the difference. Let this be the
>> > context for the following questions:
>> > #include <stddef.h>
>> > #include <stdlib.h>

>>
>> > typedef struct T1 { int x; int y; } T1;
>> > typedef struct T2 { int x; int y; } T2;

>>
>> > int main()
>> > {
>> > void* p = malloc(sizeof(T1));
>> > /* ... */
>> > }

>>
>> > Consider the subsequent alterations. Let's start with the simple:
>> > T1* a = (T1*) p;
>> > a->y = 2;
>> > return a->y;
>> > Now, changing it to the following shouldn't give it UB.
>> > T1* a = (T1*) p;
>> > T2* b = (T2*) p;
>> > a->y = 2;
>> > return a->y;
>> > Let's add an explicit temporarily variable as follows.
>> > T1* a = (T1*) p;
>> > T2* b = (T2*) p;
>> > int* a_y = & a->y;
>> > *a_y = 2;
>> > return a->y;
>> > Let's add another variable.
>> > T1* a = (T1*) p;
>> > T2* b = (T2*) p;
>> > int* a_y = & a->y;
>> > int* b_y = & b->y;
>> > *a_y = 2;
>> > return a->y;
>> > Ok, up to this point, I'm pretty sure everyone would agree that we
>> > have no UB. Now, let's take that one dubious step, and transform the
>> > above to:
>> > T1* a = (T1*) p;
>> > T2* b = (T2*) p;
>> > int* a_y = (int*) (((char*)a) + offsetof(T1, y));
>> > int* b_y = (int*) (((char*)b) + offsetof(T2, y));
>> > *a_y = 2;
>> > return a->y;
>> > Quick change, replacing some of the "a" and "b" with "p":
>> > T1* a = (T1*) p;
>> > T2* b = (T2*) p;
>> > int* a_y = (int*) (((char*)p) + offsetof(T1, y));
>> > int* b_y = (int*) (((char*)p) + offsetof(T2, y));
>> > *a_y = 2;
>> > return a->y;
>> > Now we have a problem, because on any sane implementation,
>> > offsetof(T1, y) == offsetof(T2, y), which means for most
>> > implementations I can transform it to:
>> > T1* a = (T1*) p;
>> > T2* b = (T2*) p;
>> > int* a_y = (int*) (((char*)p) + offsetof(T2, y));
>> > int* b_y = (int*) (((char*)p) + offsetof(T2, y));
>> > *a_y = 2;
>> > return a->y;
>> > and reverse the dubious step to get:
>> > T1* a = (T1*) p;
>> > T2* b = (T2*) p;
>> > int* a_y = & b->y;
>> > int* b_y = & b->y;
>> > *a_y = 2;
>> > return a->y;
>> > simplify a bit:
>> > ((T2*)p)->y = 2;
>> > return ((T1*)p)->y;
>> > And we're done.

>>
>> I think I'm missing something. This last simplification does not seem to
>> be valid according to the intent. In the unsimplified code, before
>> executing the "return a->y" you have for read access to "*a_y":

>
> I think that you need to look at it again. "a_y" originally held the
> result of "& a->y ", but I slowly transformed it to hold the result of
> "& b->y ".
>


I think "& a->y" and "& b->y" are exactly equivalent.

> The longer version of that one-step simplification is:
> T1* a = (T1*) p;
> T2* b = (T2*) p;
> int* a_y = & b->y;
> int* b_y = & b->y;
> *a_y = 2;
> return a->y;
> simplifies to:
> T1* a = (T1*) p;
> T2* b = (T2*) p;
> int* a_y = & b->y;
> *a_y = 2;
> return a->y;
> simplifies to:
> int* a_y = & ((T2*)p)->y;
> *a_y = 2;
> return ((T1*)p)->y;
> simplifies to:
> ((T2*)p)->y = 2;
> return ((T1*)p)->y;


The last simplification in this longer version is invalid, I think. Prior to
the "return", in the second last version, you access one object and that
object only by an lvalue of type "int". In the second version prior to the
return (where you do a write access), you access two objects, the first of
which with an lvalue of type "T2" (and changes the effective type to that)
and the second of which by an lvalue of type int.

Reply With Quote
  #84 (permalink)  
Old 02-06-2011, 10:54 PM
Ben Bacarisse
Guest
 
Posts: n/a
Default Re: Is the aliasing rule symmetric?

"Johannes Schaub (litb)" <schaub.johannes@googlemail.com> writes:

> Joshua Maurice wrote:

<snip>
>> #include <stddef.h>
>> #include <stdlib.h>
>>
>> typedef struct T1 { int x; int y; } T1;
>> typedef struct T2 { int x; int y; } T2;
>>
>> int main()
>> {
>> void* p = malloc(sizeof(T1));
>> /* ... */
>> }

<snip (but I think I've kept the part that matters for my comment)>

>> ((T2*)p)->y = 2;
>> return ((T1*)p)->y;

<snip>

> for the return access you have
>
> object 1: lvalue T1, address X, sizeof(T1), effective type: T1
> object 2: lvalue int, address X, sizeof(int), effective type: int
>
> The effective type in the access to object 1 was taken from the type of the
> "lvalue" used for the access.


I can't see any lvalue of type T1 in the return expression. The whole
malloced object never gets an effective type as far as I can see. I
note the "scare quotes" so maybe you have some slightly different
meaning for lvalue here.

There are only two things here that are lvalue expressions: 'p' and
'((T1*)p)->y'. One has type void * and the other has type int. Only
this second lvalue expression is used to access the object in question
(access to the pointer object 'p' is not at issue).

Just to clarify, a cast expression is not a lvalue and even if it were
the type of (T1 *)p is T1 * not T1. Also, in C, E->M is not defined to
be the same as (*E).M or there would certainly be an access via an
lvalue expression of type T1.

<snip>
--
Ben.
Reply With Quote
  #85 (permalink)  
Old 02-07-2011, 03:17 AM
Wojtek Lerch
Guest
 
Posts: n/a
Default Re: Is the aliasing rule symmetric?

On 06/02/2011 6:54 PM, Ben Bacarisse wrote:
> Also, in C, E->M is not defined to
> be the same as (*E).M or there would certainly be an access via an
> lvalue expression of type T1.


Why? Are you saying that whenever an lvalue expression such as S.M is
evaluated, it counts not only as an access to the member but also an
access to the whole structure? (Except, I assume, in a context where it
does not access an object at all, such as in &S.M?)

Or do you have something more subtle in mind, maybe along the lines that
the expression S.M accesses only the member, but it accesses it "via" an
lvalue expression of the structure type, without accessing the whole
structure, because the struct lvalue is a subexpression of the lvalue
designating the object actualy accessed?

Reply With Quote
  #86 (permalink)  
Old 02-07-2011, 04:22 AM
Joshua Maurice
Guest
 
Posts: n/a
Default Re: Is the aliasing rule symmetric?

On Feb 6, 8:17*pm, Wojtek Lerch <wojte...@yahoo.ca> wrote:
> On 06/02/2011 6:54 PM, Ben Bacarisse wrote:
>
> > * Also, in C, E->M is not defined to
> > be the same as (*E).M or there would certainly be an access via an
> > lvalue expression of type T1.

>
> Why? *Are you saying that whenever an lvalue expression such as S.M is
> evaluated, it counts not only as an access to the member but also an
> access to the whole structure? *(Except, I assume, in a context where it
> does not access an object at all, such as in &S.M?)
>
> Or do you have something more subtle in mind, maybe along the lines that
> the expression S.M accesses only the member, but it accesses it "via" an
> lvalue expression of the structure type, without accessing the whole
> structure, because the struct lvalue is a subexpression of the lvalue
> designating the object actualy accessed?


Specifically, with regards to POSIX pthreads race conditions, and the
volatile rules, is there a difference between
* a->x = 1;
and
(*a).x = 1;
?

That would be kind of funny if there was a difference, where one would
cause more volatile reads or writes than the other, or where one would
could have a race condition but the other could not.
Reply With Quote
  #87 (permalink)  
Old 02-07-2011, 05:16 AM
Joshua Maurice
Guest
 
Posts: n/a
Default Re: Is the aliasing rule symmetric?

On Feb 6, 3:34*pm, "Johannes Schaub (litb)"
<schaub.johan...@googlemail.com> wrote:
> Joshua Maurice wrote:
> > On Feb 6, 2:36 pm, "Johannes Schaub (litb)"
> > <schaub.johan...@googlemail.com> wrote:
> >> Joshua Maurice wrote:
> >> > On Feb 6, 8:55 am, "Johannes Schaub (litb)"
> >> > <schaub.johan...@googlemail.com> wrote:
> >> >> In particular, I think the committee intends the spec to say that a
> >> >> struct or union access expression involves an access with the struct
> >> >> or union lvalue.

>
> >> >> T1 *p = malloc(sizeof *p);
> >> >> p->x = 0;

>
> >> >> In this case, I think the committee's intent is that the object
> >> >> pointed to by "p" is accesse by an lvalue of type T1, and so the
> >> >> effective type of the object containing the int changes to T1. So a
> >> >> later cast and access by an lvalue of T2 will be undefined behavior..

>
> >> > I think this is also the only sensible interpretation of the
> >> > committee's intent. That is
> >> > p->y = 2;
> >> > is not equivalent to
> >> > * (int*) (((char*)p) + offsetof(T1, y)) = 2;
> >> > That is, the apparently only sensible way out is: the first somehow
> >> > participates in unwritten rules to make a T1 object, and the offsetof
> >> > way does not.

>
> >> > I wonder where they want to draw the difference. Let this be the
> >> > context for the following questions:
> >> > #include <stddef.h>
> >> > #include <stdlib.h>

>
> >> > typedef struct T1 { int x; int y; } T1;
> >> > typedef struct T2 { int x; int y; } T2;

>
> >> > int main()
> >> > {
> >> > void* p = malloc(sizeof(T1));
> >> > /* ... */
> >> > }

>
> >> > Consider the subsequent alterations. Let's start with the simple:
> >> > T1* a = (T1*) p;
> >> > a->y = 2;
> >> > return a->y;
> >> > Now, changing it to the following shouldn't give it UB.
> >> > T1* a = (T1*) p;
> >> > T2* b = (T2*) p;
> >> > a->y = 2;
> >> > return a->y;
> >> > Let's add an explicit temporarily variable as follows.
> >> > T1* a = (T1*) p;
> >> > T2* b = (T2*) p;
> >> > int* a_y = & a->y;
> >> > *a_y = 2;
> >> > return a->y;
> >> > Let's add another variable.
> >> > T1* a = (T1*) p;
> >> > T2* b = (T2*) p;
> >> > int* a_y = & a->y;
> >> > int* b_y = & b->y;
> >> > *a_y = 2;
> >> > return a->y;
> >> > Ok, up to this point, I'm pretty sure everyone would agree that we
> >> > have no UB. Now, let's take that one dubious step, and transform the
> >> > above to:
> >> > T1* a = (T1*) p;
> >> > T2* b = (T2*) p;
> >> > int* a_y = (int*) (((char*)a) + offsetof(T1, y));
> >> > int* b_y = (int*) (((char*)b) + offsetof(T2, y));
> >> > *a_y = 2;
> >> > return a->y;
> >> > Quick change, replacing some of the *"a" and "b" with "p":
> >> > T1* a = (T1*) p;
> >> > T2* b = (T2*) p;
> >> > int* a_y = (int*) (((char*)p) + offsetof(T1, y));
> >> > int* b_y = (int*) (((char*)p) + offsetof(T2, y));
> >> > *a_y = 2;
> >> > return a->y;
> >> > Now we have a problem, because on any sane implementation,
> >> > offsetof(T1, y) == offsetof(T2, y), which means for most
> >> > implementations I can transform it to:
> >> > T1* a = (T1*) p;
> >> > T2* b = (T2*) p;
> >> > int* a_y = (int*) (((char*)p) + offsetof(T2, y));
> >> > int* b_y = (int*) (((char*)p) + offsetof(T2, y));
> >> > *a_y = 2;
> >> > return a->y;
> >> > and reverse the dubious step to get:
> >> > T1* a = (T1*) p;
> >> > T2* b = (T2*) p;
> >> > int* a_y = & b->y;
> >> > int* b_y = & b->y;
> >> > *a_y = 2;
> >> > return a->y;
> >> > simplify a bit:
> >> > ((T2*)p)->y = 2;
> >> > return ((T1*)p)->y;
> >> > And we're done.

>
> >> I think I'm missing something. This last simplification does not seem to
> >> be valid according to the intent. In the unsimplified code, before
> >> executing the "return a->y" you have for read access to "*a_y":

>
> > I think that you need to look at it again. "a_y" originally held the
> > result of "& a->y ", but I slowly transformed it to hold the result of
> > "& b->y ".

>
> I think "& a->y" and "& b->y" are exactly equivalent.
>
>
>
> > The longer version of that one-step simplification is:
> > * T1* a = (T1*) p;
> > * T2* b = (T2*) p;
> > * int* a_y = & b->y;
> > * int* b_y = & b->y;
> > * *a_y = 2;
> > * return a->y;
> > simplifies to:
> > * T1* a = (T1*) p;
> > * T2* b = (T2*) p;
> > * int* a_y = & b->y;
> > * *a_y = 2;
> > * return a->y;
> > simplifies to:
> > * int* a_y = & ((T2*)p)->y;
> > * *a_y = 2;
> > * return ((T1*)p)->y;
> > simplifies to:
> > * ((T2*)p)->y = 2;
> > * return ((T1*)p)->y;

>
> The last simplification in this longer version is invalid, I think. Priorto
> the "return", in the second last version, you access one object and that
> object only by an lvalue of type "int". In the second version prior to the
> return (where you do a write access), you access two objects, the first of
> which with an lvalue of type "T2" (and changes the effective type to that)
> and the second of which by an lvalue of type int.


To be clear, you think that there's a difference between
a->x = 2;
and
int* x = & a->x;
*x = 2;
?

It would take me a long time to buy that.
Reply With Quote
  #88 (permalink)  
Old 02-07-2011, 10:10 AM
Tim Rentsch
Guest
 
Posts: n/a
Default Re: Is the aliasing rule symmetric?

"Johannes Schaub (litb)" <schaub.johannes@googlemail.com> writes:

>[snip]
>
> In particular, I think the committee intends the spec to say that a struct
> or union access expression involves an access with the struct or union
> lvalue.
>
> T1 *p = malloc(sizeof *p);
> p->x = 0;
>
> In this case, I think the committee's intent is that the object pointed to
> by "p" is accesse by an lvalue of type T1, and so the effective type of the
> object containing the int changes to T1. So a later cast and access by an
> lvalue of T2 will be undefined behavior.


I'm not aware of any evidence that supports this theory (ie,
that using '.' or '->' is also an access for the left operand).
Furthermore it seems to be in conflict with the definitions the
Standard gives for access, value, etc.

Do you have any such evidence to offer? Or are you simply
stating an unsupported opinion?
Reply With Quote
  #89 (permalink)  
Old 02-07-2011, 10:39 AM
Tim Rentsch
Guest
 
Posts: n/a
Default Re: Is the aliasing rule symmetric?

Joshua Maurice <joshuamaurice@gmail.com> writes:

> On Feb 6, 8:55*am, "Johannes Schaub (litb)"
>> [snip]
>>
>> In particular, I think the committee intends the spec to say that a struct
>> or union access expression involves an access with the struct or union
>> lvalue.
>>
>> * * T1 *p = malloc(sizeof *p);
>> * * p->x = 0;
>>
>> In this case, I think the committee's intent is that the object pointed to
>> by "p" is accesse by an lvalue of type T1, and so the effective type of the
>> object containing the int changes to T1. So a later cast and access by an
>> lvalue of T2 will be undefined behavior.

>
> I think this is also the only sensible interpretation of the
> committee's intent. [snip elaboration]


What I think you're trying to say is that this interpretation is
the only one that makes sense, and therefore must be what the
committee intended. (We don't know what the committee intended,
so there is no way to judge whether a particular interpretation
is the only sensible one, or indeed whether there is _any_
sensible meaning for what they intended.)

Regardless of what the committee might or might not have
intended, there certainly are alternative ways of reading
the standard that make as much sense as this one.
Reply With Quote
  #90 (permalink)  
Old 02-07-2011, 11:01 AM
Tim Rentsch
Guest
 
Posts: n/a
Default Re: Is the aliasing rule symmetric?

Ben Bacarisse <ben.usenet@bsb.me.uk> writes:

> [snip]
> Also, in C, E->M is not defined to be the same as (*E).M


It's true that they aren't defined to be the same (and as you
point out a cast expression is not an lvalue), but there is a
sequence of equivalences (using '===' to mean "equivalent"):

(&E)->MOS === E.MOS // by footnote 83
(&(*P))->MOS === (*P).MOS // substituting (*P) for E
P->MOS === (*P).MOS // 6.5.3.2p3

Reply With Quote
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off




All times are GMT. The time now is 10:55 PM.


Copyright ©2009

LinkBacks Enabled by vBSEO 3.3.0 RC2 © 2009, Crawlability, Inc.