Go Back   Rhinocerus > Newsgroup > Newsgroup comp.lang.* 1 > Newsgroup comp.lang.pl1

Reply
 
Thread Tools Display Modes
  #1 (permalink)  
Old 05-15-2012, 09:18 PM
Thomas David Rivers
Guest
 
Posts: n/a
Default Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

Robert AH Prins wrote:

...

>
> TEN superfluous reloads of R1? AD 2012? How the fluffing H can you
> call this an optimizing compiler? How can someone from IBM tell you
> (i.e. me, two years ago!) that "we are at least five years ahead of
> the competition"?
>

I was going to suggest checking the OPT() option setting, but I see in
your original
post that you specified OPT(2) in the IBM V2.3.0 compiler and OPT(3) in
the V3R9 compiler.
So, that's not it... OPT(3) is about as "OPT" as you can get...

This leads me to the my next question - just what "competition" does IBM
point to in the mainframe PL/I compiler
business?

And, we actually were the first compiler vendor with a 64-bit
offering... well before IBM... (in the C/C++ world.)
So, IBM is not always the first, or the best. (Our new v1.96 compiler
compares quite favorably to
IBM's, we think.)

Lastly - would you care to post the source to your example? Or, at
least the declarations of
"rept_line" and "rept_list"... wouldn't mind playing with this one myself...

- Dave Rivers -

--
rivers@dignus.com Work: (919) 676-0847
Get your mainframe programming tools at http://www.dignus.com
Reply With Quote
Alt Today
Advertising
 
and become member of Rhinocerus
Standard Sponsored Links

  #2 (permalink)  
Old 05-15-2012, 10:07 PM
Robert AH Prins
Guest
 
Posts: n/a
Default Comparison of compiler generated code AD 1980(ish) v 2010(ish)

Can anyone skilled in the art tell me why a compiler that probably dates back to
the late 1970'ies or early 1980'ies generates the following short and sweet code
for a PL/I "BY NAME" assignment, while the not completely new (but still fairly
recent) version of Enterprise PL/I (V3R9) generates the very, very, very
long-winded code below it? Or is this (V3R9) code (that predates the OOO z196
architecture) really faster?

OS PL/I V2.3.0 - OPT(2)
343 1 2 REPT_LINE = REPT_LIST, BY NAME;

* STATEMENT NUMBER 343
002664 58 70 8 268 L 7,REPT_WORK.LINE_PTR
002668 58 60 8 030 L 6,REPT_WORK.REPT_PTR
00266C 58 F0 3 600 L 15,1536(0,3)
002670 D2 03 7 003 F B54 MVC REPT_LINE.TR(4),2900(15)
002676 DE 03 7 003 6 00C ED REPT_LINE.TR(4),REPT_LIST.TR
00267C D2 03 7 00A F B54 MVC REPT_LINE.RE(4),2900(15)
002682 DE 03 7 00A 6 00E ED REPT_LINE.RI(4),REPT_LIST.RI
002688 D2 02 7 011 6 010 MVC REPT_LINE.DA(3),REPT_LIST.DA
00268E 58 E0 3 608 L 14,1544(0,3)
002692 D2 06 4 158 E 5D4 MVC 344(7,4),1492(14)
002698 DE 06 4 158 6 014 ED 344(7,4),REPT_LIST.K+1
00269E D2 05 7 017 4 159 MVC REPT_LINE.K(6),345(4)
0026A4 D2 06 4 158 E 5D4 MVC 344(7,4),1492(14)
0026AA DE 06 4 158 6 01B ED 344(7,4),REPT_LIST.V
0026B0 D2 04 7 028 4 15A MVC REPT_LINE.V(5),346(4)
0026B6 D2 03 7 030 6 026 MVC REPT_LINE.NA(4),REPT_LIST.NA
0026BC D2 03 7 036 6 02A MVC REPT_LINE.TY(4),REPT_LIST.TY
0026C2 D2 03 7 03D 6 02E MVC REPT_LINE.CO(4),REPT_LIST.CO
0026C8 D2 00 7 04B 6 036 MVC REPT_LINE.SP(1),REPT_LIST.SP
0026CE D2 03 7 05F 6 043 MVC REPT_LINE.DATE.YEAR(4),REPT_LIST.DATE.YEAR
0026D4 D2 01 7 064 6 047 MVC REPT_LINE.DATE.MONTH(2),REPT_LIST.DATE.MONTH
0026DA D2 01 7 067 6 049 MVC REPT_LINE.DATE.DAY(2),REPT_LIST.DATE.DAY

Enterprise PL/I for z/OS V3.R9.M0 (Built:20100923) - OPT(3)
3120.0 368 1 2 rept_line = rept_list, by name;

003E40 E350 D340 0624 003120 | STG r5,#SPILL33(,r13,25408)
003E46 E320 D270 0624 003120 | STG r2,#SPILL7(,r13,25200)
003E4C E350 D8FD 0571 003120 | LAY r5,_temp9(,r13,22781)
003E52 E300 D368 0604 003120 | LG r0,#SPILL38(,r13,25448)
003E58 E340 D308 0624 003120 | STG r4,#SPILL26(,r13,25352)
003E5E E310 D4B4 0271 003119 | LAY r1,LINE(,r13,9396)
003E64 E300 D8FC 0550 003120 | STY r0,_temp9(,r13,22780)
003E6A E300 D148 0214 003120 | LGF r0,<a1:d8520:l4>(,r13,8520)
003E70 D278 1000 4D33 003119 | MVC LINE(121,r1,0),REPT_INIT(r4,3379)
003E76 4110 E00C 003120 | LA r1,_shadow21(,r14,12)
003E7A E3E0 D8FC 0571 003120 | LAY r14,_temp9(,r13,22780)
003E80 DE03 E000 1000 003120 | ED _temp9(4,r14,0),_shadow21(r1,0)
003E86 B914 00E0 003120 | LGFR r14,r0
003E8A E300 D368 0604 003120 | LG r0,#SPILL38(,r13,25448)
003E90 4110 E003 003120 | LA r1,#AddressShadow(,r14,3)
003E94 41F0 E00A 003120 | LA r15,#AddressShadow(,r14,10)
003E98 D202 1001 5000 003120 | MVC _shadow21(3,r1,1),_temp9(r5,0)
003E9E 9240 E003 003120 | MVI _shadow21(r14,3),64
003EA2 E310 DF10 0158 003120 | LY r1,<a1:d7952:l4>(,r13,7952)
003EA8 E300 D984 0550 003120 | STY r0,_temp8(,r13,22916)
003EAE E350 D984 0571 003120 | LAY r5,_temp8(,r13,22916)
003EB4 4120 E017 003120 | LA r2,#AddressShadow(,r14,23)
003EB8 4110 100E 003120 | LA r1,_shadow21(,r1,14)
003EBC DE03 5000 1000 003120 | ED _temp8(4,r5,0),_shadow21(r1,0)
003EC2 E310 D985 0571 003120 | LAY r1,_temp8(,r13,22917)
003EC8 4140 E028 003120 | LA r4,#AddressShadow(,r14,40)
003ECC D202 F001 1000 003120 | MVC _shadow21(3,r15,1),_temp8(r1,0)
003ED2 9240 E00A 003120 | MVI _shadow21(r14,10),64
003ED6 E310 DF10 0158 003120 | LY r1,<a1:d7952:l4>(,r13,7952)
003EDC E3F0 D974 0571 003120 | LAY r15,_temp19(,r13,22900)
003EE2 D202 E011 1010 003120 | MVC _shadow21(3,r14,17),_shadow21(r1,16)
003EE8 E310 D238 0604 003120 | LG r1,#SPILL0(,r13,25144)
003EEE D206 F000 14A4 003120 | MVC _temp19(7,r15,0),' ......'(r1,1188)
003EF4 E310 DF10 0158 003120 | LY r1,<a1:d7952:l4>(,r13,7952)
003EFA D203 B95C 1013 003120 | MVC _temp15(4,r11,2396),_shadow18(r1,19)
003F00 E310 D90C 0571 003120 | LAY r1,_temp15(,r13,22796)
003F06 D202 B93C 1001 003120 | MVC _temp11(3,r11,2364),_shadow12(r1,1)
003F0C E310 D8EC 0571 003120 | LAY r1,_temp11(,r13,22764)
003F12 DE06 F000 1000 003120 | ED _temp19(7,r15,0),_temp11(r1,0)
003F18 E310 D975 0571 003120 | LAY r1,_temp19(,r13,22901)
003F1E D205 2000 1000 003120 | MVC _shadow21(6,r2,0),_temp19(r1,0)
003F24 E310 D238 0604 003120 | LG r1,#SPILL0(,r13,25144)
003F2A E320 D96C 0571 003120 | LAY r2,_temp21(,r13,22892)
003F30 D206 2000 14A4 003120 | MVC _temp21(7,r2,0),' ......'(r1,1188)
003F36 E310 DF10 0158 003120 | LY r1,<a1:d7952:l4>(,r13,7952)
003F3C D202 B939 101B 003120 | MVC _temp18(3,r11,2361),_shadow12(r1,27)
003F42 D202 B936 B939 003120 | MVC _temp20(3,r11,2358),_temp18(r11,2361)
003F48 E300 D8E6 0590 003120 | LLGC r0,<a1:d22758:l1>(,r13,22758)
003F4E E300 30EE 0080 003120 | NG r0,=X'00000000 0000000F'
003F54 E310 D8E6 0571 003120 | LAY r1,_temp20(,r13,22758)
003F5A E300 D8E6 0572 003120 | STCY r0,<a1:d22758:l1>(,r13,22758)
003F60 DE06 2000 1000 003120 | ED _temp21(7,r2,0),_temp20(r1,0)
003F66 E320 D96E 0571 003120 | LAY r2,_temp21(,r13,22894)
003F6C D204 4000 2000 003120 | MVC _shadow21(5,r4,0),_temp21(r2,0)
003F72 E310 DF10 0158 003120 | LY r1,<a1:d7952:l4>(,r13,7952)
003F78 E300 1026 0014 003120 | LGF r0,_shadow19(,r1,38)
003F7E 5000 E030 003120 | ST r0,_shadow19(,r14,48)
003F82 E310 DF10 0158 003120 | LY r1,<a1:d7952:l4>(,r13,7952)
003F88 E300 102A 0014 003120 | LGF r0,_shadow19(,r1,42)
003F8E 5000 E036 003120 | ST r0,_shadow19(,r14,54)
003F92 E310 DF10 0158 003120 | LY r1,<a1:d7952:l4>(,r13,7952)
003F98 E300 102E 0014 003120 | LGF r0,_shadow19(,r1,46)
003F9E 5000 E03D 003120 | ST r0,_shadow19(,r14,61)
003FA2 E310 DF10 0158 003120 | LY r1,<a1:d7952:l4>(,r13,7952)
003FA8 4300 1036 003120 | IC r0,_shadow21(,r1,54)
003FAC 4200 E04B 003120 | STC r0,_shadow21(,r14,75)
003FB0 E310 DF10 0158 003120 | LY r1,<a1:d7952:l4>(,r13,7952)
003FB6 E300 1043 0014 003120 | LGF r0,_shadow19(,r1,67)
003FBC 5000 E05F 003120 | ST r0,_shadow19(,r14,95)
003FC0 E310 DF10 0158 003120 | LY r1,<a1:d7952:l4>(,r13,7952)
003FC6 E300 1047 0015 003120 | LGH r0,_shadow20(,r1,71)
003FCC 4000 E064 003120 | STH r0,_shadow20(,r14,100)
003FD0 E310 DF10 0158 003120 | LY r1,<a1:d7952:l4>(,r13,7952)
003FD6 E340 D9A8 0571 003121 | LAY r4,_temp12(,r13,22952)
003FDC E320 D270 0604 000000 | LG r2,#SPILL7(,r13,25200)
003FE2 E300 1049 0015 003120 | LGH r0,_shadow20(,r1,73)
003FE8 4000 E067 003120 | STH r0,_shadow20(,r14,103)

TEN superfluous reloads of R1? AD 2012? How the fluffing H can you call this an
optimizing compiler? How can someone from IBM tell you (i.e. me, two years ago!)
that "we are at least five years ahead of the competition"?

Oh, maybe it's because Enterprise PL/I is a direct descendant from Visual Age
PL/I for OS/2, a compiler that had to work on a CPU with just a dozen available
registers? Let's see what PL/I for Windows generates?

IBM(R) PL/I for Windows 8.0 (Built:20110825)
; 3132 rept_line = rept_list, by name;
mov ecx,[ebp-03680h]; REPT_WORK
mov [ebp-05938h],ecx; _temp67
push offset FLAT:@CBE273
add ecx,03h
mov edi,offset FLAT:@CBE213
mov edx,edi
mov [ebp-05a38h],edi; @CBE390
add eax,0ch
sub esp,0ch
mov edi,dword ptr __imp__IBMPCODP
call edi
mov edx,[ebp-05a38h]; @CBE390
push offset FLAT:@CBE273
mov eax,[ebp-05938h]; _temp67
lea ecx,[eax+0ah]
mov eax,[ebp-038b8h]; REPT_WORK
add eax,0eh
sub esp,0ch
call edi
mov eax,[ebp-05938h]; _temp67
mov edx,[ebp-038b8h]; REPT_WORK
add edx,010h
mov cx,[edx]
mov dl,[edx+02h]
mov [eax+013h],dl
mov [eax+011h],cx
push offset FLAT:@CBE58
lea ecx,[eax+017h]
mov edx,offset FLAT:@CBE224
mov eax,[ebp-038b8h]; REPT_WORK
add eax,013h
sub esp,0ch
call edi
mov eax,[ebp-05938h]; _temp67
push offset FLAT:@CBE27
lea ecx,[eax+028h]
mov edx,offset FLAT:@CBE218
mov eax,[ebp-038b8h]; REPT_WORK
add eax,01bh
sub esp,0ch
call edi
mov eax,[ebp-05938h]; _temp67
mov ecx,[ebp-038b8h]; REPT_WORK
mov ecx,[ecx+026h]
mov [eax+030h],ecx
mov ecx,[ebp-038b8h]; REPT_WORK
mov ecx,[ecx+02ah]
mov [eax+036h],ecx
mov ecx,[ebp-038b8h]; REPT_WORK
mov ecx,[ecx+02eh]
mov [eax+03dh],ecx
mov ecx,[ebp-038b8h]; REPT_WORK
mov cl,[ecx+036h]
mov [eax+04bh],cl
mov ecx,[ebp-038b8h]; REPT_WORK
mov ecx,[ecx+043h]
mov [eax+05fh],ecx
mov ecx,[ebp-038b8h]; REPT_WORK
mov cx,[ecx+047h]
mov [eax+064h],cx
mov ecx,[ebp-038b8h]; REPT_WORK
mov cx,[ecx+049h]
mov [eax+067h],cx

Wow! The code ends with the same six superfluous reloads, as ECX is needlessly
overwritten - why not use EDX?

Again, I'm only the observer, it's you and your companies that are paying for
the extra(?) CPU usage, and maybe a 16-byte three-instruction sequence like

003FC0 E310 DF10 0158 003120 | LY r1,<a1:d7952:l4>(,r13,7952)
003FC6 E300 1047 0015 003120 | LGH r0,_shadow20(,r1,71)
003FCC 4000 E064 003120 | STH r0,_shadow20(,r14,100)

is really faster than the simple 6-byte one-instruction sequence

0026D4 D2 01 7 064 6 047 MVC REPT_LINE.DATE.MONTH(2),REPT_LIST.DATE.MONTH

I always thought that the fastest instructions are those ones that are never
executed...

Robert
--
Robert AH Prins
robert(a)prino(d)org
Reply With Quote
  #3 (permalink)  
Old 05-16-2012, 12:47 AM
Shmuel (Seymour J.) Metz
Guest
 
Posts: n/a
Default Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

In <a1frcjFggcU1@mid.individual.net>, on 05/15/2012
at 10:07 PM, Robert AH Prins <spamtrap@prino.org> said:

>TEN superfluous reloads of R1? AD 2012?


I guess that peephole optimization is too recent.

>How the fluffing H can you call this an optimizing compiler?
>How can someone from IBM tell you (i.e. me, two years ago!)
>that "we are at least five years ahead of the competition"?


Their lips move.

Have they at least fixed it to use inline code for unaligned bit
strings with constant offsets and lengths. e.g., for SMF records?

--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to spamtrap@library.lspace.org

Reply With Quote
  #4 (permalink)  
Old 05-16-2012, 02:07 PM
robin.vowels@gmail.com
Guest
 
Posts: n/a
Default Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

On Wednesday, 16 May 2012 08:07:51 UTC+10, Robert AH Prins wrote:

> OS PL/I V2.3.0 - OPT(2)
> 343 1 2 REPT_LINE = REPT_LIST, BY NAME;
>
> * STATEMENT NUMBER 343



> Enterprise PL/I for z/OS V3.R9.M0 (Built:20100923) - OPT(3)
> 3120.0 368 1 2 rept_line = rept_list, by name;


> IBM(R) PL/I for Windows 8.0 (Built:20110825)
> ; 3132 rept_line = rept_list, by name;


They are three different programs.
Reply With Quote
  #5 (permalink)  
Old 05-16-2012, 04:37 PM
Robert AH Prins
Guest
 
Posts: n/a
Default Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

On 2012-05-16 14:07, robin.vowels@gmail.com wrote:
> On Wednesday, 16 May 2012 08:07:51 UTC+10, Robert AH Prins wrote:
>
>> OS PL/I V2.3.0 - OPT(2)
>> 343 1 2 REPT_LINE = REPT_LIST, BY NAME;
>>
>> * STATEMENT NUMBER 343

>
>
>> Enterprise PL/I for z/OS V3.R9.M0 (Built:20100923) - OPT(3)
>> 3120.0 368 1 2 rept_line = rept_list, by name;

>
>> IBM(R) PL/I for Windows 8.0 (Built:20110825)
>> ; 3132 rept_line = rept_list, by name;

>
> They are three different programs.


Those of you who have used OS PL/I, Enterprise PL/I and PL/I for Windows know that Enterprise PL/I
now bases the statement numbers of the pseudo-assembler listing on line-numbers and that "if..then"
now counts as two statements rather than one. PL/I for doze also bases its statement numbers on the
line of the source, but on z/OS a version number comment-line is added by the compile procedure, and
the z/OS compile was done with listview(afterall) whereas the doze compilation missed the
(irrelevant) extra comment line and used listview(source).

Anyway, of course this is the same program, but sadly RV seems to enjoy the board for his head too
much to actually investigate the matter, a bold "They are three different programs." is much easier.

In an off-list message I have told him what to do.

Robert
--
Robert AH Prins
robert(a)prino(d)org
Reply With Quote
  #6 (permalink)  
Old 05-16-2012, 04:49 PM
Robert AH Prins
Guest
 
Posts: n/a
Default Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

On 2012-05-16 16:37, Robert AH Prins wrote:
> On 2012-05-16 14:07, robin.vowels@gmail.com wrote:
>> On Wednesday, 16 May 2012 08:07:51 UTC+10, Robert AH Prins wrote:
>>
>>> OS PL/I V2.3.0 - OPT(2)
>>> 343 1 2 REPT_LINE = REPT_LIST, BY NAME;
>>>
>>> * STATEMENT NUMBER 343

>>
>>
>>> Enterprise PL/I for z/OS V3.R9.M0 (Built:20100923) - OPT(3)
>>> 3120.0 368 1 2 rept_line = rept_list, by name;

>>
>>> IBM(R) PL/I for Windows 8.0 (Built:20110825)
>>> ; 3132 rept_line = rept_list, by name;

>>
>> They are three different programs.

>
> Those of you who have used OS PL/I, Enterprise PL/I and PL/I for Windows know that Enterprise PL/I
> now bases the statement numbers of the pseudo-assembler listing on line-numbers and that "if..then"
> now counts as two statements rather than one. PL/I for doze also bases its statement numbers on the
> line of the source, but on z/OS a version number comment-line is added by the compile procedure, and
> the z/OS compile was done with listview(afterall) whereas the doze compilation missed the
> (irrelevant) extra comment line and used listview(source).


Piggy-backing on myself: using listview(afterall) will produce a ".cod" file where the PL/I code and
generated x86 assembler are completely mashed up, that's why I don't use this option on doze.

I don't expect this to be fixed anytime soon.

Robert
--
Robert AH Prins
robert(a)prino(d)org
Reply With Quote
  #7 (permalink)  
Old 05-17-2012, 08:38 AM
Nomen Nescio
Guest
 
Posts: n/a
Default Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

That looks very odd to me, almost as if the optimization option didn't get
accepted and it generated pure dumb code. Can you verify the banner shows
that optimization is on and actually got done (however you do that
nowadays). Do you have to specify REORDER even with OPT(x)?

I guess you don't have a real machine to benchmark this on but if you did
running that assign statement in a big loop, if you could write one that
wouldn't get optimized out somehow, might tell you if it's really faster to
execute that pile of crap or not. Whatever you do, don't extrapolate the
results you got on Hercules to what will happen on a real Z9/Z10/Z196 etc.

> TEN superfluous reloads of R1? AD 2012? How the fluffing H can you call this an
> optimizing compiler? How can someone from IBM tell you (i.e. me, two years ago!)
> that "we are at least five years ahead of the competition"?


Because nobody else sells a PL/I compiler for MVS and if they wanted to it
would take 5 years to write one? ;-) But I have to admit this is disturbing
given IBM's PL/I has always had a good reputation for optimizing. I haven't
used it for work since before ESA but the new stuff is not looking too good!

> PL/I for OS/2, a compiler that had to work on a CPU with just a dozen available
> registers? Let's see what PL/I for Windows generates?


x86 doesn't really have a dozen available registers. Many of the so-called
GPRs are reserved for important stuff. You end up with 4 or 5 usable
registers in any heavy duty x86 code. Lots of thrashing is normal in x86
code but it doesn't seem to hurt performance much for some reason.

> Wow! The code ends with the same six superfluous reloads, as ECX is needlessly
> overwritten - why not use EDX?


At least IBM was smart enough to port the code generator from one platform
to the other. Those guys are no dummies! Somebody probably got a big bonus
for that.

> Again, I'm only the observer, it's you and your companies that are paying for
> the extra(?) CPU usage, and maybe a 16-byte three-instruction sequence like
>
> 003FC0 E310 DF10 0158 003120 | LY r1,<a1:d7952:l4>(,r13,7952)
> 003FC6 E300 1047 0015 003120 | LGH r0,_shadow20(,r1,71)
> 003FCC 4000 E064 003120 | STH r0,_shadow20(,r14,100)
>
> is really faster than the simple 6-byte one-instruction sequence
>
> 0026D4 D2 01 7 064 6 047 MVC REPT_LINE.DATE.MONTH(2),REPT_LIST.DATE.MONTH
>
> I always thought that the fastest instructions are those ones that are never
> executed...


I think that's still a safe bet. Thanks for posting this, it was very
enlightening.

Reply With Quote
  #8 (permalink)  
Old 05-17-2012, 04:23 PM
John W Kennedy
Guest
 
Posts: n/a
Default Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

On 2012-05-17 08:38:51 +0000, Nomen Nescio said:
> That looks very odd to me, almost as if the optimization option didn't get
> accepted and it generated pure dumb code. Can you verify the banner shows
> that optimization is on and actually got done (however you do that
> nowadays). Do you have to specify REORDER even with OPT(x)?


Yes, because REORDER breaks the language. The details of PL/I exception
handling, as the language is defined, are violently hostile to
meaningful optimization, which is a big reason that PL/I was unable to
replace FORTRAN, and the main reason that new languages intended for
compilation avoided exception handling until the 80s, when Ada and C++
introduced the try/catch model.

However, it is possible to install the Enterprise compiler in such a
way that REORDER is the default.

> x86 doesn't really have a dozen available registers. Many of the so-called
> GPRs are reserved for important stuff. You end up with 4 or 5 usable
> registers in any heavy duty x86 code. Lots of thrashing is normal in x86
> code but it doesn't seem to hurt performance much for some reason.


x64, of course, improves it massively.

--
John W Kennedy
Read the remains of Shakespeare's lost play, now annotated!
http://www.SKenSoftware.com/Double%20Falshood

Reply With Quote
  #9 (permalink)  
Old 05-18-2012, 12:10 AM
Fritz Wuehler
Guest
 
Posts: n/a
Default Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

John W Kennedy <jwkenne@attglobal.net> wrote:

> > x86 doesn't really have a dozen available registers. Many of the so-called
> > GPRs are reserved for important stuff. You end up with 4 or 5 usable
> > registers in any heavy duty x86 code. Lots of thrashing is normal in x86
> > code but it doesn't seem to hurt performance much for some reason.

>
> x64, of course, improves it massively.


Well massively as a percentage but not massively by 1964 OS/360 standards
since Intel (AMD actually) still requires you to throw 2 or 3 registers away
on stack management and other instructions that are so basic to getting
anything worthwhile done (ex. string compares/moves) have implied register
usage. But it doesn't seem like anybody coding on Intel cares. Most of them
haven't a clue and the guys who do have a clue usually haven't coded on
machines with enough registers (S/60, POWER, SPARC) to know what they're
missing. Like I said lots and lots of thrashing but they still get
acceptable performance whatever that means.

Reply With Quote
  #10 (permalink)  
Old 05-18-2012, 02:16 AM
robin.vowels@gmail.com
Guest
 
Posts: n/a
Default Re: REORDER and OPTIMISATION

On Friday, 18 May 2012 02:23:22 UTC+10, John W Kennedy wrote:
> On 2012-05-17 08:38:51 +0000, Nomen Nescio said:
> > That looks very odd to me, almost as if the optimization option didn't get
> > accepted and it generated pure dumb code. Can you verify the banner shows
> > that optimization is on and actually got done (however you do that
> > nowadays). Do you have to specify REORDER even with OPT(x)?

>
> Yes, because REORDER breaks the language.


That's nonsense.

> The details of PL/I exception
> handling, as the language is defined, are violently hostile to
> meaningful optimization,


Don't talk rubbish.
All that REORDER does is to allow the compiler to move code out of loops,
and to do things like eliminate common sub-expressions (compute them once),
in situations the same sub-expression is evaluated in two or more statements,
etc.

Things that can affect optimisation are labels (common to any language)
and presence of ON statements (e.g., code cannot be moved to a place
where it would be executed before an ON statement).

Given typical use of ON statements, they appear at or near the beginning of
a procedure, and thus do not have much influence on optimisation.

That said, it doesn't matter where the code is ; if it causes an interrupt,
the exception handler will get control, and, if appropriate, return control
to the point where the interrupt occurred.

> which is a big reason that PL/I was unable to replace FORTRAN,


It had nothing to do with whether or not PL/I replaced FORTRAN.

> and the main reason that new languages intended for
> compilation avoided exception handling until the 80s, when Ada and C++
> introduced the try/catch model.


Reply With Quote
  #11 (permalink)  
Old 05-18-2012, 02:24 AM
robin.vowels@gmail.com
Guest
 
Posts: n/a
Default Re: BIT strings

On Thursday, 17 May 2012 23:22:10 UTC+10, Robert AH Prins wrote:

> However, when working in the Netherlands in 1996, I optimized two CRC routines
> and there the savings were a measly 99.3 and 99.5% - this was pre-Enterprise
> PL/I and my change was to simply do all intermediate bit-fiddling with ALIGNED
> bits, cutting out thousands of calls to the library.


The Programmer's Guide from PL/I-F days tells us that BIT strings
are best ALIGNed for speed.
Reply With Quote
  #12 (permalink)  
Old 05-18-2012, 03:08 AM
glen herrmannsfeldt
Guest
 
Posts: n/a
Default Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

Fritz Wuehler <fritz@spamexpire-201205.rodent.frell.theremailer.net> wrote:

(snip on x64, IA32, and some others)

> Well massively as a percentage but not massively by 1964 OS/360 standards
> since Intel (AMD actually) still requires you to throw 2 or 3 registers away
> on stack management and other instructions that are so basic to getting
> anything worthwhile done (ex. string compares/moves) have implied register
> usage. But it doesn't seem like anybody coding on Intel cares. Most of them
> haven't a clue and the guys who do have a clue usually haven't coded on
> machines with enough registers (S/60, POWER, SPARC) to know what they're
> missing. Like I said lots and lots of thrashing but they still get
> acceptable performance whatever that means.


Well, S/360 requires that you not use some registers, too.

You at least need a base register, which you don't for IA32.

Register 0 has some limits on its use.

The OS/360 linkage registers, 1, 14, and 15 can be used for other
uses if one is careful. I don't know by now how much compilers do that.

-- glen
Reply With Quote
  #13 (permalink)  
Old 05-18-2012, 03:30 AM
robin.vowels@gmail.com
Guest
 
Posts: n/a
Default Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

On Friday, 18 May 2012 13:08:18 UTC+10, glen herrmannsfeldt wrote:

> Well, S/360 requires that you not use some registers, too.
>
> You at least need a base register, which you don't for IA32.


It's convenient to use a base register, but you don't have to have
one.

> Register 0 has some limits on its use.


It's the instructions that have limits on use of registers.
When zero is specified in the index field or the base field
of an instruction, no register is used.

The S/360 is archaic 1960s. Try System z.
Reply With Quote
  #14 (permalink)  
Old 05-18-2012, 03:48 AM
robin.vowels@gmail.com
Guest
 
Posts: n/a
Default Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

On Thursday, 17 May 2012 02:37:35 UTC+10, Robert AH Prins wrote:
> On 2012-05-16 14:07, robin.vow....@gmail.com wrote:
> > On Wednesday, 16 May 2012 08:07:51 UTC+10, Robert AH Prins wrote:
> >
> >> OS PL/I V2.3.0 - OPT(2)
> >> 343 1 2 REPT_LINE = REPT_LIST, BY NAME;
> >>
> >> * STATEMENT NUMBER 343

> >
> >
> >> Enterprise PL/I for z/OS V3.R9.M0 (Built:20100923) - OPT(3)
> >> 3120.0 368 1 2 rept_line = rept_list, by name;

> >
> >> IBM(R) PL/I for Windows 8.0 (Built:20110825)
> >> ; 3132 rept_line = rept_list, by name;

> >
> > They are three different programs.

>
> Those of you who have used OS PL/I, Enterprise PL/I and PL/I for Windows know that Enterprise PL/I
> now bases the statement numbers of the pseudo-assembler listing on line-numbers and that "if..then"
> now counts as two statements rather than one. PL/I for doze also bases its statement numbers on the
> line of the source, but on z/OS a version number comment-line is added by the compile procedure, and
> the z/OS compile was done with listview(afterall) whereas the doze compilation missed the
> (irrelevant) extra comment line and used listview(source).
>
> Anyway, of course this is the same program,


If it were the same program, we would see the same member names
in the code, but there aren't.
In any case, there is an obvious change, in case you hadn't noticed,
namely, the change from upper case source to lower-case.

And you haven't displayed the code as someone already asked.
Others (Tucker, Jalic, and Kewley) have made claims
about PL/I optimisation that proved to be false.

To justify you claim that the programs are the same,
you, Prins, would need to post the versions.
In view of your recent stuff-ups that is the bare minimum.

> but sadly RV seems to enjoy the board for his head too
> much to actually investigate the matter, a bold "They are three different programs." is much easier.


See above. I didn't base that conclusion on the lines shown above.
It was based on the pseudo-assembler.
I omitted the lengthy pseudo-assembler code, as there was no sense in
repeating it.
Reply With Quote
  #15 (permalink)  
Old 05-18-2012, 06:35 AM
Nomen Nescio
Guest
 
Posts: n/a
Default Re: Comparison of compiler generated code AD 1980(ish) v 2010(ish)

glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:

> Fritz Wuehler <fritz@spamexpire-201205.rodent.frell.theremailer.net> wrote:
>
> (snip on x64, IA32, and some others)
>
> > Well massively as a percentage but not massively by 1964 OS/360 standards
> > since Intel (AMD actually) still requires you to throw 2 or 3 registers away
> > on stack management and other instructions that are so basic to getting
> > anything worthwhile done (ex. string compares/moves) have implied register
> > usage. But it doesn't seem like anybody coding on Intel cares. Most of them
> > haven't a clue and the guys who do have a clue usually haven't coded on
> > machines with enough registers (S/60, POWER, SPARC) to know what they're
> > missing. Like I said lots and lots of thrashing but they still get
> > acceptable performance whatever that means.

>
> Well, S/360 requires that you not use some registers, too.


That is true. All architectures require you to use *some* registers for
*some* things but in S/360 generally you're free to pick which ones and you
can use them however you want in other contexts.

> You at least need a base register, which you don't for IA32.


True. But you did for 808X. And not just one! And you still do in x86 and
AMD64 when you start up, until you switch modes ;-)

On most Intel implementations you're going to have to use EBP and ESP for
the stack. That's two registers gone. I could write S/360 code that shares a
base register for data and instructions, that's half as many registers.

> Register 0 has some limits on its use.


True, it's not a full GPR in that it can't be used for addressing. Other
architectures have even more severe contraints on some registers. For
example GPR0 in SPARC is always zero, and writing to it is like /dev/null
At least in S/360 you can read and write GPR0 just like any other register.

> The OS/360 linkage registers, 1, 14, and 15 can be used for other
> uses if one is careful. I don't know by now how much compilers do that.


Those registers are used for linkage but when you're not actually
invoking a service you can use them however you want. If you write code on
S/360 and x86 I think you'll agree x86 feels constrained regarding
registers. I haven't written enough AMD64 to make a decision yet but I still
feel they should have specified more registers when they had a chance. And
btw if you want to talk about linkage registers, look at the AMD64 ABI for
UNIX if you haven't already. What a complicated ugly mess, like everything
Intel...

Reply With Quote
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off




All times are GMT. The time now is 09:14 AM.


Copyright ©2009

LinkBacks Enabled by vBSEO 3.3.0 RC2 © 2009, Crawlability, Inc.