|
|||
|
Hello, Le Tue, 24 Nov 2009 14:41:19 +0100, mk a écritÂ*: > > As Rob pointed out (thanks): > > 11 31 LOAD_FAST 0 (nonevar) > 34 JUMP_IF_FALSE 4 (to 41) > > I'm no good at py compiler or implementation internals and so I have no > idea what bytecode "JUMP_IF_FALSE" is actually doing. It tries to evaluate the op of the stack (here nonevar) in a boolean context (which theoretically involves calling __nonzero__ on the type) and then jumps if the result is False (rather than True). You are totally right that it does /more/ than "is not None", but since it is executed as a single opcode rather than a sequence of several opcodes, the additional work it has to do is compensated (in this case) by the smaller overhead in bytecode interpretation. As someone pointed out, the Python interpreter could grow CISC-like opcodes so as to collapse "is not None" (or generically "is not <constant>") into a single JUMP_IF_IS_NOT_CONST opcode. Actually, it is the kind of optimizations wpython does (http://code.google.com/p/ wpython/). Regards Antoine. |
|
|
||||
|
||||
|
|
|
|||
|
On 2009-11-24, Antoine Pitrou <solipsis@pitrou.net> wrote:
> It tries to evaluate the op of the stack (here nonevar) in a > boolean context (which theoretically involves calling > __nonzero__ on the type) ....or __bool__ in Py3K. -- Neil Cerutti |
|
|||
|
On 24 Nov, 16:11, Antoine Pitrou <solip...@pitrou.net> wrote:
> [JUMP_IF_FALSE] > It tries to evaluate the op of the stack (here nonevar) in a boolean > context (which theoretically involves calling __nonzero__ on the type) > and then jumps if the result is False (rather than True). [...] > As someone pointed out, the Python interpreter could grow CISC-like > opcodes so as to collapse "is not None" (or generically "is not > <constant>") into a single JUMP_IF_IS_NOT_CONST opcode. Of course, JUMP_IF_FALSE is already quite CISC-like, whereas testing if something is not None could involve some fairly RISC-like instructions: just compare the address of an operand with the address of None. As you point out, a lot of this RISC vs. CISC analysis (and inferences drawn from Python bytecode analysis) is somewhat academic: the cost of the JUMP_IF_FALSE instruction is likely to be minimal in the context of all the activity going on to evaluate the bytecodes. I imagine that someone (or a number of people) must have profiled the Python interpreter and shown how much time goes on the individual bytecode implementations and how much goes on the interpreter's own housekeeping activities. It would be interesting to see such figures. Paul |
|
|||
|
Le Tue, 24 Nov 2009 08:58:40 -0800, Paul Boddie a écritÂ*:
> As you > point out, a lot of this RISC vs. CISC analysis (and inferences drawn > from Python bytecode analysis) is somewhat academic: the cost of the > JUMP_IF_FALSE instruction is likely to be minimal in the context of all > the activity going on to evaluate the bytecodes. Sorry, I have trouble parsing your sentence. Do you mean bytecode interpretation overhead is minimal compared to the cost of actual useful work, or the contrary? (IMO both are wrong by the way) > I imagine that someone (or a number of people) must have profiled the > Python interpreter and shown how much time goes on the individual > bytecode implementations and how much goes on the interpreter's own > housekeeping activities. Well the one problem is that it's not easy to draw a line. Another problem is that it depends on the workload. If you are compressing large data or running expensive regular expressions the answer won't be the same as if you compute a Mandelbrot set in pure Python. One data point is that the "computed gotos" option in py3k generally makes the interpreter faster by ~15%. Another data point I've heard is that people who have tried a very crude form of Python-to-C compilation (generating the exact C code corresponding to a function or method, using Python's C API and preserving dynamicity without attempting to be clever) have apparently reached speedups of up to 50% (in other words, "twice as fast"). So you could say that the interpretation overhead is generally between 15% and 50%. |
|
|||
|
On 24 Nov, 19:25, Antoine Pitrou <solip...@pitrou.net> wrote:
> > Sorry, I have trouble parsing your sentence. Do you mean bytecode > interpretation overhead is minimal compared to the cost of actual useful > work, or the contrary? > (IMO both are wrong by the way) I'm referring to what you're talking about at the end. The enhancements in Python 3 presumably came about after discussion of "threaded interpreters", confirming that the evaluation loop in Python 2 was not exactly optimal. > > I imagine that someone (or a number of people) must have profiled the > > Python interpreter and shown how much time goes on the individual > > bytecode implementations and how much goes on the interpreter's own > > housekeeping activities. > > Well the one problem is that it's not easy to draw a line. Another > problem is that it depends on the workload. If you are compressing large > data or running expensive regular expressions the answer won't be the > same as if you compute a Mandelbrot set in pure Python. You need to draw the line between work done by system and external libraries and that done by Python, but a breakdown of the time spent executing each kind of bytecode instruction could be interesting. Certainly, without such actual cost estimations, a simple counting of bytecodes should hardly give an indication of how optimal some Python code might be. Paul |
|
|||
|
Le Tue, 24 Nov 2009 22:08:19 +0000, Benjamin Peterson a écritÂ*:
> >> Would it be worth in-lining the remaining part of PyObject_IsTrue in >> ceval? > > Inlining by hand is prone to error and maintainability problems. Which is why we like to do it :-)) |
|
|||
|
Le Tue, 24 Nov 2009 16:09:10 -0800, Paul Boddie a écritÂ*:
> > I'm referring to what you're talking about at the end. The enhancements > in Python 3 presumably came about after discussion of "threaded > interpreters", confirming that the evaluation loop in Python 2 was not > exactly optimal. An optimal evaluation loop is a evaluation loop which doesn't get executed at all :-) (which is what unladen-swallow, cython and pypy are trying to do) > You need to draw the line between work done by system and external > libraries and that done by Python, but a breakdown of the time spent > executing each kind of bytecode instruction could be interesting. When you say "executing each kind of bytecode instruction", are you talking about the overhead of bytecode dispatch and operand gathering, or the total cost including doing the useful work? Regardless, it probably isn't easy to do such measurements. I once tried using AMD's CodeAnalyst (I have an AMD CPU) but I didn't manage to get any useful data out of it; the software felt very clumsy and it wasn't obvious how to make it take into account the source code of the Python interpreter. Regards Antoine. |
|
|||
|
On 25 Nov, 13:11, Antoine Pitrou <solip...@pitrou.net> wrote:
> > When you say "executing each kind of bytecode instruction", are you > talking about the overhead of bytecode dispatch and operand gathering, or > the total cost including doing the useful work? Strip away any overhead (dispatch, operand gathering) and just measure the cumulative time spent doing the actual work for each kind of instruction, then calculate the average "cost" by dividing by the frequency of each instruction type. So, for a whole program you'd get a table of results like this: LOAD_CONST <total time> <frequency> <time per instruction> LOAD_NAME <total time> <frequency> <time per instruction> CALL_FUNCTION <total time> <frequency> <time per instruction> .... A comparison of the "time per instruction" column would yield the relative cost of each kind of instruction. Of course, a general profiling of the interpreter would be useful, too, but I imagine that this has been done many times before. To go back to the CISC vs. RISC analogy, I'd expect substantial variation in relative costs, which one could argue is a CISC-like trait (although a separate matter to instruction set orthogonality). Paul |
|
|
|
|
![]() |
| Popular Tags in the Forum |
| musings, performance, pointless |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Re: pointless musings on performance | mk | Newsgroup comp.lang.python | 1 | 11-24-2009 02:44 PM |
| pointless musings on performance | mk | Newsgroup comp.lang.python | 2 | 11-24-2009 12:43 PM |
| Re: pointless musings on performance | MRAB | Newsgroup comp.lang.python | 0 | 11-24-2009 12:26 PM |
| Re: sas Performance Enhancement | Paul M. Dorfman | Newsgroup comp.soft-sys.sas | 0 | 10-14-2005 02:54 AM |
| Re: Performance Statistics Under Windows v9.12 | Michael Raithel | Newsgroup comp.soft-sys.sas | 0 | 12-17-2004 03:11 PM |