Go Back   Rhinocerus > Newsgroup > Newsgroup comp.lang.* 1 > Newsgroup comp.lang.misc

Reply
 
Thread Tools Display Modes
  #1 (permalink)  
Old 04-23-2012, 09:56 PM
BGB
Guest
 
Posts: n/a
Default misc: static+dynamic types and boxed values...

so, here is something going off recently off in my BGBScript-VM project:
I have been working on making improvements to the handling and
performance of "Boxed Value Types" (and using the little "BVT" acronym
for this).

so, what is a BVT?:
in my case, it is generally a small heap object whose sole contents are
a single value (its type-name is typically closely related to the value
contained, like "_int_t" for a boxed integer).

originally, these were the sole format for numeric values (in 2004 and
2005 versions of the VM), until later getting largely displaced by
fixnum and flonum types. the reason: BVT types were slow, as well as
being very prone to leak memory... (it was not until years later that a
mechanism was added to reclaim their memory...).

hence, they ended up largely only really used either when the type did
not fit into a fixnum, or when an explicitly higher-range type was
requested (such as double).


however, later improvements have began to address these problems:
BVT types started using pass-by-value semantics, where passing a BVT
would tend to result in it being copied, and BVT values being dropped
will result in their memory being freed immediately. this, at least,
addresses the memory-leaking problem somewhat (apart from cases where
they "fall outside the safety net" or are erroneously treated as
reference types).

recently, several other things have been done:
many BVT types now use dedicated free-lists, rather than fall back to
the general-purpose memory allocator, meaning that they can be allocated
and freed much more quickly (and, potentially, aggregated into larger
heap objects as well, for lower per-value overhead).


changes are ongoing to allow something much more drastic:
potentially switching much of the interpreter over to using static
type-checking (dynamic types will still be used/supported, but in many
cases the interpreter will use static types as well).

at this point, BVT types would seem to be at an advantage again (at
least in the statically-typed case):
they can be fairly quickly allocated/freed (via simple linked-list
operations);
they don't need any type-checking (if statically known), meaning the
higher type-checking cost (vs fixnum and flonum) is less relevant;
they don't need to be wrapped/unwrapped (unlike fixnum and flonum);
they don't have range-mismatch issues (unlike fixnum and flonum, which
alias against types which have a different "semantic range" than the
physical range for which they encode).


here is an example of what some of the "new style" logic from the
interpreter looks like (manually macro-expanded):
BSVM_ThreadOp *BSVM_ThOp_ADD_XD(BSVM_SVMState *ctx, BSVM_ThreadOp *cur)
{
f64 *u, *v;
v=(f64 *)(ctx->stack[--ctx->stackpos]);
u=(f64 *)(ctx->stack[ctx->stackpos-1]);
*u=(*u)+(*v);
*(f64 **)v=ctx->fbvt_double;
ctx->fbvt_double=v;
return(cur->next);
}


I ended up "caving in" somewhat, and for some of these new changes have
largely been doing the type-checking in the front-end, emitting bytecode
output a little more like that used in the JVM, with type-suffixes being
added to many opcodes:
LLOAD_XI, load a lexical variable as a raw integer;
LSTORE_XL, store to a lexical variable using a raw float;
ADD_XF, add two numbers using raw float values;
DUP_XD, duplicate a double-valued item.
....

the prior ideal had been to do things more like .NET/MSIL, and leave all
this stuff up to the back-end. however, doing type-checking this way in
an interpreter (as opposed to a JIT) is a bit more complex, whereas
directly indicating types for many opcodes is much simpler (if, albeit,
it adds a big pile of new opcodes...).

also, there are a bunch of new "CONV" instructions related to converting
between various types (including to/from dynamically-typed references).
these will partly overlap with the pre-existing "CAST" opcodes, except
that "CAST" is more generic (convert anything to anything, rather than,
say, "CONV_XIXD" which will only be able to convert an integer into a
double).


now, as for using BVT types vs, say, storing raw untagged values on the
stack:
mostly this was because BVT types amount to a much less drastic change
to the interpreter, and don't require completely changing how some
things work (such as function-call operations, ...).

the BVT types ending up in dynamically-typed code is merely "slightly
less efficient" rather than "fundamentally incompatible" (as would have
been the case with trying to store raw untyped values on the stack).

the added costs of needing extra pointer indirections and linked-list
operations shouldn't be all that significant though.


examples:
var a; //generic variable, will still (normally) use dynamic types
var i:int; //will now use a boxed integer
var j:long; //likewise, uses a boxed integer
var f:double; //and this uses a boxed double

var v0:vec3; //this case, uses a boxed vector
boxed vectors will currently still use dynamically-typed operations (*),
but will gain some in terms of some of the other optimizations (and so
will likely still perform a little faster).


*: I may or may not add some vector-related opcodes, still undecided. it
is more a question of if v0%v1 should generate a "MOD" instruction,
which will type-check the vectors and do a cross product, or if the
front-end should be like "hey, these are vectors" and emit, say, a
"CROSS_XV3F" opcode or similar instead.

[aside:
thing is, my opcode count is already a little "absurd" compared with
many other VMs, grr... much of this is due to a lot of "compound special
case ops", most related either to interpreter performance, language
semantics, or things which seemed more convenient to add as opcodes than
to rig up logic for "magic built-in methods" or similar (things like
"obj.clone()" and "obj.toString()" are opcodes...).

typically, this has been because the marginal cost of adding new opcodes
is fairly low, just another line in a text file, and a few more
case-statements in an already giant "switch()" block, ...). not that a
switch block cares much that it is slightly more giant (vs putting logic
directly into the execution path, which *will* result in a reduction in
performance).

maybe it just isn't really worthwhile to worry about opcode counts
though as compared with, say, the JVM, or similar...

(I have already since pretty much exhausted the single-byte opcode
space, and most new stuff now is 2-byte opcodes, but it is probably not
like I will run into the 12k limit any time soon).

]


example:
v0=#[1,0,0]; //creates a vector, stores into v0 (1)
a=v0 % #[0,1,0]; //calculates cross product and stores into a dynamic
variable (2).

1: I am still deciding on the exact semantics here. most likely
statically-typed variables will use "stable boxes", which means that the
variable will be "initialized" by allocating a boxed value of the
appropriate type, and then getting/setting the value into the box (this
will likely differ from dynamic variables, where the assigned item
"replaces" the prior value).

less obvious is if a typed variable should be assigned a "stable box" on
creation, or on first assignment. this latter form is likely to be a
little more efficient, but has a potential risk of (unchecked) crashing
the VM on loading an uninitialized variable (as it will hold UNDEFINED
rather than a boxed value). however, if checked, it does have a
side-merit: it better allows for "trap on access to uninitialized
variable" semantics (like, such an operation throws an exception or
similar). I am currently leaning more towards initialize-on-assignment
semantics.


2: there is no problem here, given a BVT is also a valid dynamic type,
and so no special handling is necessary. likewise, dynamically-typed
variables will retain their usual semantics as well (holding a BVT will
not cause a variable to "become" a static variable).

note that, when dynamic, the clone/destroy semantics are also handled,
dynamically, so nothing new here (this means they still behave as value
types, unlike, say, in Java or C#).


any thoughts / comments ?...
Reply With Quote
Alt Today
Advertising
 
and become member of Rhinocerus
Standard Sponsored Links

  #2 (permalink)  
Old 04-24-2012, 12:33 PM
BartC
Guest
 
Posts: n/a
Default Re: misc: static+dynamic types and boxed values...

"BGB" <cr88192@hotmail.com> wrote in message
news:jn4jb7$si6$1@news.albasani.net...

> var a; //generic variable, will still (normally) use dynamic types
> var i:int; //will now use a boxed integer
> var j:long; //likewise, uses a boxed integer
> var f:double; //and this uses a boxed double


These look like what I call 'type-hints' in my dynamic language. I've tried
adding these several times, and it always gets very untidy. They work
though, doubling performance usually.

But, I've also been working on a conventional compiler for a statically
typed language (for one main purpose: to implement an interpreter and
runtime for the dynamic one), and I think I will try and combine the two
languages: static and dynamic typing together. (And have the choice of
statically compiled - necessary for standalone executables - and interpreted
bytecode.)

I've attempted this a few times, and always failed (there are some other
subtle differences to sort out too). However I'm more
confident this time... The language would need to cope with the following
main types:

* Primitives (like in C)
* Fixed bound data (arrays, strings, etc) (like in C, with a few extras)
* Flex data (again, arrays, strings, etc) (not like C...)
* Variant types

(Variant types are challenging within a strictly typed language, because
they break all the rules. For example, to do P^ (dereference P), then
usually P has to have a type of 'pointer to T', and the type of P^ will be
T. But with variants, P can also be a variant type, and P^ will also have a
variant type!)

> [aside:
> thing is, my opcode count is already a little "absurd" compared with many
> other VMs, grr... much of this is due to a lot of "compound special case
> ops", most related either to interpreter performance, language semantics,
> or things which seemed more convenient to add as opcodes than to rig up
> logic for "magic built-in methods" or similar (things like "obj.clone()"
> and "obj.toString()" are opcodes...).


(I've reduced the opcode count in my bytecode, by breaking the link between
a specific bytecode, and a specific handler to execute that opcode. That
means there are only dozens of bytecodes, but hundreds of handlers (and lots
of metadata around to link them up using automatic tools).

For example, the bytecode instruction:

push p

will invoke different handlers, depending on whether p represents a static
variable, a frame variable, an integer constant, etc.

In the case of arithmetic ops, the operator code is one of the operands:

binop op

These get fixed up automatically, at loadtime, with a specific handler
('add' for example) if one exists, otherwise to a more generic one. So the
advantages of having dedicated bytecodes is retained, without needing to
manage hundreds of them.)

--
Bartc

Reply With Quote
  #3 (permalink)  
Old 04-24-2012, 04:27 PM
BGB
Guest
 
Posts: n/a
Default Re: misc: static+dynamic types and boxed values...

On 4/24/2012 5:33 AM, BartC wrote:
> "BGB" <cr88192@hotmail.com> wrote in message
> news:jn4jb7$si6$1@news.albasani.net...
>
>> var a; //generic variable, will still (normally) use dynamic types
>> var i:int; //will now use a boxed integer
>> var j:long; //likewise, uses a boxed integer
>> var f:double; //and this uses a boxed double

>
> These look like what I call 'type-hints' in my dynamic language. I've tried
> adding these several times, and it always gets very untidy. They work
> though, doubling performance usually.


I more often call them type annotations.

as-is, they are not really "hints", as a hint would imply "this variable
should only hold this", but with the language making little or no
attempt to enforce this. some earlier versions of BGBScript were more
this way (meaning that messing up with an annotation was a good way to
get strange results and/or crashes, this being the case with the
2006-era VM).


in my case, the current behavior is more like in C:
the variable *will* hold this type, and attempting to store a value into
such a variable will result in it being coerced. currently, this is
handled be the front-end, which would emit a "CAST" opcode if needed
(more because I didn't want to put the logic in the execution path).

currently though, there is no checking for the case of "well, this is
obviously wrong, so complain about it", but I could do this eventually
(the frontend currently assumes it can cast anything to anything).

with the new type logic though, it is possible I could make it a bit
more specific (warning about down-casts, and storing a non-number type
into a numeric variable, ...).


also possible would have been to go the Java route:
types don't match *exactly*, so treat it as an error.

personally I don't like this, as I don't like having to type extra casts
simply because the compiler is nit-picky.


> But, I've also been working on a conventional compiler for a statically
> typed language (for one main purpose: to implement an interpreter and
> runtime for the dynamic one), and I think I will try and combine the two
> languages: static and dynamic typing together. (And have the choice of
> statically compiled - necessary for standalone executables - and
> interpreted
> bytecode.)
>


most of the native code is currently written in C.

there are a few bits and pieces of dynamically-generated assembler in
the mix, but for the most part it is C.


most of what library code exists is written in BGBScript and just sort
of hackishly shoved into the DLL (admittedly, currently in source-form).
there is not a lot of this code, as "write a big comprehensive class
library" hasn't really been a huge priority.

in this way, the VM is sort of the inverse of the JVM: relatively little
work is currently actually done in the language itself, as it mostly
just passes most stuff off to C code.

and, "well, the standard library is currently only about 4 kloc of BS
code, much of which itself hasn't really been tested" doesn't sound all
that impressive.


> I've attempted this a few times, and always failed (there are some other
> subtle differences to sort out too). However I'm more
> confident this time... The language would need to cope with the following
> main types:
>
> * Primitives (like in C)
> * Fixed bound data (arrays, strings, etc) (like in C, with a few extras)
> * Flex data (again, arrays, strings, etc) (not like C...)
> * Variant types
>
> (Variant types are challenging within a strictly typed language, because
> they break all the rules. For example, to do P^ (dereference P), then
> usually P has to have a type of 'pointer to T', and the type of P^ will be
> T. But with variants, P can also be a variant type, and P^ will also have a
> variant type!)
>


in my case, variant is a special type case in itself, or more like, it
is "the general case", with most other things being subsets.

previously, this meant opcodes like:
ADD //add, generic
ADD_FN //add, fixnum
ADD_FL //add, flonum

these were not strict in recent VMs, due to technical reasons, and
infact more indicated the "range" of the type than the exact encoding
(IOW: "fixnum" ops would mean, "assume int32 range and optimize for the
fixnum case").

but, now with:
ADD_XI, ADD_XL, ...
it becomes a bit more strict, since these types assume a specific
representation and will not type-check.


>> [aside:
>> thing is, my opcode count is already a little "absurd" compared with many
>> other VMs, grr... much of this is due to a lot of "compound special case
>> ops", most related either to interpreter performance, language semantics,
>> or things which seemed more convenient to add as opcodes than to rig up
>> logic for "magic built-in methods" or similar (things like "obj.clone()"
>> and "obj.toString()" are opcodes...).

>
> (I've reduced the opcode count in my bytecode, by breaking the link between
> a specific bytecode, and a specific handler to execute that opcode. That
> means there are only dozens of bytecodes, but hundreds of handlers (and
> lots
> of metadata around to link them up using automatic tools).
>
> For example, the bytecode instruction:
>
> push p
>
> will invoke different handlers, depending on whether p represents a static
> variable, a frame variable, an integer constant, etc.


I had previously wanted to do it more like:
LOAD i //load int
LOAD j //load int
ADD //notes: int+int->int
DUP //notes: int->int,int
MUL //notes: int*int->int

but was feeling too lazy in the case of the interpreter to rig up the
logic for modeling type-propagation in the stack (this is something done
in most of my past JITs though, and is much more troublesome apart from
either directly executing the instructions, or using per-opcode logic
which directly simulates the stack operations).

it seemed like much less effort to just start spitting out type-specific
opcodes, since the front-end generally already knows most of this stuff.


> In the case of arithmetic ops, the operator code is one of the operands:
>
> binop op
>
> These get fixed up automatically, at loadtime, with a specific handler
> ('add' for example) if one exists, otherwise to a more generic one. So
> the advantages of having dedicated bytecodes is retained, without
> needing to manage hundreds of them.)
>


there is a similar opcode in my case ("BINARY"), but it has ended up
mostly relegated to rarely used operations.

this was partly because, until fairly recently, the interpreter used a
giant "switch()" block, and so using more very-specific (typically
compound) opcodes meant higher performance (and had a more significant
performance impact, due to needing to decode the argument, invoke a
secondary "switch()", ...).

now, compound opcodes still have an advantage, but there is a little
less pressure for compact opcodes (and many of the newer opcodes are
2-byte opcodes anyways).


I just recently added a few type-hint prefix opcodes:
PF_HINT_XI //integer hint
PF_HINT_XL //long hint
PF_HINT_XF //float hint
PF_HINT_XD //double hint
PF_HINT_S //signature hint

which may be used for many of the more "general" opcodes (such as array
loads/stores, field loads/stores, ...).

I wanted these as single-byte opcodes, and (annoyingly) ended up moving
several opcodes out of the single-byte range to make space (two for
multidimensional array access, and 4 for 3-element EXCH forms, all of
which are rarely used, whereas type-hinting is likely to be a bit more
common, so I felt it justified).

FWIW, there is also a "PF_WIDE" opcode, intended mostly to allow 32-bit
jumps (most other jumps are 16-bits). note that this is unnecessary for
index variables (variable-indices, literal-table / constant-pool
entries, ...) which use variable-length values (just, using a VLI for a
jump is "problematic").


this leaves the pile of new type-specialized opcodes at:
arithmetic ops (the bulk of the new opcodes);
type-conversion ops;
compare-ops (1);
compare-and-jump ops (1);
lexical-variable load/store ops.

the BGBScript VM does compares differently than the JVM does.


1. the JVM typically had typed-compare operations, which returned -1,0,1
depending on the results, and would execute (generic) conditional
operations based on this (using ==0, !=0, ==1, ==-1, >=0, <=0).

in my case, the compare ops checked for a specific condition (equal,
not-equal, less-than, greater-than, less-equal, greater-equal), and
would return a boolean.

the "compare-and-jump" opcodes worked similar, doing a typed compare and
jump as a single opcode.

with typed forms, this essentially results in a much larger number of
opcodes related to compares and jumps (approx 48 for the new ILFD ops,
24 for compare, and 24 more for compare-and-jump).

the VM has a number of other complex conditional jump forms which were
not type-specialized, but the possibility of using type-hint prefixes
seems possible.


example:
PF_HINT_XI //integer hint
JMP_GE_FNC 24, foo //if(value>=24) goto foo

which would require 6 bytes.

although, this does render the 'FN' part of the opcode name meaningless.
(well, it is that, or wasting an extra operation to push an integer on
the stack to use "JMP_GE_XI", or using 12 more opcodes for things like
"JMP_GE_XIC" and "JMP_LT_XLC" or similar...).


or such...
Reply With Quote
  #4 (permalink)  
Old 04-27-2012, 04:59 PM
BGB
Guest
 
Posts: n/a
Default Re: misc: static+dynamic types and boxed values...

On 4/24/2012 9:27 AM, BGB wrote:
> On 4/24/2012 5:33 AM, BartC wrote:
>> "BGB" <cr88192@hotmail.com> wrote in message
>> news:jn4jb7$si6$1@news.albasani.net...
>>
>>> var a; //generic variable, will still (normally) use dynamic types
>>> var i:int; //will now use a boxed integer
>>> var j:long; //likewise, uses a boxed integer
>>> var f:double; //and this uses a boxed double

>>
>> These look like what I call 'type-hints' in my dynamic language. I've
>> tried
>> adding these several times, and it always gets very untidy. They work
>> though, doubling performance usually.

>
> I more often call them type annotations.
>


<snip>

>
> example:
> PF_HINT_XI //integer hint
> JMP_GE_FNC 24, foo //if(value>=24) goto foo
>
> which would require 6 bytes.
>
> although, this does render the 'FN' part of the opcode name meaningless.
> (well, it is that, or wasting an extra operation to push an integer on
> the stack to use "JMP_GE_XI", or using 12 more opcodes for things like
> "JMP_GE_XIC" and "JMP_LT_XLC" or similar...).
>


minor status update:
PF_HINT_XI, ...

now work on a bunch of other opcodes (generic arithmetic opcodes,
various conditional jumps, ...).

it is generally less obvious how they will behave regarding array
operations, which involve several types:
the type of the index;
the type of the array;
the type of the returned value.

example:
variant_array[variant]->variant
variant_array[int]->variant
variant_array[variant]->int
variant_array[int]->int

int_array[variant]->variant
int_array[int]->variant
int_array[variant]->int
int_array[int]->int
(well, also there being byte/sbyte/short/ushort/char arrays as well).

I would likely need either more prefixes or more opcodes to really nail
this down.

current thinking:
PF_HINT_XI/... will only apply to the output type (for normal array ops);
secondary prefixes will encode alternate forms.

so, "PF_HINT_XI LOADINDEX" means:
variant_array[variant]->int

PF_ARRAY_XUBI byte_array[variant]->int
PF_ARRAY_XSBI sbyte_array[variant]->int
PF_ARRAY_XUWI ushort_array[variant]->int
PF_ARRAY_XSWI short_array[variant]->int
PF_ARRAY_XUII uint_array[variant]->int (1)
PF_ARRAY_XSII int_array[variant]->int
....

PF_ARRAY_XUBI_IXI/...
being similar, except to hint that the index is also int (2).


1: signed/unsigned forms are internally distinguished by operations used
in certain cases, but are most often treated as type-equivalent by the
VM for sake of simplicity.

2: although counter-intuitive, the loadindex/storeindex/... operations
may also be used for things like loading/storing object fields by-name
and similar, and in many other cases (loading/storing by a constant
index) the type of the index is already known / not-relevant.

most likely all these will be 2-byte prefix forms.



did spend several days here trying to track down some "generally severe"
memory leaks (at least when using the Fibonacci test).

discovered, among other things:
there were cases where the scope exiting would not release value-types
held in variables;
more sublty, conditional compare-and-jump operations were leaking their
arguments;
....

sadly, this problem is likely still not full from fully solved, as I am
left to realize how many operations generally fail to "go through the
correct procedures" regarding proper treatment of value types (so,
solving the problem of leaks "in general" likely isn't over yet by any
means, sadly...).


or such...
Reply With Quote
  #5 (permalink)  
Old 05-05-2012, 04:04 AM
BGB
Guest
 
Posts: n/a
Default Re: misc: static+dynamic types and boxed values...

On 4/23/2012 2:56 PM, BGB wrote:
> so, here is something going off recently off in my BGBScript-VM project:
> I have been working on making improvements to the handling and
> performance of "Boxed Value Types" (and using the little "BVT" acronym
> for this).
>


misc, put a current version of the VM spec here:
http://pastebin.com/YuZaXmkt

but, granted, it is the HTML source rather than the document proper, but
alas...

shouldn't be too hard to figure out how to look at the contained HTML
document.


some relevant information is not contained (it is in other specs).


so, as can be noted:
in its present form, most of the new "static" types are accessed mostly
via prefixes. I had added some number of "static opcodes" (not actually
defined in the spec), and whether or not they will be retained in the
future is less certain (given pretty much everything they do can be done
via more general opcodes and prefixes).

some other opcodes were relocated, and a few were dropped (mostly ones
which were no longer used, not actually implemented, or just didn't do
all that much useful).

if I were designing it clean now, I might do a few things differently,
but this bytecode format has developed incrementally over a decent chunk
of a decade.

dunno what else to comment at the moment, maybe if anyone wants to
comment on the spec, they can do so.


note: I don't currently have a functioning web-server (due mostly to
geographical distance issues), but this may change in the near future
(with regained physical contact with said server, maybe I can try to get
it back online).


or such...
Reply With Quote
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off




All times are GMT. The time now is 03:39 PM.


Copyright ©2009

LinkBacks Enabled by vBSEO 3.3.0 RC2 © 2009, Crawlability, Inc.