Both sides previous revisionPrevious revision | |
destination_passing_style [2021/12/08 10:24] – awf | destination_passing_style [2022/09/26 08:34] (current) – awf |
---|
</code> | </code> |
| |
And another version ''vadd_dps'' which takes a pre-sized result buffer | And another version ''vadd_dps'' which takes a pre-sized result buffer, which the caller must ensure is the correct size |
| |
<code c++> | <code c++> |
Vector vadd_dps(void* buf, Vector a, Vector b) | Vector vadd_dps(void* buf, Vector a, Vector b) |
{ | { |
double* out = (double*)buf; // Function body exactly as before, but no alloc in vadd. | double* out = (double*)buf; |
vadd_blas(min(a.size,b.size), a.data, b.data, out); | vadd_blas(min(a.size,b.size), a.data, b.data, out); |
std::fill(out + min(a.size, b.size), out + max(a.size, b.size), 0.0); | std::fill(out + min(a.size, b.size), out + max(a.size, b.size), 0.0); |
| |
<code c++> | <code c++> |
void vadd_dps(Vector* out, Vector a, Vector b) | void vadd_dps(void* buf, Vector a, Vector b) |
{ | { |
Vector tmp = vadd(a,b); | Vector tmp = vadd(a,b); |
std::copy(tmp.data, tmp.data+tmp.size, out->data); | std::copy(tmp.data, tmp.data+tmp.size, buf); |
gcdelete tmp.data; | gcdelete tmp.data; |
| return Vector(tmp.size, buf); |
} | } |
</code> | </code> |
| |
Again, the copy and allocations can be removed by DCE. So, what's the point? We seem to have just made ''vadd'' less efficient, and any gains we talk about could have been made by inlining ''vadd'' in main. | which again, can be easily optimized. So, what's the point? We seem to have just made ''vadd'' less efficient, and any gains we talk about could have been made by inlining ''vadd'' in main. |
| |
The point is this: when compiling ''vadd'', we can easily compile and optimize ''vadd_size'' and ''vadd_dps'', without cross-module inlining. When a caller sees ''vadd'', it can observe the existence of the ''_size'' and ''_dps'' variants, and know that stack-discipline allocation will yield more efficient code. This will mean less stuff on the GC heap, and less GC overhead. With careful subsetting of the source language (and with the introduction of an explicit ''gcnew'' or ''new/delete'' in the source), we can compile many sensible programs from a functional language such as F# to non-garbage-collected C. | The point is this: when compiling ''vadd'', we can easily compile and optimize ''vadd_size'' and ''vadd_dps'', without cross-module inlining. When a caller sees ''vadd'', it can observe the existence of the ''_size'' and ''_dps'' variants, and know that stack-discipline allocation will yield more efficient code. This will mean less stuff on the GC heap, and less GC overhead. With careful subsetting of the source language (and with the introduction of an explicit ''gcnew'' or ''new/delete'' in the source), we can compile many sensible programs from a functional language such as F# to non-garbage-collected C. |