Blog

April 10, 2009

Risk Free Offer

I’m not a native English speaker so perhaps I have a bad eye for this, but I’ve just found something on a magazine’s web site that looks strange to me. When Mac Life has a section on its web site at the bottom of each of page titled “Risk Free Offer”, I can’t help but to see the word Risk as a verb; this would mean the magazine is asking you to take the risk of a free offer, which is most likely not the intended meaning.

I think what they mean is “Risk-Free Offer”, where the hyphen transform Risk Free into a compound adjective with the meaning of an offer free of any risk. Well, technically-speaking, you don’t always need to put a hyphen to make a compound adjective. But in this case there is an ambiguity, so I guess they should put the hyphen in.

Although I now have another theory. Perhaps they’re playing on the ambiguity to make people notice more. In which case it worked amazingly well on me.

April 8, 2009

C³: Comments on Memory Management

I just got wind that someone else in this world is attempting to create another “better C++” language: C³. It’s still vapourware¹ at this point. What’s special is that I know who is c3wife. And here is a reply to c3wife’s post about the value of values.

It’s true that garbage collectors don’t solve all problems, but I think you’re a little too rash about them. I don’t know a serious garbage collector which isn’t capable of handling cyclic references correctly. Some are totally non-blocking and can run on a separate thread, but require atomic operations and being notified of changes to pointer values (like the one in Apple’s Objective-C 2.0). Some garbage collectors can defragment memory by relocating memory blocks and adjusting pointers that point to them (like the one in Sun’s JVM). Some don’t run until you try to allocate memory (like the one in D), which makes it no more unpredictable than calling malloc that itself offers no guaranty about speed.

And a garbage collected program in some cases can outperform one using RAII: since you postpone calling destructors and freeing memory when you no longer need it, you finish the task faster, and only then the garbage collector kicks in. Note however that you can still mimic that with RAII by having a destructor placing memory on a garbage stack which is to be handled later, something à la NSAutoreleasePool.

I agree that RAII is a very good thing in C++, and surely something to keep since there are many situations where a garbage collector is inappropriate. For anything that holds a resource other than memory (file handle, transaction, mutex lock, etc.), you generally can’t wait for the garbage collector to catch up. And if you do some real-time programing, you don’t want the GC to interfere at random times.

And it’s also true that a garbage collector doesn’t entirely protect against memory leaks, but it has nothing to do with cyclic references (except for poor ones based on reference counting). It has to do with keeping unnecessary references where you shouldn’t (in a global list for instance), or, for conservative garbage collectors, having some pointer-like data which isn’t really a pointer.

So I’d say the best of both worlds is to have the two memory management techniques at our disposal.

The page about garbage collection on Wikipedia is full of interesting details on that subject.

Forcing all pointers to be smart pointers expressing ownership and copy behaviour is a good move. Plain old C pointers are an abomination in C++, and I use them only scarcely when writing C++.

The only downside is that it can easily explode in a dozen of different pointer types. But I think that’s still better than having one type with ambiguous or wrong semantics.

One thing I’m wondering: are pointer to stack variables allowed? And in that case, is there any provision to ensure the pointer doesn’t outlive the stack frame?

And good luck developing C³.

Vaporware in the nothing-functionnal-published-yet sense. I’m sure the Husband has some prototypes. ↩︎

April 4, 2009

VTable Benchmarking

In order to evaluate the impacts of the various vtable modifications I proposed in my last post about D, I built a test program implementing them so I could do some benchmarks. And I got strange and surprising results.

The Test Program

The test program runs by performing 3 laps each doing 400 000 000 function calls for each type of call supported. Supported type of calls:

Static: A direct call to a function, no function pointer involved, no virtual call.
C++
D Class: C++ or D virtual dispatch built into the language.
Standard: A replica in C++ or D code of what the compiler does when creating a virtual call.
Dynamic: Accessing the virtual table by checking the function offset in a global variable.
Dynamic Indirect: Accessing the virtual table hidden behind a second layer of indirection and by checking the function offset in a global variable.
Objective-C: A dynamically dispatched call to a method of an Objective-C object instance.

If you want to check my results, or just the program, you can download it from here: VTable Benchmarks.zip

Unzip it, and you’ll have a folder containing the source code, and five scripts file named test-dmd.sh, test-dmd.bat, test-g++.sh, test-g++.bat, and test-gdc.sh. Those scripts compile the C++ or D code using the compiler in the script’s name and then run the test (you must have the compiler available in your path). .bat files are for Windows, .sh for Linux and Mac OS X. All arguments you pass to the scripts are passed to the compiler too, making it easy to test with various optimisation flags.

Feel free to experiment, improve the scripts and modify them to work with whatever compiler and platform that suits you, and please post your thoughts and own results in the comments.

Following are my own results.

Mac OS X (PowerPC G4)

./test-g++.sh -Os

== 3-lap average; totaling 1200000000 function calls ==
Static:           2349424 ticks; 1.000000 (base time)
C++:              5688451 ticks; 2.421211
Standard:         5687754 ticks; 2.420914
Dynamic:          6022387 ticks; 2.563346
Dynamic Indirect: 6691103 ticks; 2.847975
Objective-C:      10912837 ticks; 4.644898

Pretty much what I expected: C++ and Standard to yeild the same results, Dynamic being a little slower and Dynamic Indirect even more. Note that this is the only test in which I can compare with Objective-C’s dynamic dispatch. It’s interesting to note that method dispatching in Objective-C takes 465% the time of a static call and 192% the time of a C++ virtual call.

./test-gdc.sh -Os -frelease -finline

== 3-lap average; totaling 1200000000 function calls ==
Static:           234 ticks; 1.000000 (base time)
D class:          568 ticks; 2.423295
Standard:         568 ticks; 2.423295
Dynamic:          602 ticks; 2.565341
Dynamic Indirect: 702 ticks; 2.994318

The overall result for gdc is about the same as for g++, but the tick counts are way off. It’s almost as if you took the tick counts from the C++ test and divided them by 10000. I’m not sure why the clock function is lying in std.c.time, but there’s obviously a bug causing this strange behaviour in GDC. Because of this, we can’t compare ticks from those two tests directly. I think we can safely assume they’re just divided by 10000 because the result fits pretty well with those above if you do that, but this may not be very reliable.

Windows (AMD Duron)

test-g++.bat -Os

== 3-lap average; totaling 1200000000 function calls ==
Static:           2373 ticks; 1.000000 (base time)
C++:              3040 ticks; 1.281000
Standard:         3048 ticks; 1.284230
Dynamic:          3310 ticks; 1.394748
Dynamic Indirect: 3595 ticks; 1.514675

Once again, g++ gives me the results I’d expect: C++ virtual calls being almost the same as Standard, Dynamic being a little slower and Dynamic Indirect even more. It’s interesting to note that the gap between Standard and Dynamic is a lot smaller than between Dynamic and Dynamic Indirect.

test-dmd.bat -O -release -inline

== 3-lap average; totaling 1200000000 function calls ==
Static:           674 ticks; 1.000000 (base time)
D class:          6766 ticks; 10.034108
Standard:         2370 ticks; 3.514582
Dynamic:          8679 ticks; 12.871478
Dynamic Indirect: 9653 ticks; 14.315867

Ok… seems we’ve got a few problems here. Comparing the time for a static call with the one from g++, I’d say the static call was inlined despite my efforts to make that impossible. I’d also guess that Standard has been changed to a static call by the optimizer (the time is almost exactly the same as g++’s static call). So I’ll advance that the only reliable data would be the tick counts for the D class, Dynamic and Dynamic Indirect. When you compare them, you find that they’re all way higher than for g++, so I’m not sure what to think. Is DMD so bad at doing virtual calls?

I also tested DMD without without the -inline switch, but it isn’t much better for comparisons.

test-dmd.bat -O -release

== 3-lap average; totaling 1200000000 function calls ==
Static:           6760 ticks; 1.000000 (base time)
D class:          7100 ticks; 1.050296
Standard:         7107 ticks; 1.051331
Dynamic:          8549 ticks; 1.264694
Dynamic Indirect: 2933 ticks; 0.433974

I’m not sure why the static call is so slow (compared to g++) nor why Dynamic Indirect is faster than them all, but at least the other results seems consistent in their ratios: D virtual dispatch is pretty much equivalent to Standard, and Dynamic is a little faster.

Linux (AMD Duron)

./test-g++.sh -Os

== 3-lap average; totaling 1200000000 function calls ==
Static:           2693333 ticks; 1.000000 (base time)
C++:              3023333 ticks; 1.122525
Standard:         3026666 ticks; 1.123762
Dynamic:          3023333 ticks; 1.122525
Dynamic Indirect: 3363333 ticks; 1.248762

The overall results looks good, but there’s one oddity: Dynamic is as fast as Standard and C++ virtual dispatch. Also note that the overhead of a virtual call seems less than with g++ on Windows (1.122525 times the time of a static call, compared to 1.281000 on Windows).

./test-gdc.sh -Os -frelease -finline

== 3-lap average; totaling 1200000000 function calls ==
Static:           2690000 ticks; 1.000000 (base time)
D class:          3030000 ticks; 1.126394
Standard:         3023333 ticks; 1.123916
Dynamic:          3030000 ticks; 1.126394
Dynamic Indirect: 3370000 ticks; 1.252788

Pretty much the same as with g++. Dynamic keeps being as fast as Standard and C++ virtual dispatch for some unknown reasons.

./test-dmd.sh -O -release -inline

== 3-lap average; totaling 1200000000 function calls ==
Static:           1010000 ticks; 1.000000 (base time)
D class:          2650000 ticks; 2.623762
Standard:         2690000 ticks; 2.663366
Dynamic:          2690000 ticks; 2.663366
Dynamic Indirect: 3590000 ticks; 3.554456

Again, we have some interesting results. First, the static call is a lot faster than with g++ or gdc, so it probably got inlined like in the results for Windows. Comparing tick counts, virtual dispatch (D, Standard, and Dynamic) seems to run at the same speed as the static call made by g++ and gdc. Perhaps the tick counts are just in a different unit as the ones from g++ and gdc. If that’s the case, then we can still compare the relative scale, which shows us that Standard and Dynamic run at the same speed (just like g++ and gdc) but that the D virtual dispatch mecanism is a little faster for some reason.

Condensed Results

Percent of Static call time

	Static	Virtual	Dynamic	Dyn. Indirect
Mac G4 g++	100%	242%	256%	285%
Mac G4 gdc	100%	243%	257%	300%
Win Duron g++	100%	128%	139%	151%
Win Duron dmd	100%	1004%	1288%	1432%
Win Duron dmd (no-inline)	100%	105%	126%	43%
Linux Duron g++	100%	112%	112%	125%
Linux Duron gdc	100%	113%	113%	125%
Linux Duron dmd	100%	262%	266%	355%

Percent of Virtual call time

	Static	Virtual	Dynamic	Dyn. Indirect
Mac G4 g++	41%	100%	106%	118%
Mac G4 gdc	41%	100%	106%	124%
Win Duron g++	78%	100%	109%	118%
Win Duron dmd	10%	100%	128%	143%
Win Duron dmd (no-inline)	95%	100%	120%	41%
Linux Duron g++	89%	100%	100%	111%
Linux Duron gdc	89%	100%	100%	111%
Linux Duron dmd	38%	100%	102%	135%

Conclusions

All those tests seems to show that the overhead of the Dynamic dispatch method is pretty small when calling repeatedly the same function in a small loop. In the C++ benchmarks, it’s 6% more than virtual dispatch on a PowerPC G4, 9% more on Windows on an AMD Duron, and, strangly, exactly the same on Linux. I think that’s fast enough to justify having a non-fragile ABI for class functions in D.

But those results have some limitations. They don’t take into account the initialization overhead to populate static variables giving the offset of a function that must be taken before constructing an object. It could also be argued that calling repetitively the same function in a tight loop isn’t realistic and gives more optimisation opportunities than you’d normally have in a real program.

It also remains to be investigated why there’s no overhead for Dynamic on Linux. My only guess up to now is that the linker performs the optimisation; that’d explain why it does the same for DMD, GDC and G++ on Linux, when the former doesn’t share the same optimiser as the others. But right now that’s only a guess.

All this deserves that we dig into the generated assembly code. I’ve done it on PowerPC to make sure there was no unexpected inlining, but given the results it’d be more interesting to disassemble the x86 executables.

March 5, 2009

Snow Safari

One thing I haven’t seen discussed anywhere is the level of details that went in the new look for Safari 4 on Windows. Version 4 brings back Windows-style dialog boxes (a poor imitation of it to be true). But the browser window is showing something new. It looks a lot like Mac OS X controls, but they’re whiter, have subtle but nice rollover effects and the text boxes corners are now rounded. I’m left wondering if this couldn’t be a glimpse of a new look for Snow Leopard.

(Note the rollover effect on the bigger A button in the above screenshot.)

It’s not as if there wasn’t a precedent. When Safari 3 for Windows appeared, it used Leopard’s window style, even though Leopard wasn’t out at the time. What’s different this time is that we haven’t seen glimpses of what the Snow Leopard UI will be, we don’t even know if it will be different. But somehow, I feel they didn’t went to all this trouble defining this new style only for the Windows Safari toolbar.