|Subject:||Re: Phoneme Advanced Questions|
|Date:||Aug 16, 2007 11:33:21 pm|
I have a few responses for you. FYI, below, I call the phoneME Advanced VM by
its other name, CVM (just so that I do less typing):
1. Which did you do: a debug build or a non-debug build? Here you say that you
are comparing a debug build of CVM against presumably the the optimized version
of JavaSE on linux x86:
We found that the performance of JVM in phoneme(Debug version) was slower one time than that of Release Version on linux Operation System.
Here, you say that you are building a non-debug build of CVM:
The build options are J2ME_CLASSLIB=basis, CVM_DEBUG=false, CVM_PRELOAD_LIB=true and the others is default.
So, which is it? Are you doing your benchmark with a debug build or not?
2. CVM's x86 port is not fully tuned and optimized yet. Hence, of course, you
should expect the optimized JavaSE version to be faster here. CVM is normally
targetted towards embedded devices with different characteristics than the x86
PC that you are running on. The ARM port for example is one that is tuned and
optimized (for some configuration of the ARM).
3. You didn't build CVM with its JIT enabled. Hence, you are comparing an
interpreter run on CVM (possibly in debug mode) against the optimized JavaSE VM
with its JIT. Of course, you would expect JavaSE to be faster here too. Turn
on the JIT with CVM_JIT=true and you will see that it will run faster, though
JavaSE will still be faster.
4. Your benchmark code is a classic example of a bad benchmark for various
a. the gcc C compiler will detect that all the arithmetic you are doing in those
loops are dead code, and just eliminate them. The net effect is that you have
functions that measure the time of doing nothing.
b. real world applications (such as the set top box ones you are hoping to
target) will not sit around in loops like these and do arithmetic that will
result in dead code that gets eliminated. Hence, for real world applications, a
C compiler will not be able to generate code that runs as fast as code that does
nothing like in your benchmark example.
c. the JavaSE JIT will also optimize away some of this code, but will not do as
complete a thorough job as the C compiler. The reason is not because we cannot
make it optimize this kind of code. It is because there is no point in doing so.
As mentioned above, real world applications don't do things like this. Hence,
it would be a waste of footprint and code complexity to make the JIT optimize
away code like this when they don't exists in real world applications.
d. CVM's JIT will do even less optimizations of this kind than JavaSE because
CVM is targetted torwards embedded devices (like the set top box you are hoping
to port to), and embedded devices are even more sensitive to footprint and CPU
resources. Hence, it will not waste resources to optimize this kind of code
that normally doesn't appear in applications.
For all the above reasons, your benchmark comparison is not meaningful.
5. You said:
Actually, we wish transplant the phoneME Advanced to OS20 Operation System in STB(Set Top BOX). the CPU on STB is STx5105 whose frequency is 200MHz and whose memory is 64MB DDR SDRAM(frequency:133MHz). But we found too that the performance of phoneME Advanced JVM on STB is slower 150 to 200 times than in my PC.
We bulid phoneME Advanced on PC(Intel(r) Pentium(r)4 CPU 3.2GHz, 512MB memory) and my PC's Operation System is Windows XP. We ran a linux virtual machine named VMware on Windows XP, and we ran phoneME on the linux VM. - the memory is 384MB to be allowed to use Linux OS
a. STB runs a CPU at 200MHz. PC runs at 3.2GHz. Comparing CPU clock speeds,
that's a difference of 16x.
b. PC has a front side bus @ 800MHz. STB bus speed is 133MHz. That's a
difference of 6x.
c. ST20 has 2K I-cache, 2K D-cache, 2K SRAM. PC has an 8K L1 cache, 512K L2
cache and maybe 2M of L3 cache. Let's assume the L3 is not present. That's a
rough difference of about 512K / 6K = ~85x.
NOTE: As an estimate, I'm using data about the Pentium 4 3.2GHz from this
article: http://www.pcstats.com/articleview.cfm?articleID=808. I got the ST20
cache data by googling for "ST20 cache" and made an educated guess.
Without taking all other factors into consideration (e.g. IO access speed to
load the Java VM code into memory, CPU architecture, instruction set, RAM
capacity, RAM speed, etc), the above alone gives us a hardware performance
difference of roughly 16 x 6 x 85 = 8160x to the advantage of the PC. So, why
is it not realistic that you would expect CVM to run slower on the STB than on
the PC by around 150 to 200 times?
Of course, my calculation is very simplistic, but it certainly illustrates the
point. The real performance difference is more complex than that and based on
too may variable to calculate in a straightforward formula like this. That's
why you are seeing a performance difference of ~200x instead of 8160x.
6. You asked:
We want to know whether this situation is general. whether is this version phoneME Advanced JVM be fit for the platform?
Whether CVM is fit or not for your platform depends on your needs. If you want
to use it like a desktop to do desktop PC type work, I think you will be
disappointed for obvious reasons. But if you intend to use it as an STB to do
STB type processing, then I think it will probably be a good fit.
Mark [Message sent by forum member 'mlam' (mlam)]