Changelog for 3.0-776
Optimize JitCache::InvalidateICache by maintaining a "valid blocks" bitset
Most of the InvalidateICache calls are for a 32 bytes block: this is the
number of bytes invalidated by PowerPC dcb*/icb* instructions. Profiling
shows that a lot of CPU time is spent checking if there are any JIT blocks
covered by these 32 bytes (using std::map::lower_bound).
This patch adds a bitset containing the state of every 32 bytes block in
RAM (JIT cached/not JIT cached). Using that, a 32 bytes InvalidateICache
can check in the bitset if any JIT block might be invalidated. A bitset
check is a lot faster than an std::map::lower_bound operation, improving
performance of JitCache::InvalidateICache by more than 100%.
Some practical numbers:
* Xenoblade Chronicles (PAL)
56.04FPS -> 59.28FPS (+5.78%)
* The Last Story (PAL)
30.9FPS -> 32.83FPS (+6.25%)
* Super Mario Galaxy (PAL)
59.76FPS -> 62.46FPS (+4.52%)
This function still takes more time than it should - more optimization in
this area might be possible (specializing for 32 bytes blocks to avoid
useless memcpy, for example).