[Freeciv-Dev] Re: cachegrind output
[Top] [All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
On Mon, Apr 11, 2005 at 05:30:08PM -0400, Jason Dorje Short wrote:
> ==10227== I refs: 43,911,977,627
> ==10227== I1 misses: 31,618,906
> ==10227== L2i misses: 662,259
> ==10227== I1 miss rate: 0.7%
> ==10227== L2i miss rate: 0.0%
This means when the CPU is fetching the next instruction, it misses in
the on-chip cache only .7% of the time, and those misses essentially
always hit in L2.
> ==10227== D refs: 22,042,750,698 (14,615,236,139 rd + 7,427,514,559
> wr)
> ==10227== D1 misses: 346,770,522 ( 307,166,794 rd + 39,603,728
> wr)
> ==10227== L2d misses: 66,183,642 ( 48,142,055 rd + 18,041,587
> wr)
> ==10227== D1 miss rate: 1.5% ( 2.1% + 0.5%
> )
> ==10227== L2d miss rate: 0.3% ( 0.3% + 0.2%
> )
Similarly, when the CPU is reading (rd) or writing (wr) data to memory,
it misses in L1 only 1.5% of the time (more on reads than on writes,
which is normal); and of those, 80% are found in L2 rather than going to
main memory.
> ==10227== L2 refs: 378,389,428 ( 338,785,700 rd + 39,603,728
> wr)
> ==10227== L2 misses: 66,845,901 ( 48,804,314 rd + 18,041,587
> wr)
> ==10227== L2 miss rate: 0.1% ( 0.0% + 0.2%
> )
This is just the sum of data + instruction statistics for L2. Somehow
they forgot to say that there were 65 billion total memory accesses; the
miss rate it reports is misses / total rather than the alarming number
of 17% that you might expect from looking at those numbers.
A hit in L1 is about a 2-4 cycle delay, which will often be ignored
thanks to pipelining and out-of-order execution. A miss in L1 that hits
in L2 will be 10-20 cycles; often too long to work around but not a huge
deal. A miss all the way to main memory will be 50-200 cycles.
So, summarizing: we shouldn't worry about cache performance. Say the
entire program is only ever reading & writing memory (it's not), L1 hits
cost 2 cycles and L2 misses cost 200 cycles. Then we spend 2.2 cycles
per memory reference. So by avoiding every L2 miss (impossible), we'd
speed up the program by 10%.
-- Benoît
|
|