Complete.Org: Mailing Lists: Archives: freeciv-dev: April 2005:
[Freeciv-Dev] Re: cachegrind output
Home

[Freeciv-Dev] Re: cachegrind output

[Top] [All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
To: Jason Dorje Short <jdorje@xxxxxxxxxxxxxxxxxxxxx>
Cc: Freeciv-Dev <freeciv-dev@xxxxxxxxxxx>
Subject: [Freeciv-Dev] Re: cachegrind output
From: Benoit Hudson <bh@xxxxxxxxxxxxxxxxxxx>
Date: Mon, 11 Apr 2005 23:37:02 -0400

On Mon, Apr 11, 2005 at 05:30:08PM -0400, Jason Dorje Short wrote:
> ==10227== I   refs:      43,911,977,627
> ==10227== I1  misses:        31,618,906
> ==10227== L2i misses:           662,259
> ==10227== I1  miss rate:            0.7%
> ==10227== L2i miss rate:            0.0%

This means when the CPU is fetching the next instruction, it misses in
the on-chip cache only .7% of the time, and those misses essentially
always hit in L2.

> ==10227== D   refs:      22,042,750,698  (14,615,236,139 rd + 7,427,514,559 
> wr)
> ==10227== D1  misses:       346,770,522  (   307,166,794 rd +    39,603,728 
> wr)
> ==10227== L2d misses:        66,183,642  (    48,142,055 rd +    18,041,587 
> wr)
> ==10227== D1  miss rate:            1.5% (           2.1%   +           0.5%  
> )
> ==10227== L2d miss rate:            0.3% (           0.3%   +           0.2%  
> )

Similarly, when the CPU is reading (rd) or writing (wr) data to memory,
it misses in L1 only 1.5% of the time (more on reads than on writes,
which is normal); and of those, 80% are found in L2 rather than going to
main memory.

> ==10227== L2 refs:          378,389,428  (   338,785,700 rd +    39,603,728 
> wr)
> ==10227== L2 misses:         66,845,901  (    48,804,314 rd +    18,041,587 
> wr)
> ==10227== L2 miss rate:             0.1% (           0.0%   +           0.2%  
> )

This is just the sum of data + instruction statistics for L2.  Somehow
they forgot to say that there were 65 billion total memory accesses; the
miss rate it reports is misses / total rather than the alarming number
of 17% that you might expect from looking at those numbers.

A hit in L1 is about a 2-4 cycle delay, which will often be ignored
thanks to pipelining and out-of-order execution.  A miss in L1 that hits
in L2 will be 10-20 cycles; often too long to work around but not a huge
deal.  A miss all the way to main memory will be 50-200 cycles.

So, summarizing: we shouldn't worry about cache performance.  Say the
entire program is only ever reading & writing memory (it's not), L1 hits
cost 2 cycles and L2 misses cost 200 cycles.  Then we spend 2.2 cycles
per memory reference.  So by avoiding every L2 miss (impossible), we'd
speed up the program by 10%.

        -- Benoît



[Prev in Thread] Current Thread [Next in Thread]