Table of Contents
BE ENGINEERING INSIGHTS: Where Does the Time Go? By Rico Tudor rico@be.com
The star of this article is Where I quantify results, note that my test machine is a uniprocessor 200MHz Pentium Pro, a mid-level performer these days. All concepts and results are applicable for BeOS users on PowerPC, although the numbers may vary. Benchmarking a system from the outside is straightforward: hook up the logic analyzer and read off the numbers. However, having a system benchmark itself presents some real problems. Heisenberg's Uncertainty Principle, which applies to subatomic particles, states that the more you know about a particle's position, the less you know about its momentum (where it's going), and visa versa. For us to observe the particle, it must interact with its surroundings, and that changes the state as observed. Analogously, the more you know where your program is executing, the less you know how much time it's hogging. For us to observe the program's whereabouts, we must add code, and that introduces overhead that is not part of the program. In both arenas, we can attempt to gain better measurements by using indirect methods. In the case of benchmarking, we want to benchmark in ways that allow overhead to be minimized, measured or canceled. Using only SIMPLE EXAMPLE First, let's time bigtime_t t = system_time( ); t = system_time( ) - t; printf( "%f\n", (double)t); This measures the time to execute one call of bigtime_t t = system_time( ); for (int i=0; i<1000-1; ++i) system_time( ); t = system_time( ) - t; printf( "%f\n", t/1000.); I get about 0.28us... good enough for government work. The Grind Routine Here is the first installment of our applet. The "#include <Application.h> #include <math.h> #include <stdio.h> #include <string.h> struct pointer { struct pointer *p; }; struct pointer futile = { &futile }; uint z; void grind( uint n) { struct pointer *p = &futile; for (uint i=0; i<n; ++i) { p = p->p; p = p->p; p = p->p; p = p->p; p = p->p; p = p->p; p = p->p; p = p->p; p = p->p; p = p->p; p = p->p; p = p->p; p = p->p; p = p->p; p = p->p; p = p->p; p = p->p; p = p->p; p = p->p; p = p->p; p = p->p; p = p->p; p = p->p; p = p->p; p = p->p; p = p->p; p = p->p; p = p->p; p = p->p; p = p->p; p = p->p; p = p->p; } z += p == 0; } futile " is a linked list, made circular by pointing to itself.
grind() traverses this list futilely until its counter is
exhausted. In assembler, the inner loop looks like this:
loop: movl (%eax),%eax movl (%eax),%eax . . 28 more "movl" instructions . movl (%eax),%eax movl (%eax),%eax incl %edx cmpl %ecx,%edx jb loop The compiler cannot eliminate this code by optimization, partly
because of the dummy variable "z" (always 0). For each step
along the list, the Intel P6 core (Pentium Pro, Pentium II)
takes 3 clocks, the L1 cache latency. Loop overhead is
eliminated by speculative execution (fancy hardware). Therefore,
The situation is largely the same with the PowerPC, although the numbers are different (e.g. 2 clocks needed by the 604). Calibration Next, we need to calibrate To do this, we make two assumptions (uh, oh). First, that time
is allocated in a fixed quantum by the scheduler. Second, if
we run a short enough Here is the main body of the applet. bigtime_t gtime; uint gcount; struct A: BApplication { A( ): BApplication( "application/x-cpuload") { Run( ); } void ReadyToRun( ) { calibrate( ); new W( BRect( 0, 0, 100, 100)); } void calibrate( ) { gcount = 10; while (TRUE) { gtime = 0; for (uint i=0; i<100; ++i) { bigtime_t t0 = system_time( ); grind( gcount); bigtime_t t = system_time( ) - t0; if (gtime==0 || t<gtime) gtime = t; } if (gtime > 800) break; gcount *= 1.1; } printf( "gtime=%Ld gcount=%d MHz=%f\n", gtime, gcount, gcount*32.*3/gtime); } void MessageReceived( BMessage *m) { if (m->what == 'q') Quit( ); } }; main( ) { A a; return (z); } The search is over when the time quantum exceeds 800us. Why? Because I claim there is a clock interrupt every 1ms, the first regular source of overhead that calibrate() will encounter. You can verify my claim by nudging the printf() into the "while" loop, extending the search to 2 seconds, and plotting the results. Here's what I get:
201 + | | 200 + # | # | # ##### # # # 199 +## | # | # 198 + ## ## | # ###### | # ### ### | ########## 197 + ### | | 196 + | ## | # # | # # # 195 +-------------+------------+-------------+------------+-#--- 100 1000 10000 100000 1e+06 This is a plot of time-slice (in microseconds) against effective CPU clock frequency (in MHz). Measurements are noisy until a few hundred microseconds, after which the CPU is reliably measured at 199.5MHz, close to the manufacturer rating. At 1000us, it drops suddenly to 198MHz: this is evidence of the 1000Hz clock interrupt, when BeOS must perform housekeeping. The graph shows steadily increasing load between 1ms and 1s. This load includes Pulse events, and other periodic work in the servers. Work of periodicity 1s is popular since another decline is clearly visible. By writing The CPULOAD Applet Here is the remaining code for " struct L: BLooper { L( ) { n = 0; Run( ); PostMessage( 'g'); } void MessageReceived( BMessage *m) { set_thread_priority( find_thread( 0), B_LOW_PRIORITY/5); while (TRUE) { grind( gcount); ++n; } }; uint n; }; struct C: BView { C( BRect r): BView( r, 0, B_FOLLOW_ALL, B_WILL_DRAW|B_PULSE_NEEDED) { lastt = system_time( ); memset( lastn, 0, sizeof lastn); system_info si; get_system_info( &si); for (lcount=0; lcount<si.cpu_count; ++lcount) ltab[lcount] = new L( ); } void Pulse( ) { bigtime_t t = system_time( ); float f = 0; for (uint i=0; i<lcount; ++i) { f += 100. * (ltab[i]->n-lastn[i]) / ((float)(t-lastt)/gtime); lastn[i] = ltab[i]->n; } lastt = t; f /= lcount; graph( f); } virtual void graph( float) = 0; bigtime_t lastt; uint lastn[20], lcount; L *ltab[20]; }; struct V0: C { V0( BRect r): C( r) { } void graph( float f) { printf( "%f\n", f); } }; struct W: BWindow { W( BRect r): BWindow( r, "cpuload", B_TITLED_WINDOW, B_NOT_ZOOMABLE) { AddChild( new V0( r)); MoveBy( 100, 100); SetPulseRate( 1000000); Show( ); } bool QuitRequested( ) { be_app->PostMessage( 'q'); return (TRUE); } }; L does the grunt work of calling C starts L and calculates the load on a per second basis. It grabs results directly from L, a different thread, which works because the datum is a single machine word. This eliminates communication overhead, which would otherwise have to be analyzed. V0 prints the results textually. W requests Pulse events for reporting purposes, 1 per second. Observations A simple way to swamp the CPU for a few seconds is the following shell command:
With " " running, CPU availability drops from 97% to 4%. Why not 100% and 0%? 97% reflects actual BeOS time-sharing overhead, while 4% shows the impact of "cpuload" itself. While its scheduling priority of 1 is as low as possible, BeOS will nonetheless allocate "cpuload" a few time-slices per second. This anomaly still allows accurate measurements in the remaining range. A scheduling priority of 0 would cause scheduling conflicts with the idle thread, yielding peculiar results.cpuload CPULOAD Variations Text output has some advantages:the applet introduces less overhead, and results can be logged for later analysis. With averaging over a minute, precision of 4 digits can be expected. However, graphical output is compelling, so I provide two possible replacements for V0: struct V1: C { V1( BRect r): C( r) { x = r.right; } void graph( float f) { ScrollBy( 1, 0); StrokeLine( BPoint( x, 100), BPoint( x, 100-(f+.5))); ++x; } uint x; }; struct V2: C { V2( BRect r): C( r) { memset( rec, 0, sizeof rec); nrec = 0; } void graph( float f) { if (nrec == 0) rec[nrec] = f; else rec[nrec] = f + rec[nrec-1]; ++nrec; for (uint i=0; ; ++i) { uint j = 1 << i; if (j >= nrec) break; float y = (rec[nrec-1]-rec[nrec-1-j]) / j; FillRect( BRect( i*8, 100-y, i*8+6, 100)); SetHighColor( 255, 255, 255); FillRect( BRect( i*8, 0, i*8+6, 100-1-y)); SetHighColor( 0, 0, 0); } } void Draw( BRect) { SetHighColor( 255, 0, 0); for (uint x=0; x<100; x+=20) FillRect( BRect( 0, x, 100, x+10-1)); } float rec[33000]; uint nrec; }; V1 implements a simple horizontal scrolling plot. V2 shows the load for the last 2^n seconds, for all integers "n" from 0 to 15: each bar averages twice the historical load of the previous bar. Another variation involves running Finally, you can start multiple grinders on one CPU, and measure BeOS time-sharing efficiency. With two grinders per CPU, instead of one, I measure CPU efficiency at 99.92%. Are we having fun yet? Conclusion " Understanding these interactions is the crux of benchmarking in the modern environment of complex OS and hardware. The answer is attacking the benchmarking challenge with the same custom concentration that you use to write the program. And take industry benchmark figures with more than a grain of salt.
DEVELOPERS' WORKSHOP: But You Can't Make Her Think By Doug Fulton lbj@be.com "Developers' Workshop" is a weekly feature that provides
answers to our developers' questions, or topic requests.
To submit a question, visit http://www.be.com/developers/suggestion_box.html.
In the 1920's, a ranger at Yellowstone National Park found
himself in a psychological battle with two of the college
students the Park hired for summer work. Less than a battle,
actually, it was more a series of artful annoyances played on
the ranger by the students. They were careful about covering
their tracks, so the ranger couldn't retaliate or even reprimand
them -- although avoiding punishment wasn't their main concern.
They were more interested in increasing the ranger's frustration
by their anonymity. Finally, however, the two blew their cover
in the interest of art.
Of the hundreds of geysers in Yellowstone, a few erupt
with a dependable regularity. Bleachers are set up around
these geysers, where the park rangers lecture the tourists
for a few minutes before the eruption. Just to the side of a
tool shed near their nemesis's geyser, the two students planted
a steering column, with the wheel still attached, that they had
lifted from an abandoned car, jabbing it column end down into
the ground.
They hid behind the shed waiting for the crowd to assemble.
Seconds before the geyser erupted -- just as the ranger was
finishing his speech -- one of the students yelled to the other
something along the lines of "Okay, let her go!", and the
other fellow jumped out from behind the shed and, feigning
great effort, began to spin the wheel, apparently letting loose
the jet of steam and hot water.
This is an amusing, but misleading, UI.
It was leaked in these pages a few weeks ago that the UI
Guidelines, our Balm of Gilead for the Control what Ails You,
was back from the ghost writers and THIS CLOSE to being finished.
True-ish, but, unfortunately, this propinquity is affected by
daylight: All the words are there (and then some), but we're a
bit thin on hours in the day that can be devoted to pushing
those words into the correct order and trimming the ones
that don't do much more than sit on the page and smile
at you as if you yourself were a drooling idiot in need of
constant reminding that the user will be less confused if you
design your UI such that it is...less confusing.
Out on the documentation plantation (where we grow our own adverbs,
and at night you can hear the boasts of the rodomonts as they
stuff syllables back into their syncopes), we're also raising tech
doc, user doc, asparagus, release notes, and miscellaneous
editorialisms and articles. And of all these, the UI Guidelines,
whose topic is a religious issue, needs the most scrupulous
attention (of a certain type): We don't want to publish an errata
sheet for the bible.
So let's resynchronize our watches: We're going to try to get the
UI Guidelines -- at least the parts with the sex scenes -- on the
R4 CD. But if they don't quite make it, they'll be posted to the
web site soon thereafter. You can lead a horse to water, but don't
hold your breath.
The "Red Herring" makes interesting reading -- an insider view
of Silicon Valley wheelers and dealers. It's well-written and
opinionated, above cutting-and-pasting quotes from the usual
suspects and recycling of press releases, which makes some
other business magazines so bland. One doesn't have to agree
with every opinion this estimable magazine offers, however.
Opinions are made to be disagreed with, and I'd like to
oblige.
In the "Red Herring" for November 98, in an article titled
DRINKING THE KOOL-AID(TM) <http://www.redherring.com/mag/issue60
/editor.html>, Jason Pontin opines, "Steve Jobs rescued Apple --
but the point is moot." He concedes that Apple has a simpler,
more successful product line, smaller inventories, and now, for
the first time since 1995, a healthy bottom line. But, he
contends, it doesn't matter anymore.
"The whole point of the Mac was to be a better alternative to
Microsoft operating systems on Intel chips. Yet no one wants
such an alternative anymore, except graphic designers who are
used to it," he says. Pontin argues that iMac buyers aren't
thinking "different," they're just buying a fast Internet
terminal that also runs Office, probably implying a PC would
do more for less. In his mind, people who rave about Apple's
spectacular turnaround have drunk the famous Kool-Aid the
founder was accused of serving his followers in the old days.
And he concludes: "Steve Jobs has saved Apple. Good for him.
It doesn't matter."
As indicated earlier, I beg to differ.
The article notes, but fails to attach much significance to,
the fact that almost 10 percent of iMac buyers were PC users.
Keeping in mind the iMac quirks and price, the iMac success
with PC owners cannot be dismissed. Just as Apple fans were
wrong to keep on dissing the growing number of PC buyers for
not seeing that the Mac was so much better, PC aficionados
would err if they thought the many people who buy iMacs were
just drinking the Apple Kool-Aid.
Simplicity and style, exactly what Apple promoted, do matter.
And, unlike the "Red Herring," the PC industry agrees and has
reacted in its usual pragmatic way. Andy Grove was quoted in
Time magazine on the positive effect of the iMac on the
thinking of his customers, PC makers. The risk for Apple is
to see the PC industry once again adopt and improve upon its
ideas. On the bright side, Apple is in a leadership position
again, not teetering on the edge of the grave.
I don't believe any company should survive merely because it
provides "diversity" to the ecosystem, but when customers
vote with their wallets for a variety of solutions, this is
good news. Who would have predicted a year ago Linux and the
iMac would appear as legitimate choices for enterprises and
consumers?
This is ironic at the very time when the trial for the DOJ
suit against Microsoft is about to start, especially when one
considers the role Microsoft played in supporting Apple's
turnaround. In any event, for us, this is a much better world
than one where "other" platforms are unthinkable. It's much
easier for us to be a specialized platform in an ecumenical
world than off-color in a monochromatic one. Imagine what the
"Red Herring" would think of a market where customers thought
there was no need for more than one business magazine.
1997 Be Newsletters | 1995 & 1996 Be Newsletters Copyright ©1998 Be, Inc. Be is a registered trademark, and BeOS, BeBox, BeWare, GeekPort, the Be logo and the BeOS logo are trademarks of Be, Inc. All other trademarks mentioned are the property of their respective owners. Comments about this site? Please write us at webmaster@be.com. |