Table of Contents
Sound and Music Wanted: BeOS game music and sound effects BeOS game company looking for a sound and music person for a game currently in development. Will need music in a low CPU intensive format, and a smattering of sound effects. Undecided on a licensing policy yet, so there may or may not be money involved. Assume "not." Plenty of info is available to the interested. For more information contact: Mike Pelletier at: mike@compar.com
BE ENGINEERING INSIGHTS: "Threads Don't Like Playing Ping Pong," Part I The main purpose of the engineering articles is to deliver simple and accurate technical information that's meaningful to as many people as possible. Thus, simple articles are better than complex ones -- a rule I follow as often as I can. But not today. With apologies to the average developer who is never going to deal with the subtleties I'm about to discuss, I just can't stop myself! After many general information articles on various subjects, I felt an irresistible urge to write some strong technical content. You've been warned, now continue at your own risk. At the heart of the BeOS graphic system lies the application server. It's been the theater of continuous cleaning, reorganization and optimisation during the last 12 months, and more will happen, at least until Release 6. Recently, we discovered a nasty interaction between the app server and the kernel's thread scheduler. This problem had very serious consequences, and solving it has brought a significant performance improvement to Release 4. It also pointed out an interesting class of problems, that some of you could encounter in the future (if not already). So in this article, we are going to study different mechanisms you can use to synchronize threads, and try to describe their behavior under well-defined stress conditions. For that we used one or two "worker" threads, executing the same reference task again and again. They had to use a synchronization mechanism to guarantee that only one thread executed the task at a time (from now on, the reference task will be called the "critical section"). To simulate the CPU noise generated by other unrelated threads, we sometimes added an additional dummy thread. The dummy thread didn't vie for the critical section -- in other words, it didn't interfere directly with the other two threads. All measurements were done on a Pentium II 350MHz in conditions as stable and similar as possible, with most of the servers, applications and services disabled, but with some daemons left running. In this environment, all code that we executed was always available in the level 2 cache (if not in the level 1!). In the real world, of course, this isn't a realistic assumption, but as you will see, studying the perfect case is hard enough already... TEST 1 -- An Ideal World In this test, one B_NORMAL_PRIORITY thread repeatedly executed a 6.1us ("us" = "microsecond") task that's protected by a simple semaphore: while (true) { acquire_sem(g_sem); do_critical_section(6.1); release_sem(g_sem); } We ran the loop for about two seconds and recorded the latency (time consumption) for each lock and for each unlock. We then plotted the results on separate graphs (i.e. one for locking, another for unlocking). These graphs represent a very rich source of information, and we're going to be studying them throughout most of this article. The graph's horizontal axis is latency in microseconds; the vertical axis is the hit count, or the number of locks/unlocks at a particular latency. Both axes are exponential and elided where necessary (to fit the format of an email, and reduce the length of this article). All graphs are supposed to be viewed with a monospaced font. This first graph shows the locking latency: -------------One thread, semaphore, locking------------- 32768| A | AA 16384| AA | AA 8192| AA | AA 4096| AA | AAA . ... | AAA 128| AAA | AAA B 64| AAA BB | AAA BB 32| AAA BBBB | AAA BBBB 16| AAA BBBB | AAA BBBBB C 8| AAA BBBBB C | AAAA BBBBB C 4| AAAA BBBBB CC | AAAA BBBBB CC 2| AAAA BBBBB # CCC # # | AAAABBBBBB ## ## CCC # # ### # # +!-------!-------!-------!-------!-------!-------!-------!--...--!-- 0 8 16 32 64 128 256 512 2048
Here's the graph for unlocking : -------------One thread, semaphore, unlocking------------------ 32768| A | AA 16384| AA | AA 8192| AA | AA 4096| AA | AAA . ... | AAA 256| AAA | AAA B 128| AAA BB | AAA BBB 64| AAA BBBB | AAA BBBBB 32| AAA BBBBB | AAA BBBBB 16| AAA BBBBB C | AAA BBBBB C 8| AAA BBBBB CC | AAA BBBBBB CC 4| AAAA BBBBBB # CCC | AAAA BBBBBB ## CCCC 2| AAAABBBBBBB ## CCCC ## # # # | AAAABBBBBBB### CCCCC ## ### # # # +!-------!-------!-------!-------!-------!-------!-----...--!----- 0 8 16 32 64 128 256 2048
We replaced the semaphore with a benaphore. You can look at one of the previous Engineering Insights articles for more detailed information about the benaphore: http://www.be.com/aboutbe/benewsletter/Issue26.html. while (true) { if (atomic_add(&g_lock, 1) > 0) acquire_sem(g_ben); do_critical_section(6.1); if (atomic_add(&g_lock, -1) > 1) release_sem(g_ben); } -------------One thread, benaphore, locking------------- 65536|A |A 32768|AA |AA ... |AA 16|AA B |AA BBB 8|AA BBB |AAA BBB 4|AAA BBB |AAA BBBBB C 2|AAA BBBBB CC |AAA BBBBB CC # +!-------!-------!-------!-------!-------!-- 0 8 16 32 64 128 As expected, the probability and timing are much improved:
-------------One thread, benaphore, unlocking------------- 65536|A |AA ... |AA 256|AAA BB |AAA BBB 128|AAA BBB |AAA BBB 64|AAA BBBB |AAA BBBB 32|AAA BBBB |AAA BBBBB C 16|AAA BBBBB CC |AAA BBBBB CC 8|AAA BBBBB CC |AAA BBBBBB CC 4|AAA BBBBBB CCC |AAA BBBBBBB CCC # 2|AAA BBBBBBBBB CCC ## ## # # # |AAA BBBBBBBBBBB ## # CCCC ## ## ### ## # # +!-------!-------!-------!-------!-------!-------!-----...--!----- 0 8 16 32 64 128 256 2048
Round-trip: Greater than 99% probability of a 6.7us total round-trip -- pretty good for a 6.1us critical section. But, then again, this is an ideal -- and so unrealistic -- test. NOTE: In all our tests, the unlocking statistics are mostly similar to one or the other of the two already shown. Unlocking never blocks so, as you might expect, it doesn't really affect overall system performance. So as there isn't too much to be learned from those, we won't show any unlocking graphs for any of the other tests, but will only make some comments when necessary. TEST 3 -- A Small Dose of Reality Let's see what happens when we introduce a second B_NORMAL_PRIORITY thread. We'll have two equivalent threads fighting over the lock. First, we use a semaphore: -------------Two threads, semaphore, locking------------- 16384| A . . | A 2048| AA | B AA 1024| BB AA . .. .. 256| BB AA C | BB AA CC 128| BBB AA CC | BBB AA CC 64| BBB AA CC . ... ..... 16| BBB AACCC D | BBB AACCCC D 8| BBB AACCCC DD # # | BBB AACCCC DD # # 4| BBB # AACCCC DDD # # # | BBB # AACCCC DDDD # # # # 2| BBB # # AACCCCC DDDDD # ## # # | BBB # ## ##AAAACCCCCDDDDDD ## ## # ## # # +!-------!-------!-------!-------!-------!-------!-----...--!---- 0 8 16 32 64 128 256 2048 NOTE: This is a graph of only one of the threads. The profile of the other thread is very similar, as expected, with the same spikes in term of both probability and latency: (A) : 24.138us (85.677%) against 24.168us (85.285%) (B) : 3.545us (11.741%) against 3.543us (12.159%) (C) : 29.059us ( 2.303%) against 28.878us ( 2.236%) (D) : 52.718us ( 0.137%) against 51.378us ( 0.157%) So from now on, we won't comment about the second thread at all.
Round-trip: 24.1 + 6.1 + 5.0 = 35.1us for 7/8 of the iterations. The "ping pong" clearly affected the performance, but it's not a total disaster compared to the 13.1 * 2 (don't forget, now we have 2 threads working in parallel!) = 26.2us of the semaphore. TEST 4 -- The Benaphore Curiosity Mostly out of curiosity, let's see what happens when we use a benaphore in the two threads context. -------------Two threads, benaphore, locking------------- 16384| A . . | A 2048|B AA |B AA 1024|BB AA |BB AAA 512|BB C AAA |BB C AAA 256|BB CC AAA D ... .. ... . 32|BB CC AAADD |BBCCC AAADD 16|BBCCC AAADD E |BBCCC AAADDD E 8|BBCCC AAADDD # E |BBCCC AAADDD # EE 4|BBCCC AAADDD #EEE |BBCCC # AAADDD #EEE # # 2|BBCCCC # AAAADDD #EEE # ### # |BBCCCC ## AAAAADDDD #EEE ####### # # ## ## +!-------!-------!-------!-------!-------!-------!-----...--!---- 0 8 16 32 64 128 256 2048
As expected, in ping pong mode the benaphore is slightly slower than the semaphore (24.7us vs. 24.1us), but the non-blocking case is still much faster (0.31us vs. 3.5us). The unlocking test (not shown) also satisfies this expectation. Round-trip: A little bit better than the semaphore. The improvement in non-blocking performance makes up for the degraded ping pong. In real life, it's not clear that we would get a similar behavior, but in any case, even if the benaphore gets slower than the semaphore, it won't be by much. First part conclusion: When two threads have to fight for the ownership of a shared critical section, performances can be significantly reduced. The snappy little benaphore degenerates into a semaphore, and the CPU efficiency is reduced by a small factor -- the two-thread tests are 1.4 to 2.5 times less efficient than the one-thread tests. This would seem bad enough already, but who knows what's going to happen once we move those tests from a ideal and isolated environment into a more realistic one. Clearly a story for next week...
DEVELOPERS' WORKSHOP: Mucking With Menus "Developers' Workshop" is a weekly feature that provides
answers to our developers' questions, or topic requests.
To submit a question, visit
http://www.be.com/developers/suggestion_box.html.
Foreword
Harlan Ellison once noted that first impressions are deadly,
dangerous, and deceiving. Since this is my debut Newsletter
article, I'm aware of what's at stake. I've had my hair cut.
I've shaved my face raw and put a styptic pen in my pocket
for the nicks. I'm wearing that same collared shirt, spiffy
tie, and cotton slacks that almost lost me the job when I
interviewed here. All this so that you'll walk away from
this column with that warm, fuzzy, BeOS-coding kind of
feeling. So much for introductions.
Delicatessens, Laminates, and American Consumers
An odd phenomenon occurs after midnight on summer weekends
here in the land of suburban sprawl. In eerie emulation of
the coastal grunion runs, hordes of Hollywood zombies issue
forth from their movie theater coffins. At that time of
night activities are scarce -- absent certain chemical and
hormonal addictions, and/or a zoot suit and the
irrepressible urge to dance.
For the rest of us there's the 24-hour delicatessen. Some go
for the food; some go to avoid the grunion. Me, I head to
the deli to marvel at one thing: a laminated, folded sheet
of cardboard stock which, when unfolded, stands two feet
high and three feet wide. This item is not just a menu --
it's a symbol of the bewildering diversity of American
culture, of the consumer's RIGHT TO CHOOSE. Confronted by
this formidable selection, I forego exercising my birthright
and always get the corned beef Reuben and the vanilla Coke.
So what has that got to do with the BeOS, you ask? The
connection is menus. These are not just the cornerstone of
the contemporary GUI; they also show off BMessages and
object- oriented design to good effect. And, they provide a
safe, wholesome springboard to the wonderful world of the
Interface Kit (see the foreword on the Importance of First
Impressions).
To whet your appetite for menus, I've concocted an example
of an application that tweaks menus a bit more than the
average Be application. Appropriately enough, it's called
MenuWorld. Go ahead, download it:
ftp://ftp.be.com/pub/samples/interface_kit/MenuWorld.zip
The following notes boil down the essence of this example.
The Four Corners
Ninety percent of menu behavior is defined in four classes:
http://www.be.com/documentation/be_book/The%20Application%20Kit/Message.html.
A romantic relationship awaits you.
Primping and Preening
There are three different kinds of menus as far as drawing
goes: column-drawn (standard), row-drawn (e.g., menu bars),
and custom-drawn (called "matrix," though there's no
restriction on how the items are arranged). You specify
which one you're using when you create the menu. MenuWorld
proudly presents all three methods.
If you're arranging menu items in a column, things couldn't
be easier. The menu works with your menu items to figure out
how big to make the menu, and how to arrange the items
within the menu. The menu also takes care of drawing submenu
triangles and shortcut symbols. All you need to do is create
your menu items and add them to the menu. Working with
row-by-row organization is similar.
If you're doing a custom arrangement of items, things get
more interesting. Here are some points to consider:
ZPG and Other Population Paradigms
Stuffing your menus with items is easy, and can be done in
several places:
Magick Mother Invocation
Since BMenuItems inherit from BInvoker, you can pull many of
the same tricks with them as you can with controls and other
invokers. In particular, your messages can be arbitrarily
complex, and you don't need to dispatch them to the main
window. For instance, the File->About... menu item
dispatches to the global application object. Also note that
each user item contains the same message ID, so they all get
handled in the same place (a separate field identifies the
actual name of the item which sent the message); the test
items are set up in a similar manner. Again, try achieving
that same kind of flexibility with MFC!
There is a price to this flexibility, however. Assuming
MenuWorld were a full-featured menu editor, let's say we
wanted to save the menus we've created and use them in our
application, ˆ la Visual C++ menu resource editing. The Be
way to do this is to archive the Menu Bar and its
constituents into a BMessage, then write the flattened
BMessage data to a BResources object.
The problem, which ties in to a recent missive on BeDevTalk,
is that the BMenuItems do not store their targets -- indeed,
they have no inherent way to identify a target once they've
been archived and reconstituted. One thing you can do pretty
easily is to set the target of each menu item to the main
window when you instantiate them. However, if you want to
set special BMenuItem targets in MenuWorld, coming up with
an archiving mechanism is non-trivial. Some third-party apps
on BeWare, such as AppSketcher and Interface Elements,
address this problem:
http://www.be.com/beware/Development.html#Cat_Interface%20Creation
That's it for this week. I leave you with a German blessing:
"Froehlich sei das Codeverbessern; guten Appetit!"
BeDevTalk is a discussion group that's a forum for the
exchange of technical information, suggestions, questions,
mythology, and suspicions. In this column, we summarize some
of the active threads, listed by their subject lines as they
appear, verbatim, in the group.
To subscribe to BeDevTalk, visit the mailing list page on
our Web site: http://www.be.com/aboutbe/mailinglists.html
Subject: Random not so random?
The opening parry regarded the proper seeding of the random
number generator: Call srandom() (or srand()) only once, and
before any random() (rand()) calls. The safest bet is to set
the seed in main().
Other details popped out:
Subject: objects in areas
How do you put an object in a shared area so more than one
application can use it? For example, how do you share a
BBitmap? As asked by Ingo Weinhold:
"The only way I see is to archive the BBitmap object and
flatten this archive to the area. [However,] each
application has to unflatten and reinstantiate the object,
which won't save any memory...
"So let's assume it's possible...other applications could
clone the area, but the logical addresses may differ. An
object that has stored pointers to address its data will get
into trouble if its logical address changes. Correct?"
First things first: No, you can't share an object (not
safely and predictably, anyway). As for creating an object
at a specific address, Jon Watte reminds us that C++
provides a "placement" allocation.
Subject: (Sigh) BeOS on PPC
Announcing a grass roots movement to petition Apple to
release G3 specs to Be. The devil's advocate response from
Geoffrey Clements (which, read literally, depicts a bizarre
anatomy):
"Who cares, really? Yeah a G3 running the Mac OS will kick a
Pentium II running Windows's ass. But a dual Pentium II
running the BeOS will Kick the G3's ass."
Yes, but with a proviso:
"BeOS is a faster operating system, so long as it doesn't
get klunkier in each release. Its been rather noticeable
that BeBoxen that could zip speedily along in DR8.2 (BeOS in
1996) are almost unusably slow in R3. I'm speculating that
one of these slowdowns is due to the MIME typing file
handling system."
The proposed solution: Make MIME-typing optional. "What?!?"
quoth Chris Herborth:
"How could the filesystem's MIME file attributes slow down
the display and applications in R3? The filesystem in R3 is
_orders_of_magnitude_ faster than the one in DR8. And it
wasn't the only thing that changed between DR8 and R3.
"If the file typing is made optional, we'll have the same
useless filesystem as Windows and UNIX; our files won't know
what they are, and we'll have to rely on stupid file
extensions for everything."
Subject: Performance followup
A spawn of the BeOS on G3 thread, listeners discussed system
performance. Is it degrading? Is it the app server's fault?
MIME types? Anti-aliased fonts? After some discussion
THE BE LINE: (From George Hoffman).
The graphics team is well aware of the state of app_server
performance. Not all of the perceived slowdown from PR2 to
R3 is the fault of the app_server, but there are hard
numbers which do indicate a slight slowdown in graphics
performance. ... One of the big priorities we've had for R4
is performance, and I'm pleased to say that the R4
development app_server is quite a bit faster than R3, and,
more importantly, _feels_ faster.
(Also, see Pierre Raynaud-Richard's article, above.)
Subject: Broken assembly debugger on Intel R3.1?
Delesley S. Hutchins is having trouble with Be's (ahem)
streamlined debugger:
"...I can't get the debugger to work... I type "help", and
nothing happens... it doesn't even go back to the debugger
prompt."
From the scept'red isle, that other Eden, demi-paradise,
royal throne of kings and Tinky Winky, Simon Clarke
suggests, moddishly:
"Do you have
set in your user setup environment file? I had the same
problem if I went into the debugger from the alert that a
crashed app shows. However, with the debugger on all the
time it works fine."
Subject: Tips on maniacal code optimization
What's the first step along the path to optimization?
"Profile it first!"
1997 Be Newsletters | 1995 & 1996 Be Newsletters Copyright ©1998 Be, Inc. Be is a registered trademark, and BeOS, BeBox, BeWare, GeekPort, the Be logo and the BeOS logo are trademarks of Be, Inc. All other trademarks mentioned are the property of their respective owners. Comments about this site? Please write us at webmaster@be.com. |