Fundamentals of Multithreaded Programming

Proceedings of the May Be Developer Conference

Fundamentals of Multithreaded Programming

PETER POTREBIC: Welcome to the talk on the fundamentals of multithreading. Just this hour I think one more hour of Q and A, so we're getting close to the end of the weekend, so hopefully we can still keep your attention.

My name's Peter Potrebic and this is Benoit Schillings and we're going to be sharing this presentation.

For a lot of you or some of you this might be your first experience, the BeOS might be your first experience with programming in a multithreaded environment. What we're going to do in this session is give you kind of a brief overview of some of the theory and issues that come up in a multithreaded environment. Then we're going to talk about how we do things in the BeOS, multithreading the BeOS way.

Benoit is going to talk about the multithreaded issues between our client and server architecture. We heavily rely on client server architectures as a method of taking advantage of our multithreaded environment. Then I'm going to come back and talk about the multithreading on the client side.

And I'd like to put in a plug for kind of having a good foundation. If this is your first time dealing in a multithreaded environment or you're coming back to it since college, picking up a good OS book, you know, dusting it off your shelf or buying it at a book store, it's a good idea to start there for some of the theories.
First, I'd like to point out we're not professors up here, this isn't going to be a very rigorous explanation of what the issues are. It's basically theory coming from an engineering standpoint. We've been implementing this stuff, the whole team, for several -- Benoit for many years. It's just basically our experience in these areas.

Some of the problems you have in a multithreaded environment, I list three of them up here, the first three points. One of the issues is controlling access to critical sections of the code.

Imagine a memory allocator in a multithreaded environment. You couldn't have two threads of execution running through some critical sections of the memory allocator at the same time, because while the allocator is allocate -- is setting up some memory, it might have some tables or some pointers that are in an inconsistent state, so the memory manager has to take care to protect access to those critical regions of code and this is a problem. You don't have any access in a single-threaded environment.

We also have to worry about limited system resources. Every system has limits. There's limits on file space, there's limits in physical memory, there's limits of virtual memory, there's limits on other resources that the system can create such as threads, ports, things like that. In a single-threaded environment it's easy for the one thread to kind of understand and manage those resources. The one thread can say okay, I know I'm never going to allocate more than ten file descriptors, so I know I'm never going to run out.

In a multithreaded environment you can't make those assumptions, because you don't know what the other applications are doing in the system, so you kind of have to know to understand or deal with the problems when you might hit the limits of some of those resources.

Finally, a third issue is dealing with flow control. And by this I mean in a multithreaded environment you often have a producer of some data and a consumer of that data and they're running in separate threads. If the producer is producing data faster than the consumer is handling that data, you have to have some buffering of that data at some level.

Well, if the consumer is generating it faster than the producer, that buffer is going to grow and grow and grow and potentially consuming up all of that resource, whatever that data is. If that data were file descriptors, that cue of file descriptors might exhaust the total number of file descriptors that the system can have. So you want to have some type of flow control to manage that, so that you don't have one producer/consumer pair taking up all of the system resources. So one solution or probably most solutions to these problems at some level comes down to locking, using some form of locking. But just saying that you're going to lock doesn't solve all your problems. You don't say, okay, I'm just going to throw some semaphores at the problem or some condition variables or whatever and you don't solve your problems.

You have to worry about things like the order in which locks are acquired. The canonical textbook example is if two threads both want to acquire two locks, both those threads have to acquire the locks in the same order. If the two threads were to acquire the locks in the reciprocal orders, you could easily end up in a deadlock, where the first thread acquires, say, lock B, and then tries to -- tries to acquire the second lock and the other thread acquires in a different order and they end up waiting on each other and you have a deadlock.

Then there's other issues that I mention here with race conditions and worrying about how to manage stale data in a multithreaded environment. You might have a pointer, if some other thread disposes of that pointer it's now invalid and how do you maintain consistency between your threads so that you don't have another thread accessing this now invalid pointer.

All of those issues are very pertinent to BeOS because we're a top-to-bottom multithreaded environment and multithreaded architecture. As I had mentioned before, we use a client server architecture many places. Our AppServer, which Benoit is going to focus on in his talk, is very multithreaded. All of our other servers are also multithreaded. Our network server and our audioserver are each multithreaded.

We also do threading on the client side and one example of that is that each window you open up on the client side is actually one thread of execution, and that goes along with in the AppServer each window also has a thread. So even in the simplest application that just has one window, you actually end up with a minimum of four threads.

So those -- these multithreaded issues definitely come into play. Now I think Benoit is going to talk about multithreading inside the AppServer.

(Speech by Benoit Schillings excerpted out.)

PETER POTREBIC: Now we're back on the client side and I wanted to talk about threading on the client side.

There's 4 classes and people are probably already familiar with those or already programming with them and the threading on the client side comes from the BLooper class, which is an abstraction of a thread and an associated EventLoop, and the BMessage and BMessenger class work with that thread and that's how we get our event model.

So I'm going to talk about some of the design issues that came into play as we designed this architecture, and some of the things that a developer has to keep in mind while they're programming the BeOS and how they apply to some of those issues that I mentioned in the opening that come into play in a multithreaded environment.

In C++ pointers and multithreading are a kind of tenuous -- are tenuous bedfellows. As I had mentioned before, if you have a pointer and a thread, another thread does something with that pointer to like delete it, other threads will now, if they have that pointer, will have a pointer to some stale data. And I have a little sample code here that shows a manifestation of that in the BeOS. You simply create a window and you tell it to show. Well, in the BeOS you've just created a separate thread of execution, this window. So it's running a separate thread and once it's visible on the screen, if it has a close box the user at any point can just click on the close box and that window goes away, or perhaps using our messaging/scripting architecture something has told that window to go away.

Well, if you've kept the pointer to W and later on in the code you try to call some function and access something in that window, in this case I'm calling a virtual function which at a minimum is going to access the vtable inside that pointer. Well, you've just made an unsafe call because if that window no longer exists, you're now accessing and looking for a vtable where you just have garbage and you're going to end up with undefined results.

So one of the solutions is to use the locking that we've implemented and so before you ever access a looper or in this case a window pointer, have rewritten the code where you would first lock the window and then -- well, this code should be indented. And once you've safely locked the window you know that that window pointer is still valid and it will remain valid while the window is locked. If lock returns false, that means that the window no longer exists.

So the call to lock will block until it can acquire an underlying semaphore or until it fails to and that implies that the window is no longer around and then it's safe to call the virtual function.

A SPEAKER: Question: If a window's gone away, how do you call a lock on it?

PETER POTREBIC: The question was if the window's already gone away, how do you call lock on it. Well, one of the keys is that lock isn't virtual, so basically the code here that -- that's calling lock, that's just like calling a function. It doesn't have anything to do really with this window pointer. And then I've -- we've implemented a mechanism inside the kits that safely determines if that is still a valid object.

So there's some kind of, you know, there's magic going on. I see someone shaking their head. On the next slide I will give a solution that is 100 percent correct, as there is cases where this isn't 100 percent correct that have been talked about on Bedevtalk, in newsletters, things like that. So to find more information about these issues you should look on our web site, read some of the old newsletter articles. So a completely safe alternative is to use this object called a BMessenger and what a BMessenger basically is is instead of having a pointer to something which isn't safe, it's a token and this token is a safe reference to an object.

So I've rewritten that first example and again, I create the window and I instantiate a messenger to that window and once -- when a window is first created it is in a locked state until it is -- until you call Show(), so it's safe at this point to create this messenger object.

Now, once it's visible, again, it can go away at any time, but now I use the messenger to communicate with the window and I do that through the messenger and the message API can now safely send messages to that window. And -- and the messaging system also enables multithreading to work in a much better fashion. You're no longer synchronously asking the window to do something by calling a function, but you've kind of -- instead of your application being serial by sending messages asynchronously, you break up that and that helps take advantage of the multithreading architecture.

Like one of the new features in this current release is that when you do have a messenger to something, you can extract out the handler and looper pointers, if they're in your own address base. A previous session I talked about how messengers work across address bases, which is another one of their advantages over a pointer.

A couple more design issues, and this relates to the flow control. I mentioned briefly that flow control was an issue in multithreaded environment. Might have a producer producing data faster than a consumer can consume data. So what our system does is eventually the producer will be blocked behind, because some cue has reached the maximum capacity and to prevent it from swamping the system, at some point it will be blocked and so no longer be allowed to produce this data. And that's true in the looper and messaging architecture. Behind a looper, which remember, is the event queue, it's a thread abstraction running EventLoop and it's reading messages out of a cue, so this cue has a certain fixed size, which is a good thing, because that's how we implement the flow control. If that cue fills up, anyone attempting to write a message to that cue would block.

So we had a couple decisions to make and one of the decisions we made is that if you put the PostMessage API actually it doesn't block; instead it just immediately returns an error, telling you that the port was full so it couldn't successfully post that message. So this is something you have to keep in mind as you're programming the BeOS is that the PostMessage call will never block if the port is full and this helped prevent deadlocks just within an application. Otherwise you'd have to be -- you have to be wary of deadlocks, even just having two windows posting messages to each other. Because if those two windows, if their ports were full, the two PostMessages could block, leading to deadlock. Now you get an error which you then have to handle in an appropriate fashion.

The other API for sending a message, SendMessage, does block, so these are just a few of the issues you have to be concerned with. Now, as Benoit mentioned, I wanted to go back and talk about benaphores a little bit. And I guess I didn't stress enough, "Benoit" mentioned the "benaphores."

So what a Benaphore is is we're using an atomic counter around a call to acquire a semaphore and this atomic counter will allow us to avoid making kernel calls in the comment case, and here's the little kind of pseudo implementation of what locking a Benaphore would look like.

There's a call here to atomicadd, which will increment this atomic counter and in most cases that atomic counter is zero, so when one is -- the old -- the value is zero, so you're going to skip over this call of acquiresem and it's only the next guy that's going to attempt to acquire this lock that is going to be blocked.

So the common case is you don't have contention on a lock. You just have one thread tries to acquire it; it acquires it, avoiding the kernel call. It does its thing, then it unlocks it. Then later on you have another thread and it also can avoid the kernel call. It's the only way you have contention, two threads trying to lock the same semaphore in overlapping time, where you actually have to call in to the kernel. And one thing I wanted to point out that the benaphore is a super performance boost. It can degrade to a plain semaphore if you use kill thread and it's a little bit subtle. I'm not sure people, you know, people can ask questions about this, but imagine someone tries to acquire the lock; they have incremented the semaphore and they have to call acquiresem, so now they're blocked. If that thread is killed, the -- the atomic count will never be decrimented to match the atomicadd, so the counter will be artificially inflated, basically rendering it useless, because then every attempt to lock will think oh, okay, the counter is greater than zero so I have to call acquiresem.

So it's just something to keep in mind when you're trying to tune the performance of your system. You might notice some problems if you are using kill thread. The reason I discovered this is I ran into that exact problem. Yeah, but Benoit never calls kill thread and I happened to be calling it someplace and I noticed, you know.

So now I'm going to go into some sample code and one of the common questions people have is how do I design -- is how do I design a simple app that has one window and maybe a secondary thread drawing into that same window.

So you have the window as a master and you have the slave thread and so I'm going to step through some sample codes, showing how that might work. So here's the slave thread and all it's doing is it's in this while loop locking the window, in here is doing some drawing and then it's unlocking the window. And now the window thread, here's the quit code for the window and it sets this flag to false, attempting to tell the slave thread that it should have stopped doing its thing and then the window thread wait makes this kernel call which will wait for this thread to die, wait for the slave thread to die.

Well, as a comment indicates down here, that this is going to lead to a deadlock. So this isn't the way to do the master/slave architecture. And there's a couple problems in this example and one of the things is that this code didn't check for the return value of lock and it's something you always have to do because this lock will return false if the window is no longer valid. So the solution in the simple case, of course, the code is now checking for lock and what I mean by the simple case is that there's no cleanup in this -- this slave thread does not have to do any cleanup when it's finished. It doesn't have any memory it needs to delete, it doesn't have any file descriptors that it wants to close.

In the simple case you don't have to do anything special in the quit code. All the window does is it quits. And as soon as it quits this lock will return false and the slave thread will break out of the loop and the slave thread will die, so in this simple case the solution is fairly simple. Now, the more complex case would be that there is some cleanup code required in the slave thread. So this is kind of back to the first example where the window in the quit sets this flag to false and calls a WaitForThread because, it wants to wait for this cleanup code to execute before the window really quits.

But this doesn't work because you have a deadlock where the window is waiting on the thread, but you could potentially have the thread in this line of code waiting for the window to become unlocked so it can lock it and this is your classic example of a deadlock. So there's two -- there's kind of two solutions to this problem. The first solution is to eliminate the dependency. I just described kind of a dual dependency, where the window is waiting on the slave thread to die and the slave thread is waiting on the window to be unlocked. So you have to eliminate one of those dependencies. And one way to do that is to move the cleanup code out of the slave thread and move it into the window thread, so now you don't have the window thread -- the window thread no longer has to wait for the slave thread to die, thus breaking the deadlock.

And the second solution is to get a third party, some other thread, to kind of arbitrate that relationship and that is also a way where you can dead -- you can avoid the deadlock and I'll show you that code. So here the cleanup code stayed in the slave thread, but now the window again does nothing, so now you have some other thread and just in this example I'm using main, where it waits for the slave thread. So this allows the slave thread to close any file descriptors or do whatever cleanup code it requires before it exits and you've broken the deadlock kind of in the other direction, where now the, you know, the window isn't waiting on the slave thread so you've eliminated the deadlock.

And I'm going to show you another piece of sample code. Yeah. Benoit said we're running out of time, so that's kind of a brief overview of some of the multithreading issues in the BeOS. Trying to provide sample code as fast as we can, helping you out with some of these issues. BeBook goes into a lot of detail, explaining these things and as I mentioned before, our newsletter article, newsletter articles in the past have talked about these issues and will continue to in the future and we have about 15 minutes, I think, to answer any questions.

A SPEAKER: Are you going to have a lock manager to arbitrate locks and look for deadlock detection? Sometimes there's not enough information within a particular context to know what the right lock to do is.

PETER POTREBIC: I mean we've thought about those issues. It's a thing that we've worked with at the kernel level as well, outside of -- I was talking about the kits and Benoit with the AppServer. It's something we're thinking about. We don't really have anything like that today.

BENOIT SCHILLINGS: So your question is what kind of tool we have today to debug threads, that's your question? Well, if I speak for myself, I use PS and printf, I found them to work pretty well for me. As far as I know, the mid thread debugger is about to let you switch between tracks at the debugger level and to see what a debugger is, so you can use that to debug your threading problems.

Another thing we have, we've got something I think it's called syslog, which let's you put some time stamped events into a log. That's often a very powerful way to figure out what went on in the middle of those 40 threads. So these are three tools you can use, but I prefer printf.

A SPEAKER: If you want to create a thread that really doesn't need to interact with the system, like nothing would ever send it messages or requests, is there any lower-level thread?

PETER POTREBIC: The question was is there a lower-level thread concept than a BLooper and yes, there is. The kernel has an API for creating a thread called SpawnThread and you can delete threads and things like that, so if you're just creating some lower-level thing you can use the kernel API for thread.

A SPEAKER: Benoit, you said that it's good to allocate plenty of threads, but in the session this morning I think Jeff said that each thread has 128K stat associated with it. Does that put a limit on the number of threads?

BENOIT SCHILLINGS: Don't believe them. No, this is true. Threads in DR9 have a fixed stat of 256K. The current version of DR9 is one of the things that needs quite some memory, because every window in it needs 5K of memory. This is something we are very aware of and every day I go annoy the kernel people to tell them we need to go back to the kernel thread.

A SPEAKER: Only virtual.

BENOIT SCHILLINGS: Yes. It's only virtual so even if it's preallocated it doesn't really use that space, so you shouldn't worry about it.

No more questions?

A SPEAKER: I'm a little confused about the benaphores there. Is it the second thread that actually has to acquire the semaphore, the first one doesn't have to?

PETER POTREBIC: Correct.

BENOIT SCHILLINGS: The semaphore is used in the inverted way, so the semaphore is taken by defolding the benaphore. There was something which was waiting for it. You need to look at the code and that's the best way. Those things are painful to explain in English and even worse in French.

PETER POTREBIC: And there's sample code, there's sample classes I've seen called BBenaphore, TBenaphores that will do this kind of thing.

(Inaudible question.)

PETER POTREBIC: The question was, to repeat it, is just a further explanation of benaphores and doesn't the second guy acquire the semaphore, when the names are called and there's also a newsletter article that explains.

A SPEAKER: All right. I just have a follow-up question on that. Does the atomic block add -- actually do the appropriate architectural flushes on the --

BENOIT SCHILLINGS: Oh, yes, that works the way it's supposed to be and everything, otherwise the machine would last for about five seconds.

A SPEAKER: Is atomicadd a kernel call or how does it work?

BENOIT SCHILLINGS: So your question is is the atomicadd a kernel call. No, obviously not, otherwise this would take quite sometime, but there is an instruction and sequence of instruction which is in the BeOS assembly instruction which will let you do that. So it's a very small library call. If I remember, on the 66 megahertz machine it's about 1.5 microseconds.

A SPEAKER: Does the atomicadd on single CPU machines, not --

BENOIT SCHILLINGS: No, as far as I know, it will still flush the cache, but you might want to ask Dominic behind you, he probably knows that better than I do.

DOMINIC GIAMPAOLO: On the 603 there's actually a bug that requires that the cache be flushed...

PETER POTREBIC: Did you hear the question, Dominic? The question is it's not going to flush the cache on the 604, is it?

DOMINIC GIAMPAOLO: Yes, on the 604 it won't do that. Right now it does, but in the future it won't.

A SPEAKER: You said that you use printf in multithreaded environment. What kind of problems does that present?

PETER POTREBIC: The question was using printf in a multithreaded environment. The C library is not multithread safe --- it is not reentrant. If you use our debugging macros in debug.H they have a semaphore that serializes that so you can get readable printouts if you're getting a lot of debugging code. So I'd suggest for this type of stuff to use those macros.

BENOIT SCHILLINGS: The AppServer, for instance, they've got sprintf instead of printf and there's some semaphoring around it and that works fine and it goes to the serial port. Debugging the AppServer using the AppServer is a very bad idea, so I cannot get anything in the screen.

A SPEAKER: Are you guys going to support pool threads so you don't have to dynamically allocate them?

PETER POTREBIC: The question was are you going to support pools of threads so you don't have to dynamically allocate them, and the second piece of sample code, which if you're interested I can show you afterwards, was basically talking about how you would use this BLooper architecture to have a kind of a worker helper class and you could dynamically create them or dynamically have some, you know, a number of them kind of sitting around, ready to be used in whatever purpose. So it's something -- it's an abstraction that can be easily defined by a developer.

A SPEAKER: Yeah. Follow-up question on the question before that. What about STL and I/O streams in C++, are those not --

PETER POTREBIC: The question was is the standard template libraries in I/O stream, are those threads safe. And my answer is I don't know.

DOMINIC GIAMPAOLO: STL is thread-safe.

PETER POTREBIC: STL is thread-safe Dominic says. And he believes the I/O stream -- he believes I/O streams are as well. If you want, bring that up at our general Q and A and John Watte, I believe, will be there and he'll be able to answer that question.

A SPEAKER: I was just starting to read about this the week before the conference. Some of the kernel thread things, with the interest in writing a thread class that behaves like the thread class in Java, in particular, can you suspend it and resume and can be asynchronously killed by another thread? It looked to me like that would require using the SpawnThread and so forth and that BLooper did not have the ability to do that. Is that correct or am I missing something.

BENOIT SCHILLINGS: Dominic, come here.

PETER POTREBIC: This is good, then you have threading at all levels, the kit, the server and the kernel.

DOMINIC GIAMPAOLO: What was the question?

BENOIT SCHILLINGS: Could you repeat the question?

A SPEAKER: I don't think so. I was trying to figure out how to write the equivalent of the Java's thread class to create a simpler class, that would run a method and could be suspended and resumed and in particular, could be asynchronously stopped in the middle of something, no matter what else it was doing.

DOMINIC GIAMPAOLO: We fully support all that.

A SPEAKER: Okay. Does BLooper?

DOMINIC GIAMPAOLO: No, that's not a BLooper. With the kernel threads, yeah, you can stop a thread, you can kill it off asynchronously, all of that stuff. It's -- yeah.

A SPEAKER: Okay. That was my question, is there some way of using BLooper --

DOMINIC GIAMPAOLO: The kernel chapter and the signals is the way to do that.

PETER POTREBIC: Anymore questions?

A SPEAKER: Can we see that code that you were going to show us?

PETER POTREBIC: Let me just show it at the end, make sure we get done.

A SPEAKER: With your AppServer model is it possible to like, say, have your AppServer on one machine, where you're actually viewing the information on the screen from another machine?

BENOIT SCHILLINGS: So your question is it possible with the model of the AppServer to get the application running on one machine and displaying it on another machine, which is what X does. The answer is theoretically, yes. This was actually one of the constraints I had initially when writing the AppServer. That's why everything goes through that session layer which could go through CPIV or type time or 1200 baud rate.

There is one exception to that which is the use of BitMaps. When you create a BitMap for many purpose you need to be able to access directly that BitMap for performance issue. We actually have a function which is SetBase, which will transport the data through the port if types and server are both on the same machine. But to be honest, right now we see no need for that ability that we never tried it, but as far as I know, all the function is to be in place to be able to do it in the future.

PETER POTREBIC: The question is do we have a thread-safe atomicadd, which to subtract you just pass and get a negative number and do we have a thread-safe swap.

DOMINIC GIAMPAOLO: There is no comparent swap.

PETER POTREBIC: That's the answer.

A SPEAKER: Is there some way to get a load average so you don't overthrow the system?

PETER POTREBIC: The question was is there some way to get a load average so you don't swamp the system?

DOMINIC GIAMPAOLO: Boy, this is turning into a kernel Q and A. Yes, GetSystemInfo. That's a function call. That's a system call.

A SPEAKER: So you guys got BLooper class for basically preemptive threads, you lock, you unlock. So what about the opposite, another class for cooperative threads so that another way of managing currency is with preempted points?

DOMINIC GIAMPAOLO: So the question was that the looper model abstraction is good for -- I forget, what word did you use?

A SPEAKER: Preempted?

DOMINIC GIAMPAOLO: -- preempted threading. Is there an extraction for a cooperative threading model and currently our framer doesn't have that built, that concept built into it.

A SPEAKER: There's no need for it, from a preemptive world.

DOMINIC GIAMPAOLO: Again, just a different way of managing concurrent access to a resource. If when you're, you know, having a yield call. That would effectively be a semaphore in that case. A yield is releasing a semaphore for all intents and purposes.

PETER POTREBIC: Got a question?

A SPEAKER: When's the last chance that a window has before it goes away?

PETER POTREBIC: The question was what's the last chance before the window goes away.

A SPEAKER: Kind of goes back to the lock call.

DOMINIC GIAMPAOLO: The destructor. I mean if you have a window subclass, you can define a destructor and that's code that's running while the window's still valid.

PETER POTREBIC: Any other questions?

BENOIT SCHILLINGS: Wow, we answered every question about multithreading.

Transcription provided by:

160 West Santa Clara St.
San Jose, California
408.280.1252

Copyright ©1997 Be, Inc. Be is a registered trademark, and BeOS, BeBox, BeWare, GeekPort, the Be logo and the BeOS logo are trademarks of Be, Inc.
All other trademarks mentioned are the property of their respective owners.
Comments about this site? Please write us at webmaster@be.com.
Icons used herein are the property of Be Inc. All rights reserved.