Extending BDirectWindow

March '98 Be Developers' Conference

Extending BDirectWindow
Pierre Raynaud-Richard:

Pierre Raynaud-Richard: Hi, my name is Pierre Raynaud-Richard. I'm going to present an extended session. We just switched between the two rooms. Some people are confused. You have one last chance to switch or it's too late. I am going to speak about a new API named BDirectWindow which has been designed to allow screen-intensive application to get very high bandwidth.

In more details we are going to give a technical definition of what BDirectWindow is and I will give some typical usage of what BDirectWindow is very useful at. After that we'll look in more details on what are the limitations of BDirectWindows, what sort of things are going to make BDirectWindow difficult to use, and what are the other things that are so interesting in BDirectWindow. Hoping that even taking the first things I'll been speaking about into account, some of you will still be interested in using it.

So next slide. Here is a description of BDirectWindow. The first point to remember about BDirectWindow, it's -- it's just a simple BWindow class. It's a fully functional BWindow class. If you were taking your current BApplication and replacing BWindow by BDirectWindow everywhere and recompile, it would just work exactly the same. It has the same API, fully functional.

What does that mean practically speaking? Here is a diagram of the standard communication protocol of a BWindow with the application server. When you create a BWindow, you get two threads, one on client side, one on server side, they're basically here to handle messages going back and forth, and that what makes the Be interface very responsive. It's always listening to the tasks going on.

If you want to draw using a BWindow, you usually create another thread that will share that communication protocol with the thread C*. Through the lock, you can send commands or request for services too, and then on the other side the server thread will handle the requests. And some will go deeper inside the core of the app server to access and draw on the screen. The clients will draw to the screen using the BWindow. All those mechanisms are fully functional.

So at the same time BDirectWindow is a BWindow, it's also something completely different. It's a description of a simple buffer. It's giving you the possibility to access directly the parts of the screen that correspond to the visible portion of your window. Next slide please -- This context is described the following way.

So it's based -- this is the description. You will get the dimension, origin, position of the buffer, based on the frame of the window where it is in the current screen space. You'll get the format information, color encoding, endianess. More useful, you will get the base adress in your memory space, direct access just like if it was in your application.

If you want to do DMA, you get another base pointer in PCI spaceso you can control your DMA directly, from a PCI card into the screen buffer. The next one is very important. It's the clipping region. If you get the description of the full buffer, you're not allowed to draw everywhere. Imagine there's another window front to the window you're using, part of your window is covered. You're not allowed to draw there. So you'll get the clipping region that will describe what part of the buffer you can use. And that part is the protocol that will enable and disable that write privilege to the buffer, and the modification flags. It will give you extra information about what happened during change of privilege.

For example, when the users change the depth of the screen, when this is done, we have to suspend your access privilege for some time. So you will lose your privilege. Later you will call back and get your privilege again. And during that change of access right, the buffer has been reset. That's one of the flags. Or if the window is moving, we have to stop you and blit the window for you and start you again, we'll tell you the window moves, which means that what ws in the window is still there, not lost, it just moved.

In the case where you change the depth of the screen, the buffer is erased. You have to reset everything. So you get that sort of information and a couple other flags that allow you to improve -- to optimize your drawing code as much as possible.

It's time to go and look at a short sample code. This is an example of a window doing DMA using BDirectWindow. So you create a new window, then you will activate everything by doing the show. By showing the window, you make it appear and then your DirectConnected will -- the function that is used to describe all the change in the buffer properties will call you one first time to start the thing, and here we go.

At the end of it, when you want to stop your window, you will just quit it, which will hide the window and the DirectConnect will be called to stop the connection. So I should probably first give some information here. The way this direct access context management is done, everything is done through one function named DirectConnected. This new thread, C', is created on the client's side and it gets an extremely fast, direct connection with the window manager in the core of the app server. And it knows every time your buffer is modified, part of your window is modified. Or when the depth of the screen changes. And he calls you and you get called by DirectConnected and you get all the new properties described in their new state, or the fact the privileges have been suspended or modified. And then you are free to implement the way you want that direct access context, completely on your side, which gives you a lot of freedom.

The other thing you can do from there, you can reduce it to only part of the window because you only want to use it on soem BView part of it, or you could be using it and share it between multiple threads, or do whatever else you want. Another big advantage is that it's a lazy protocol. If nothing changed in the screen configuration, DirectConnect is never calling you. You're just here and you can access your screen and keep going. You don't have any overhead. You don't -- that's a big difference with what I know of the current DirectX API where you can get current access of the screen, but you have to lock access to the screen and unlock it every time for every frame. Here it's a lazy protocol. So let's go back to the sample code.

So here is your classic configuration, very basic. Here is the constructor of the class. So when your window is created, you could have -- depending what you're doing, you will call init_private_context, and initialise the DMA controller. And then you will set connected to state false that tells us the connection is off, disabled for now, and this other flag, connection_disabled, will be used later to tell to the protocol you don't want to restart the connection anymore. The next time you disconnect, you will be disconnected for real. And you will be shutdown after that.

So what happened is the DirectConnected, you create the window, you do the show, the show creates the window, makes it appear, immediately DirectConnected it's called with B_DIRECT_START flag, then you know that now you're connected and you can init your direct screen access context from the information you get from DirectConnect info, and then you can start your DMA. And it will keep going. If you do nothing else, nothing else will happen for some time. Then you will move the window and that will change the visible region of your window. You will get a B_DIRECT_MODIFY, and you will get the new state of your region. You will be able to update your direct screen access and modify your DMA on the fly if you want. And it will keep going.

If for some reason it has to be stopped later, you will get B_DIRECT_STOP and then you have to stop DMA and you will remember that you're not connected anymore. Just to quit, you set connection_disabled, it's for real. You will hide and sync to, that way you're sure that you will be called as disconnected, connected will turn to false, and as connected is false and this one is true, you will never accept any connection request again, to be sure that you will stay disconnected and avoid any problem. Once that is done, you can close your DMA, you still have to free your private context. That's the way the protocol works.

So now that we have a basic description, let's see what it's used for. Two applications are -- first, using DMA video in a window, as you saw that demonstrated in the general session already once. Let me show you the same thing again. Very interesting video. And this is 640x480, 30 frames per second, and you can resize it. You can move it, you can do all this stuff.

A Speaker: Presidential footage.

Pierre Raynaud-Richard: It's all done in DMA, using DirectWindow protocol. So that's one interesting application.

A Speaker: Don't close it.

Pierre Raynaud-Richard: Yee, that's really one interesting application.

A Speaker: Do you have that on floppy?

Pierre Raynaud-Richard: A bit more technical is to not do video but to do what I would call smooth and fast animation because as you have direct access to the frame buffer, you can draw to the screen way faster and smoother than before. So I will show you a quick sample of that. I will come back to that in more details later. Here is a very simple fully resizable fully movable and so on, and as you see, just looking at it, it's doing a star field animation and using approximately no CPU -- it's only plotting and erasing the pixel that it needs to do. It's doing that at 60 frames per second.

So, going along, all right, thank you, let's go to the bad side of BDirectWindow. You get direct access to the screen frame buffer, but also that means you're on your own to deal with it now. You're not yet capable of using a lot of nice library and a lot of different rendering. That will come, later. But for now, dealing with DirectWindow, it's dealing completely on your own. You have to be capable to deal with multiple format on your own. So if you do any rendering, you have to rewrite, get your own rasterisation package. It can be a huge task. For some specific application, it's not a big deal. For some others, it's clearly not worth using BDirectWindow for now.

Another constraint is the fact that the clipping is absolutely mandatory. You have to respect it because you get direct access to the screen frame_buffer which means we're not going to stop you, if you want to draw anywhere. So you have to respect the clipping. You have to be careful and write your clipping code with care. You have to include the clipping code in your rendering code. So it's even more work. It's a good thing to consider seriously before deciding to use BDirectWindow.

And the third point, currently we don't support any acceleration API from the DirectWindow point of view. If you need direct acceleration, you have to go to standard API. That will change in the long-term future. For now it's like that. And you still have a little freedom in using the acceleration because as you saw in the previous slides, you can draw directly using the red and also go through the blue arrow to the application server. So you can use both protocol at the same time. But it's a tricky operation because of the fourth point. You will get some fairly interesting synchronization issues. So it concerned only some pretty advanced stuff, I would say, using both protocol at the same time.

Here is an example of what can happen to you if you try to do that and you're not careful. Here you have the client thread, doing both direct access through the red arrow, and the blue to the app server through the window. As a typical BDirectWindow application, it's looping and doing frame after frame of some animation of some thing, some rendering. So here I will represent one frame, and you have the time line and the app is looping on that time line again and again and again.

During the red part, it's allowing itself to use direct frame buffer access. That means that during that period it cannot accept a change in the configuration of the frame buffer. So the DirectConnected code can be executed only during the yellow part and this is enforced by your own synchronization. You will block direct connected as long as you're not ready to change.

As you're drawing in your red part, you also do some blue requests as represented on this diagram. What can happen is as you're trying to do your blue requests, it goes through the window, to the server, and as the server also doing a lot of other things for a lot of people, like another window can be moving in front of you, and you don't even know. He's calling you back at the same time with DirectConnected and so he's waiting on your synchronization protocol.

So let me resume in a much clearer way. You're waiting for the app server and he's waiting for you. We call that a deadlock. It's very bad and you shouldn't do that. So it's very simple. The app server will wait for two seconds, three seconds after and say, "Okay, this guy is not going to answer to my DirectConnected request, so I suppose in that case I can't keep the graphic system blocked waiting for him so I have to kill him." So he's going to kill your team and unblock itself. That's the results. So if you want to avoid that that problem, there's an easy way out. Just be careful and never call a blue transaction during your red period. It seems very, very simple when you say it that way, but you will be amazed if you try to use that to see what a blue transaction can be.

So let's continue and let's see, time to look now at the good sides of BDirectWindow. The first thing that is the most interesting for us as we are pushing media so much, you will get much better video streaming performance. It's easy to understand. The next slide, please.

On this slide you can see three ways of sending a video stream from a video source to the screen. The first one, start out using DMA from the video source to the main memory and then clip that because you don't want to display everything, and then blit whatever you need to display using the CPU. For people who don't know, DMA is a way to have two devices use a direct and fast protocol to transfer a lot of data much quicker than you could using the CPU, and also it doesn't use any CPU, it's faster and saves a lot of time. It's a very good thing to do. Also here, it goes through the main memory so you're taking part of the main memory bus bandwidth.

In the future we'll improve DrawBitmap to be able to do the second part also using a DMA. So you DMA a full stream to the memory and then you clip it and DMA it again to the screen. What BDirectWindow allows you to do is clip at the source and use DMA directly on the screen. So I think it's clear enough like that.

The second interesting advantage is the one like I showed you, the star application. It gives you much smoother animation in some cases.

So I have a better sample here I'm going to show you. It's named Chart. It's inspired from a shareware. It's just a star field and it can turn around and -- whoops. Oh, big mistake. Nobody noticed anything. So here what I am doing is drawing my star field in an offscreen and blit it on the screen using drawbitmap, one frame after another, for every frame. So it's a way to do the thing. And with that, you see I can do 60 frames per second.

Here I have a view meter that gives me one bar for 12 frames per second, the red bars are the goal I'm trying to reach, and so I am doing, and then the green ones is telling me how much more I could do if I was not synchronized at 60 frames, using the CPU power available. So it's doing okay at 16 bits per pixel. But I am picky, I want very nice stars, I will switch to 32 bit, and it's still doing it, but it's using ... 72 -- 73 percent of theCPU. And a little bit more, let's set a nicer star field, the galaxy. And when it's really useful, it's in full screen so you can see much more details. And now you can see I am doing only 29 frames per second, using all the CPU. And it's not looking bad, but we want 60 frames per second. We can't say it's bad performance. A full frame 32 bits is like 3 megabytes. 29 times a second.

And also we have to do the processing of the star animation. So it's good performance, but not good enough. So as you can imagine, as we only need to draw and eras stars, not the whole screen, it could be smarter to erase just what we need directly. So you can watch the view meter.

If it goes all the way, we get 10 times more than what we need. So perhaps we can try like using that menu here and set it to back to I set accidently at the beginning, DirectWindow mode. So now we are getting 60 frames per second and using approximately 10 percent of the CPU load. It looks a bit better, also, and doing exactly the same thing.

Another interesting feature is it's still a, BDirectWindows, you know. So you can switch on the fly. No problem. You can resize it and move it. And -- whoops. And so on. And as you can see, it's fully integrated with the interface Kit, also using the standard API. And from here, I can set the background color, and when I do that, I'm not drawing myself the whole buffer, because I am just going to use this standard API that can use the accelerator and do it way faster than myself. So if I go back to a more reasonable mode, here, I am -- I can change the color of space. Black is better usually for space.

But what is that trick that we "could" do so much more frame if we wanted ? Do you believe me ? I can ask it to do it. And now, it's trying to do as much as it can. It's -- you don't really see the difference, that's true. But that display is too slow. Too bad. That's it for this demo. Thank you.

Third point, a very strong advantage of BDirectWindow is its very efficient parallelism. So let's illustrate that on a practical problem. Here we have two applications altogether using three windows. That's the configuration. Now we want to draw to the screen a lot We have a four CPU machine. We get a lot of threads talking to the graphic system. Surprise, the three window threads will have to go through the same lock, and of the communication overhead, and when they arrive in the core of part the app server accessing the screen, they will also have to be synchronised with the request coming from the fourth thread. So you have the problem of the overhead communicating and when you draw the screen, you're single-threaded

So one solution is to put two direct screen contexts through two direct windows to work with all those threads. As you're implementing your direct screen access yourself, you're free to come with whatever protocol you need to be able to have the three threads working together. And you won't have to synchronize those with the fourth thread because they will be working in completely different part of the screen as the regions are always completely distinct. Then you can have -- next slide -- those four threads directly accessing the screen, using the four CPU's directly. And also if you want to deal with synchronization issue, you can still use blue arrows and do a couple things using the app server and driver acceleration, for example.

Other ways of using the parallelism if you're using DMA as I showed at the beginning, you don't need threads. You just control it -- have your DMA controller controlled by the DirectConnected. This gives you three direct control, with only one direct connected function. You have all your DMA going in parallel and controlled lazily by the change of the window manager. So I have a demo for that. You can imagine what. So let's go back to 32 bit. 16 is no fun.

So chart, full screen, go to here, DirectWindow, make it turn and here we go. And I can imagine what you're thinking. He's going to click on two threads. It's going to be twice faster almost, and what's the point? Because we already have almost ten times the CPU we need.

Let me show what you can do when you have a lot of CPU power. You can increase the number of stars in this, which was only 2,000, to, let's say, 20,000. And now, whew, I don't have enough CPU to get my 60 frames per second. Ah, better.

You could be wondering what is the purpose of that application. I think I have some answer for you.

Could I have some sound, please, on this machine?

BeOS machine : "Captain James T. Kirk: Space, the final frontier. These are the voyages of the starship Enterprise. Its five-year mission: to explore strange new worlds, to seek out new life and new civilizations, to boldly go where no man has gone before."

Pierre Raynaud-Richard: And you can still do the same thing, as usual... So let's go to the last point, last two points I will demonstrate together. Another big advantage is it's fully transparent and scales pretty damned well. Fully transparent, what does that mean? You're using it, but the user is not going to notice. It's not going to penalize you in any way because you are doing it. And what does scalability mean? I'll show you.

Let's go back. Perfect. Let's launch that -- you clearly have something on there. You probably remember that demo doing live video mixing between two different sources. Here we have two live video sources -- oups, at least one. Here the second. And you can mix them in that big window.

VIDEO: If the Bat wants to play, we'll play.

Pierre Raynaud-Richard: Let's do some crazy heavy effects. Okay. Here we are. We have the background. And here and here. Now let's launch Chart.

VIDEO: The real game begins.

Pierre Raynaud-Richard: You can go here, pick another one. We need two, I think it's clear.

VIDEO: Batman Forever.

Pierre Raynaud-Richard: And another star field. It looks better. And I forget this one. Okay. So now we have already five direct windows going. Not bad. So let's launch a couple, two, three, four, five other star fields here and there. I will resize this one to full size and background and another -- where can I put this? So here we go. We have ten direct windows, two doing DMA and one doing live video with effects, and so those two seem to be active. So we are still doing the 60 per second star field animation.

A Speaker: And fire.

Pierre Raynaud-Richard: And if I move windows around, all those windows, like this guy, DirectConnected is reprogramming the video DMA on the fly, and you're moving it and move the window in front of it, and everybody has to do that for all windows, real time. But it's still doing it, interactive. That's what I mean by transparency. And here you can see the CPU load, using them both. The video page flipping filter is pretty heavy, all software. And I can turn off one CPU, and it's still going, a bit slower. Not too bad. Here we are. Thanks for coming.

Yes?

A Speaker: A question for you. Does that direct connection limit you to the frame buffer?

Pierre Raynaud-Richard: What do you mean exactly?

A Speaker: Is it giving you free access to anything on it, or is it restricting your memory access as to the frame buffer?

Pierre Raynaud-Richard: In the current implementation it's designed to be used with PCI graphic card, frame buffer access. It could be extended to anything later if needed.

A Speaker: What, I'm wondering, is the potential security hole of having a user's code --

Pierre Raynaud-Richard: Uh-huh.

A Speaker: -- writing to PCI? What I'm asking, does this direct do anything to prevent the user from writing to an arbitrary area?

Pierre Raynaud-Richard: We can do that. In the way it's implemented now it's not exactly it, but yes, we can do that.

A Speaker: You can write to the PCI card but nothing else on the PCI bus.

A Speaker: Thank you.

Pierre Raynaud-Richard: Is there another question?

A Speaker: What's for next year?

Pierre Raynaud-Richard: No idea. I will find something.

A Speaker: You had ten directs, direct windows up all at once. To what extent were they cheating by only rendering bits and pieces?

Pierre Raynaud-Richard: That's one of the advantages of DirectWindow. It's an easy way to implement lazy drawing, certainly.

A Speaker: You used video. Can you also take input from the hard drive?

Pierre Raynaud-Richard: As far as I know, can the hard drive DMA directly into the screen?

A Speaker: Yes, but --

A Speaker: A VCR can --

Pierre Raynaud-Richard: I'm not sure what's going through on that screen back there in the current implementation, so --

A Speaker: You could --

Pierre Raynaud-Richard: As far as -- any PCI device, DMA, can do it, can control anything. What you get is the ownership of the part of the screen. After that you access it the way you want, DMA or direct drawing. So as long as you can do the DMA, then this protocol will allow you to control it.

Yes?

A Speaker: You were saying at the beginning one of the limitations of this was that there isn't much rendering support. Is that something that's on the drawing board?

Pierre Raynaud-Richard: That's a good question. Again, imagine in the current graphic system everything you need to draw in that sort of direct frame access, with all the different modes and clipping, region, it's in the app server. We use it to draw. What we need is to clean it, restructure it, make it more like a share library so that later it will be directly accessible by the client in a different context. So like draw a line, direct, whatever, draw a stream and do it through a DirectWindow but on the client's side. But currently it's not available and won't be for some time, because before releasing a new API of that importance, we want to clean things up. Because once it's done, it's done. We have to stay compatible with it for a long time.

A Speaker: Does the DMA have to support scatter-gather, or is it basically a physically contiguous memory space that your DMA is?

Pierre Raynaud-Richard: What you get -- it's not physically contiguous. It's a buffer with a rowbyte. You have -- you get the description of a list of scan lines in a specific rowbyte space. The buffer is linear but the clipped area is not linear. It's only scanlines.

A Speaker: What about issues of window alignment? Are you allowed to use that window as a user on any arbitrary pixel?

Pierre Raynaud-Richard: Very good question. The problem depends of you specific DMA capabilities. There is a new API that will allow you to enforce an alignment on the window. That alignment can be defined not in pixel but you'll tell to the window, "I don't want this window to be anything other than 4 bytes aligned in the frame buffer," whatever position it is. And that gets you out of the problem.

A Speaker: That's still a problem for windows, with overlapping view if your clipping is --

Pierre Raynaud-Richard: In this case you have to deal with it in any case, depending what your DMA protocol can do, the one you are using. The BTA48 is a very common chip set. We were able to deal with all problems. The base address of the buffer used to do the DMA, has to be aligned on 4 bytes boundary, but after that you can control the clipping at the pixel level so you can deal with everything.

Any other question?

A Speaker: Will this same technology be expanded to other areas. I know we don't have the new media yet, but with other medias when they are available?

Pierre Raynaud-Richard: I imagine we could. I don't see a clear application just now. Do you have any clear idea what you're thinking about?

A Speaker: For example, for audio that -- DMA directly to the buffers that the audio hardware uses.

A Speaker: There's a way to do that.

A Speaker: I didn't hear that.

A Speaker: Our audio driver already does that.

A Speaker: But the clients' don't; do they?

A Speaker: Yes.

A Speaker: They do?

Pierre Raynaud-Richard: Next question? So I think it's time to let you go in that case. Thank you.