Preview of the Media Kit

Be Developers' Conference

Preview of the Media Kit
Steve Sakoman

Mr. Sakoman: This session is a preview of the new Media Kit. It is an overview -- we won't be going into details on the API.

What we're going to do first is take another look at some of the demos that you saw in the opening session, and go into a little bit more detail on the functionality in those programs. We'll talk a bit about the hardware we're using, and the overall techniques used in the demos.

Then we'll take a look at the Media Kit architecture from 30,000 ft., and after that describe what Be is going to do to fill in some of the holes in that Media Kit architectural plan. We'll finish up by giving you a schedule for the alpha and beta releases of the media kit.

Our goal is to be the media OS. We hope to provide you with an operating system and a set of kits that let you do the kinds of media-rich applications that we were demonstrating yesterday. Alex likes to call it "what you see and hear is what you get *now*", not later on after you've rendered and played back the finished product.

The reason we did those demos is sort of multi-faceted. First, we wanted to set the bar for the kinds of performance and API functionality that we aim to provide in the Media Kit. Secondly, we wanted to inspire folks like you to take another look at the way things have traditionally been done and see that there may be another way to do things

We'll start with Benoit's mixer program. I'll turn it over to Benoit at this point, and he can tell you a little bit about what's going on in this app.

Mr. Schillings: Can you hear me? The Benoit mixer is a program that Steve said we wrote to show what could be done. The first thing I thought was "why don't you write a cool mixer?" So I went with writing the traditional approach of doing sliders. I like to write show-off demos. I like to do that but nobody was excited about my demo with sliders. I took a bunch of people from Be and locked them in a room and we did a little brainstorming which went very, very well, and people came up with a few ideas which led to this idea for 3-D mixer. I was excited. We came out with that first version.

So let's run it, and I'll show you some aspects of the interface, some aspects of how the implementation works, what's good on the BeOS and that kind of thing. Let's start with the first one.

The first thing you can see is that you can view transparent screens if you do them inside another window, and that way you can float both your UI and not hide what's behind it.

The idea here is very simple. The idea is that you've got your audio channels. You can control the intensity of the channel and the spacial position of the channel just by moving some objects in the 3-D space. So let's, for instance, take the vocal track and solo it. So we only listen to that one. You see that one and the other channel disappears.

When I take it and move it left or right, I don't know if you can hear that in the room, you see that the sound is spanning from left to right.

So in the same way, if I want to reverb an object, I can take the object and shape the reverb and get some flanging of the object. So you can have a very easy, direct manipulation of different objects.

Let's say I want to take those four tracks and only take that, I only play those four objects, and I can make them no sound or put back some sound just by moving them.

There is also a demo mode, which will rotate the thing to make it look better when you're not busy playing with it. So that's an interface that works better with the computer, because with the slider interface you've got only one mouse, so how do you move two buttons at the same time? Here it's very easy to take multiple objects, move them around, select them, move some objects, add some reverb and move that sound in space.

In the same way I can magnify my time line, I can move this around or I can get some info on the instrument channel. You see that, if they're working the way they should.

Another thing you can do is to move things around. This was something that was not finished. This was more to show -- to be able to show the waves.

For instance, I take three channels and I go into wave mode. Now I can see information about those three waves. I can change the time scale, and the nice thing is if I want to see this in detail, I can change the size of that guy like that. So you can see that you should not feel restricted to a certain kind of way.

Inside the window it's your program. You do whatever you want.

So let's go back to the mixer aspect. So that's the first version which was done for regular stereo speaker.

We also did another version which was designed for surround sound, Dolby ProLogic. For that we show you this space as a circular space. You are at the center, and I don't have the Dolby system here, but you can reposition the sound channel around you the way you want and get the sound rotated around you.

Actually, this version is another feature which is a different course of motion I'm doing here. So I'm moving some objects here and move them like that here. If I go back to my time zero, you see that the objects are replaying the motion that I did.

I can take another one at the same time, and I can have multiple objects moving at the same time, like that.

Let's go back to time zero. And now we'll see the combination with motion. You see by the way the names are trying to avoid connecting here. They are fighting very hard to do that. So it starts spinning very quickly.

This program was written around Christmas, so I decided to do a Christmas version of the program, which was spewing some snow to the intensity of the sound. In this version here I'm not using DirectWindows. I've chosen another pair. And we put it in demo mode and you've got your sound and everything around you with some snow.

So a few quick things about the way the program is structured. Since the idea was to use multi-threading and use it with the BeOS as well, it's a simple program. We'll put the source code in public domain at the web site, as soon as I have added comments to my code. In French.

The way the threading is done, you've got one I/O on your channel. We have a channel object, and each of those channels, which is reading the data from the file is handled by a thread. So I can make -- multi-thread my OS and that channel object knows how always to be one I/O ahead.

So you've got those 15 threads for your channel. For the channel objects, you've got one mixer thread, which is responsible for taking the sample from all those threads, mixing them and putting them into a sound buffer.

There is a thread which is responsible to feed those mixed buffers to the Media Kit. And then there are two threads which are responsible to do the 3-D animation and the rendering of that animation, and then you've got all the parasitic threads of the server.

So we use 20, 22 threads on this application. It's very simple. There is no science in that program. The only cool trick is how to do fast transparencies. I show that trick to people who are interested in that if you want. Voila. That's it.

Mr. Sakoman: This next application was written for Dominic, who wanted to watch a South Park tape in his office So I sat down and did a quick version of this program for him. The idea was to provide a live, resizable video window with image controls, and in typical Be fashion we wanted these video windows to be just like any other windows so that the user doesn't feel that there is anything special about them. They resize; they drag; they're work-space aware. They feel completely natural. So let's move to another work space.

A few words about the hardware that we're running on: This is a dual Pentium II, 266 system. We have two video capture cards. I think one is a model 401 from Hauppauge. The other is an Avermedia card. Both cards are based upon a Brooktree 848 video capture chip.

I think one has a Temic tuner, the other has a Philips tuner. In one of my weekly newsletter articles, there is a list of the brands of cards that we support, and the list is growing: IX Micro, Miro, Intel, Avermedia, 3Com, Hauppauge . . .

We're DMA'ing directly to the screen into a BDirectWindow. The DMA program is actually handling clipping so that you can move other windows on top of the video window with clipping handled properly. There are no strange artifacts on the screen as you move your window around.

There are controls here for all the standard TV features like brightness, contrast, saturation, We support multiple types of input : generated color bars, S video, tuner input for receiving TV or cable signals as well as the composite video signal that we're using right now.We support all the most popular video standards including SECAM. Just to annoy Jean-Louis, though, we don't currently support French TV tuning.

There are also some controls over here on the right for things like enabling and disabling comb filters, chroma filters and error diffusion in 16 bit modes. Currently we're running in 16 bit mode with this window.

We could do something like grab the window using our work space tool to move it over to a 32 bit work space, and as we pop over there, the DMA engine is reprogrammed using the DirectWindows protocols to switch into a 32 bit DMA mode.

Let's take a little look under the hood. When the window size is less than CIF, which is 320 by 240, you are seeing 30 fields per second DMA'd directly to the screen. When we grew the window to bigger than a CIF image, we were interlacing the even and odd fields. So 60 times a second, half of the fields were being redrawn, and the other half retained the previous field.

What we'll do next is take a look at the VCR demo. The goal here was to capture some South Park clips for special use. This program is basically a CIF video recorder. It captures uncompressed 16 bit per pixel video at 30 frames per second, as well as the stereo audio tracks at 44.1 kHz.

You'll notice the same control panel that we had in the TV demo, with the addition of another tab view for the VCR type controls. So let's record some of Alex's chick flick. We'll wait a few seconds. And then push the stop button.

So for purpose of illustration I included the audio data in the video buffer. It is hiding just off the bottom edge of the window. I've always heard people say silence is golden, but what I've noticed with this program is that it's actually pink!

In this version of the program we're DMA'ing the F1 fields to main memory at CIF resolution, and then capturing that data to disk. We're DMA'ing the F2 fields to a preview BDirectWindow.

We're capturing audio at 16 bits, 44.1 kilohertz, combining that into the video buffer, and writing the buffer to disk through the file system. The combined data rate is about 4.8 megabytes per second. There is no special buffering going on here. We just capture each frame and issue a write command to the Be file system. We're using a $200, 5.2 gigabyte drive that we bought from Fry's.

There is nothing special about Playback. It's not even a BDirectWindow. It's just a normal BWindow. We're reading directly into the bits portion of a BBit map and then doing a DrawBitmap.

The video effects demo was the grand finale of the video demos yesterday. The intent of this program was to demonstrate a two source video switch with various overlay and transition effects.

The program features two live, movable, resizable CIF preview windows, one for each source, and an output window that can be 640x480 or 320x240. The output window displays overlays, dissolves, and other transition effects

These are the two live source windows. They're resizable. They can handle the clipping properly. These are DMA BDirect windows.

The output window is resizable to full 640x480, and it is also a BDirectWindow. The processor is doing the effects in the output window, and is handling the clipping with software. We have a range of effects starting with a bug, or logo overlay like you see in the corner of the screen on CNN.

The next effect is a simple cut transition, selecting between the two sources. We can also do wipe transitions in real time.

The vertical slide transition is a more complicated effect where we're actually using one image to push the other image out of the way. All of these effects are controllable with one of our new BSliders or with an automated button that sends messages to the effects code to change the point at which the transition occurs.

Moving to something more computationally intensive, we have a sparkle dissolve.

The checkers transition uses the slider control to decide how many blocks to divide the screen into. And the effect code basically chooses which image at each particular segment to use.

The classic dissolve is one of the more computationally intensive effects. All these effects are written in C code. There is no assembly language, no MMX, nothing fancy here. Just straight C code.

The chroma key is the next effect, but unfortunately we don't have Alex's bunny person mini studio. Instead I'll use the color bars as the video source. You can see that the blue bar in the color image is passing the video. The code for this effect is looking at the ratio of the blue intensity to the sum of the red and the green and comparing it to the value of the slider controls.

You can see as I make the threshold for the blue ratio less stringent, we can pick up a key for other colors that have some blue content to them.

And now for a couple of effects that Pierre helped to develop. First, his classic page turn from the 3D book demo. To remind you again, this is all being done in software, straight C code! Even with this effect we are getting 30 frames per second on the output.

And finally we have a combination chroma key page flip effect. This is probably the most computationally intensive effect we've shown today. We're both chromo keying and page flipping simultaneously. If you look at the CPU monitor, you'll see that we're pretty much utilizing 100% of both of the CPUs. If we disable one of them, the remaining CPU is definitely totally pegged!

If you look very carefully you'll see that we're dropping a few frames on the output window, but we're still doing quite well. I think that with the next generation of CPU, when we hit 300 or 350 megahertz, all these effects and probably even more complex ones can be done completely in software.

So moving back to a little bit more of the details again, here we're doing the DMA of the F1 fields to main memory. We're DMA'ing these odd fields at 640 by 240 resolution. So we're sort of stretching the image out horizontally in preparation for vertical line doubling later in the effects engine.

We're capturing the even fields and using those for the preview BDirectWindows. There's one thing that we didn't mention in the general session. I often get questions inquiring about how lightweight Be threads are. I attempted to answer that question somewhat dramatically with this app. To do so, I was extremely gratuitous with my use of threads. In fact, for every frame I spawn two threads, one for the even scan lines, another for the odd scan lines. I then allow the threads to compute the effect and then die.

So we're creating and destroying 60 threads per second in this application. That's probably something no sane developer would do, but it manages to communicate the fact that the use of threads isn't all that expensive. Even for a real time video application.

Since we capture at 640 by 240 but want a 640 by 480 output, I do software line doubling. In this demo I use a simple line duplication.

Let's move on and talk a little bit about what's happening in the media space for BeOS. First I'll talk about the OS in general. You don't get this type of media performance by just implementing a set of media APIs.

We really needed to put a lot of effort system wide into supporting these types of applications, starting with the kernel and moving all the way up through the Media Kit.

You may have noticed that I mentioned our threads are very lightweight? You can do some pretty insane things with threads in video type Apps like this. When Jon talks about the Media Kit architecture you'll see that minimizing buffer copies through use of kernel features is also incredibly important. Our very high performance file system allows us approach the raw drive speeds, so that we can use very inexpensive disks to record uncompressed audio and video. A 64 bit file systems comes in handy when you're recording combined audio and video at five megabytes per second. And that is just CIF resolution!

And of course there is also the graphics systems, I imagine that all of you have been as impressed as I have with Pierre's work on BDirectWindow. I think Be is really the first system I've seen where video windows are nothing special to the user. They act just like any other window. You can get very live response.

With a lot of the foundation work completed, we have now turned our attention to the Media Kit. We've done a lot of thinking about media support at the very lowest levels of the system, and a lot of thinking about App capabilities that we want and UI capabilities that we want. The meeting point for all these desires is the Media Kit API.

What I'd like to do now is have Jon Watte to come up to give you an overview of the Media Kit from 30,000 feet.

Mr. Watte: Hi. My name is Jon Watte. I'm doing the Media Kit. The goal with the Media Kit was, of course, to make it possible for you application developers to easily take advantage of all of this raw power that the BeOS provides, especially on top of this newer generation of PC hardware.

What this block diagram shows you is that your application, while being the most visible to the user part of the system, doesn't actually have to do the major part of the work. The application should focus on doing whatever value you add, and we at Be should take care of providing a framework for doing all of the mundane or kind of gory technical details of shuffling data through the system without maxing out any system resources.

Indeed, one goal is to take advantage of the protected memory that's there in the BeOS. If one application crashes, other applications should not be affected. That's what this white bar in the middle signifies. That's the protected memory. The Media Kit library is the only thing that allows your application to talk to other applications without itself breaking the protected memory barrier.

So how would the user kind of interact with his Media Kit? Well, mainly through your application. Your application can talk to the Media Kit library which in turn talks to all the other components. So you don't need to deal with whether you have a TV camera or a high 8 camera or whatever kind of camera you have. You don't need to worry about that. You just ask for the camera, and the system will provide that to you.

When the user buys his camera he gets from Be or ultimately from the camera manufacturer, a floppy with the actual driver and add-on needed to talk to this camera for the Media Kit.

So if you're really interested in all the gory technical details and what the names of the system classes are and how it's all supposed to fit together in a synchronizing time, you should come to the media talk after lunch, which is going to be a very open feedback oriented design overview of the Media Kit.

If all you want to do is see these nice demos, then this was the right session for you. If what you want to do is just know the name of one call, you need to call to play a QuickTime movie on screen, then unfortunately there is not a session for you at this BeDC, which is because not all the details of the API have been finalized. We're looking for input now before it's too late.

So that's kind of the 30,000 feet overview.

Mr. Sakoman: There have been many questions about drivers for media devices, and file formats support. So I thought I would take this opportunity to give you an idea of what our plans are. We have limited resources at Be so we don't have a prayer of supporting every piece of hardware that's ever been made, nor of even keeping up with all new devices as they come out. So we're going to have to be very careful in how we target our resources.

Our top priority is going to be working with a set of PC OEMs who are bundling the OS with their systems. We will make sure that we have drivers to support the hardware that they want to ship

We'll also develop or acquire drivers for the most popular set of plug-in cards in the media space. And for the specialty and niche type drivers, we're going to rely on hardware vendors and interested developers who are putting together a solution for those particular spaces.

So I guess the general answer to the questions is that it's going to be the entire range of possibilities for driver support. We will do some; hardware vendors will do some, and we're hoping that a lot of them will come from the folks that are in the room today.

Let's talk about codecs for audio and video. We're going to concentrate on providing some of the most popular codecs.

We really want to focus on doing absolutely the highest performing codecs that we can either develop or license. We're shooting for real time or near real time software codecs. Our targets for R4 include Indeo, Cinepac, MPEG1, and MJPEG. For R5, we're hoping to add DV and MPEG2 codecs.

As you heard Claude mention in his keynote talk, by next year around this time it appears that we will be able to do real time DV and MPEG2 encode and decode. These are the codecs that we hope to bundle into the OS.

We will rely on ISV's and hardware vendors to provide alternative SW codecs, as well as codecs that rely on particular hardware support.

File formats support is another type of Media Kit plugin. We're going to concentrate on the popular ones, AVI, QuickTime. We're going to rely on third parties to do a lot of the others

And now, the question that's on everyone's mind. The current target for the alpha release of the Media Kit is April 20th. I'm looking at John's face and waiting for the grimace He just raised his eyebrows and dropped them a few times. Not a grimace, so we must be on track. For Beta 1, we're looking at early June and for Beta 2, early July. We're planning our Golden Master for around mid August.

So that's the end of the sort of the scripted portion of our presentation, and at this point we'd like to just open it up for questions. If they are of a detailed nature about the API, I'll ask you to save them for Jon's session this afternoon

A Speaker: Were those releases a public alpha and a public beta, or were those just limited.

Mr. Sakoman: I think the early ones will probably be to selected developers, people who are working on a serious application, actively full time. We have limited ability to support massive numbers of people. So we really want to target ourselves to the very serious developers.

A Speaker: When you used hardware options going from target first, the bundled systems, the hardware on those mother boards and the support, does that include the Universal Serial Bus and the FireWire?

Mr. Sakoman: The question was on bundled hardware, does that include things like Universal Serial Bus and 1394. The answer is yes. To expand on that a little bit, it would also include the audio cards that would be bundled, the video cards that would be bundled, the capture cards that would be bundled.

A Speaker: What's the road map like for sound support on this application?

Mr. Sakoman: The question is what's the road map for sound support? As I mentioned yesterday, currently it's SoundBlaster OS 64. This is the one that we thought would enable most people as quickly as possible to get some audio support. Where we're moving to in the next release is support for AC 97 hardware. We're doing some drivers. There are some third parties who are writing drivers for their own hardware. I think around R4 time, you'll see support for motherboard-based AC 97 hardware, as well as add-in cards. As to which specific vendors, they need to announce their own products. We can't do that for them. Other questions? Yes.

A Speaker: Does the current OS have a replacement with new media kits?

Mr. Sakoman: The question I believe was will the old Media Kit be replaced with the new one?

A Speaker: Yes.

Mr. Sakoman: Do you want to get that one, Jon?

Mr. Watte: Sure. Typically we would prefer for you to use the new Media Kit as soon as it becomes available. Obviously we don't want to break existing applications.

Mr. Watte: So the functional parts of the current media kit API will remain, and those applications will keep running, but there is no room for expansion in the current Media Kit API so it's not going to move forward from where it used to be.

A Speaker: You mentioned about API. How about server?

Mr. Watte: The audio server?

A Speaker: You have currently an audio server.

Mr. Watte: Yes, the audio server will kind of go away, and there will be media servers which handle both audio and video and whatever other kinds of media you want to handle.

So what servers are or are not available on your system may change. But since as you talk to these servers through our shared libraries, we can change the shared libraries and everything works. So yes, the audio server as you know it will go away.

A Speaker: Thank you.

Mr. Sakoman: Any other questions? In the way back.

A Speaker: At what point will we see communications slaving with external devices through time control, machine control, serial import, the standard device control and also things like import time control to external time control.

Mr. Sakoman: I think the question is when will we see support for -- could you go through the list again? I'll repeat it as you go. Slaving to external time sources. Video machine control and general device control in general.

I know that there are some developers who who are working on supporting those things. I think there will be some chicken and egg there, though. We first need to get people who are doing hardware to go public with that, and some of the third-party software people who are working on those kinds of functionality to go public with their stuff.

Be is not implementing any of this ourselves. We're providing the mechanisms to allow others to do that.

Mr. Watte: You might also want to ask me that after lunch.

Mr. Sakoman: Okay. You might also want to ask Jon that after lunch at his session.

A Speaker: Is there any plans to support the cable modems?

Mr. Sakoman: The question is, is there any plans to support cable modems? And the answer is, currently, no.

A Speaker: When will we see SCSI support?

Mr. Sakoman: The question is when will we see SCSI support. That's currently planned for R4. We're supporting the Adaptec controllers.

A Speaker: Anything with DVD support?

Mr. Sakoman: The question is, anything with DVD support? I guess my question back is by that do you mean DVD ROM or play back of DVD media?

A Speaker: Both.

Mr. Sakoman: Both. Okay. DVD ROM, the current driver that I have on my machine with R3 seems to be working with my DVD ROM. I think we need to take a look at file system issues and actually looking at DVD commercial media itself.

Then there are encryption issues. So we don't have any plans currently around actually playing back DVD movies that you can buy at the store. That will be something that you can expect a third-party might want to do. But as far as supporting the DVD drive hardware and file system, those are in our plans.

A Speaker: Another issue just sort of came up through a mishearing, but what about 3-D hardware acceleration cards? A lot of people who are in the game market -- I realize that that's away from where we are now currently, but I'm saying that those people really do need those cards. At least they're in there.

Mr. Sakoman: On a road map that we put up yesterday, the plan for 3-D acceleration is OpenGL® 3-D acceleration, and that's targeted for R5.

A Speaker: Has there been any discussion recently of adding support for auxiliary video devices, meaning like second monitors, something like that? We can put the video output in one thing, full screen and still have our main screen.

Mr. Sakoman: The question is support for multiple monitors. Which release do we have that for?

Mr. Giampaolo: It's not for R4. It will be in R5.

Mr. Sakoman: So the answer is don't expect it in R4, but possibly in R5.

Mr. Sakoman: The question is, is the driver in R3 capable of supporting DVD ROM drives? And it's my experience that I've been able to read standard CD's in my DVD ROM drive, but I don't have any DVD ROM material that I've actually looked at so I can't say for certain what would happen there.

Mr. Giampaolo: We mounted one.

Mr. Sakoman: Dominic says that we have mounted a DVD ROM media.

Mr. Giampaolo: But you couldn't do much with the data.

A Speaker: Do you have a specific 3D chip set that's being targeted for R5?

Mr. Sakoman: No, I don't think we yet have a specific chip set. I think that would follow under the general rule of if some PC OEM comes to us and wants to put together a BeOS bundle and wants to put together a particular 3-D card, then that would probably move to the top of the list pretty quickly.

A Speaker: But you've already got the Glide; right?

Mr. Sakoman: We have some glide libraries that were done for PR2. When we say we're doing open GL hardware acceleration support, what we really mean is in our graphics driver model we'll be putting hooks and API into that for sort of a general purpose support rather than just a card-specific library.

Mr. Sakoman: Other questions? Okay. I guess no more. Thank you for coming. I encourage you to go to Jon's session this afternoon