Be Developer Conference, April 1999

Media Application Track: Intro

Christopher Tate: I'd like to try to get started in a timely fashion so we can tell you what the Media Kit is all about. I would like to suggest that perhaps the people sitting in the back might want to move up a little bit. I couldn't see the slides from the back during the morning session, so just a friendly warning.

Welcome to the Media Application Track. This is where you will learn nothing at all about how nodes work internally. Well, I -- okay. You'll learn about this much about how nodes work internally. Just what you need to know in order to actually cause the data that your application wishes to process to move through a node chain and do something at the other end that your application also wishes to have happen. Respecting the application's wishes is very important.

My name is Christopher Tate. I'm one of Stephen Beaulieu's DTS engineers. The other two I think you have already met. Owen is running my slides for me and Jeff will be speaking in a moment.

Welcome back. I hope that lunch hasn't put you all to sleep.

Today's talk is split up into two major sections. The first part we will go into great, gory, and excruciating detail about how applications behave, what their responsibilities are with respect to the media kit. There are a lot of things that are up to the application to get right in order to have the whole Media Kit work properly. All of the work is not up to the nodes.

Also, since applications having user interfaces is a good thing, we're going to talk about some of the special facilities that the Media Kit presents in order to allow your application to provide a user interface for a node that someone else may have implemented. It's a little black box, as far as your application is concerned. Nevertheless, there is a communications mechanism set up so that your app can present a useful, coherent interface for that node.

We're going to talk about the fun stuff first.

There are a few basic things that applications need to do. They need to bind all of the nodes that they want to use and none of the nodes that they don't want to use, connect them all in the proper order with the proper behaviors; start up the chain, shut it down. If there is a file that they want to skip around in, you have to do seek operations. And then finally, shut everything down; shut it down in a coherent fashion, disconnect, dispose of any temporary resources that may have been allocated in such a way that it doesn't affect the operation of the rest of the system.

As we were putting together this talk and going over all of the various sorts of things that people will be doing with the Media Kit, we discovered that there are four basic applications that get mixed and matched in any given real world product. Those four have somewhat different behaviors, and we're going to present them all since your applications will probably either just fit into one of the categories naturally, or be able to be broken down so that this part here is like live playback, and this part here is like recording, or whatever the circumstances might be.

In a little more detail, the four basic application types are:

Playback from disk. That's where you have a QuickTime file, and you set up something to read the file, and spit audio and video through in a synchronized fashion and play it on the screen and at the sound card at the appropriate synchronized point in time.

Capture to disk, where you have say a microphone that is listening or perhaps a video camera with both video and audio input being recorded to disk where it being put on disk isn't really necessarily timely, but it at least has to capture all of the data being fed to it, and it has to record it in a way that the timeliness of the production when the person was speaking or when the video camera was turned on can be reproduced when it comes time to be played back later.

Offline processing is the case where say you're doing MPEG video encoding with audio which can't currently be done with software in real time with anything more than just trivial encodings. So what you would then have to do is set up your raw video files and your raw audio files and an MPEG encoder, pipe it all through, go and have coffee and come back when it was done.

In this case, unlike all the other cases, there are no real time constraints, either on how fast the data is being input into the system -- that's coming off the disk, however fast the disk can take it or however fast the encoder can take it -- or on how fast the data is being consumed. It doesn't really matter whether it takes two minutes or four minutes or 50 minutes to pull it off a disk. Similarly, on the consuming end of it, writing the encoded file to disk, it doesn't make matter whether it takes 30 milliseconds or 50 seconds or all day, as long as all of the data is processed.

Finally, there is something of a special case called live playback, when live media are being captured and immediately routed to live performance on some other device. As we'll see, that is mostly like a combination of recording and playback, but there are some subtleties that you need to be aware of.

The first part of the talk we're going to discuss playback from disk, and as he is the person who has had the most experience with this, Jeff Bush is going to come up and present an example of an application that does this in all of its wondrous glory.

Jeff Bush: Welcome, everybody. I'm Jeff Bush. Normally I work on UI stuff, but lately I've been working on media stuff.

I wrote the media player app that is going to be in the final release form of Genki. I don't think you're going to be getting it on the CDs that you have that they're going to be giving you, but it will be in the one that goes out finally. It will be cool.

So I'd basically like to take some time and go over how the media player basically does its thing, and issues that I ran into, and you'll probably run into while you're writing media applications.

So we start out basically with what we want to do. We want to play a file. We start out having a video card and a monitor, an audio card, and a QuickTime file. So that basically boils it down about as far as you can.

So I'll give a brief introduction, of basically how QuickTime works. With QuickTime you have -- it's not an encoding -- it's an encapsulation format. So you have these atoms that contain data, but the QuickTime itself is not actually encoded. The encoding actually -- like the video -- may be stored in some other encoded form that needs to be decoded separately.

So this brings an interesting problem, and you'll see there is a solution to that that's kind of interesting. So what we need to do is, we need to -- we have two separate tasks which are independent. One task is splitting out the QuickTime format and getting out the actual streams, and the second is actually decoding those streams once we have them. And for many reasons it's advantageous to split those out, so we actually use separate nodes for those.

So basically, this is the diagram we'll be using to describe what's going on. Here is your file. Here's the system mixer. We always connect to the system mixer to make sure our audio doesn't interfere with the audio from other running applications.

And this. The window is actually a node that is created within the application and registered with the Media Server. The node basically knows how to get bitmaps and draw them on to a view.

So we create this within our application, as I said, and despite the fact that it's created in our application, it behaves exactly the same as every other node. We use the roster to access it, and make it do its thing.

So the first thing we ought to do is set up the file reader node. So there are basically three steps that are involved with this.

So we have to find first of all a node that can handle a specific file, and this is probably the first question I would have looking at this. It's like, "how would I know which node to use?" It seems kind of vague.

So there is a function that does this for you called SniffRef. It's a function of the Media Roster. Most of these will be introduced as I talk about them. What it does basically is: nodes can register types of files that they handle, and alternately once you ask for a specific file, the Media Server can actually specifically ask those nodes, "okay, do you handle this file?" And step through all the nodes to find the one you want.

Okay. So basically this is just is some code to kind of give you an idea. It's real simple. entry_ref, as you know, is the way that we store files in BeOS. dormant_node_info is the structure that we use to describe a node that hasn't been instantiated yet. So basically we just pass it in and we say SniffRef, the ref, the zero is another parameter that is not real important in this case, and then this. So if the error is okay, then that means it has given the information on the node for the file.

The next step is we have to instantiate this node. And instantiating means what you think. You actually create an instance of this.

As we mentioned earlier, it doesn't really matter to you where it exists. That's the point. So in many cases, you know, this node could actually be instantiated right within your application so it lives within your address space and does all this stuff there. It also could be instantiated in the Media Add-On Server.

In either case, like I said, it behaves exactly the same. So we start out with our dormant_node_info structure that we had before, which is information about this file handler node. Then we take one of the media_node structures.

media_node is the handle that you use for any time you have a node, you always use this handle to refer to it. So basically what we want to do is create a node and get our handle to it so we can set it up. So basically it's the same thing. Instantiate dormant node, boom, we've got a node. We've got an ID so we can use it.

Now, it's important to mention also that this node comes from an add-on, as I think it was mentioned earlier, so basically the nice thing about this is that, if we decide to support something like ASF or some new whiz-bang format, as they tend to do fairly frequently, basically you can just load this node, this add-on, put it into a directory and every media application can use it transparently, without having to know about it, and that's the cool thing about using the whole add-on architecture. So basically in this case it's kind of like the Translation Kit except for media.

So once we have this file, we need to tell it which actual file on the disk it should use. So we use another roster function, which is SetRefFor. We use our node ID that we had before.

Because this node is actually -- I'll get a little bit into the node. It is actually the file interface type, it inherits from BFileInterface, this is like a special hard coded type of node, and you can set that.

Once we've done that, basically what it's going to do when we set the ref, is it's going to open the file, it's going to read through it and initialize itself and figure out what's in it and it's going to be ready for us to start doing some more interesting things with it.

So the next step now that we've got this node is making connections. And making connections is where it starts to actually get a little bit interesting.

So as I mentioned earlier, we have to find the inputs and outputs, and inputs and outputs are important because they not only give us some type of handle to what we use in our connections, but they also give us format information, which is very important. We'll determine the format and decide how we're going to connect them, and then we'll go ahead and connect them.

So the first step that we start out with our file reader and we find outputs from this node. So in this case I've got my handle again, which I got earlier.

Now, the roster has a function that's called GetFreeOutputsFor. It takes quite a few parameters here. Pass in the node ID, of course. We pass in a pointer to an output. I didn't put that on the slide, but this type is called media_output. It's another structure. It looks sort of like media_node.

Christopher Tate: Two lines up.

Jeff Bush: Oh, yeah, it's hidden in there. Okay. No, I'm not really confused. So we pass in a pointer. This is actually a pointer to an array. We pass in a number that we want to get to, and then in this right here determines how many we got.

Now, it's important to note this right here. We're asking for outputs of type "raw audio". It's also important to point out that we pass in a number of outputs we want. If I wanted I could make this be an array of like ten outputs, if for some reason that I thought there might be more outputs there, and it will actually tell me "okay, I found however many outputs you have in there."

So there's two basic things that are going on here. The first thing is we can look for an output of a specific basic type. Raw audio or encoded audio or encoded video or encoded audio, or really any media format at all.

The next thing is these outputs might have more specific information about what type of output they have, or it may be that it has multiple streams encoded -- it has multi track data. So in this case what I end up with, in this case as you can see I've only made one and I've passed in the address of it and I've only done one. So I'm assuming that there is only one audio output, which is a pretty good assumption in this case.

So I'm going to check the outputs. If it's zero, that means there are no audio outputs, which can happen. Or it also can mean there are no raw audio outputs, which is an important distinction to make as well.

If it returns one, then I know that I have a valid output and I can continue using that. Similarly, I'll also go and check for raw video, which is not very common to find in BeOS, depending on what you're working on. I'll look for an encoded audio potentially or an encoded video. So generally like in the QuickTime file you're going to have raw audio.

So actually what I end up doing is kind of an iterative process. It sounds a little tedious, but we are working on convenience classes to do this, and this is where you get the flexibility where I can start out looking for raw audio, and if it doesn't have it, I can call the same thing again, except with BVDR encoded audio. If so, then I've got an encoded audio output, and I can plug it into something that can use that.

So now comes the interesting part. It's connecting the nodes. I'll talk a little -- well, maybe I'll talk about format.

So basically in this case I'm going to assume that I've obtained an input to the mixer in the same way that I've obtained an output from the file that I want for this step.

So now I have a media output and a media input, and I'll just restate again that an output is different from just a source. An output encapsulates all this information about this output. So what is its format? Potentially what is its destination? So I'll make this call Connect -- very simple -- and I'll start. I notice there is some confusion. Here is where the distinction between sources and destination and outputs and inputs comes in.

In this case when I call Connect, I start with a source. A source is a raw end point. And I also use this destination. The source and destination contain basic information about how to talk to a node. So basically it's a way of finding it within the whole space of nodes.

So in this case these outputs and inputs are actually incomplete. Because first of all, the output that it gave me is like a token. It's like, okay, here's an output that you could have, but there are no guarantees made about that.

So I think somebody mentioned earlier there is kind of a race condition there. I don't know if it's necessarily like a race condition. It's like you've got this output. You can't really guarantee that it's going to be there so you have to keep that in mind. It's basically said this is like an output that I have. So Connect is actually the stage where you go through and you do the negotiation and you finalize that.

Another key point here is the format. So the format is actually, and I'll touch on this again in a little bit, but the format is where the bulk of the work that the application has to do falls in.

It's actually negotiating the format, and it does fall squarely within the realm of the application to actually figure out which formats are usable. It's not really that complicated and it's actually a little bit better that the application knows what it's dealing with.

Finally you also notice that I've passed back in these structures, which seems a little weird. Why am I passing them back again? What it will do is, as I've said before, these are incomplete, so this output doesn't have a destination on it because it hasn't been connected yet. What it will do is, it will fill in these structures with the proper information when it's finished. So if that returns B_OK, then I know that they're connected. If for some reason there is a failure, a bad format, or something, it will return the appropriate error, and in that case I can try again, or try to find another way of doing it.

So in this case now I've connected things up here. I've got -- basically what's going to happen here is the QuickTime reader is going to read out the file, and it's going to output buffers that contain raw audio data to the mixer, which will mix it with whatever else is going on.

You also notice there is a little clock on the sound output icon on this slide. The sound output there is a DAC on the sound card and what it will do is actually it writes. It will set this time. So whenever one of these wants to check "what time is it now?" they'll actually refer to something that's published by the sound output. And it actually uses shared memory services, and there is no overhead there, but it's important to note that the sound output is the time source.

When I create these nodes I actually will call SetTimeSource. I will call SetTimeSource on these nodes to tell them whenever you check the time for when you've got to do something, check this time source, and that's when you're going to do it.

So not to delve too deeply, but basically what they'll do is when he's processing, he's like, "okay, when do I have to produce my next buffer?" What time is it now? They'll do little conversions and he'll say -- or she -- I want to sleep this amount of time and send a buffer. So basically all their timing calculations are based on this time. So it's important that you kind of understand why you need to set a time source there.

So the next step. We need to actually back up. We need to find a way to get from here [the QuickTime reader] to here [the Video Window]. Now, what we have at this point, we have an output from here and we have an input from here.

The input from here [the Video Window] is neat because we know what this takes. It takes raw video of some type.

The output from here [the QuickTime reader] is going to be encoded video of some type. So what we need to do is find a decoder that can handle it. In this case generally it's QuickTime. It's going to be like Cinepak, it could be Sorenson or some other type of encoding. So we need to find another line of it.

So we start out with this. This media_format was actually returned to us. It exists within the output that we got from the file node. This is one of the members of this format. So we assign it to this structure, videoOutputFormat. Then what we do -

Audience Member: It should be named "videoInputFormat."

Jeff Bush: I'm sorry. It was very early.

Christopher Tate: Since it's Gobe, I can change it right here.

Jeff Bush: No, no, no. That's okay. I'll explain it. It will all make sense, I promise.

So basically let's skip that for a minute. We have a media format that lists the output format under codec. So this was found in our output, like I said. It's in the output. The output is a structure, and within it it has one of the members of its format. It's a media format. What we want to do here then basically is we want to set its type to be -

Christopher Tate: That's right.

Jeff Bush: Okay, I'm sorry. I'm sorry. This is what the output of the codec is going to be. What type do we want the output to be? There are things that we know about the output, and things that we don't know. We know we do want it to be raw video. We don't know what the dimensions are because that's all encoded in the file. We don't know any of that type of stuff. So we actually set the entire structure up to be a wild card.

So basically now this says this format structure represents any type of raw video. This right here represents the actual encoded format that is output. And the reason we're setting these up now is we're setting these up because we're going to pass both of them in and say okay, I've got this. I want this. How are you going to get me there? And we start out here's our friend dormant_node_info again. This is the codec info. So we want to get information about the codec.

Then we call FindDormantNode. Like the outputs it can find multiple nodes. We start out by passing in the codec node info, which is this guy. We pass in the number of nodes that we want. In this case there's only one, the video input format. And the video output format. So basically we've passed them in both formats that we want and it's basically going to walk through all of its nodes and find one that can handle this as an input, and this as an output, or one that at least says it does. We also say we want this to be a type B_BUFFER_CONSUMER and B_BUFFER_PRODUCER. This is not necessarily as important because obviously if it says it can do these things, it's probably both.

So basically the media server is actually taking care of most of the work for us, finding nodes that can handle these formats. Basically what I should have here now is a node that basically knows how to do these formats. So what this is actually going to have in it that it's looking for is saying, okay, this is Cinepak encoding. It's, you know, whatever type of format. So then we have this guy here is our node.

So after that basically once we have that node, we can connect them as before. And actually I'll talk about this again even though we kind of did it already. So in the connection process and -- actually, could you go back to that frame with the connections? Connecting nodes.

So you'll notice here that we have a format. What we will do when we actually connect those nodes together is we'll take the format from the output of the file reader. We'll take that format and pass it into Connect and say this is the format I want you to connect with and they will actually negotiate, and hopefully they will connect.

We have two connections to make. So the first one we make is from the file reader to the codec, which shouldn't be a big deal. In this case we pass it. It's important though that we pass the file reader outputs format. So we found this format using B_MEDIA_ENCODED_VIDEO and then we pass on this format. And then basically that shouldn't be a problem.

Then what we do is we connect the output of the decoder to the input of the video. So in this case we know the format from the output, and we actually will attempt to grab the output from the decoder after we've connected the file reader to it, because in that case then it knows what the actual dimensions are, and the view that actually displays the data can actually look for stuff in that format, and then we connect those. I hope I didn't go through that too fast.

So basically then what we have here is we have two pools of buffers between these guys right here. This will read some of the file. It will find some audio. It will package it up in a buffer and send it up to the system next to it. Then periodically, like every 30 milliseconds, it will say okay, time to send a frame of video, find some video and will send a buffer of encoded video to this guy.

This guy here is a Cinepak decoder. Whenever he gets a buffer, it will look through it, decode it and then copy it into another buffer which gets sent over to here to this guy which gets sent over to the screen over time.

So all the time this guy here is maintaining the timing information and what happens that way. Maybe I should break for questions now because I can see I kind of skimmed over going through this stuff. Any questions?

Audience Member: Could you go back?

Now, is there any instance in which you would be using a different structure for the first two parameters and different structures for the pointers? For instance, you have a file audio source --

Jeff Bush: Yeah, you could potentially.

Audience Member: What reason would you have to do that?

Jeff Bush: You might have saved them from before. And take advantage of the fact that the mixer will just give you a new connection every time you connect.

Audience Member: Okay, thanks.

Jeff Bush: Bear in mind that these are sources and destinations. So if you've got the source and destination, you've got those nodes. You've got a handle. In many cases you might not want to have to look for that again and then you can get outputs and inputs. You can make multiple connections here without necessarily having to continually ask it for outputs. That's the node. And it will just fail when it's wrong.

Audience Member: I'm sorry, can you repeat that answer?

Jeff Bush: Sure. You could actually ask for an output of a node and you've got output an and input; right? You could call Connect multiple times just using the source and destination from those outputs that you've got without repeatedly saying give me another output. Give me another output. Give me another output, because output guarantees you nothing. Connect is the one that says, okay, either you're connected or you're not.

So in this case you can fill in these structures with a different one every time to maintain whatever structures you have. It is an important distinction and it's kind of a subtle detail. In many cases you won't need to deal with that. It will look exactly like this. It looks a little redundant because you're passing the same structure twice, but if you need it, it gives you the flexibility to deal with it, deal with it in a way that you don't always have to ask for an output.

Any other questions?

Audience Member: Why did you pick entry_refs versus BPositionIOs for set sniffing things? Because you get excluded from certain networking operations. You can hide behind those.

Jeff Bush: Well, I mean the first problem is that when you do a SetRef, you aren't guaranteed that it's going to be in your team. So if you can't pass a BPositionIO because the address of this object means nothing to it. It doesn't necessarily mean anything to it.

The second problem with BPositionIO is that there is a copy that's done. Every time you do a read, it's going to do a copy and you would like it -- what you really want to do in a file reader is, you want to do 64K or larger reads, so you're totally bypassing the cache, read this big chunk and then work on that and pass that off and try to copy as little as possible. I don't want to get too deep into that. It's actually possible to construct a file reader that extracts a file without doing any copies, because it can actually find positions of that file with data chunks and set those in the buffers and send it off like that.

Audience Member: File position I/O.

Jon Wätte: The problem is going to be BPositionIO is not a file so you might want to get the attributes like file type or there are special attributes like media formats and since the file handler deals with the file, then it can actually get at those attributes. If you have a BPositionIO, you cannot do that.

Jeff Bush: So in the case where let's say for some reason I was -- this file handler exists in another team, like the Media Add-On Server, which we provide the flexibility to do. I take an address of this BPositionIO and pass it to it. Within that address space, it means nothing.

In a way it would be nice to have that kind of flexibility, you know, where you could say I want to pass it anything. But unfortunately at some point you you've got to kind of cut off and make things a little bit hard coded for the sake of efficiency.

One alternative to that, however, is you could make the QuickTime actually take a multistream in. So actually the file reader would be dumb. All it does is takes raw file data, packs it into a buffer and then sends it. And then the multistream decoder will go through it and decode it and everything. So that way you've got the flexibility that you can pass buffers from anywhere like a network stream into there.

Kind of the disadvantage of that is that your file reader does not understand the concept of time. So any time that you try to make a node be time driven that has no concept of time, you get more complicated, and we had gone down this road several times, and you end up doing things like sending notifications back and forth, and it gets really complicated.

So yeah, at some point you just have to cut it off and say, this is the way it will work. And I think this is a pretty flexible way of doing it. I mean, there's nothing to prevent you also from using BPositionIOs internally, however you plan to do this within your node. I mean, you could make a node actually do this code, if that was the way you wanted to do it.

Any other questions?

So on to playing and seeking.

Basic playback is pretty simple as I said before. The nodes actually take care of most of this for you.

When you connect a chain, the first node is kind of the leader because it produces buffers, and everybody else in a way sort of acts passively because they receive buffers and they send them on.

So the way I kind of think about it, and you can think about it different ways. They have threads. They actually do their own thing, but when you connect a chain like this, the first node is always the kind of the leader, and everybody else is kind of passive. They take buffers and they link them.

So all operations I do, I do on the file reader. Start, Stop, Seek, and they actually will negotiate with themselves. The file reader will say, "oh, I don't have anything else to send you," you know, to further nodes downstream so they don't say, "where are my buffers?" So that's relatively straightforward.

Seeking gets interesting. On the MediaPlayer, you know, we wanted it to be as groovy as possible so we've got basically there's a position slider that shows you where you are on the file. You can grab it. You can scrub. You can fast forward. You can rewind. You can grab end points and move it in so it will loop around certain parts of the file.

So this brought up a couple of important issues. How we seek things in the file. How we determine our time.

At this point you kind of have to understand, you know, the concept, the difference between media time and performance time a little bit. What's actually going on.

From the application's side we really don't have to worry that much about what the performance time is or what these buffers are doing, and actually what's going to happen is that the file reader and there are other complications here, but the file reader basically does the translation between media time and performance time.

That's a little bit of an over-simplification, because like with encoding you've got other things, but for now let's just assume it hasn't changed and does that. We tell it, "go to this position." We're telling it a media time that we want it to go to and the performance times, it may still be playing it, but the performance time of the buffer has continued to increase. So it's taking care of all that for you. You just have to tell it that. So it's really a lot simpler to do it from the applications side.

So the first thing is fast forward and rewind. One way we could have done this, which we have not done yet. It would be nice. But probably the coolest way to do it would be to actually tell it, okay, play your stuff faster and play your stuff slower and it will actually fast forward the right way. But the one way you can do it that guarantees that will work on nodes that do not or cannot support different speeds of playback, is basically what we'll do is we'll wake up periodically, let's say every hundred milliseconds and we'll seek it ahead by a very small amount. So it will actually skip, skip, skip, skip, skip, which is something that we'll see and you'll notice that a lot of these things tend to do.

So that involves basically calling the Seek function on the media roster. And passing in the node and passing it a new time. The way that you keep track of time in this case, on the file reader is that you know when you started playing this clip. You can find out from the time source how long you've been playing, and you can actually extrapolate where am I in the file?

Keep in mind that the file is giving you no feedback about what it's doing. It's not telling you, okay, I'm at this time or at this time. You can't ask it because obviously you can't be sending it messages and be polling it and saying, okay, where are you now? There's overhead in sending it messages like that. So I actually will keep it indirectly just by keeping relative to times. So what I'll do is, knowing the positions of the file, I'll just stream, and when I seek it will actually start playing from that time and everyone else down the chain will follow along.

Audience Member: In performance time?

Jeff Bush: Yes.

Audience Member: Is there enumeration that specifies for us when we do the application, are we talking about frames? Are we talking about milliseconds or seconds? Are we talking about SMPTE? How do we pass what we want to forward to.

Jeff Bush: Time is always in microseconds.

Audience Member: Always?

Jeff Bush: Always. As far as these functions are concerned, we take a bigtime_t. It may be relative to different time source, which may have a different concept of time, but we generally don't think in terms of higher level like frames and stuff like that.

Jon Wätte: It is in the logical microseconds relative to the time source that you're driving off which is typically your sound card which may not necessarily be the same concept in microseconds as system time or an atomic clock would be turned. So if you have a sound card you know it's supposed to be playing at 48 kilohertz or 48,000 sample a second. Then it will publish time, every 48,000 samples it plays, it will publish one logical million of microseconds which may not necessarily be exactly a million microseconds in the real word, but that is the time it is derived from.

So if you have a Layla that would be the SMPTE and you wanted to derive the SMPTE from the time it has published, you can easily do that by using the timing class, because the timing that is published is the real media time, the underlying media time, and not the real world time. So if you wanted to derive a SMPTE time code, you just convert the microseconds to the linear time code and then you're done.

Jeff Bush: Which brings up a good point and kind of brings me up into the next. So that's good.

Audience Member: How would you, or could you access a particular frame? I mean, I guess you said they don't store in terms of frames.

Jeff Bush: Right. That's an excellent question. That's exactly what I'm about to say.

So that was the next problem that I ran into. You know, it would be nice to be able to scrub and be able to go to a specific frame. To be able to go frame by frame. There are issues involved, you know, with, you know, having a video encoding, you know, where you have key frames and stuff, but let's ignore that for a moment and say let's assume that everybody takes care of that.

Basically I want to go a frame at a time so what I do is I actually will just look at the format and it will have a field rate which in this case corresponds to frame rate and I'll determine the period of a single frame.

When I want to display one frame, what I will do is I will call Start, and then I will call Stop with a time that is exactly one frame beyond that. And the kind of neat thing about the way that the nodes work is that they will cue events. And this is where you start to have to worry about performance time. Seeking makes things a little more complicated.

So what I will do is I'll take my time source and I'll say okay, what is now? And then I'll give it a little bit of a fudge factor because I know that there's some latency. There's some amount of time that it will take to do this. And then I'll use that as the performance time of the stop.

So whenever you do starts and stops you always pass in a performance time. The performance times says when should you do this? I could say "okay, in one hour I want you to seek back to the beginning" and I could pass "an hour from now" as the performance time. It won't execute that demand until it gets to that point.

In this case we're dealing with smaller amounts of time. So what I'll do is I'll call Start with that time. And then I'll add the frame period and I'll immediately call Stop, but with that time. So I've queued up a stop and what it will do is it will start playing, as soon as it gets to that frame, it will stop and play one frame, assuming everything is written correctly in the nodes. But generally that's a very easy, simple way and that works very well for just showing a single frame.

Audience Member: Suppose during normal playback I wanted to keep it, say, a video and audio stream slaved to audio, yet say every minute or so I want to make sure that the -- that everything gets resynchronized to, say, some particular video frame into the video stream? Is there a way to do that?

Jeff Bush: Why do you want to resynchronize that?

Audience Member: I don't know why offhand, but is it possible basically to have a secondary master time source?

Jeff Bush: Well, I suppose anything is possible. It sounds like it would be a little more complicated. I mean, you have a concept of time already by this time source. Your video and your audio both know about this.

Jon Wätte: Okay. This is kind of intricate. You can't do that without skipping the audio. So what you have to do is you have to look at the time source that everyone else is looking at, and when it's not where you want it to be, you either seek the media or you seek the time source, which is when you will get time warps. The newsletter might go into excruciating detail about these and be updated again to talk more about it so, yes, it can be done and the details are to be worked on.

Audience Member: When is that going to be available?

Jon Wätte: I was under the impression that the Be Book was on the beta CD.

Jeff Bush: It is online now. There is some documentation now. I'm sure it will be updated again. But the stuff that's there will give you a lot of information and hopefully we'll continue to send this out.

Needless to say, that's an inherently complex thing when you try to do two things, and there are issues that are involved there, but yes, it is possible. It would probably involve making your own time source I would guess, but, okay, so any other questions? Does seeking kind of make sense?

So yeah, it's pretty straightforward, I guess.

Cleaning up. This is kind of the nitty-gritty details. It's important to be a good citizen and clean up after your nodes. We're improving this.

The original release of the Media Kit didn't deal as well when you didn't clean up things properly. We fixed a lot of that. You know, make watchdog threads, and there are a lot of details. But it still is always good to clean up your things properly, because if you don't, what's going to happen is things are going to have to time out, and it's better just to do it the right way.

So the first step is we must stop all the nodes in the chain that we've started. So I should mention at this point that when we first started creating this chain, we called Start on this chain. Now for the ones that were downstream from the file maker, the start didn't really do anything to it. It just said, okay, start. Get ready to receive some buffers, and whenever a buffer would come in process as I said before, it's kind of passive, but we did call start nodes, and that's important.

The first thing we must do is stop them. Starting from producer to consumer, and the reason is because they're sending buffers to each other. If you just start disconnecting them, you know, buffers are going to be en route and everybody is going to ask, "where did it go?" So we stop a node. We stop synchronously, which basically means that this call to Stop will not return until it's totally stopped.

So this makes us a guarantee that this will not send me any more buffers, assuming nothing is terribly wrong, after we call Stop. So we call Stop in the file producer. After Stop returns we can guarantee there are no more buffers. It won't send out any more buffers. And that's important because we want to make sure that, you know, if buffers are en route, that's bad. And then we continue going from producer to consumer so that basically the buffer queues, I guess you'd call them sort of just empty out.

The next step is to Disconnect all the connections we've made. Once again we Disconnect in order from producer to consumer. The reason has to do with ways we do about buffer ownership. At certain points nodes downstream should be where that -- will call their node upstream and tell it okay, here, use these buffers. They're mine, but I'm letting you borrow them like in the case of bitmaps, we'll use bitmaps to display stuff. So you'll have upstream nodes and it will use those.

So once again we Disconnect and from producer to consumer to make sure that they negotiate that and get all their buffers back the way that they do.

Finally, we call Release on all the nodes that we have instantiated. And that basically tells us, whatever you need to do, Release. Just do it, whatever you have to do, as far as we're concerned.

An important note here: the mixer is a node. You shouldn't Release it. You shouldn't stop it or seek it. Just be very nice to the mixer. It can do bad things. I mean, generally you'll affect other people. So there's not really a notion of security in the Media Kit. There may be a point where, you know, that can happen, but generally you have to be a good citizen.

New Information: In future releases of the BeOS, calling Release() on system-owned nodes like the mixer will have no ill effects. The new rule is this: When you are done with a node -- any node -- Release it.

So that basically covers playback. And a lot of these things seem like very little details. You know, hopefully we'll have the classes to cover those. But those are the basic things you'll have to deal with in any application is just putting nodes together. A lot of this stuff is done for your convenience. So that's basically an overview of how the media application works. Any questions?

Audience Member: I don't know what Be's policy is on this kind of stuff, but it seems like it would be really helpful if applications like MediaPlayer were available as source.

Jeff Bush: Yeah, absolutely. I'm not sure exactly what the plans are, but I know we are definitely working very hard to provide a lot of source. I mean, examples definitely help, and you know, we don't have a lot of really good ones out there. So we definitely are working towards that and hope to provide some source that can --

Christopher Tate: For example?

Audience Member: I was going to say as another reason to do it, especially if you want all your applications and nodes to be good citizens, having clear examples of how you are a good citizen is always helpful, in addition to the good documentation, of course.

Jeff Bush: Definitely. Absolutely.

Christopher Tate: The first thing that some of us in DTS are going to do after this conference is revisit all of the sample code and make sure that it's up to date and works and presents an appropriate example for exactly the right way to do all of these things.

In the case of playing media from the file there was a newsletter article that covers this topic a while back by Eric Shepherd called "Simple Player." That code is available. It might be a little out of date. We've haven't had a chance to look at it again in light of the changes for Genki, but it's probably basically sound.

Jeff Bush: Well, it's a pretty good citizen. But yeah, we'll get some --

Christopher Tate: One thing that we should restate is that when you stopped all the nodes, you started with the producers, and then you let the buffers flush downstream so you stop in the direction that data is moving in the stream. When you started the chain running, you start at the end of the chain and work backwards, generally. You say okay, get ready to receive some data and it's going to be coming.

If you don't do that, in some cases you might start a producer, which will then start generating buffers and send them along and then you'll start something downstream, and those buffers won't have really gotten -- they'll have backed up a little bit and so you'll be kind of scrambling to catch up with all this data.

So you can run into some jitter in your timing before things really get smoothed out to properly accounting for the latency.

Jeff Bush: Well, maybe two issues which I kind of -- it's important to first of all, connect the nodes before you start them. I think I said that. Obviously if they don't have somewhere to send to they get a little cranky. Yeah.

In general one thing you -- the kind of subtle detail of this is that you have to keep in mind that when a node starts, it potentially reads some data and it learns some things about it. For example, the video decoder doesn't know what the app ratio or the color depth or frame rate or anything of the video is until it actually reads some of the stuff. That's why you connect them in order, and you start them in order.

The other thing is also I'll quickly mention what Preroll is. Preroll is basically it's an optional thing, but it's generally good to call it. You can call it on a node and say okay, get ready to do whatever you need to do.

Generally when a node starts, there is some time consuming stuff that has to be done. It may have to read some headers or build some stuff up. So if you did that always, when you got the start the first time, then you would be late but on the back. Generally your first frame would be late and you'd be playing this catch up game until you got caught up. And that's generally not a good way to do this. So you call Preroll and that gives it a chance to do whatever processing it needs to do before it gets started. So we generally always call Preroll on the file nodes before we call Start.

Audience Member: I have two questions.

Jeff Bush: Sure.

Audience Member: You mentioned you need to stop each node, and you said you also have to wait for the buffers to flush out. Do you have to wait or can you just sequentially go through and say stop, stop, stop, stop, and if you have to wait, how do you wait?

Jeff Bush: You don't have to wait. Well, first of all, it's important to mention that generally you never -- when you're in application you don't wait to do time. You just pass performance times.

If you were going to wait, you'd pass a performance time for that. You don't have to worry about that. You stop synchronous from start to finish, and the stop like I said, guarantees that the node is completely stopped before you go to the next frame. So you're safe.

And what will happen is when the node gets disconnected, it should be smart enough to recycle all buffers before cleaning up. So as long as you're not interrupting a stream it's playing, it will be happy.

Christopher Tate: Let me go into a little more detail there.

Audience Member: Is there a guarantee you won't interrupt?

Jeff Bush: Well, you start from the beginning.

Christopher Tate: Hang on a second. Starting, stopping are all as far as the nodes are concerned, those are the events that it needs to deal with just like receiving a buffer is an event that it needs to deal with.

When your application tells a node to start, what you're really doing is you're saying, there is an extra parameter that is the performance time to start or to stop, at which that operation should be performed. And that's again in terms of the time source that that node is slaved to.

So your application can say "start at some point 150 milliseconds in the future and then stop again a little bit after that" and you go 150 milliseconds and okay, start... and now I'm stopped.

When you are shutting down the stream of data and you want to get rid of the nodes, you probably want to just stop, guarantee that it is stopped and therefore, safe to be deleted, and you won't affect anything else about the stream after it's deleted and released from the system.

The synchronous stop is: you tell the Media Roster stop this node and don't return from that function call until the node is stopped.

Jeff Bush: No. There is a good point though. There's a port and it fills up. It may be possible that that node has sent several buffers and they're waiting in that port. So when you stop the first one, you know that it's stopped, but you don't know that there's not still buffers in the port. What happens is when you call stop in the second node of the chain, what it will do is it will actually read all of the buffers out of its port and recycle them and clean up after them.

Audience Member: If you stop them in order, that will already have happened by the time the second stops, it will stop them synchronously?

Christopher Tate: Yes, depending on how fast buffers are formed.

Jon Wätte: No, because the first guy stopped synchronously, then there is not a race because any buffers that were already queued, before you lead the queue to the second node.

Christopher Tate: You had a second question.

Jon Wätte: Maybe you didn't hear that in the back. There are two ways to stop. There is asynchronous stop which is like any event, you queue it: "At this point in the future please go ahead and stop yourself."

And there's the synchronous stop, which is "when you receive this message, stop dammit." And don't return until you promise to not produce any more buffers.

And if you use the synchronous stop and you stop the topmost producer first and then each node down the chain in kind of head first order, then you will guarantee that nothing bad will happen to buffers, because the first node will be stopped and not producing buffers before that stop call returns to you and you stop the next node.

Therefore, your next stop demand will be after any buffers that the top most node may have already queued to the next guy.

Therefore, your synchronous stop will come after the last buffer that the previous guy sends, and therefore there is not a race in the tear down. Again, this is kind of internal that if you just do what we tell you, then you don't need to worry about it, but on occasion I understand that it's important to know why we tell you these things.

Christopher Tate: You had a second question, and then we can take a little break and let people go to the bathroom.

Audience Member: I'd like to stop and ask another question. A lot of file filters or file readers may not support future enhancements to that file format. For instance, color, space, types. How can we appropriately find the appropriate node when we haven't gotten any information back from the file reader and we're looking for a node to handle this file type?

Jeff Bush: You're saying you get a format --

Audience Member: Let's suppose format X and there is X version A, B and C, and some nodes may read all of them, and some nodes may read only version A and B.

Jeff Bush: Like you could --

Audience Member: Would we have to go through the sequence until we finally connect and get a failure and have to go back and search for another node that supports version A?

Jon Wätte: If you're a node and you say I can handle this file, if you're a codec, then you will be told about what form of this is encapsulated because there is the file format like QuickTime or AVI or ASF, and there is the Indeo encoder or the Cinepak video encoder and both of these come in different versions, but all the file splitter or producer needs to know is what is the version of the file format? Is this a QuickTime version or video file? And if it can handle it, he says, I can handle it, and then he can handle it. And otherwise, he's promising something he can't fulfill, which is a bad thing.

Inside this file, there may be an Indeo 5 or an Indeo 4, or an Indeo 3 stream, and when the codec is registered on the system and this is a node specific detail, you can just assume it works, but when a codec is registered on the system, it registers itself with the specific version format. So again a codec will not promise. I can have the Indeo, and have it only handle the Indeo 5 and the file is in Indeo 3. It says "I can handle Indeo 5," and there has to be some negotiation between some commonality in language between the file producer and the decoder so that they know what Indeo is and what version 5 is, but that negotiation is per file format.

Like AVI has this way of identifying codecs. QuickTime has this way of identifying codecs. And we didn't want to define the Be way of identifying codecs just to get more confusion in the fray. We just say that the Be way of defining it is a big unit and then you say which is this for the Indeo way of identifying or for the QuickTime way of identifying. So the codec can register -- I used Cinepak codec version 3 for the QuickTime. And also the Cinepak version 2.5 for AVI, for instance. So the system is there to make it work so you shouldn't come into the situation that you describe where someone says I do Indeo but then I couldn't do this version of Indeo. Does that answer your question? Good.

Jeff Bush: Any other questions? Either everyone totally gets it or you're --

Audience Member: How is seeking the file handled if the file format doesn't support seeking to an arbitrary position in the file? For example, MPEG can only seek through certain positions, or if the format requires that you decode a certain amount of the file before the position that you want to seek to?

Jeff Bush: Right. Right. Sure, that's a good question. The first thing is, you can return an error from Seek. It would be kind of lame, but you could. Basically and generally in most formats like MPEG, you can position yourself in the file and look for a start code. So you can fake out a seek. That's the best way to do it. So I guess you should handle the case where a node says I can't seek and then you can't seek. But in general you can. So --

Jon Wätte: There is enough back information from the codec to the file format so that they can collaborate and actually figure out where to seek. For instance, if you have to seek every ten frames and you're seeking somewhere in the middle of the performance. There is enough of a back channel that you can make that happen. Run through all those five frames before it's time to run through the next frame. Again, whether that works or not or maybe the coding just gives you bad data until the next key frame is found, that is a default file format and codec.

Jeff Bush: So yeah. I mean, in some cases, if it doesn't handle it right, you might see little screen boogers like you'll see if it seeks into the middle where there are ways that the nodes can negotiate key frames and seek both. There are ways to negotiate a file where the key frame is. And if in general it has an application, then you can assume it can seek and if it returns an error, then assume that it can't seek.

Christopher Tate: Any other questions? It's about ten after 2:00. We're going to take about a 15 minute break. Lunch was only an hour ago. So try to be back here about 2:25 or so, so that we can kick off and do more excruciatingly wonderful detail.


Back to BeDC '99 Table of Contents