Introduction to the Media Kit: Media Roster

Be Developer Conference, April 1999

Introduction to the Media Kit: Media Roster

Stephen Beaulieu: We'll start back up again. Thanks for your patience. Sorry we got a little ahead of schedule this morning. We made the plan that we probably wouldn't start exactly at 9 o'clock, and then when we did, it kind of threw us off. Hopefully we are now on schedule and slow things down and make sure we cover all the issues.
What we looked at so far again is basic overview of the Media Kit. We've looked at the fundamental pieces you use everybody knows about to actually build a media application with the pieces the nodes use to talk to each other.
What we'll talk about is putting them all together. The rest of this morning's session I'm going to get into the meat of the concepts of how things work, why they work the way they do and when you -- we're going to finish up the morning section with application node interaction. So when you say "Media Roster, start this node," what does it mean from the node side? So that everyone goes into the afternoon, regardless whether they're going to the app track or the node track, has an idea, okay, this is what I do, and when I connect these two nodes this is what the nodes do in the background. So you have an understanding and the node people will understand, okay, when this function is called on me, what am I in the middle of?

Let's actually move on. Putting them together. Putting the building blocks together.
So first thing we're going to do is talk about time and the Media Kit. As we said, time is exceedingly important. It is the fundamental thing that drives the Media Kit. And what drives sending buffers downstream is timing.
We're going to talk a little bit about the differences between real time, media time, performance time. We're going to talk about latency, what it is, how nodes use it, how apps use it.
Then we'll get into the Media Roster. Every application is for actually doing media work and the sorts of things that the media roster allows an application to do with nodes. And then, like I said, on to the interaction after that.

Time and the Media Kit.
So as I said, kinds of time. Media time, real time, performance time and the differences between them.

Media time is real simple. You have a file, a video file. It is five minutes long. Media time is how far into the file you are in terms of time. Are you a minute into a file? Are you 30 seconds into a file or whatever? It is a basic concept of where you are.
The people who really care about media time are the end points, the people either reading stuff out of a file or people writing it to the file at the end of the chain, and it's used mainly for seek operations. You are playing information out of a file at times one minute, and suddenly the user wants you to play it at three minutes. Use media time to do that.

Real time. Also a fairly easy concept. Real time is the time as far as the computer is concerned. The rest of the Be Operating System goes off the CPU's clock. So the CPU says it is Thursday, it is today, 10:30. And that gets translated into some number of microseconds since the system start. It is what everything else in the system uses.
Real time is the interaction, is what has to be used by media applications when they want to interact with the rest of the system, when they want to send a message to someone and they have to write to that port, or if you are actually reading from a port and you only want to wait a certain amount of time before you have to handle your next event, for example, before you have to start or stop. When you pass that time, it has to be in real time so that the system can understand it.

So let's get to the big one. Performance time. You need to understand this concept. What we're going to do after I get through these next two slides, if you don't understand performance time, we will stay here in performance time until you get it. The Media Kit does not work unless you understand performance time. If we do anything else today, you have to know that or you won't be able to write the nodes or the app.
Performance time is the time an event is supposed to be performed. Think of it again from the audio example. Performance time is the time that the sound comes out the speaker. Performance time is the time that the video frame shows up on the screen. The time it's performed.
Performance time is going to be reported by some sort of time source. Some sort of clock that says it's now so the data that is supposed to be performed now should be performing now, which is very clear I'm sure. Let's step back.
You've got a clock and the clock says it is time 50, and the clock -- generally that keeps track of performance time, is at the end of your stream.
So, again, at the audio example, inside the actual audio card there is a digital-to-analog converter, a DAC, that it uses for its time for determining what sound to send out, to actually write this to the hardware so it goes out to the speakers.
So in many, many cases, the time source for your application, for your applications node chain, it's going to be the audio time source. It says I'm supposed to be playing the data that is that supposed to be playing at time 50. If there is no data, I'm going to play silence. It's at the end of the stream generally at the time things have to be performed.
We have this concept of a time source which we can make that clock accessible to applications and the nodes, if they can see what time it is, so they can make sure they can perform everything -- they can do all the work they need to do so the data reaches the performance place at the proper performance, at or before the proper performance time.
Generally, there is one clock for a set of nodes, one clock that says here's what time it is.

So let's move on to what the heck the difference between performance and real time is.
So for starters, the reason why performance time and system time can differ is because there is a different piece of hardware that actually is the clock that says what time it is.
So the system which keeps track of real time, the CPU says here's what time it is, it is time 50. It has been 50 minutes since the system has started up.
So the audio card is also keeping track of it in the same sort of concept of microseconds. So the clock gets started a little bit later than the CPU. And its concept of what a microsecond is or a minute is could be slightly different because it runs at a different rate. So it might think it's 49 minutes and 59 seconds as opposed to 50 minutes.
So generally you can get a drift between performance time, either because the hardware runs at a slightly different rate or because they start at a slightly different offset.
Does that make sense, hopefully, to people? Good.
That is one reason why performance time is going to differ. That is when you look at the very end of the chain. The other reason that performance time is different is because of latency.

So what is latency? Latency in general is the time it takes to do something. I have a buffer of audio that I need to change in some way. It takes some amount of CPU effort and some length of time passes to actually perform the manipulation on that data. That is processing latency inside a node. It takes me a millisecond to perform whatever action I need to perform.
There is also a concept of downstream latency. If you're the upstream node, you're the ultimate producer, you're reading audio out of a file. If you have some sort of latency that takes into account you have to hit the disk, read the information out of the file, put it into the buffers to get it ready to send downstream.
Then there is also the concept I've got a filter node, the mixer and the audio output card downstream of me that I need to send this information.
So the filter is going to take some amount of time to process some things. The mixer is going to take some amount of time to process things. The sound card will take some amount of time to actually get the data to the buffer to when it plays. And transferring the information from one node to another is not instantaneous. They're different threads. You're going to have some context switch, some sort of basic overhead, so it does not magically appear. Work has to be done. That is the downstream latency. Anything outside of your node is downstream.
This combines so it is a large number, you know. For some value of large. It could take, you know, 50 milliseconds from when you start -- for when the file producer is ready to send something downstream to when it is played. It could be for any reason. It takes time to transfer it. So it's important.

So let's show an example. So look at the clock. It's performance time 40. You're in the node, and what you have, you have a buffer that has been designated to perform at time 50.
The first node has 3 ms of internal time to produce the actual node. Three units of times. We'll call them milliseconds, even though that is not a good example, so it will not take 2 ms to transfer, but milliseconds, so we have something better than an "um".
Time's 40 ms. A buffer is supposed to be played at 50 ms and there is a total latency for the chain of 10 ms, therefore, the performance time on the buffer is 50.
The performance time at the output is 40, which is pretty much equivalent to real time. It's -- that's the time everything is in. It's close enough for the downstream.
But this node, Node 1, has to start doing its work 10 ms before then to make sure it gets delivered on time so that the buffer is performed at the proper performance time.

So let's go ahead and move the buffer. So some time has passed. Five milliseconds has passed. It's finally gotten to the next node.
The performance time of the buffer still hasn't changed. We have five more milliseconds to start at. Five milliseconds of total latency, two inside the node and another three downstream of it. So it needs to get that buffer and start producing the five milliseconds early.
So it gets it. It gets it on time. Node 1 has behaved properly. It has delivered on time and it's there. So now Node 2 starts it processing and passes it downstream.

Now, there is one millisecond of internal latency in the final node. It receives its buffer at time 49 for buffer that is supposed to be played at time 50. It does its millisecond worth of work and finally..

..you're done. It's performed. Everything works fine. This is the way the system is supposed to work.
Now, it does work when the nodes behave properly. The key to this is that every node, when it says how much is my processing latency, how much time does it take me to do something, needs to report the maximum time it could take for them to do that processing. Because we don't want a system whereby it says I can generally do my processing in a millisecond, but it could take as long as four. Well, if you don't know what the data is beforehand, you need to report your internal latency as four so you can make sure everything comes and is performed on time.
Is this clear? Does this make sense to people, the difference between performance and real time? Question?
Audience Member: What I find a little bit difficult to understand is how are they related. Time isn't an absolute quantity, it is relative always. So what does performance time mean? I mean, you have a performance time 50, but what does it mean? When will it appear on the output?
Stephen Beaulieu: Okay. When it will appear on the output will depend upon the output's time source.
Like I said, if the clock on -- if the driver for the sound card starts up a millisecond after the system starts up, it will -- its concept of time for what time it is will say it's time zero, but that's actually a millisecond in as far as the computer time. It says for every 100 milliseconds that go by in the system clock, I only report -- or say every 1,000 that goes by, I only report 999.9 of those milliseconds.
Audience Member: Clock drift.
Stephen Beaulieu: You have a clock drift.
There are things in the system, in -- there is this idea of time source object in the system in a node, so we talk about a kind of node. It understands the relationship between that clock's sense of time and real time in the clock.
What it does, it periodically says it is now, this is my concept of now, this is the performance time, this is the actual real time in the system, and here's how much I seem to be drifting from that time for calculating in the future.
So each of the nodes and the application has a time source set for it. It can ask what time is it. It can also ask, given this performance time, I'm supposed to perform something as 50. Well, what does it mean?
It can ask for the real time for that performance time, given what has been its reported latency, and the time source will give the information back to it. Well, at the current rate, to actually deliver to buffer at time 50, you need to start at time -- at real time "whatever" which is what is then used for doing the system interaction.
We have classes. That's the time source class in a node. Both applications and nodes can get access to that and figure out what time it is. Does that answer the question? Great.
Audience Member: The number system for denoting time, is it a circular number system or is it a very long number?
Stephen Beaulieu: It is 64-bit integer signed.
Audience Member: Is there a convention for a short form, a circular number system, perhaps 16 bits, which is compact that you handle inside small DSPs?
Stephen Beaulieu: If you had a DSP -- someone would have to translate. There is not a convention for doing it right now. Time in BeOS in general is handled inside of 64-bit integers. You can have the normal C-calls that can get you a 32-bit integer. I don't know of a call that will get it down to 16.
Audience Member: The least significant bits and the proper conventions for handling rollover of a circular numbering system can keep this a little more efficient in terms of handling.
Stephen Beaulieu: Yeah.
Jeff Bush: You're talking -- it is a nightmare.
Stephen Beaulieu: What I can say now, and I'm sorry, I don't have all of the answers. What I do know, time is 64 bits, I do not know of any conventions for keeping that, for keeping accurate time less than that.
That is kind of the issue. We deal in microseconds since start. To be able to manage our low latency, you might need to wait for only 500 microseconds. You can't get that sort of accuracy with the other numbers unless you do the rollover, like you said. But I don't know of any provisions to do that. That is again something we would need to talk to the engineers about, but I don't have that answer. Sorry.
Audience Member: Are there functions to figure out like how fast your disk is running or how long it would take to just do a MEM copy? You want a real accurate and sort of maximum value for your processing time, but I don't know how to do that. I don't know how to ask a disk how long could it possibly take you to seek to this point.
Stephen Beaulieu: There are a couple of key issues which we'll get at in a little bit. Run mode. Things like read from disk, that is an off-line process. You can't get any guarantees about how fast that is going to go, which means a file reader needs to be intelligent enough to read ahead before the information is needed.
But you can get into a bottleneck where you are just doing -- so much processing going on or you're trying to produce so much media. If it is a file writer, for example, the buffers are coming faster than can be written to disk. That's just a problem with the system.
Now, in terms of nodes -- of the hardware that you are dealing with. And what can be done is the actually writing of the information can be taken out of the loop so the file writer, for example, has -- reports some sort of latency for buffers. And it has a big circular buffer chain and then it passes off responsibility of actually writing to disk to an actual thread so it can keep the latency, as far as buffers are concerned coming in, low.
You will still run into the problem if you can't write to disk fast enough, you will start getting stalls and glitches if you are trying to do it in realtime. But if you are writing to a disk, you want all the bits in the file. You're not trying to do it in realtime. That depends.
In terms of internal to a node, one of the things we suggest that nodes do is when they get connected they know what their format is. They know in general what needs to happen.
They can do a test run to determine, oh, well, here's my average case or this is the worst that it can be for me. It creates a bit of information, runs its process loop and says, oh, on this machine this is how long it takes me to do this in the worst case. That's what I will report. So that's what I probably suggest doing.
Audience Member: If I have multiple streams, is one of the streams designated the master performance time clock so that everything will be sync'd up to that, or would each stream have its own performance time?
Stephen Beaulieu: If you have -- I'll dive through it throughout the day.
If they're completely independent streams, you could set them up with different sets of time. However, if you wanted to keep them in sync, you would pick a time source that could either -- one of the members of your stream or some external time source -if you're not doing audio, just multiple video streams that you're working with, you can still use the audio source as a sense of time.
You can set all of the nodes in the chain to say pay attention to this clock, and that's what we suggest people do to deep things in sync. But there might be some circumstances -- you don't have to do that. If you don't need to do those in sync to do your file writing, you don't have to. But that's what you should do in other cases.
Audience Member: If you have a node that's essentially configurable, like a video processor, in one node the latency is like a millisecond, another node it's like 40 milliseconds. In general, I think you would want to report it as one millisecond if it's in the first mode.
But the question is, is there a way of notifying the upstream members that the latency is changed if it is reconfigured?
Stephen Beaulieu: Yes, there is a way to do that. And we'll go into it in the abstract. You send a notice upstream saying you're late.
Now, they've been producing everything on time, but you send the notice upstream and say you're now late by 39 milliseconds, it not your fault but I need my information 39 milliseconds earlier. They should go and correctly do everything appropriately for doing that, depending on their run mode.
Again, that's -- that's in the afternoon sessions on both sides. We'll be touching on exactly those sorts of issues and how nodes need to deal with that and how apps should set things up to make sure they behave the way they expect it to.
Audience Member: Well, I think you went pretty far to describe it. What happens if I am late, something beyond my control that delays my ability -- or say I misguess, and all of a sudden I'm outrageously late. What are the consequences of the system and how do I recover from it?
Stephen Beaulieu: It depends. We will get this this afternoon. But basically you can tell a node to exist in certain things. So you can say you're in an area where if you are late you should drop data until you can catch up. Don't sent me those buffers, just figure out, keep skipping ahead of where you are supposed to be until you can catch up.
Or you can say, well, I really do want every buffer. I'm writing to a disk. How fast I do it or how slowly, it doesn't matter. So I set everything to an offline mode, in which case you don't actually care about time, you just care about producing the buffers writing to the disk. Each piece can do it as fast or as slow as it likes because you're not constrained by having some sort of live performance that you need to keep each bit in sync with.
So the application can specify for every node what sort of mode it should be in and how it should behave in case it gets late.
Like I said, we have guidelines for saying if you are writing that sort of app, here's how you should set the nodes up. And then the nodes seem to be able to handle being in that mode and do the right thing.
Audience Member: If I'm a node and I do a little preflight of my process before show time and I come out with a number, once I'm in the pool now with like 10 other nodes, they're all working at the same time, can I expect the same performance I got in my preflight?
Stephen Beaulieu: It might differ some. It will honestly depend on how many things are up and running in the system. You can make a point that you can become CPU-bound. The processor just can't handle it.
Audience Member: That's why I asked the question. How do you know ahead of time?
Stephen Beaulieu: What you can do is, do a best-guess estimate. If you discover it changes, we have systems so that you can report that information upstream to basically say it's taking me this much longer, and if all the nodes are well-behaved and do that, then they can sync themselves in.
Audience Member: Is there an almost-late, which is to say, fix it before it fails?
Stephen Beaulieu: In almost. I guess I'm not --
Audience Member: It sounds like the late mechanism is quite elegant in the way it can retune the pipeline. Is there an almost-late mechanism, which is to say if you have a guard zone buffer upstream, you actually want to retune the pipeline before this buffer runs dry?
Stephen Beaulieu: Yes.
Audience Member: There is an almost-late construct that allows the pipeline to retune before you have a gap in your audio stream?
Stephen Beaulieu: No, there is no actual construct. Each node would have to basically say I know I'm supposed to produce this data at this time. If it is a filter in the middle of a chain, it still has the concept of the format and it knows how often it should be getting things.
If it finds the time it is waiting like is increasingly getting closer, it can basically say bump it up beforehand. But I don't think we have a construct for that.
Audience Member: In the output stage just before you make it to your D-to-A converter, it is late for going into that guard zone cube, not actually being late for the DAC. So that if you are running short on time, not running out of time, retune the pipeline.
Stephen Beaulieu: Right. That would be a node implementation. We don't have anything built into the system that does that automatically. The node writer would have to do that in the node if they thought it was a good thing to do.
Audience Member: I would figure that the I/O structures would take on the responsibility of the almost-late, and the guidelines in your API to implement an almost-late.
Jeff Bush: That is in there. You can implement it.
Audience Member: When you jump in these nodes --
Stephen Beaulieu: Yes.
Audience Member: You described performance time primarily as a playback function. What does it look like in a capture, capture process, save to disk or sent over a network?
Stephen Beaulieu: I will do it quickly. That is also coming up in the series of slides in this afternoon in the two different tracks.
Basically, we have a run mode called recording mode in which case what happens is nodes in recording mode, the actual captures, don't stamp the buffer with the time it is supposed to be played at the outside, they stamp the buffer with the time it was captured.
And then -- so that's useful if what you are trying to do is keep in sync, like a live chain capturing audio and video to process them, and I need to make sure the right frames of audio go with the right frames of video out of the stream.
What we have is the ability to -- you capture it in a record mode so that you have the same number at the beginning, and we do a transformation inside the kit to real performance outside.
We do have some mechanisms for doing that. That is a little complicated to get into now. We'll touch on that later in the day. And if we haven't quite explained it, ask us after we have done the first pass with the slides.
Audience Member: I just wanted to point out, when you are doing measurement of your own performance, you might be using some artificial data that is small, so get everything into the cache so your performance looks better than it really is. When you are testing your own performance, I think it would help to allocate a big buffer and slam it all through the cache and measure it so you are measuring also the time it takes to load the cache into data.
Stephen Beaulieu: The way this works is, don't do that. Basically at connection time, but before you are started, if this is how it is going to work. If you are connected in such a way everyone is connected and then you start everyone, we've got a notion, which we will get to in a little bit, like a pre-roll. Do your preflight, do whatever you have to do to start on time.
But you have the format that you can connect with. You know the size of the buffers that could potentially be sent to you. You would use one of those buffers and do it accordingly. You wouldn't want to run a little test program because that is meaningless.
You wouldn't want to take something small to fit in a cache. It's different from how good can it be. How good can you perform and how good will you perform under the set of circumstances you have to be performing at. And you want to run your preflight test on what you actually have to work with.
Audience Member: I'm a bit confused between performance time. If I'm doing a video record node, I already have got time within my SMPTE code. How is it correlated?
Stephen Beaulieu: For starters, if you have the right hardware, you can sync every node in the chain to the SMPTE. So it is the same. Our positive time source allows you to have external time source to say here's what it is.
But the SMPTE time -- I don't know enough about SMPTE and I don't know enough about the video. Chris Tate, one of our DTS engineers, will be able to.
Jeff Bush: Let me speak very briefly about that. In general -- and we'll be discussing that a little more -- you would prefer to drop video data than audio data because people will notice dropped audio more.
If you have got multiple kinds of data that you are coming in, you've got video that's SMPTE time-coded and audio from some other source that is not SMPTE synchronized, you need to decide on some single canonical time source so that you are looking at time stamps on both of those kinds of data in the same time space.
So the video capture node would have been instructed that instead of stamping buffers with the SMPTE time, it should stamp the buffers with the audio's time source.
And that node is responsible for converting from its internal time clock source as close to that same representation as it can get with the other time source that allows the downstream nodes that are handling the data to try to synchronize as closely as possible.
They will drift apart. They will not line up appropriately. We will get data loss one direction or the other, or you will have to interpolate, depending on what your application is.
It is tricky to synchronize sources that are not external and synchronized. And if you really don't want to lose data and keep things, you have to use an external clock. Thank you.

Stephen Beaulieu: Because we've started to get into some of the issues, we're going to move on to talk about the media roster and the interaction because we have an hour left now before lunch.

So the Media Roster. So we're talking about what it is and the sort of things you can do with it, specifically, finding and creating nodes, connecting them, controlling them and telling them to display user interface.

What is the Media Roster? It's an object that manages an applications communication with nodes. Every media application needs one of them and every media application gets only one of them. You have one common interface.
The way you do it is just a static call to your Media Roster. Again, it is important to note that applications deal with the Media Roster, nodes don't. Applications only deal with the Media Roster. The Media Roster is how you do everything.

Let's move on. You can find and create nodes. So a couple ways you can do that. We have shortcuts to common nodes, and we can browse and create -- browse all the available nodes and create them as necessary.

The shortcuts are really easy. You have one system mixer, one node that the BeOS creates that is hooked up to the default sound output that is the mixer for that output.
We don't make you have to browse through all the potential nodes to find it. We give you a simple roster call: GetAudioMixer. We also do that for the default video input and output and the audio input and output, the really common nodes that everyone in some case or another needs to use if they're capturing or performing.
We give you the shortcuts. There is no need for you to do the work. That also allows, in our new media preferences panel, if you have multiple video capture cards, you can decide which one is the default input, for example.

So we have a different concept when you are dealing with something that is not a default node. If an application wants to find out all of the nodes available in the system, what could I possibly do with everything, what are the capabilities of this installation of the BeOS with these apps and this machine.
It can go through and find dormant nodes. Dormant nodes are nodes that live inside of add-ons but that haven't -- may not have been created yet. They might, and we have ways of figuring out what nodes are up and running in the system.
But if you want to figure out the full capabilities of the system, you need to go through a dormant node. Each dormant node is a handle to a flavor of a node. A flavor is a full description of what that node is, what its name is, what sort of formats.
You can get what sort of file formats can this particular node from this particular add-on handle.

So this is again fairly straightforward. The application talks to the roster, and it basically says I want to search through all the live nodes or the dormant nodes. And what the roster will do is return them a new dormant node structure that basically says here's something that is available which you can then ask for. Okay, I'd like more information. It is an iterative processor.
You can start walking through all of the available nodes. For each of those that come back, you can walk through all the flavors so you can see the entire system.
You can then ask the media roster to create one for you. You have found your Cinepak decoding node. You have the one that does that. You can ask the roster, make it for me and return a handle to me, a media_node construct that I can use to talk to it and talk to it later.
That is how the system works. Again we'll go into much more detail this afternoon how the apps should do that, and from the node side, how you should report that information back.

Another thing you do with the Media Roster. Again, for any node that's been created, you can also browse all of its available connections. You can browse for all of the connections that are actually hooked up to somebody. They can also browse for how much potential connections can I get set up.
Then you can also, after you determined what connections are available, an application can tell the Media Roster break this connection or break this connection between the two nodes or create a connection between the two nodes. Again, we'll go into detail on how you do it in functions a little bit later.

You use the media roster to tell it to do two things. You can tell it to do a preroll, to actually prepare to get started. Tell them to start, tell them to stop.
You can tell any node to seek, but what seeking means is different. For a file reader, you tell them to seek to a new position in the file. You can also set a time source for a node. And you can set its run mode. And you can set its play rate.
Nodes can support, you know, playing at half speed or something like that. That's extra work they have to do and cannot guarantee that every node will be able to do that correctly, but you can certainly tell it to. It just might not behave that way.
Hopefully every node -- you guys are all the node writers. Make sure your nodes can support play rate because people will occasionally ask you to do that.

The Media Roster also can get you access to a user interface for a node. We talked earlier about a node kind called a controllable. Basically what a controllable is, the node can present its configurable aspects. The parameters that you can set at change. And the roster has a mechanism for allowing you to get that parameter web, a list of all of the parameters that a node has and actually create a user interface for it.
You would use this, for example, if you have an audio-editing application and you would like to control the default game on the audio mixer. You know, does sound come out the speakers or not. You would like a little slider that controls that.
You can use the Media Roster to ask that node, give me all your parameters, and either build a complete user interface board for that or just pick one particular control out of it to use. And, again, we'll cover that in how you do that in the application section and also how nodes would implement that.

Back to BeDC '99 Table of Contents