Media Application Track: Live Processing

Be Developer Conference, April 1999

Media Application Track: Live Processing

Christopher Tate: Let's talk about the last remaining case where it is the last basic application or node chain architecture that you're liable to run into, and that's where you're doing simultaneous capture and playback. Things don't get quite as strange as they do in offline here, but there are, of course, some issues that you need to be aware of, and some special handling that needs to happen from the applications side.
So we've got live media input and we're routing it to live output. This might be simple audio capture to audio output. This might be audio and video in sync. But the general principles are the same.
I'm going to skip over how you would go about finding the node to handle video input and finding the node to handle video output. We talked about that earlier on.

ChromeVideo was a piece of sample code that was supplied with a newsletter article a few weeks ago. It is today proof-of-concept code. That means that you probably won't actually be able to get it to run. It really, honestly, I swear to you all, has been seen running in the Be offices. Really.
Owen Smith: It's really cool, too.
Christopher Tate: We really wanted to show it to you, but like I said, you probably won't be able to get it to run, and we usually can't get it to run. It will be much, much better and will ship as part of the fairly substantial sample code and explanation package that we hope to get out to you all in a couple of weeks.
Audience Member: When you say ship, can we download it from the FTP site?
Christopher Tate: This will all be downloadable from the FTP site just like all the sample code.
In general what ChromeVideo does is it sort of does drastic things to each buffer of video data in a way that lets you know that obviously something is happening. I'll leave that as a surprise exactly what it does.

So you figure, well, we're recording input and we're playing back output so let's just hook up the nodes. Input is recording, and then we're doing things in real time and there is real time output.

On this slide I say playback run mode. By playback run mode I mean the three "drop data" or "increased latency" or "decrease precision" modes. The non-recording, non-offline run modes.
What happens if you do this? Nothing. Nothing at all. You start all the nodes and you get nothing at all on this screen. It probably doesn't even paint a blue screen. It probably doesn't even paint a blue screen or a white screen or a black screen or anything. It leaves whatever bits were there when the window was created.

That's because as far as the producers are concerned, all of the data is late and needs to be thrown away. Filter nodes, nodes that will be forwarding things, may try to increase latency and tell the producer to produce earlier, which doesn't mean anything in recording mode. But the ultimate consumer will probably choose to simply discard data. Bye-bye, it's gone, nothing happens.
As far as the consumer can tell, it should have already been displayed quite some time ago. It's obviously out of date and displaying it now would be wrong.
In Genki there is now a way to accommodate this particular circumstance without having to put in some other nodes whose job it is to restamp the time stamp to say, "okay, well, it's not actually late anymore. You should play it in the future." We have built that facility into recordable producers, such that when the buffer is stamped, instead of marking it with the capture time, it is marked with the capture time plus an application-specified downstream latency. In essence, this translates from capture time to playback time.

So you still run your producers in B_RECORDING mode. You set a delay long enough to account for the downstream latency of all of the buffers that you are going to be playing in a synchronized fashion. I'll show you this on a slide here.

You might have multiple chains operating independently, a video chain and an audio chain, for example, with radically different latencies.
In a simple case you simply call SetProducerRunModeDelay on the video input node to account for its downstream latency. You can inquire of a node what its downstream latency is through the Media Roster, set the delay and suddenly you get output in the window and you get to see all the weird black and white effects that ChromeVideo does.

Things get slightly more complicated when there is more than one output node, more than one output stream that needs to be synchronized. Producers should still be recording, and they should also have the same time source. If you want data to remain synchronized through your system, then all of the producers that are generating that data from first principles need to agree on that time source. Otherwise, things will drift apart and you will get glitches. In the case of routing recorded data to playback, the producers need to account for whatever the downstream latency of whatever the longest chain is.

For example, if we're trying to synchronize audio and video here, they're completely independent chains. The only time source that we really have available that we need to slave something to is the audio output. The audio chain needs to be slaved to the audio output in order to make sure that the sound doesn't glitch.

But these two chains might have vastly different latencies. The latency of the audio chain might be 8 milliseconds. The latency of the video chain might be more like 70. If you have, for example, a VisioWave card with compressed data, you've probably got 30 to 50 milliseconds latency inside the card on input and there's nothing you can do about that, except account for it downstream.

So the way that you keep the playback synchronized is that you set the delay to the latency of the longest chain for all producers that need to stay in sync.

This is yet another summary of what I just told you. SetProducerRunModeDelay. Currently set producer run mode delay is only for B_RECORDING run mode. The node that that is a function provided for is all introduced as a parameter to the function that is not currently implemented except for B_RECORDING run mode.
Audience Member: Can you say that again?
Christopher Tate: One of the parameters to SetProducerRunModeDelay, you tell it what mode to apply it to, and you want to tell it what the delay is and you tell it what run mode that this delay should be used in. You can currently only apply the delay to B_RECORDING mode. If you pass another run mode, you get an error.
Jon Wätte: Just to clarify, it will actually also set the run mode of the node to B_RECORDING when you call it like that. So you don't need to first set the run mode and then SetProducerRunModeDelay. You can just call SetProducerRunModeDelay with the delay and the record run mode, which happens to be the default parameter around the end line, and the node will be in recording mode.
Christopher Tate: I want to talk about time sources here just briefly. If you want to keep everything in sync, everything needs to be on the same time source. You should use audio time source because drift between the hardware time sources and the Media Kit time source will eventually show up as dropped or doubled data. If you drop or double audio samples, you will get audible glitching.
If you drop or double video frames, you won't notice until the problem gets really bad. A single frame here and there is not detectable by eye, unless you're slow stepping through it, in which case it's not live any more.
So in this case everything would be set to the sound output time source, as far as the play time. This is sort of tangential and you may not be aware of it, but there is some audio hardware where you can think well, I can just capture audio and then play it back and then everything will be fine out of the same card. But there is some audio hardware like SonicVibes cards that use different clocks for input and output and they are not synchronized and cannot be synchronized. If you try to loop captured input data to output on that kind of audio card, you will get glitching no matter what. Buy a better card. [audience laughter]
Audience Member: Is that the [unintelligible] card?
Christopher Tate: Excuse me?
Audience Member: [Turtle Beach] Daytona?
Jon Wätte: Yes.
Christopher Tate: Somewhat better cards, like SoundBlaster Live! cards, you can choose to slave all of the internal clocks to the same crystal and everything is lovely. Really good cards like the Sonorus Studio or the Layla can take external clocking in a myriad of formats.
Ideally in this situation what you would slave to for all nodes everywhere is an external clock source, and you would use a Layla on the audio output and input and you use a VisioWave card and just slave everything to black burst and it's all synchronized all the time, and the only thing you have to do is account for the latency.
I don't know how much the VisioWave cards are. The Layla lists for in the $800 range right now. I'm sure that there are cheaper cards that can slave to external time sources. But the Layla is really cool.... [audience laughter]
Jon Wätte: We have drivers for the Layla.
Christopher Tate: Yes. We like the Echo guys, and they like us. Do you have any questions about any of this? If you want to go back to the beginning, this is the time to do it because we're doing different topics next.
Audience Member: All right. I've got a better version of my original question. Suppose that I need my video and audio to be slaved to some time so that my audio/visual segment length needs to be 6.5 seconds according to clock time.
Christopher Tate: Wall clock time?
Audience Member: Wall clock time. And furthermore, suppose that I have not the $800 audio card but one that runs slightly fast so that when it crosses it's 44.1 kilosamples it actually has only taken, say, 43 whatever -- 43 kilosamples or kilo -- it takes slightly less time to actually play it than the clock says. Are the ultimate consumers able to compensate for too much data?
Christopher Tate: I'm going to let Jon give you the really technical answer, after I give you sort of the shorthand answer. The shorthand answer is first you will lose audio data. And if you really need to make sure that they're synchronized within the amount of drift you're going to get between zero and 6-and-a-half seconds or even between zero and an hour or whatever, your cheap hardware is not -- even though it's cheap -- going to be off by so much that unless you're in a real pro audio production situation, you're not going to care about being off by two seconds at the end. If you really do care, then frankly you should be able to afford a Sonorus Studio card. Now, Jon might have more to say about this.
Jon Wätte: There is another time source. There is the GetTimeSource default time source, which is derived from the audio output. There is another time source known as the system time source, not because the system is using it, but because it's derived from system time. The system call that returns wall clock time in microseconds. So if you set all your nodes to slave to the system time source instead of the default source, then they will all derive their meaning of time from wall clock time.
Now, if the card is fast or slow, you will get dropped or double buffers of audio, but if your 6.5 second time is more important than the quality of your audio, because there is a trade-off there, then you can do that and get the effect you want.
Christopher Tate: Any more questions?
Audience Member: Can I see the double chain slide?

So you said that you have to put the same latency on the video output and the audio output, but how would you account for latency in the outputs like the sound output would have maybe two milliseconds latency, and the video output would have a half a millisecond latency?
Christopher Tate: That should be accounted for in the hardware nodes. The drivers for the physical outputs should report the appropriate latency from their nodes such that you -- what you really synchronize is not when things are rendered to physical media. What you really synchronize in the Media Kit is when things need to reach a physical output device from the edge of the node.
So if the rendering time is non-trivial, then that node should report the difference between when it receives things and when they're rendered to hardware as its internal latency, and then that would get accounted for in this scheme.
Audience Member: How are your window nodes interacting with the video driver? Is there any latency there when you produce a bitmap and step it into a window and where it actually --
Christopher Tate: The question is whether there is any latency in the video window case, between when it receives and when it renders into the window.
Currently there is "a lot" of latency. I'll define what that means in a second. And we know that we want to implement a particular solution where there is "a lot less" latency.
"A lot of latency" means that there is copying that happens when that node receives the buffer. There is a blit to the screen and then the app_server renders the screen. Ideally, the way this will happen is that there is a video window and the BBuffer itself actually has a hardware representation in the frame buffer, so that the upstream ChromeVideo node is rendering directly into the frame buffer and there is essentially zero latency on the video window.
Audience Member: Say you were working on a video application and you wanted to give your user a range of values to choose from, from "real time display" through to "show me every single frame." Now, from what I've seen you have some values in passing to say one or the other, but not in the middle or would that be some parameter passing to how much latency you would allow?
Christopher Tate: The real distinction is whether or not things need to happen in real time. If they don't need to happen in real time, then in a sense it doesn't matter. You just render in offline or you start things and then stop them and then you don't worry whether they're late. You just sort of render into the window in offline mode -- look, there is a frame! Look, there's another frame that might be 70 milliseconds late!
Audience Member: So you can only do one or the other? You can't do some halfway?
Christopher Tate: The way the media kit is designed is to draw that particular distinction, real time versus non-real time, and in real time run modes it tries very hard to keep things up to desired time constraints.
Audience Member: Can you set those constraints? Can you say I will allow a latency -
Christopher Tate: If you would like to request as feature, tolerances or other ways of parameterizing the real time constraints under which the media kit operates, Jon Wätte is sitting right here. I'm sure he can give you his e-mail address.
Jon Wätte: He already has it. What you're talking about though is more "jitter" than "latency." The same thing in the video display case. The big problem in video display is not copying a buffer of some hundred Kbytes. That's still like a millisecond or less in typical memory subsystems.
The big cause of latency in video display on computer displays is vertical retrace, which happens every 50 milliseconds or so. And that's the cause both of latency -- because if you write a pixel and the electron beam just passed the pixel you wrote it's going to take 50 milliseconds for it to display. If you write it right before the that part of the buffer, it's going to take no milliseconds. So it's both a source of latency and a source of jitter. And, video display adapters being what they are, there is not a whole lot we can do about that.
If the user has a card like a VisioWave card, which actually does hardware compression and video input and output, a node could be written to derive time from the video, the actual video signal and have a defined latency with no jitter.
So we have no good measurement of jitter, unfortunately. There may appear a measurement of jitter in addition to measurement of latency, and there can be someone working on this, depending on the amount of feature requests we get from you guys. You know, the Media Kit is still young. We have considerable flexibility in how we move it forward and we have also, you know, a ways to go to move it forward, so you guys' input really helps.
Christopher Tate: We have a lot of things that we know that we would like to add to the Media Kit. One of the things that we need to know from you is what's important. Obviously we want to do the important things first and not just our little pet projects that nobody in the real world is going to care about. So feature requests are a great thing for us to get in our mailbox.
Audience Member: I have just -- maybe I can get you to expound on how an application like Gobe Productive -- here's our plug -- could use the Media Kit to play back movies with the general transformations and arbitrary cover regions that might be covering something. I believe from listening to the various sessions it would probably have to make some sort of a filter node that would playback, say, that would be receiving the frame buffers for a movie program from down the node chain and effectively do a transformation on whatever we're going to do, doing the translation, rotation, and that sort of thing for the movie, and then masking it and then doing a blit to screen and drop bitmap to screen.
Am I on the right track, as far as what we would have to do in order to accomplish that sort of thing? Because it's a rather general kind of transformation that could happen to the movie when it's being played, in an ideal, you know if we implement things in an ideal way.
Jon Wätte: The problem is Gobe is great because it can rotate everything. I don't know how you guys did that. I wouldn't want to foist that requirement on every node writer to be able to handle arbitrary rotations.
Audience Member: Right. I wasn't even vaguely suggesting that. I think we have to write our own filter nodes, from what I gather.
Jon Wätte: So you're definitely on the right track. What I would do is I would for each buffer I get in I would then just do the rotation in one step and copy to screen. You might want to use the BDirectWindow so you can do that.
Audience Member: That's what I was going to get to next. Is that a practical thing at this point, Or should we hold off and just say, well, it's going to take that extra bitmap in order to get it on the screen? Because we actually have another problem, which is a rather complex clipping path that the movie might be intersected on the cover region on top of the movie.
Jon Wätte: The nodes do support clipping, most of them, hopefully. And you will know when they don't so you have to do the clipping for them. But in your case you know, if you're familiar with the BDirectWindow API, just do it. It's your work. If you want to just get up quickly then go into the bitmap and taking the extra copy may be worth it just to be done sooner.
Audience Member: I will ask more detailed questions offline later.
Christopher Tate: The basic answer is yes, you're on the right track. Imposing clipping, especially doing clipping after arbitrary transformations is not something that nodes earlier in the chain are going to be aware of, and in particular you might not necessarily have a trivial inverse transformation to inform them of what they should clip to.
Audience Member: Every so often we've been talking about nodes being in one application used by another application. Now, is there any way for instance, if I was using Gobe Productive and I had a final output and it was published as a node, is there any way I could in another application read the data from that node? It's not just a node, I guess. You have a complete connection over here and you want to add on to it and have a final consumer in your application. Is that a feasible thing to do?
Christopher Tate: If you know about the existence of that node.
Audience Member: Just that node? It may have other nodes connected to it.
Christopher Tate: Let's assume that there is a node that you want to get output from, whatever it is. If you know about the existence of the node and you determine that it, in fact, does have free outputs, then yes, you can connect to it and it will start sending you buffers. As I said earlier, there is no cross-application security in the Media Kit. Nodes are nodes and they're global.
Audience Member: How do you start all the nodes if you only see the last one and it's a chain?
Christopher Tate: The caveat there is you might not necessarily get anything meaningful out of it unless there is a higher level protocol between you and Gobe Productive that lets you instrument Gobe in such a way that you can instruct it to start negotiating -
Audience Member: It's really outside of the scope of the Media Kit then?
Christopher Tate: That is outside the scope of the Media Kit, absolutely.
Jon Wätte: As an aside, in Gobe there is supposedly some UI to start this thing so you might want to just take advantage of it and just tell the user "press start in Gobe" to get the stuff going.
[Recess taken]

Back to BeDC '99 Table of Contents