Working With Video and Audio

Proceedings of the May Be Developer Conference

Working with Video and Audio

STEVE SAKOMAN: Welcome to the working with audio and video session. Notice we didn't say approaching, because this is going to be work. We are going to warn you ahead of time! What we are going to talk about this afternoon, rather than the specifics that we have been talking about in the other sessions, is our general direction with media. And I'm going to give you lots of pointers to more information.

We are going to be taking a fairly pioneer-like path here into the world of digital media. I'll give you some perspective on what we are up against in the OS as far as dealing with the data rates, the high bandwidth of audio and video, an overview of the new technologies that we are going to be putting into the system, and I'll end up on a lighter note with what you can do today with video.

We are going where we feel the future is, the world of "bits in, bits out." So no more analog signals going in or out of the box for media. And in particular we believe that 1394, or FireWire as the Apple trademark is known, is going to be the media interface of choice in the coming years. And in fact, today as you were coming in and out of all the sessions in this room, the music that you heard was actually being delivered in a way that I don't think has ever been used in an event like this before. It was all coming to you over 1394, through a 20-bit external set of surround sound DACs. So it's here today. Also if you looked in the back of the room in some of these sessions you'll notice we hired someone to come and film them. The camera is a Sony DV camera with a 1394 interface. So digital is coming to the world of audio and video and electronic music, as we will discuss a little bit later.

We realize that the world isn't there yet, so we are also going to give you some support for "analog in, bits out." For audio this will be through the standard PC audio hardware, and for video through a video capture card for broadcast and composite video.

We have been laying the foundation, really, for dealing with media streams with DR9. Some of the most visible changes are in the file system: the increased size parameters for the file system (to allow these large video and audio files) and lots of performance improvements. In particular IDE for DMA has proven to be a real big win for us. We have also put some support in the memory management system for dealing with intelligent DMA devices. Getting these bit streams into the machine is difficult enough that you don't want to have the processor touching the data as it comes in. So we really need intelligent DMA devices, and the system needs to know about allocating large blocks of contiguous memory, and it needs to be notified when these things are going to change behind its back.

Some other foundational work: BBitMaps have been reworked a little bit and are now areas. They start on page boundaries and are sort of primed and ready for these intelligent DMA devices. You'll be able to display directly from the DMA buffer without having to touch the data.

I wanted to run through some sobering numbers real quickly just to give you some perspective. A lot of people, when thinking about video don't realize that each frame of NTSC (square pixels) require almost a megabyte of storage. Each second you need to find a place to put 27.6 megabytes of data and you need to, if you are going to do anything with it, find some cycles to do that.

Rectangular pixels make the situation even worse. The numbers for PAL are quite similar.

Whoa. I was going to race through a lot of slides but not quite that fast.

So let's see. Just to put some perspective on it, let's look at the kind of hardware we have on our desktops right now, in comparison to the roughly 30 megabytes per second of video streams. We have 132 theoretical megabytes per second on the PCI bus, but that's really a burst rate. Sustained rates, on the kind of machines that we run on right now, are between 15 and 85 megabytes per second.

Memcpy rates, if you remember one of our newsletters from a few months back, are between 20 to 40 megabytes per second, and if you are blitting to video memory over the PCI bus, you are probably more in the 15 to 30 megabytes per second range.

Some quick numbers, too, on hard disk rates, to give you even more perspective. One to five megabytes per second on IDE. For SCSI, two and a half to four and a half megabytes per second. When we have future support for ultra wide SCSI, then we will be in the four and a half to eight megabyte per second range.

So you see we have some real challenges here. And as I mentioned before, it's clear that we really need intelligent hardware.

There are some real common quick and dirty ways of dealing with video data rates. If you have worked with video at all you will recognize these. The easiest method is to recognize that our eye is less sensitive to color information than it is to intensity information, hence subsampling the color information results in some savings. There are two sampling standards that are quite popular, 4:2:2 and 4:1:1. That will get you down to about 12 bits per pixel.

Other common methods of reducing this huge flood of data is to reduce the image size, probably the most common is the CIF format, which is 320 by 240 pixels, that gets you down near hard disk rates. And it is also very common to just drop fields, so as you store the data, you deal with 30 fields instead of the full 60 per second, or if you are working with PAL, 25 instead of 50.

There are also a world of compression standards for video. I'm not going to really go into what all these standards are, but I thought I would highlight some of the ones that we are going to put our attention into.

We will be providing both software and hardware solutions for compression, but really the last one, Consumer Digital Video (DV) is where we feel that we need to put most of our attention. This is really the first affordable digital recording device for video. As you can see by the camera in the back of the room, it's being accepted already, people are using it to record events like this. There is a substantial quality improvement over the existing "prosumer" type videotapes. It records in a 4:1:1 format for NTSC and gets about a five to one video compression.

So you can see from the numbers on this slide, especially the 3.1 megabytes/sec data rate, that we are in good shape here as far as the capabilities of our machine to deal with DV. Part of the format is also sync'd digital audio information. You have your choice between two 16-bit channels, either 44.1 kilohertz or 48 kilohertz sample rate, or if you want some sort of cheesey surround sound, you can get four channels of 12 bits, 32 kilobits sample rate. You will see that some early camcorders only support the lower quality sound.

There are also digital data subcodes that are available with this format: titles, table of contents. If you are looking for some more information you will see web address on a lot of slides. I want to give you pointers to where you can go for more information because this is a lot of new stuff that you probably aren't familiar with. Trying to write it down here will be tough, so this will be on the web in a few days. I put a lot of information on the slides, but I will blow through them pretty quickly.

Adaptec in particular has some pretty nice introductory information on DV and 1394. Sony is probably the leader in the digital video realm right now, they have three cameras on the market. The DCR-VX700 is the one that's been on the market the longest. You can buy it at Fry's today, as well as the three CCD version of it, which has higher quality images, the VX-1000. They've recently introduced the DCR-PC7, it's a very tiny camera, and I believe it's the one they have been using the last couple days to record this event. Matsushita has also got a digital video camcorder, there is a pointer to their site on the web where you can get the details. These are the cameras we plan to work with as we develop our support.

I guess I neglected to mention, too, on that last slide, Sony also has a digital video deck for editing purposes, and the way you interconnect the cameras to the decks is via 1394. It's a very low cost interconnect. The cable itself is from a very high volume consumer product, I believe it's one of the Sega video games. It's very high performance. Currently 200 megabits per second is the data rate for the products that are on the market. And future enhancements to 1394 are outlined all the way up to over a gigabit per second.

As I said, 1394 uses a variant of the GameBoy connector. It's a type of connection where you don't need for many products to even have a power supply. A little bit later on we will be demoing this tiny camera, a video teleconferencing camera from Sony. As you can see, there is no power cord, there is just a FireWire connector on it. The bus provides up to 60 watts of power.

Devices are hot pluggable, so that while your machine is running you can come in and plug your camcorder in, work with it for a while, realize you need the deck, plug the deck in, the system will automatically recognize all this. Data transfers that are in process will be interrupted for a short period as the bus reconfigures itself, but will resume automatically.

Another neat thing is that the computer is optional with the 1394 interface, so when you are off in the field, and you need to work with your camera and your deck, you really don't need a computer with you.

There are a lot of really good technical overviews on the web. If you look at a lot of these pages, you will get good information on where products can be had, who is working on what. Let's see, not on the web is the IEEE1394 specification. You are going to need to spend some money for that, from IEEE. There is also a trade organization that is very active and you can learn about that, too, on the web. All these addresses are good places to start.

To get to a little bit more specifics on the BeOS 1394 support, we are supporting the Adaptec 1394 interface card. It's a 200 megabit per second PCI card, we have it running in BeBoxes and in a variety of MacIntoshes. They are available for developers from Adaptec right now. They are, I believe, $595. The package includes lots of extra stuff, it's really intended for people who are developing applications for Windows NT and Windows 95 machines, so you get a bunch of beta Windows software with it, some cables, a copy of the specification, stuff like that. They just announced an AHA-8945 host adapter which combines this FireWire capability and an Ultra Wide SCSI on the same board, and we also plan to support that.

A little bit further out than that, we are exploring yet another adapter card that combines these capabilities and a hardware codec for the digital video compression standard.

So where are we? We have a pre-alpha driver, you heard some of the results from that driver as you walked in and out of the room today. We have a small wrapper around the HAL code that we licensed from Adaptec, this demo is alpha code, just to let you know sort of where things are in the 1394 world. I would say that everyone is pretty much alpha with their 1394 support.

Adaptec released their beta code a couple weeks ago and we are in the process of integrating that into DR9. There is very simple library support right now for asynchronous and isochronous transfers. And we have some simple demos that will be available in the next few weeks on the web, that will show how to do some video and audio. They are basically technology demos, they don't reflect where we are going to go with the media kit. But for those really adventurous souls who learn best by getting their feet and hands really dirty, this is some good stuff to play with.

Then next big step for us is really learning from the sample apps that we are writing, there are a lot of new products and technologies here, we need to figure out what the right way to structure the media kit is. We are looking for your help, so those adventurous souls who want to dive in right away, I'm the guy to get in touch with, and I will help you in any way that I can in the hopes that you will help me, too, figure out where we want to go.

Here's some more stuff to go learn, there are a lot of protocols involved with 1394. Some of the specifications are available on the web. The first one is the digital video camera specification, which is a specification for the class of camera that we are going to do our demo with, if the demo gods are with us. It's an uncompressed video camera, it sends data in either YUV or RGB formats, and is available now through Sony distributors, Sony professional distributors, the part number is here on the slide. There are some specifications for the camera on the web. For those who are really interested in digital video format, if you have 50,000 yen in your pocket, which is around $500 I guess, there is contact information here for a very nice fellow at Matsushita who will ship you what's called a Blue Book. It's a big, fat binder that has specifications for everything from the tape format, the oxide on the tape, up to 1394 commands, digital video compression, packet format information, how the audio is shipped, the whole deal. So, more stuff to go look up!

Next, audio. In addition to the DV audio specification, which is just a stereo signal, Yamaha has proposed to the 1394 Trade Association a standard that they call MLAN. Their intent here is to provide a single audio transport format that you can use to encapsulate many streams of sampled audio. They also hope to, in their future or people to capture video right now. Some of the things that were really important to us: we want to have a single product that we supported that would work across all of the wide variety of platforms that we are running on right now. We want the capability for people to take their video, either from a TV tuner built onto the capture hardware, composite video, S-video, and we want to deal with some of the data rate problems. We wanted this solution to be smart enough to automatically scale video in hardware, to do the interpolation, to have the filters on board, so we didn't really need to touch video data with software. And we wanted to have that scaling be continuous, so that it's basically on a pixel basis. You can pick an X and a Y resolution and the hardware will automatically scale the video to that resolution.

Output formats are also important. You don't really want to have to be touching every single pixel of data in order to convert formats. So our solution supports multiple RGB and YCrCb formats, everything from RGB 32 down to RGB 8.

If you look at the driver that's posted on the web right now, you can see all the various formats we support. We support the whole raft of RGB, YCrCb, and monochrome formats. There is support for complex clipping in hardware. You can generate an arbitrary clip mask, show every other pixel if you wanted to, I guess. And again, intelligent DMA of the captured video. It's becoming clear in the broadcast area people are really interested in putting a lot of nonvideo information in the vertical blanking interval. Certainly closed captioning is something that's here today, teletext is popular in Europe, and something Intel is pushing really hard in this country is called Intercast, which is basically broadcasting HTML web pages in the VBI. That's here today also. If you tune into CNN, NBC, MTV, among others, they actually are broadcasting HTML information that is related to the show that's currently playing.

The BeOS video capture driver supports cards that are based on the BrookTree Bt848 video capture chip. And if those cards have a Philips or a Temic tuner, or a Philips stereo or second audio program decoder chip, they will work with our board.

So we are going to blow through I guess some slides here giving you details on supported cards. In the U.S., actually around the world, the ubiquitous vendor is Hauppauge Computer Works. And to celebrate the developer conference, Fry's has these on sale this weekend for $127, so these are very affordable video capture cards. There are several models, the most available ones seem to be the model 400 and 401. They have just announced a 402, which also includes an FM tuner. We don't have support in the driver for that yet but should very soon. There are also products from a company called STB Systems, and of course Miro Computer Products. We haven't yet tested the driver with these two, so if anyone has these and wants to give them a try, I personally would love to hear how they work, especially if there are problems.

In Western Europe, Hauppauge also has a number of models that work there. We have reports from several countries that the driver seems to be working with the model 405, and I haven't heard anything yet about the 416 or the 418. Again, if you live in countries that use these standards, we would love to hear from you.

And in Great Britain and Ireland, they have their own set of products, so you can hit the web, our web pages, in a few days, to get these slides.

The DR9 driver is available now. As I mentioned before, these are the models that it's been tested with. Features aren't complete yet: FM tuner real soon now, also Japanese and Eastern European tuner support are about half done. After that we will tackle the VBI data capture.

Okay. At this point I'm going to turn it over to William. And I believe he may have a special guest for us a little bit later, too.

WILLIAM ADAMS:: Hello. Can we turn up the lights a little bit? A little bit more. Okay.

What Steve was talking about up here, FireWire 1394 and all that, this is just an application that we have called Video Center, which allows us to display video in a common sort of environment that people understand. I mean, looks like a TV screen and it's intended to act like that. The FireWire camera doesn't have a tuner, so there is no point in changing channels. But as you can see, it's basically displaying 320 by 240, 30 frames a second, you know, part of the demo we normally have the ability to zoom and focus in software, but we ran out of time. So, with this particular camera you can zoom and focus, zoom out, and you can also do all of this through software. So that's basically the current FireWire driver driving this little inexpensive camera.

Now the second piece is, the TV tuner is basically the exact same familiar piece of software again. I forgot to connect the audio. You can hear the speaker.

Basically this is the Bt848 driver, in a familiar windows screen here. The only difference is I actually have control of the channel changers, so I can change the channels on the TV. So as you can see --

A SPEAKER: Jordan! Get the Bulls!

WILLIAM ADAMS:: So the general idea is, I think you get it, is that it's not very hard to integrate these different types of video sources into a common sort of architecture. The idea with Video Center is it's a ready little playground for you to plug in things. What I'm going to do is go through a little bit of the source just so you see what's going on behind the scenes with all this stuff.

I will bring up the TV tuner. You will see the first thing is -- I still don't have this quite right. I will bring up a different one. CompViewer. Okay. The first thing we see is that I learned how to make the font bigger.

(Applause.)

WILLIAM ADAMS:: Don't ask me to change it.

What I have in this particular application is a whole bunch of stuff that is kind of irrelevant. The salient points are I create a background image, basically this is just an image that I had a buddy of mine create. And I use it as the background, that's when you see that TV with the little buttons all over it. That's all that thing is. Then down here I do a whole bunch of things to set up the locations of the buttons and so that when I click on them I can actually perform some actions, so that's all that's going on here.

Then right here I create a view, which is a VideoView, there is really not much interesting about it, it's just a view that you can display a bit map into. The only interesting thing about it is one feature it has. This is a multithreaded application, so what happens in this VideoView, is once it's attached to a window, it creates a thread and this thread calls this particular function here, which is called drawer, and I pass it myself as a parameter. And if we look at the bottom here, drawer basically goes through a loop where it says well, while it's not time to quit, if the view that you pass me has a video source, and that video source is running, and I can get a lock on the window, get the current bit map from that video source, if I actually got a bit map, then -- I don't like this kind interface -- but anyway, if you have the bit map, then go ahead and draw the bit map into the view, unlock the window and go on. So this is just running in a loop over and over again. Get the next bit map displayed, get it displayed. This thing doesn't know about anything really, than this video source is, the value returned here is a bit map stream. And we can look at that bit map stream object. It's just a header file, it's basically a protocol, there is no implementation in here. It's just a bunch of calls that say well, a bit map stream should conform to a whole bunch of things, in particular you should be able to start it, stop it, get the next frame, check if it's running, set brightness, channel, audio source, all these sorts of things. So this is just a protocol. There is no functionality in here.

But what I have implemented is two different subclasses of that particular object. In this particular case we are using the Bt848 source, so you can see the implementation, forget the init for now, we have start method, this is how you start on the Bt848 driver, this is how you stop it, you can set the locale for the tuner, you can change channels, all that sort of stuff.

If we go and look at the FireWire driver, the FireWire subclass, similarly it has a huge initialization section which we'll cover a little bit, the zooming and focus of the camera, start, stop, get next frame, and all these other things that it doesn't implement, or at least aren't implemented now. You can't change channels, there is no audio source, there is no locale. And that's about it.

And then in my main application, back here after I create the background image that's going to be displayed, put that VideoView in my window, I basically pick whichever video source I want to use, either a Bt848 source or a FireWire source, and I tell the VideoView that this is the source and away it goes.

So there is not much to it, it's pretty easy programming. The complexity comes primarily in these classes, the initialization of these sources.

In the case of the Bt848, you basically have to open a driver, which is /dev/Bt848, you have to create a bit map that you are going to be drawing stuff into. I creat a separate buffer that gets the video DMA into it directly, then I set up a bunch of configurations to say whether I'm using composite, tuner, what format, locale, brand of tuner, size, brightness, all these sorts of things, and then in the start method I do specifically an I/O control that's specific to that driver, I initialize it with the parameters I set on it and then I tell it to start.

So it's pretty straightforward, but you have to know for the particular driver that you are using what this exact sequence of initialization and start and stop are.

For the FireWire driver, it's a bit more complex. But it's the same general idea. And in the initialization, this one is kind of huge, in FireWire you create, you basically create channels of bandwidth. You say I want so much bandwidth and this is isochronous transfers, which means they are going to come on a time pulse. In this case I'm setting up and saying I want so much bandwidth, I'm going to have 30 frames a second of this size of packets, and that's what all these little things are doing.

So you set up various control channels, you are basically talking to pieces of memory or pieces of address space and saying, okay, I'm going to set this value in this particular location in this address space and that means change the contrast, you know, or whatever the particular thing is. So you do all this setup.

And then way down here we have the same sort of routine wherein the get next frame the way this particular one works, is I wait for an isochronous transfer to complete, all of the mumbo-jumbo down here is doing color space conversion from YUB to RGB, painful but necessary, then I fire up a new request and I copy, I just return the bit map and that view draws in.

So very similar functionality but the implementation is totally different. From a high level standpoint you, as a developer, you are probably going to just grab one of these objects and use it, you don't really need to implement all this for these particular, these two particular sources, but the code is there and you can change it however is necessary for your particular application. So I think that's --

A SPEAKER: How is performance --

WILLIAM ADAMS:: Say that again, sorry.

A SPEAKER: Right now you are just displaying on the screen, how is the capability and performance of taking it and putting it on the hard drive so you can edit it later.

WILLIAM ADAMS:: It's totally dependent on your implementation of buffering. Oh, sorry, Let me back up. The question is, it displays on the screen, that's nice, how about saving it to a hard disk? The answer is, it depends on your hard disk performance, of course, how many buffers you allocate in memory, so the amount of streaming that you are doing, and can you keep up. If you had a super fast hard disk, then you are not going to have a problem, you write it out as soon as you get it.

If you have a slower hard disk, you have do worry about okay, I have to preallocate 20 megabits in RAM, so I can trickle it out to the hard disk. There is no solid answer on that other than it depends on your configuration.

STEVE SAKOMAN: It also depends very heavily upon the parameters you give to the hardware scaling and decimation, if you give it the full for NTSC 640 by 480, probably the only device in the machine that's going to be able to deal with that is if you DMA directly to the screen. Once you start scaling it down and capturing only fields instead of a full frame, then you get into the realm where theoretically you can write it to the disk. And we are going through the exercise right now of doing all these things to find out where the bottlenecks are.

Yes?

A SPEAKER: Do you take advantage of hardware overlay to display it directly to the graphics card?

STEVE SAKOMAN: I ask Scott this every day. He sits across the aisle from my cube. We don't have overlay support yet. Certainly with the driver as it is right now we have incredible flexibility for where you can send the DMA data, so in fact you can send each scan line to a different place if you wanted to, but probably the most rational use I have seen is people who want to send one field to the screen so you can see what you are capturing, and the other field to a buffer where they want to store or process from that other buffer.

A SPEAKER: If I want the right driver for people to use my hardware card, how do I go about that? Do you have standard API, the JPEG compression card --

STEVE SAKOMAN: If you want to learn about writing drivers --

A SPEAKER: I also want it so everybody else can use it, too.

STEVE SAKOMAN: Exactly, yes. So we will take first things first. To learn to write a driver, I think there are some sample drivers on the release that we gave you now. You will probably have questions. So at that point I'd say, you know, you will probably write to these guys, who will then probably have you talk to me.

So then there is the next question, which is how do you glue it into the upcoming media kit architecture that sort of pulls all this stuff together? You want to talk to me about that. I think I'm in the information collecting mode right now where I really want to support what you guys want to do. A lot of the underlying stuff is new, and currently running on much more expensive platforms. We want to figure out how to bring it down to affordable stuff like this. And use ultimate technology. So we would love your help.

A SPEAKER: Are you doing any kind of thinking or preparation in terms of getting ready for the arrival of HDTV, will FireWire handle the bandwidth?

STEVE SAKOMAN: FireWire has incredible bandwidth, as I mentioned, it's going to be up to 1.2 gigabyte range in the next few years. So yes, I think that there is the capability, I think the whole HDTV standardization access, how long it takes to adopt, then will probably dive in less quickly there, than we will with the digital video stuff.

A SPEAKER: What's the status of multi-channel audio support?

STEVE SAKOMAN: Well, we are laboring on that. As I mentioned, we plan to follow the MLAN specification, which does talk about multiple channels. One of the things we hoped to do for you here today was to demonstrate DTS encoded surround sound being transported over 1394. We got real close. But we have dropouts, so we decided not to do that today. But we have it in mind, and we are working both on delivering those kinds of streams, encoded streams out FireWire as well as talking both to Dolby and DTS about how we can become an authoring platform at some point in the future for doing the encoding, also.

A SPEAKER: Are there any current support plans for live --

STEVE SAKOMAN: I'm sorry, I missed the question.

A SPEAKER: Is there any current support or plans for live audio Alpha support channel or video with compositing?

STEVE SAKOMAN: The question is, are there any current plans or future plans for live Alpha support for the video?.

Honestly, it's not something we really tackled deciding yet. So it's up in the air. If you have input on that, steves@be.com is the right place to send it.

WILLIAM ADAMS:: Any other questions? I guess that's it.

Thank you.

(Applause.)

Transcription provided by:

160 West Santa Clara St.
San Jose, California
408.280.1252

Copyright ©1997 Be, Inc. Be is a registered trademark, and BeOS, BeBox, BeWare, GeekPort, the Be logo and the BeOS logo are trademarks of Be, Inc.
All other trademarks mentioned are the property of their respective owners.
Comments about this site? Please write us at webmaster@be.com.
Icons used herein are the property of Be Inc. All rights reserved.