August 1997 BeDC: Writing Drivers (Extending Track)

August 1997 BeDC

Writing Drivers (Extending Track)

ROBERT HEROLD: Hi, I'm Bob Herold, and I'm going to talk about writing device drivers for the BeOS. So if you're here to hear something else, you're in the wrong place, but if you're here to hear this, please stay.

So, why write a device driver? Device driver writing is not something that is really for the faint of heart. It can be a bit more difficult than writing an application, just because you are actually writing something that is part of the kernel. But so why would you want to undertake such a thing? A device driver is really something that is, like I said, part of the kernel, so, as such, it provides a service that is available to all applications. The main reason for writing a device driver is to interface hardware to the system and to make that hardware available for use by all applications.

The function of a device driver is to manage that hardware in an orderly way so that many clients can use it, many applications can use it, and that the hardware actually works in a correct way and works right. So the idea is, make your hardware work with the operating system.

In the BeOS, there are several different types of device drivers. I'm only going to talk about one of these today. There are just plain regular device drivers for things like PCI cards that you would plug into your Mac or your Intel machine or your BeBox. There's a specialized kind of device driver, which we call a SIMM, which is something that is specifically for managing SCSI controllers.

So if you have, say, an extra-fast-and-wide SCSI controller, you would not be writing a regular device driver, you would be writing a SIMM, which would plug into a layer of the kernel that we call the CAM, common access method. Those are very specialized just for writing -- just for managing SCSI controllers, but a lot of the same concepts that I'm going to be talking about today would apply in writing a SCSI SIMM.

There are also graphics drivers which are not actually part of the kernel but are practically just as complex to write as writing a normal device driver. Those are specifically just for managing graphics controllers and providing functionality for the application server. But I'm not going to be talking about those today. I don't know if that's the topic of a separate talk, but it certainly could be if it's not. So I'm going to be talking about just writing to regular device drivers today.

So, first, just how are device drivers used from client programs? It's a very simple interface; it's the same interface that is used for the low- level access to the file system, and that's basically POSIX calls: Open, Read, Write, Iseek and Ioctel. So it's very much like UNIX in that way in that you just make POSIX calls to open and access your device and close it later.

In terms of what a device driver actually is, like I said, it's an extension to the kernel. We use the add-on, the notion of an add-on in the kernel to basically have a separately compiled object file that gets loaded at run time and linked with the kernel. In loading and unloading -- and at run time it's a very dynamic system, in that you can add device drivers to the BeOS, and they get picked up and recognized and can be used right away when you have actually added it to the file system.

So it's a very dynamic system that way. And the fact that they're loaded and unloaded at run time means that they're only actually consuming resources in the kernel when they're needed, and when they're not needed, all the resources that they used are freed up for use by the rest of the system.

And how they're actually accessed, the POSIX calls, which clients use to call device drivers, go through a file system independently or in the kernel, and then once the kernel decides, 'Oh, this is a call for a device driver,' it then gets filtered down to the device file system, which then in turn vectors all the calls off to a particular driver that you have written.

The other way that they're called from the kernel is, of course, when actually interrupts occur. Drivers have the ability to install interrupt handlers, so that they can field interrupts from your hardware devices, and the kernel will of course just patch those to your driver once it determines where the interrupt is actually coming from.

There are a couple of places where the kernel looks for device drivers for which to actually publish device names for. You can provide a device driver on a floppy. You can also put it in your home directory; and that's typically where, if you are developing device drivers, you would be putting them. And then device drivers which we've written we put into the BeOS directory, the boot volume. And the BeOS directory is primarily intended as sort of a read-only directory of stuff that Be has provided, and then any additional functionality which you as developers would be writing would go into your home directory. So there are three places where it looks.

Often there is some confusion about the difference in terminology of a device versus a driver. You often hear them referred to as "device drivers," but in the way that we talk about them, the driver is actually the object file, the PEF container with all the code. And a device is a name which is published by the driver and which is referenced, which clients actually reference, to get the functionality of the driver. So the kernel loads the driver and asks the driver to publish a bunch of names for devices that the driver can support, and then the client program would just go and use the opening call with the device name to actually open the driver.

Under the BeOS structure of a device driver, it basically has five global names that when the kernel loads the driver it looks for. Two of them are required. One, the first one is Publish Devices, which is the means by which the kernel finds out the names of the devices that the driver can support. And then the Find Device call is a call that the kernel uses to get information about a particular device that somebody is trying to open. Optional Entry Points include an INIT Hardware call which is called one time during the initialization of the kernel, and it gives the driver an opportunity to put the hardware into a quiescent state and do any kind of one-time initializations on the actual hardware device.

And then the two calls, INIT Driver and UNINIT Driver, are a matched pair. Every time the kernel loads the driver, it will call INIT Driver and give the driver a chance to do any sort of one-time initializations or set up of data structures or anything like that it needs to it. And the corresponding UNINIT Driver gives it a chance to clean up any of the resources that it's consumed while it's been loaded.

FROM THE AUDIENCE: Question. When you call the INIT Hardware, does that mean the driver is loaded?

ROBERT HEROLD: I'm sorry, I didn't hear the question.

FROM THE AUDIENCE: Is the driver loaded or unloaded when the INIT Hardware is called?

ROBERT HEROLD: The driver is loaded. The first time it is loaded, INIT Hardware is called, and then UNINIT Driver is called. And every subsequent time that the driver might be loaded, only the INIT Driver is called. So that's just -- the idea is that it's a one-time initialization. Basically, since drivers get loaded and unloaded, and they actually can be loaded and unloaded quite often, there is no state that gets saved. After it's unloaded, there's no state that gets saved, so there's no way of knowing whether this is the first time you have been loaded. So the idea of the INIT Hardware call is saying, this is the first time, and here is a chance to go and put your hardware in a quiescent state.

When the kernel goes to -- when it goes to load a driver and when it goes actually -- when it uses the Find Device call, the Find Device call returns a pointer to a structure which has a bunch of hooks for that particular -- for that particular driver -- for that particular device, excuse me. And the hooks that get returned are pointers to functions that will handle Open, Read, Write and Close, and those are fairly -- just sort of straight-through mappings from the POSIX calls.

And there's this other call called Free which is actually called when the last close has been done and after all I/O that has been occurring in the device is complete. It's a fairly subtle distinction that normally doesn't matter, but as we'll see in one of the sample drivers that I'll show you a little later, it actually can be quite useful to have the notion of separating the operation of closing the device from the operation of actually being finished with all I/O operations on the device.

For instance, you might have a serial driver which is opened and then the file descriptor is duplicated, and you could have one thread which actually goes and closes the file descriptor and another thread which still has an active read going on and is waiting for that read to complete. So the Free call will only be called by the kernel once both the close is finished and once the actual outstanding read is complete. So it's a way of ensuring that all operations which are active in the driver have actually completed before disposing of all the driver's resources.

Now, the way that a single driver can handle multiple devices, like let's say you write a serial driver, you have four serial ports, you want to be able to have a single driver manage all four of those serial ports with the same code. So we have this notion of a cookie, which is something that when you call Open, the Open call returns basically a pointer to the kernel, and the kernel in turn passes that pointer to all the other hooks for that driver, for that device, and that's the way that the driver actually is able to distinguish between all the different devices that it might support.

So it's just a private data structure that's created by the driver, passed back to the kernel, and then in turn passed back to all the other hooks. The kernel doesn't attach any semantics to it, it doesn't touch it or manipulate it in any way, but it's just a way for a driver to actually manage sessions with each of the devices that it might support.

So we might as well just dive into an example right here. This is going to be a very trivial example. There's actually an implementation of this up on the FTP site as well, but it's just a very nice, simple driver. This is a driver that all it does is, when you call Read, it just basically zeros the buffer that you pass to read into. It doesn't support write, it doesn't really do anything else, it's just basically a source of zero data. So we'll go through the process of actually -- well, I'll show you the driver, I'll show you how to build the driver, and I'll show you how to install the driver and how to test it. So it's just a real tutorial on how to go about doing that.

So here I have the actual source for the driver file. Let's go to the top. First of all, we have to, of course, include all the standard includes that -- I'll show you where stuff is. In Kernel Export and in Driver.H is most of the defines that you need for actually writing the driver. Driver.H contains the data structure which the driver needs to export, and Kernel Export contains definitions for all the stuff in the kernel that a driver can use in its implementation.

So as we go down a little bit, here are four declarations for all the hook functions that the driver will be telling the kernel about. And here is the actual structure for the hooks that the driver will be passing back to the kernel. You can see you have Open, Close and the free cookie call, the call for doing Control and Read and Write.

This is a very simple driver. It only has one device name that is going to publish, and they're going to call it the MyZero device. So it will be available as /dev/MyZero, and we'll see it when we actually test this driver.

So here is the first exported call from this driver, it's the Publish Devices call, and it just returns a pointer to that list of device names which we saw up here. When the kernel is interested in opening a particular device, it calls Find Device on this driver to see if that particular name is one that is supported by this driver.

So the implementation of this just checks to see if the name that the kernel is interested in opening, if that's patched in as an argument, actually is on the list of the devices that we export or that we implement. And if so, we actually return a pointer to that hook structure telling the kernel that this is in fact a device that we know about. Otherwise we just return NULL saying, "No, this isn't a device that we know about, this must be somebody else's."

So this is the first hook function, and this is the Open call, and basically, we make sure here that the client attempted to open this as a read-only device. Because it is in fact a read-only device, writing to this doesn't really make sense. So this is just the way we check it out.

And Close, we're not going to do much here. Basically, there's not much to do for this device in terms of closing it, it's sort of a virtual device, it doesn't actually correspond to any real hardware. Similarly, for the Free function, it doesn't correspond to any hardware. There are no resources that we've used from the kernel in terms of areas or semaphores or anything like that, so there's really nothing to do here.

There are not really many control calls you can make to this device. There's only one really interesting call, which is the Read call, and all this does is zero the buffer that's been passed in, and we just -- one interesting thing to see here is that -- there are two interesting things. We ignore the position parameter, just because it doesn't really serve any use to what we're trying to do here, which is just zero the passed-in buffer.

And the other thing we notice is that we use the Memset function, which is a standard C library function. When you write a device driver, you actually are linking against the kernel when you go to actually build the driver. The kernel exports both a bunch of services that it provides as well as most of the standard C library. So all of the standard C library functions are available to you when you actually go and write your device driver. Memset is certainly a useful one here, so we just go ahead and use it.

And then for the Write call, again, since we checked to make Open call whether this was a read-only device or not, the kernel will handle that, will honor that protection and never call Write. But just in case it does, we have actually gone and implemented it as well.

So now we've written that, let's go ahead and try to actually build a project with it. And the reason I'm going to go through this is that it's a little bit trickier than just building your standard application, there isn't yet a template under Metrowerks for creating device drivers. So I'll just go through the steps here fairly quickly and just show you the different things that you need to do.

First of all, you need to make a project, and let's put it in the right place. And this is our device directory, here is our file, and let's call it zero.proj. Okay. There's our project file. And we're not building an application, so one of the first things we have to do is get rid of these libraries which normal applications would link against. We're going to be linking against the kernel, so we have to get rid of these guys. Then we have to go and add our source file to the project. So there we have our source file. Now we have to actually add the kernel. And here is where farther down the line I think Metrowerks will be able to provide a nice template for doing this, but for now it's a little bit tricky what we have to do.

So let's go find a kernel to add. So. We're running on a Mac, so here is the Mac kernel, it's in boot/beos/system. So we'll actually just copy this over to the directory that we're working in. So there we go. Close that. When you write a driver, you link against the kernel, but you don't link against the actual kernel, you link against an abstract kernel name, _kernel_. That allows us to have different name kernels on different architectures: We have one for the BeBox; we have one for the Mac; we have one for the Intel platform.

But when you write a driver, you're just linking against this abstract name, and the kernel automatically resolves that abstract name into whatever the kernel happens to be running. So what we have to do is, you actually have to link against the abstract name. So we copy the kernel into our directory and call it _kernel. This is just sort of a work-around for a slight problem that we have now where you can't create a link here to the actual kernel, that just happens to not work, so this is the way we do it right now. Eventually once we have sort of a project stationary for doing device drivers, all this will be set up for you, you won't have to worry about it. But for now that's what we have.

So now we have to actually go and add the kernel to our project. So we'll just go ahead and add it. Now, at this point you might say, "Oh, let's just go ahead and make it and see what happens." And we see that it's still trying to look for some of the calls that an application -- that are typically for an application developer. So what we need to do is go and tune up our settings to tell it that we're really developing a device driver.

So a driver is really just like a shared library, it's just an add-on. And when you say it's a shared library, that basically says, it's not going to have a main function, it's not going to really try to look just like an application, it's just a piece of code that's going to be linked to something else. And we'll call the file name our zero_driver. We need to change the name there and there.

Now, we need to go and tell it that -- it doesn't really have any entry points, so we'll just get rid of all these. And we also need to say that -- we need to export all the globals that are in the file. The reason you need to export them is that -- I'll get to you in just a second -- the way the kernel decides that a particular file is a driver is that it looks for one of those five calls: INIT Hardware, INIT Driver, UNINIT Driver, et cetera. And the only way that it can find out about those is if they're explicitly exported from the file. There's another way to do this actually in your source file using the export pragma, but for purposes of speed, I'll just do it here and export everything.

What was your question?

FROM THE AUDIENCE: When you change the file type, the MIME up at the top for the preferences --

ROBERT HEROLD: Can I?

FROM THE AUDIENCE: No, when you just did, change it to a shared library application, do you want to do that, the application?

ROBERT HEROLD: I probably don't, but I'm not an expert on Linux, so I really wouldn't even know what to change it to. I can find out for you if you like. This works, though.

Okay, so I think we're ready to go ahead and link it again and see if it works. Yes, it seemed to work.

Now, let's go back and try to install our driver. So here we have our driver, and due to the slightly buggy icon handling in the system, it happens to have gotten the same icon as our kernel, but we'll ignore that for now. And to install it, we'll put it in one of those directories that I told you about earlier. So that was in like the home directory, config, add-ons, kernel, drivers. So we'll copy it there.

Now, let's go see if it actually showed up. So I have over here -- in another work space I have a terminal window. And look, there it is, our zero_driver is right there. So our driver, just by virtue of doing an LS in the /dev directory, which is the sort of virtual file system where drivers are accessible, just by virtue of looking in there, the kernel has gone and looked through all the possible places the drivers could be, found this driver, and actually installed it into that hierarchy. So there we have it.

Now, let's go and see if we can actually use this driver. So go back over here. I've written a little test program which we'll go look at here. Can you guys see that? No? I'll embark on a little IEEE adventure here to try to change the font. There we go. How is that? Is that readable? Okay.

So this is a little program that's going to just go and test that our driver works. So what it's going to try to do is, first it's got a large buffer, it just initializes that buffer to Oxff. It tries to open the device, it gets an error message. Then it tries to read a bunch of zeros from the device. If that fails, it gets an error message. And then it just goes and checks to make sure that actually that it worked, that it actually zeroed the buffer. So there it is. Let's go ahead and make it. And so we'll close this up and go back to our terminal window.

Now, so there's the application which we just built, and let's go and try to run it. And here, lo and behold, it worked.

So that's a fairly simple device driver. It just shows you the call -- it just uses the sort of minimal calls, Publish Devices and Find Device, and it just shows you how to walk through the process of actually building a device driver. So you can see, it's not like the simplest thing to do at this point, because you have to do all the manipulations on the project file, but it certainly is possible to do from the BeIDE. So now let's go back to here.

Now, most drivers won't be that simple. Typically for the hardware that we're interested in, we'll be writing device drivers for devices on the PCI bus. So there are a couple of tricky things to handle there. One is just actually finding your device, sort of how to browse the PCI bus and look to see if your device actually is out there.

Often PCI devices will have base registers that map large chunks of physical address space or that define large chunks of physical address space that actually manage what the device is trying to do. And what you need to do is actually map those into the virtual memory system so that they're accessible from your driver and also from anybody who is calling your driver. Often a driver for a PCI card will have several logical devices that it's handling. For instance, it's a serial port card, and it has four serial ports that it's handling, but it's a single driver.

Also, in a more complex driver, you will have synchronization issues, and you'll start using semaphores to go and manage the interaction between the actual client call to read and the actual interrupt handler that shows that a read has been completed. Then you'll actually have to go and set up interrupt handlers to do all these things. So I have a sample driver that we'll go through here and we'll actually see how a bunch of these things are done.

So let's go back to here. So let's go and -- this font is not visible. There we go. Make it really big. Okay. This driver is for an imaginary PCI device, the idea here being to demonstrate the concepts you need for actually writing a device driver and not getting lost in the actual details of real hardware. Because typically real hardware requires lots and lots of complex stuff which tends to get in the way of actually making a presentation of the structure of device drivers. However, this is a fully functional device driver for this imaginary device, I've tested it and actually run it, and it does all the things correctly, so it's a good demonstration vehicle.

Let's see. It actually supports up to four logical devices, and the idea is, it's actually a bus/master PCI device. So this imaginary device is able to read and write your system memory. When you actually are doing a write to the device, the actual hardware will go and read your system memory and write it out to whatever piece of hardware it's controlling. And when you're doing a read, it's going to get it from the hardware and actually plunk the data in the system memory.

So you're going to have to deal with issues of physical memory in your system memory that needs to be locked down; i.e., you have to tell the virtual memory system that this memory needs to be resident, so that when MyDevice goes and takes the data from memory, the memory is there, the actual data is there. So we need to deal with the locking issues of the virtual memory system.

And also like a lot of hardware out there, it's a bit funky, in that you need to do three things to actually get the device to do something. So you actually have to -- you write a register to tell it what to do, then you write the same register to tell it an address, then you write the same register to tell it the number of bytes.

Now, the reason for constructing that kind of convoluted example is to show the notion of being able to synchronize access to that particular piece of hardware. So that what you'll actually be doing there is showing the synchronization primitives for actually locking access to the hardware, doing the three transactions, and then later unlocking access so other people can get at it.

So the first thing we want to do is actually find our device. So let's go and look at the code that actually goes and finds our device on the PCI bus. Okay. Here is the utility routine which actually we're going to be calling in several places. This is just a sort of generic "go find a device on the PCI bus." And the key call to look at here is getting PCI info, which basically iterates through all the PCI devices one by one and passes back a large data structure, PCI info structure, which tells you everything you need to know about the device.

So you go, one by one, you would write through all the PCI devices looking for yours. Here for this utility routine we pass in the PCI Vendor ID and the Device ID. Then we just go and check to make sure -- check to see if this particular one is ours, and if it is, we break out of Return True. If ever we get to the end of the PCI devices and we still haven't found it, we Return False. So that's how you go and find your PCI device.

Then once you have your PCI device, you need to actually map the base registers, which have been defined for your PCI device. Actually, I should ask here, how many of you are familiar with PCI at all? Okay. So this talk of base registers might be confusing to you. So just a quick overview.

PCI is, of course, a bus that's actually used on a Power Macintosh and on Intel platforms, and it defines for every device -- every device has the ability to define a set of -- basically, it has the ability to make a request for a bunch of address space that clients can use to actually manipulate the device. So it has a register called a base register which says, "I need 1 megabyte of address space to actually do the operations on my device." And the BIOS, when it starts up the computer, actually goes and looks at all the PCI devices, finds all the space allocation requests in the base registers, and then goes and lines up all those base registers in the physical address space so that each one of them has a unique physical address. And that's how they essentially do plug-and-play on PCI devices.

Later when the kernel comes into play it actually has to go and use the NWU and start actually accessing all that physical memory. It goes through all the base registers and finds out where they all are, then it goes about and maps them. So one of the jobs of the device driver is to actually go and set up that mapping for that physical address space request.

So let's go and look and see how that's done. Like I said before, the INIT Driver call is the one that gets called when the driver is loaded, and here is where we're going to actually set up our mapping for the base register. So the first thing we do in an INIT driver is just make sure that -- we look and make sure that the hardware is actually there. It's a bit of a redundant call here, since we actually did that in the INIT Hardware routine, which I scrolled by but didn't talk about. Yet nonetheless we'll do it here, for safety's sake, just to make sure that we're dealing with a PCI device that we know about.

So once we found our PCI device, in this data structure that gets returned from looking for the PCI device are the base register requests which have been made by the device and set up by the BIOS. So we go and find the physical address that the BIOS has assigned to our base register, and that's the address that we're going to use in setting up the mapping. What we have to do when we set up the mapping is make sure that the physical address is page aligned, so the first thing we do is actually round it down to the nearest page and then clean up the size based on that.

Then we have to make sure that the size of the chunk of physical memory that the device has requested is actually a miniature number of page sizes, because that's the unit of allocation that we can do with our mapping operation. So we round it up to the nearest page boundary. Then we actually go and do the mapping.

So here's the call to the kernel that says, "Here is a chunk of physical address space. I know it's just used for memory-mapped hardware, so just map it for me and tell me where you put it." And the kernel returns the base address of the virtual mapping for that physical area in this parameter here. And so once we have completed this call, then that base register is fully accessible by the rest of the driver.

But here in our INIT Driver call, we have actually created an area for accessing that base register. So that's a kernel resource that we've consumed here, and we do it every single time the driver is loaded. One thing that's very polite to do is to not sort of wantonly consume resources and not return them to the kernel when we're done with them. So whenever the driver is unloaded, we actually go and delete that area and thereby make that address space available for other drivers. So that's what we do in the UNINIT Driver call.

So that's a very fast description of a pretty complex topic of how to go and map physical memory for your driver. But it's something that if you're writing PCI device drivers you're going to have to do every single time.

Now, how do we go and handle multiple devices in our driver? Well, as we saw before in that very simple driver that we did, the Publish Devices call is what the kernel calls to ask the driver the names of the devices that it supports. So we just return our pointer to a list of names. Now let's go and see what that list of names is. It's up here.

So here we've decided that we're going to call -- we're going to name these devices -- we're going to put them in a subdirectory called Fake, and we're going to name them device 1, device 2, device 3 and device 4. Just by virtue of having this path in the name, the kernel will actually create a subdirectory under /dev called Fake, and it will actually put all these names inside there. So client programs will just be opening /dev /fake /device. So it's a very simple way of actually exporting lots of functionality for four devices just by virtue of naming them like that.

FROM THE AUDIENCE: How does the system handle the possibility of conflicting device names?

ROBERT HEROLD: Right now it doesn't.

FROM THE AUDIENCE: Okay.

ROBERT HEROLD: What it does is actually the second -- I mean, it's first come, first served. So the guy who gets the name first is the one who's going to actually stay.

So we're down here, Publish Devices. Now, when the kernel wants to actually go and open one of these devices, it's going to be calling the Lookup Device name, just to see if this is a device that is supported by this driver. And here the code gets a bit more complex than the earlier sample. Here we actually just iterate through the entire list to see if the name that the kernel has passed in is one that we support. If it is -- oh, wait, this is a different one, sorry.

This is an internal routine. It's a work routine that's actually used by a Find Device, so we here see that Find Device calls the routine to see if it is one we support. If it is, it returns the pointer to the hook structure for that device; otherwise it just returns null saying it's not one that we support. As we'll see later, we provide this as a work routine, because this will actually be useful in distinguishing between the different devices that are supported by this driver.

So let's go back to here for a minute. Handling multiple devices, we're still talking about that. This is a single driver, so when you -- when the driver gets opened, the name of the device that's being opened gets passed to the Open call. So you will get passed/fake or you could get passed/ fake/device 1 or fake/device 2. And you probably -- the way to distinguish which like piece of hardware you're actually managing there is to actually look at the name and, based on the name, make a mapping from the name to the actual piece of hardware.

So here in the Open call we do exactly that. We actually go, here is the name, it gets passed in. We go and look it up. And you recall that that Lookup routine sort of went through the list of names and said: Is it the first one? Is it the second one? The third one? The fourth one? And it has returned to the number depending which one it is. And that's how we figure out which one it is.

So now we know that we're managing, say, device 2. I have made the assumption that only one client will be actually using device 2 at a time. That's a simplifying assumption. So I'll just enforce that by having a bit field of devices that have been opened and just make sure that this is the only one that's opening it.

This is another interesting call that's exported by the kernel which is an Atomic Or function. Basically, it ensures that you can take any two values, or them together, and it does so atomically and returns the previous value of one of them. Here it is, Atomic Or. You pass in a point or two the thing you're going to or into and the value you're actually going to or, and it returns the or value of this thing before the or occurs. So it's a nice way of doing synchronization primitives, like this one where you want to basically set a flag atomically and see what the or value of the flag was.

Now, here we are going to actually allocate the cookie which is going to be passed back to the kernel, and that's the data structure which is going to be used to manage the operation of this device. So here we go and set it up. We allocate it. We go once again and find the device just to get the PCI information for this device. And we do a little bit more setup on the actual device record, the cookie.

And here we're going to go and create some semaphores which we're actually using to manage the device. This one, the first semaphore is going to force exclusive access to the hardware. Because, remember, this has funky hardware, and we have to do three accesses to a single register. And you want to do these three accesses atomically. So we create a lock, such that you get the lock, you do the three accesses, then you release the lock. So here we go and create a semaphore to actually do that. We give it a nice name that actually describes which device it is and what it's for and we create the semaphore.

Here is another important thing here, which is to actually set the owner of the semaphore. Typically, when you call Open, you're calling it from an application. So the thread and the team that is calling Open is the application thread. And when you go and create a semaphore by default, its owner is set to the calling team. However, that's not good for a device driver, because you want that device driver to persist beyond when that team actually goes away.

So if like the application quits, you still want the device to be there just in case there are other Opens for it. So you have to take all the resources that would normally be assigned to the application and actually transfer their ownership over to the kernel. And that's what this call does right here; it basically takes that semaphore and says, "This doesn't belong to the application, this belongs to the kernel. So when the application quits, don't go and destroy all the semaphores that were attributed to that application."

Then we're going to have another semaphore which is basically going to be used to manage the interaction between reads and the interrupt handler which actually signals the completion of a read. Here we're actually creating the semaphore with an initial value of zero, such that when you're actually doing a read, you block right away, and then the interrupt handler is the one that releases the semaphore signal line, signaling that the I/O is completed, and allowing the read to continue because it's finished.

Here we go and install an interrupt handler for this device, so that's what this calls for. The one thing to know here is that for a PCI device, the actual number of the interrupt to use in installing the interrupt handler is passed to you in the PCI information record. So when we go and look up the PCI device, in the record of information that was passed back is the actual interrupt number that gets passed in to install the interrupt handler. Once we have installed it, we're set to go and tell the kernel, "Okay, enable this interrupt to start happening on this particular interrupt line."

So just we've covered like how to set up semaphores. We haven't actually covered how they're actually used in implementing a device driver. So let's go down and look at the Read call.

So what I've done here is I've factored both the Read call and the Write call into a common subroutine. First of all, in the common subroutine it's going to be called View Device I/O. But we're going to pass an operation that says Go and Do Read or Go and Do Write. If we go and look at that, the code which actually does the I/O operation -- here we go. Okay.

One thing to note here. Debugging a device driver can be kind of tricky, because you can't just use Print Out and get output on your terminal window and things like that. This is something that's actually part of the kernel and, as such, most of those services just aren't available. What we do have and what we use a lot for debugging device drivers is the ability to actually generate serial output out of one of the serial ports, and you can either take that and loop it back to the computer and actually connect it to user for output, or you could actually set up another machine to sort of capture it all and view it there.

So here we've defined a macro for one of the kernel calls, which is actually d Print Up, and we define a macro for it so we can actually get rid of these things when we're finished debugging our driver. But you actually go and print out things where you're printing out what parameters were passed in and things like that. And this would show up on the serial port as a lineout on the serial port.

So the first thing we do is check to see if the client made a bonehead call and didn't actually want to do anything. So if it said that the size that it wanted to read was zero, then why bother doing anything?

So now here it gets a little bit tricky. Remember that this is a DMA device, it's a bus/ master on a PCI, so it can actually read directly out of system memory. What we have to do is take the buffer that was passed into the read and lock it down. So here is the call where you lock stuff in. What this does is it actually ensures that the entire buffer is resident in physical memory, that none of it has been paged out to disk by the BM system. This is, of course, important when the developer actually goes and starts doing BMA accesses.

A couple of flags that we pass in of note. One is to tell it that this is intended as an I/O, so we have to tell it we're doing I/O. And the other thing we have to tell it is that we intend to be reading from the device and that the device is going to be actually taking the data and writing it into memory. The reason we need to do that is the BM system keeps track of what pages have been modified in physical memory, and it uses that in managing virtual memory. And if there's some device that's actually going in and writing in data behind the processor's back, it has no way of knowing that those pages have been dirtied, as it were, written into. So we actually go and tell the kernel, "We're intending to read from the device and write into memory and treat these pages as dirty." So that's what that flag is for.

So we go and we lock the memory. Then we go and use that semaphore for the hardware lock to get exclusive access to that funky little register where we do three things. Then we tell it -- and we note that we've not done any I/O yet. Then since we've locked it all down, we have to actually find out where in physical memory all the pages of the buffer are. So we go and use the kernel to get a memory map which actually goes and finds all those physical pages which can be scattered throughout physical memory and returns you a table, which basically gives you the address of every physical page and how many contiguous physical pages there are at that address.

So ignore the debugging stuff for now, that was just for getting the driver working. Now here we go, and here we just go and call this routine Start I/O. I can actually go through the mechanics of how the I/O actually occurs, but I see we only have four minutes left, so I think I'll skip through some of this.

Suffice it to say, that in Start I/O we decide what part of the buffer that we're going to write, we go and set the three registers that we're actually telling the device Go and Read or Write from this particular location.

Another thing to point out here is this instruction, one thing you have to worry about in device drivers is for many of these processors, the 604 especially, the 604 feels free to sort of reorder the accesses that it actually puts out on the processor bus based on efficiency. So it can decide, if you do a Read and a Read and a Write, it can say, "Oh, well, just based on the state of the pipeline in the processor, it's more convenient to do the Write first and do the Read later," which for memory-mapped hardware is not usually what you want to do.

So there's an instruction for the PowerPC called Enforce In Order Execution of I/O, or EIOEIO. That's really the name. So what that does it basically places a barrier that says, Everything that I've done before this point shall happen before everything that happens after this point. So after every memory-mapped hardware access that we do, we insert those EIOEIO calls, and it enforces that everything happens in order and that the hardware is actually accessed in the correct way.

Now, what's going to happen is that we're going to go and write this, and then at some point later an interrupt is going to say, "Okay, the operation is complete, let's go and look at what the interrupt handler does." So the interrupt handler is actually very simple. It gets passed into the cookie, which we pass back to the kernel, so that's the way the interrupt handler knows which device is actually being managed.

Then it just basically -- since we've had this table of physical pages and we can only like do I/O on one physical page at a time or one set of continguous physical memory at a time, we may actually need to break up the I/O into several distinct parts, each one doing a little chunk of physical memory. So the interrupt handler actually just goes and tries to start I/O in the next little chunk of physical memory, and if it discovers in trying to start it that it's actually finished, it then goes and releases the semaphore, which tells the reader that the I/O is complete and just returns.

So that's how the interrupt handler actually communicates back to the client Read or Write call that the I/O has been completed. And this is the call which will actually unblock the client, and then it can go and return saying, "Oh, the I/O is complete."

So that's pretty much it. Let's see. We talked about those guys, about locking memory and unlocking memory and actually getting the mapping of the physical memory. And we're out of time, so I think I'm going to skip these two and open it up for questions. Any questions? Yes.

FROM THE AUDIENCE: Because you linked the Mac kernel with the hardware, would the same driver work on the same hardware?

ROBERT HEROLD: It will actually work on any hardware. What we did is we linked against the abstract name Kernel, and you saw it actually copied the Mac kernel. I could have easily copied the BeBox kernel as well or the Intel kernel. The idea there is that when you're at a driver, you want to only get an abstract name, you don't really want to have specific hard-coded names in there. So we invented this abstract name. I just happened to pick Mac kernel because that's the one we're running on, I could have picked the other one as well. All the kernels actually export the same set of names for that very reason. So they're all the same.

FROM THE AUDIENCE: Another question. Is there any problem of opening a driver within a driver, so that one driver opens up a serial driver?

ROBERT HEROLD: Yes, you can do that.

FROM THE AUDIENCE: What about the EIOEIO like on Intel? (Inaudible)

ROBERT HEROLD: What we've done for Intel is we've taken the EIOEIO and mapped it to the appropriate thing on Intel. I don't believe Intel has the same problem, but if they do, they have the same kind of ordering instructions. It might have been nice to think ahead and say, "Oh, we're going to be on any processor" and call it a nicer name, but for now we're using the PowerPC name for all the processors.