Table of Contents
BE ENGINEERING INSIGHTS: BeOS Kernel Programming Part IV: Bus Managers By Brian Swetland swetland@be.com
This week we will once again delve into the mysteries of writing code that operates in the realm of the BeOS kernel. Recently, Ficus expounded on kernel modules in his article "Programmink Ze Quernelle, Part 3: Le Module <http://www.classic.be.com/aboutbe/benewsletter/volume_III/Issue26.html>. We'll dig a bit deeper into this subject by looking at a specific class of modules -- bus managers. Bus managers provide an abstraction layer to drivers. Instead of burdening drivers with the knowledge of how to interact with devices on a bus (i.e., PCI, ISA, SCSI, USB), a common interface to each class of bus is provided. The PCI bus manager allows a driver to locate and interact with devices on the PCI bus (consult your local header file /boot/develop/headers/be/drivers/PCI.h for gory details). A driver, once it obtains a pci_module_info pointer from get_module(B_PCI_MODULE_NAME, (module_info**) pci), may use provided functions like get_nth_pci_info() to iterate over devices on the bus. Once located, functions like read_io_8(), write_io_32(), and friends allow iospace access (on x86 platforms where there is a separate iospace). PCI and ISA are fairly boring bus managers. Where bus managers get interesting is when they have two sides. On the "top" side, an interface is provided to drivers. On the "bottom" side, an interface is provided to busses. A driver using a bus manager like this needs to know nothing about how the underlying bus works -- it only needs to know about the interface provided by the bus manager. Brian Talks About SCSI -- No Surprises Here Let's take a concrete example -- SCSI. The SCSI standard defines a number of things, but the most interesting to a driver writer is a command protocol. To interact with a SCSI device, a 6, 10, or 12 byte command is sent to the device, some data is sent or received (depending on the command), and a response code is returned. The specification defines the gritty details of the electrical and signaling properties of the bus, arbitration, and other stuff that someone writing a tape driver doesn't want to worry about. SCSI devices all accept the same basic commands and adhere to the electrical and signaling standards defined in the specification. SCSI controller cards all use totally different mechanisms for issuing these commands (ranging from twiddling the bits almost directly to passing commands to an intelligent controller using an elegant shared-memory mailbox mechanism). The SCSI bus manager provides an interface to drivers that adheres to the Common Access Method (ANSI X3.232-1996).** It uses three function calls -- one to allocate a command block, one to issue a command, and one to release the command block afterwards. The command block is filled with the SCSI command, information about the data to send or receive, and some other useful flags. The command block refers to the SCSI device using three numbers: the "path" (which maps to a specific hardware controller), the "target" (the SCSI ID of a device on the bus), and the "lun" (the SCSI Logical Unit Number, which is 0 for most devices). The driver using the SCSI bus manager need not concern itself with what hardware is actually associated with a particular path. /* from /boot/develop/headers/be/drivers/CAM.h */ struct cam_for_driver_module_info { bus_manager_info minfo; CCB_HEADER * (*xpt_ccb_alloc)(void); void (*xpt_ccb_free)(void *ccb); long (*xpt_action)(CCB_HEADER *ccbh); }; #define B_CAM_FOR_DRIVER_MODULE_NAME \ "bus_managers/scsi/driver/v1" The SCSI bus manager doesn't know how to talk to any specific host controllers. It relies on a number of modules (living in /system/add-ons/kernel/busses/scsi) to actually speak to the hardware. The bus manager is a bookkeeper -- it loads the scsi bus modules and lets them search for supported hardware. If they find any, they inform the bus manager with the xpt_bus_register() call and are kept loaded. If not, they're unloaded. Registered busses are assigned a number by which drivers can address them. /* from /boot/develop/headers/be/drivers/CAM.h */ struct cam_for_sim_module_info { bus_manager_info minfo; long (*xpt_bus_register) (CAM_SIM_ENTRY *sim); long (*xpt_bus_deregister) (long path); }; #define B_CAM_FOR_SIM_MODULE_NAME \ "bus_managers/scsi/sim/v1" The SCSI bus manager itself exists as a module which lives at /system/add-ons/kernel/bus_managers/scsi. It exports two distinct modules -- (bus_managers/scsi/driver/v1 and bus_managers/scsi/sim/v1) -- from one binary (this is a neat feature of the kernel module system). A Module Of Many Names How does get_module(B_CAM_FOR_DRIVER_MODULE_NAME, &cam) actually get the right module, you may ask? The module manager looks for modules by first prepending the user config directory for kernel add-ons and then prepending the system config directory for kernel add-ons to the module name and looking for a binary. If that doesn't exist, subpaths are sliced off the end until a match is found (or there are no more subpaths). So in the case of this get_module() call, the following items are attempted: /boot/home/config/add-ons/kernel/bus_managers/scsi/driver/v1 /boot/home/config/add-ons/kernel/bus_managers/scsi/driver /boot/home/config/add-ons/kernel/bus_managers/scsi /boot/home/config/add-ons/kernel/bus_managers /boot/beos/system/add-ons/kernel/bus_managers/scsi/driver/v1 /boot/beos/system/add-ons/kernel/bus_managers/scsi/driver /boot/beos/system/add-ons/kernel/bus_managers/scsi Here it stops because a file is found. Within that binary is a symbol called "modules" that contains the list of modules which exist in the binary: module_info *modules[] = { (module_info *) &cam_for_sim_module, (module_info *) &cam_for_driver_module, NULL }; Should the correct module not exist in this list, the module manager continues to look until it exhausts all the possible binary names. At that point it must report failure. The module_info structures referred to here include their full name (e.g., "bus_managers/scsi/driver/v1"), which is used to determine which module is desired. This feature allows a single module binary to support two different interfaces (like the SCSI bus managers driver and bus interface) or different versions of the same interface. What if we wanted to provide a new version of the driver interface ("bus_managers/scsi/driver/v2")? We could include it in the modules list alongside the old interface (kept to provide backward compatibility). Old drivers would get the old version, new drivers would get the new one. Everyone would be happy. More About Finding Modules get_module() is well and good if you happen to know the name of the module you're looking for (which is the case for a driver hunting for the SCSI or PCI bus manager). What can you do if you don't know the name? The SCSI bus manager needs to try to load all available SCSI buses, but looking for them can be tricky. Luckily the kernel provides some handy tools: /* somewhere in the bowels of scsi_cam.c */ void *ml; size_t sz; char name[B_PATH_NAME_LENGTH]; ml = open_module_list("busses/scsi/"); while((sz = B_PATH_NAME_LIST) && (read_next_module_name(ml,name,&sz) == B_OK)){ cam_load_module(name); } close_module_list(ml); This snippet of code allows the SCSI bus manager to iterate over all available modules that have names starting with "busses/scsi". The function cam_load_module() actually does a get_module() on the current module and sees if it registers itself with the SCSI bus manager or not. ** BeOS CAM Non-compliance (for nit-pickers) When I said that the SCSI bus manager adhered to CAM, I lied. Our implementation diverges from the spec in several important ways:
Erratum for Be Engineering Insights: Device Drivers By Rico Tudor rico@be.com
The close function in the sample driver that appeared in my article last week contains an error. In addition to providing a correction, this erratum includes a full explanation of the device open/close protocol. When user code invokes system call open(), a new "session" begins with a call to device open. This results in the creation of a cookie (if you desire), and a file descriptor with reference count 1. A subsequent call to system call dup() increments that ref count. Creating a new team with system call load_image() or fork() duplicates all the parent's file descriptors for use by the child, raising the ref counts accordingly. The ref count of a file descriptor is decremented on the explicit call of system call close(), or the implicit system call close() when a team exits. A system call close() of a file descriptor with ref count 1 will trigger the device close. For a team with two or more threads, that final system call close() can create a subtle situation: one thread is blocked or running in one of the device read/write/ioctl functions, when the other thread calls device close. The sole purpose of device close is to encourage blocked calls -- of the same session -- to unblock and go home. Once the driver is free of all other callers for a given session, driver shutdown can proceed to the final stage: disabling the hardware, removing interrupt handlers, and freeing kernel resources. Your driver performs these chores in the device free function. The kernel guarantees that the call is single-threaded with respect to this session. Of course, you must remain vigilant for the threads of other sessions, and protect shared data appropriately: an example is "ocsem" protecting "nopen". The original device write function had a race condition between acquire_sem_etc() and has_signals_pending(), causing "wbsem" to miscount. has_signals_pending() is a function internal to the kernel, hence it is not documented and it should not be used. Instead, the revised code uses the return status of acquire_sem_etc(): now the semaphore is unchanged should an error occur. Not shown here is a mechanism to unblock I/O threads: you'll want one for drivers with slow, blocking I/O. Even fast devices would benefit if you expect lost interrupts or other erroneous hardware behavior. static status_t qq_close( void *v) { return (B_OK); } static status_t qq_free( void *v) { struct client *c = v; struct device *d = c->d; acquire_sem( d->ocsem); if (--d->nopen == 0) { (*isa->write_io_8)( d->ioport+MCR, 0); (*isa->write_io_8)( d->ioport+IER, 0); remove_io_interrupt_handler( d->irq, qq_int, d); } release_sem( d->ocsem); free( v); return (B_OK); } static status_t qq_write( void *v, off_t o, const void *buf, size_t *nbyte) { cpu_status cs; struct client *c = v; struct device *d = c->d; uint n = 0; while (n < *nbyte) { status_t s = acquire_sem_etc( d->wbsem, 1, B_CAN_INTERRUPT, 0); if (s < B_OK) { *nbyte = n; return (s); } d->wcur = 0; d->wmax = min( *nbyte-n, sizeof( d->wbuf)); memcpy( d->wbuf, (uchar *)buf+n, d->wmax); (*isa->write_io_8)( d->ioport+IER, IER_THRE); acquire_sem( d->wfsem); n += d->wmax; release_sem( d->wbsem); } return (B_OK); }
DEVELOPERS' WORKSHOP: By "Developers' Workshop" is a weekly feature that provides
answers to our developers' questions, or topic requests.
To submit a question, visit
http://www.be.com/developers/suggestion_box.html.
In my last column, I provided a simple function called
CopyFile(). Unsurprisingly, this function copies a file
under the BeOS, including not only the "ordinary" data of
the file but also any attributes that the file may include.
This week, I'll extend the function to attempt to discern
whether there is enough space on the destination volume for
the file before actually performing the copy. The source
code for this extended version of CopyFile() is available on
the Be FTP site at this URL:
<ftp://ftp.be.com/pub/samples/storage_kit/CopyFile.zip>
Let's look at the new function prototype first:
There are two new arguments since last time: "preflight" and
"createIndices". The first of these specifies whether or not
to analyze the source file to determine whether it will fit
on the destination volume; the second indicates whether the
copy routine should ensure that file attributes which are
indexed on the source volume are also indexed on the
destination volume.
I'll discuss indices a little later; first let's look at
preflighting. In CopyFile.cpp there's a function called
preflight_file_size() which estimates the storage required
for a given file. I'll walk through its implementation
briefly, starting with its prototype:
First off, note that two of the arguments are fs_info
structures. These structures, obtained through the Storage
Kit's fs_stat_dev() function, describe whole file systems.
Not all BFS volumes are the same; in particular, BFS
supports a variety of fundamental block sizes. Storage on a
disk isn't continuous, it's divided into discrete units
called "blocks." Everything on a disk -- file data,
attribute data, and file system control structures --
occupies an integral number of blocks. Even more A given
file's data will consume more or fewer disk blocks depending
on what the destination volume's block size is. For this
reason, the preflight operation calculates the file's
storage requirements in terms of blocks, not bytes.
Calculating the number of blocks consumed for the file's
ordinary data is easy:
BFile::GetStat() returns a variety of useful information
about a file; in this case we only care about its data size.
We then count how many blocks the data will require,
rounding up. The extra 1 is because there is a one-block
file system control structure called an "inode" for every
file. The inode is where the file system stores various
information about the file such as when it was last
modified, what user "owns" it, whether it's write-protected,
etc.
In the case of a destination file system that does not
support attributes, we're done counting blocks. However, the
usual case of copying files onto BFS volumes is more
interesting; attribute storage under BFS is complex.
Proceeding through the code we see that
preflight_file_size() checks whether the destination volume
supports attributes by verifying that the B_FS_HAS_ATTR flag
is set in its fs_info structure, then enters the following
loop:
The principle of iterating over the source file's attributes
is one we saw last time, but the rest of the code deserves
some explanation. BFS uses two different schemes for storing
attributes: a "fast" scheme in which attributes are actually
stored in leftover space within the file's main inode, and a
slower scheme once that space fills up. The above code
calculates how much fast-area storage would be required to
hold the attribute, then checks to see whether there's
enough fast-area space available. If so, the attribute's
storage requirements are added to the fast-area tally,
otherwise a separate attribute inode and attribute data
blocks will be used for it. In that case, the storage
requirement calculation is the same as for the file's data
portion.
The nine-byte FAST_ATTR_OVERHEAD constant is calculated
based on the scheme that BFS uses for storing attributes in
the fast area. The overhead is 4 bytes for the attribute's
type, 2 bytes for a name-length indicator, 2 bytes for the
attribute's data length, plus 1 byte for a trailing NULL at
the end of the attribute's name. Similarly, the current size
of the BFS inode structure is 232 bytes; whatever is left
over in the inode block is available for fast attribute
storage.
There is one more feature of the Be file system that
complicates storage requirement estimation, and that's the
concept of indexed attributes. BFS can be instructed to
maintain an index of all files that contain a particular
named attribute; this index then allows the file system to
search for files whose attributes match various criteria by
scanning the indices rather than having to scan all files.
This is the secret of the Tracker's blazingly fast "Find..."
capability. The cost of this feature is disk space: the file
system has to duplicate the attribute data within the index
structures on disk.
To estimate the amount of extra storage needed for indexed
attributes, the preflight routine keeps track of the
cumulative size of all indexed attributes within the
attribute-scanning loop:
then, after all attributes have been examined:
Indexed attribute data is duplicated within the volume's
indices, up to a maximum of 256 bytes per attribute. We keep
track of the total amount of data that will be added to
indices by the copy operation, then estimate its storage
cost as twice the number of blocks necessary to hold the
data contiguously, plus a small amount to account for
indexing of non-attribute data such as file names and
modification times. The exact number of blocks required
cannot be determined accurately because it depends on the
state of the indices prior to the copy operation. Double the
minimum is an educated guess based on the observation that,
on average, the blocks within a given index structure under
BFS tend to be about half-full.
It bears repeating that these are *estimates*, not precise
calculations. This code may overestimate by a few blocks the
amount of storage that will actually be consumed by the copy
operation. That's as close an estimate as possible, since
certain aspects of the file system -- particularly directory
and index management -- are somewhat nondeterministic.
Because conservative estimates are safe, this is an
acceptable inaccuracy for our purposes.
Now that I've shown you what's involved in predetermining
the amount of storage required for a file, a word of caution
is in order. Especially for files with large numbers of
attributes, this preflighting function will be SLOW. One of
the slowest operations in any file system is the "stat"
function, examining a file's vital statistics. Under BFS,
because attributes are practically files in themselves,
getting attribute information via BNode::GetAttrInfo() is
just as bad as BFile::GetStat() for performance. Note that
the Tracker itself doesn't use this intricate procedure to
preflight disk requirements for a copy operation; it uses a
simpler, more conservative heuristic instead -- one which
doesn't involve getting info on all attributes. But if you
know that you're copying files with large amounts of
attribute data, or large numbers of files (such as an
installer program might do), the more accurate estimation
that I've presented here might be worth the trouble.
1997 Be Newsletters | 1995 & 1996 Be Newsletters Copyright © 1999 by Be, Inc. All rights reserved. Legal information (includes icon usage info). Comments, questions, or confessions about our site? Please write the Webmaster. |