Be Newsletter, Volume IV, Issue 5

Volume IV, Issue 5; February 2, 2000

Table of Contents

BE ENGINEERING INSIGHTS: The BeOS Networking Environment, By Howard Berkey
DEVELOPERS' WORKSHOP: Optimize Anytime with cputime, By Marco Nelissen
From the Mailbox By Jean-Louis Gassée

BE ENGINEERING INSIGHTS: The BeOS Networking Environment
By Howard Berkey howard@be.com

There have been a fair number of questions recently in BeDevTalk and BeUserTalk regarding the networking rewrite. I thought I'd use my shift in the newsletter sweatshop to describe the new architecture and give some details on its status.

This article is an attempt to roughly describe the stack internals and give developers who might wish to create new protocols or NIC drivers with some tasty tidbits of info. For sane, normal developers who have no desire to work on networking internals, be assured that the sockets API will be just as it is on your favorite BSD clone and skip down to the $ales Pitch.

Overview

BeOS networking is being completely replaced by a new architecture, called the BeOS Networking Environment, or BONE. None of the existing R4.x networking will survive the change; it's either being ported over to the new architecture (in the case of drivers), or discarded completely (in the case of the net server, net kit, netdev kit/net server add-on architecture, PPP, Netscript, etc.) The new architecture focuses on performance, scalability, maintainability, and extensibility, in no specific order. It is simpler than the current net_server, yet far more flexible.

The BONE architecture is a modular design that allows for easy removal or replacement of any of its individual parts, by users or by Be. In this regard, BONE is an API spec for a networking architecture, and a description of how those modules interoperate. The implementation Be will ship can have parts replaced by users at will if they so desire, provided that they adhere to the specification.

Obligatory ASCII Diagram

                          _______________
                         |               |
                         | libsocket.so  |
                         |_______________|
user land                        |
- - - - - - - - - - - - - - - - -+- - - - - - - - - - - - -
kernel land                      |
                         ________|________
                        |                 |
    ______________      |  net api driver |
   |              |     |_________________|
   |   bone_util  |              |
   |______________|      ________|________
                        |                 |
transport layer         | protocol module | (e.g.,. udp)
                        |_________________|
                                 |
                         ________|________
                        |                 |
network layer           | protocol module | (e.g.,. ipv4)
                        |_________________|
                                 |
                         ________|________
                        |                 | (contains
data link layer         | datalink module |  routing, ARP,
                        |_________________|  etc.)
                           /           \
             _____________/___         _\_______________
            |     loopback    |       |     802.3       |
            | framing module  |       |  framing module |
            |_________________|       |_________________|
                     |                         |
physical layer       |                         |
             ________|________         ________|________
            |                 |       |                 |
            | loopback driver |       | ethernet driver |
            |_________________|       |_________________|

As you can see from the diagram, BONE consists of a library in user space which is linked with user programs, and a driver and several modules in kernel space which implement the networking services. There are several networking protocol and framing modules, as seen in the diagram, which are structures that extend the module_info structure to provide the standard API used by each module type. Put another way, each module in the above diagram is a concrete instance of an abstract C class representing each networking module type.

Let's look at each driver and module type in the architecture.

* libsocket/Net API driver/kernel sockets module

All networking functionality visible to user programs is provided by libsocket, which is a very thin library that opens a driver (which provides the socket "file descriptor") and communicates with it via ioctls to provide the networking API. The net API driver instantiates the internal data structures associated with a socket (the bone_endpoint_t), sets up the protocol stack for each socket, and handles all communication between the socket and the stack.

Other networking APIs besides BSD sockets interface could be implemented to talk to the net API driver using the same ioctls that libsocket does.

BONE also provides a libnet emulation library which allows programs linked against the R4.x libnet.so to continue to function, ensuring binary compatibility.

And finally, for the truly ambitious amongst you who are developing networked file systems and such, you'll be happy to hear that there is a kernel module interface to the sockets API so you'll be able to use networking from kernel land.

* BONE utilities module

The bone_util module contains functionality that the other modules need and/or that doesn't fit elsewhere. bone_data (see below) manipulation, benaphores, fifos, masked data copying, and other "generic" utilities are provided here. All parts of the BONE system use this important module. It defines operations for several data types.

A bone_data_t is a data type that is used in BONE as a container for transient networking data. While it fulfills the same requirements as mbufs do under a BSD networking architecture, bone_data_t are quite different than mbufs and suffer from none of mbufs' limitations or problems.

Central to the efficiency of a networking stack is reducing the amount of data copies. Unlike mbufs, bone_data_t are containers of lists of iovecs. A bone_data_t contains two such lists: a "freelist," which contains pointers to actual memory addresses that need to be freed, and a "datalist," which contains a virtual "view" of networking memory that can not only be very efficiently accessed, but also easily modified.

Consider the following scenario. A user calls "sendto" with a buffer containing a udp datagram that is 2000 bytes long. This results in a bone_data_t with the following layout:

bone_data_t {
    datalist: {iov_base = &buffer, iov_len = 2000}
    freelist: {&buffer, 2000}*
}

(* actually this wouldn't be here in this case since on datagram sends BONE is zero-copy and would pass the user's buffer directly to the NIC driver rather than allocating a new buffer that would later need freeing. But we'll leave it here for demonstration purposes.)

The udp layer would then add a header to the data. This is easily done by simply adding an iovec to the chain:

bone_data_t {
    datalist: {&udp_header, 8}, {&buffer, 2000}
    freelist: {&udp_header, 8}, {&buffer, 2000}
}

(Again, the udp_header would not *really* be added to the free list, since the udp layer would be using a local buffer for it that would not need freeing, but we'll use it as an example, as with the IP header below.)

Now, suppose the interface it's being sent on has an MTU of 1500 bytes. IP would need to fragment the data and add an IP header to each frag.

On other systems (especially BSD-based systems that use mbufs), multiple copies would need to be done here. BONE simply manipulates iovecs in their lists:

bone_data_t {
    datalist: {&ip_header, 20},{&udp_header, 8}, {&buffer,
   1472}, {&ip_header_2, 20},
{&buffer + 1472, 528}
    freelist: {&ip_header_2, 20},{&ip_header,
20},{&udp_header, 8}, {&buffer, 2000}
}

By manipulating the logical view of the data rather than copying, bone will see a big scalability and performance win when using large datagrams (such as during bulk data transfer of things like large image files).

* bone_proto_info_t

All the protocols are implemented as instances of bone_proto_info_t. These are chained together as appropriate in structures called bone_proto_node_t for each networking endpoint instance when it is created. A driver_settings configuration file specifies which protocols to put in a socket's stack when the socket is created.

(If you're afraid of looking at a config file for efficiency reasons, don't be; the bone_util module contains optimized functions for reading the BONE settings. On average, opening a socket under BONE takes on the order of 300 usec (microseconds)).

When networking operations occur, the net api driver calls the appropriate function in the bone_proto_info_t module on top of its protocol stack. The protocol then performs all necessary protocol-specific operations and calls the next protocol in the chain, on down to the network layer protocol, which passes the final data on to the datalink layer.

To add a new protocol to bone, one essentially creates a bone_proto_info "subclass" for the protocol, and adds entries for it to the BONE configuration file. It will be loaded at runtime by either the API driver (for new sockets) or the datalink layer (for inbound data).

* bone_datalink

The datalink module is the center of the BONE architecture.

The datalink module handles things like routing, ARP, interface management and link-level framing. The first thing the datalink module does is load the network interface driver modules. Each of them then scans hardware and does its hoodoo magic, and calls back into the datalink module to register an ifnet_t structure for each instance of the networking card that they find. The modules reregister at any time they need to, responding to things like new cardbus cards being inserted, new USB interfaces being logically added, etc.

Each time an interface is brought up (via ifconfig, etc.), the datalink module spawns off a thread which blocks on the interface module's receive method. When new data arrives on the interface, it's read by that thread, demuxed, and pushed up the appropriate protocol stack to the receive queue of the appropriate bone_endpoint_t.

The fact that each interface has its own reader thread associated with it, in addition to the fact that multiple user-level threads will be pushing data simultaneously through the system, should provide BONE with greater scalability than other systems, particularly in the area of stack latency. Multiple-interface BeOS systems perform quite well under BONE.

Networking Interfaces are represented using the traditional BSD struct ifnet data structure, modified for BeOS. This structure contains much info about an interface, including the various addresses associated with it, volatile statistics, the bone_interface_info_t module to use for the interface, and the bone_frame_info_t module to use for framing the data.

* bone_frame_info_t

Since many different interfaces use the same link-level framing types, these were isolated out into modules to facilitate reuse. For example, any number of ethernet card driver modules can load the single bone_802.3 module for their framing needs.

Similarly, by decoupling framing from the rest of the link layer, a single NIC driver module can use different types of framing. For example, a HiPPI interface that is configured to use the HiPPI physical layer vs. its logical layer framing. Another example would be an ethernet interface that wants to send jumbograms rather than 1500-byte ethernet frames.

* bone_interface_info_t

A networking-oriented interface to device drivers is added in BONE, to be used in writing NIC drivers. If desired, a traditional device driver can also export a bone_interface_info_t module interface, which makes porting existing drivers easy.

Sample Code

In the way of sample code, I have included the current snapshot of bone_proto.h and bone_interface.h, the two headers most useful to the majority of you who will be writing BONE modules. I have also included a snapshot of the bone_util.h BONE utilities header file, since the other files use it so much. Finally, I've included the source code to the BONE loopback interface module to illustrate how to write a network interface module. To get the code: <ftp://ftp.be.com/pub/samples/bone/bone.zip>.

Note that these files should be considered alpha-level software. They are likely to change in the future. The loopback module is (purposely) nonoptimized and provided as an illustration; real loopback operations are heavily optimized in BONE and bypass this module entirely.

While these files aren't everything you need to start developing for BONE, they should give you an idea of the directions you should be heading in.

Massively Cool Features (the $ales Pitch)

In addition to the traditional BeOS GUI-based tools, all of your favorite UNIX networking utilities are either already ported or will port readily. Examples include:

BIND 8.2 tools: addr, dnsquery, irpd, named-bootconf, nslookup, dig, host, mkservdb, named-xfer, nsupdate, dnskeygen, named, ndc
Configuration Tools: route, ifconfig, etc.
Utilities: telnet, ping, ftp, traceroute, tcpdump, libpcap, etc.
and many more.

Almost every feature that BeOS net developers have been asking for is there; sockets are file descriptors, the sockets API is much more compliant, raw sockets are there, it's relatively easy to add new protocols, there is a kernel networking interface, and so on.

Net performance has improved massively; there are no hard numbers (and we haven't finished optimizing) but our benchmarks are putting BONE around twenty times (2000%) the speed of the current net_server; BONE is in the same league as Linux and FreeBSD, though not fully competitive with their speed yet. Yet. :-)

Schedule

OK, I realize that the biggest question all of you are asking is "when?" In traditional Be style, I can only say "soon." The new stack is almost ready for beta. And that's all I can say for now.

DEVELOPERS' WORKSHOP: Optimize Anytime with cputime
By Marco Nelissen marcone@be.com

"Developers' Workshop" is a weekly feature that provides answers to developers' questions or topic requests. To submit a question or suggestion, visit: <http://www.be.com/developers/suggestion_box.html>

As one of the new guys at Be, I've been able to avoid the task of writing a newsletter article for a while. Today though, my fellow DTS engineers finally located my secret hideout, and promptly assigned me the task of writing this week's Engineering Insights article. So, as we say in the Netherlands: "here it is."

In this article I'll present a small application, written a long time ago when I wanted to measure cpu usage of the SoundPlay mp3 player (you may have heard of it). Specifically, I wanted to compare its cpu usage with that of the other players available at the time: I wanted to see which threads used the most cpu and when they used it. The resulting app is (not surprisingly) called cputime. You can download the source code at <ftp://ftp.be.com/pub/samples/kernel_kit/cputime.zip>.

You can start from the command line this way:

cputime application arguments

where "application" is the application you want to measure and "arguments" are the arguments (if any) you want to pass to the application. Cputime will launch the specified application with the given arguments. While the application is running, cputime will continuously monitor its cpu usage, either until the application exits or cputime's sampling buffer is full. When either of these happens, cputime will open a window and graphically display the cpu-usage over time of each thread of the application.

Because of the way cputime works, threads that are created and destroyed within one sampling interval are "lost" and don't show up in the display. If the application you want to monitor rapidly spawns new threads that run for only a very short time and then die, you may want to modify cputime to use a higher sampling rate. This is left as an exercise for the reader.

Cputime consists of two main parts: data-gathering and data-presentation. The data-gathering part uses the kernel kit function get_next_thread_info() to iterate over all the threads in the target application's team, getting a thread_info structure for each thread. Relevant data from this structure is then stored in an array for future reference. The data-presentation part reads the array and graphically displays the cpu usage of all the threads in a window. You can zoom in and out and pan the display to find the exact time slot you're interested in, and you can disable the display of threads you're not interested in. cputime has allowed me to identify the parts of my applications that used the most cpu, so I could optimize them. I hope you will find it equally useful.

From the Mailbox
By Jean-Louis Gassée

Not surprisingly, last week's column brought more questions and one or two arguments to my BeMail mailbox. I appreciate these responses -- they help me understand where we need to make ourselves "even" clearer. More than one reader noted that I've become more cautious, shall we say, a little less "colorful" in my observations. It's not clear whether this is seen as regrettable or, au contraire, a welcome change.

And, yes, my column is vetted by our legal counsel for statements that would in fact or appearance cause trouble with the powers that oversee the stock market. It's one thing to believe in emerging opportunities, but one's optimism can to easily be interpreted as promoting not the product, but the company's securities (notwithstanding the potential entendre-doubling the latter noun might suggest). The fact that our company's stock is publicly traded impacts what I can and cannot say.

Second, several readers proclaimed they don't care for Internet Appliances, don't need them, and feel we're misguided for getting into them. This suggests that we haven't made a good enough case yet for this emerging genre of connected devices -- for their benefits in general and for the role our technology can play in this emerging area. Readers object that "my PC does everything I need, so I have no use for any of these Internet devices." But this is not a PC versus IA (Internet Appliance) debate. It's PC and IA, not PC versus IA.

In theory, a PC is capable of infinite mutability through software and hardware add-ons that can simulate any experience. I'm exaggerating, of course, but the idea that the PC is a simulation engine is fundamentally correct. So is the idea that dedicated, specialized devices always appear in a prosperous ecological niche and coexist with the general purpose ones.

Let's put Swiss Army knives and screwdrivers aside and look at automobiles. The last 20 years have brought us specialized vehicles such as minivans and SUVs. The same kind of specialization is appearing in the digital realm, with telephones, Palm VIIs, TVs, stereos, alarm systems, WebPads, Tivo video recorders, pagers, Web Minitels, and similar information appliances and, yes, PCs. And for everything that we can see or imagine now, there are many other kinds of devices our inevitably derivative thinking prevents us from intuiting in the present moment.

As for myself, whether or not I can make good guesses as to which Internet Appliances will survive beyond the concept stage, I know we're onto something. Go buy a Tivo hard disk video recorder. Today, it connects to the Net through a slow telephone line. Watch your children use it and quickly forget that there was a time when Tivo didn't exist. Now let's muster our available derivative thinking and "picture" what will happen when the Net connection is always open, say via cable modem or DSL. We'll be able to program the recorder from anywhere in the house, or in the world, through a browser, with no need to download the TV schedule at night as the Tivo box does today.

No, it's not video on demand -- that will come later with a beefier Net infrastructure. And no, we don't really need these appliances. Nor do we really need PCs, or cars, or chocolate. Personally, I'd rather give up PCs than books. But just as we don't seem to want to live without personal transportation, there's a good chance that we'll find these appliances exciting or liberating enough to want them, regardless of what we call need.

Next week, I'll describe how one of us uses PCs, appliances, and wireless technology for home entertainment.

Recent Be Newsletters | 1999 Be Newsletters
1998 Be Newsletters
1997 Be Newsletters | 1995 & 1996 Be Newsletters