Approaching the File System I

Proceedings of the May Be Developer Conference

Approaching the File System I

DOMINIC GIAMPAOLO: I'm Dominic. That's my name. I'll go first and then Cyril will talk. I'm going to talk about the actual file system implementation. Not so much the details of how you can use it, which he wrote and which allows you to plug in file systems and things of that sort. My slides aren't exactly great, but...

What the heck is going on? Okay.

Okay. So back on line. What I was going to talk about, this is just a brief overview of the overview, what the file system is, what are attributes and queries, what is journaling and how do you get performance and then a quick summary of everything.

The Be File System is a 64-bit file system. That's not just the file system size, the whole, but 64-bit files as well. We have a 9-gig drive, create a 9-gig file. I hear there's 22-gig drives available. You can create a 22-gig file if you want. So 32 bits is just not enough. Two or three minutes of uncompressed video is greater than 4-gig, so clearly you need larger files, as well as large file systems.

Just the other day I had about 17-gig hooked up to a BeBox. I mean it was just three or four drives actually, a 9-gig, an 8-gig, a 4-gig and 2.9, so it's very easy to have more than 2/32nd. In the future the way it was designed is, just a quick thought, basically, the big thing right now is uncompressed video and how long that is so the way the file system is designed right now I think it will max out at about nine hundred hours of uncompressed digital video. So that will last us a little while.

The file system also supports extended file attributes. I'll talk a little bit more about what those are and how you can use them. Basically, it's a way to tag information or associate information with the file. You can index attributes, which is kind of cool, so you can do queries on what attributes there are.

We use BTrees for a lot of things actually, for directories so if you have ten thousand files in a directory and you open file two, it's very efficient to do that. You don't have to do a linear scan to find out that it's not there or that it is there.

Indices are also stored using the same BTrees, so it's a nice use of the code. File system is also journaled, which is both for boot up time, as was demo'd earlier in the day, and for integrity of the file system data. A lot of effort has been spent toward sustaining high bandwidth I/O, since that's very important for the sorts of applications that we're targeting.

Oh, one thing I should have said. In terms of the high bandwidth I/O, it's been my goal to maintain at least 85 to 95 percent of the raw disk bandwidth which is actually something we've been able to achieve. One of our engineers, Guillaume, spent a lot of time tweaking the IDE driver, which now is significantly faster than the SCSI driver, which will probably change too.

So he was able to get sustained 6 megabytes a second from a pretty average IDE drive using DMA and all of that. Running it on the raw device doesn't get much more. I think he got 6.2 out of 6.5 megabytes per seconds. That's writing for a 28-megabyte file, so it's not just like oh, I wrote 50K, it was that fast.

Okay. Attributes. Basically an attribute is a name value pair. A name is a short, descriptive string. The value is any one of a string, another string, an integer, a floating point, double or just raw data. So for example, comment equals "this is cool" is an attribute. The name is "comment," the value is "this is cool." And you can have any number of attributes that you want per file, it's not limited to, like, 64K of extended attributes, like in HPFS, the OS2 file system. And you can have any size of data that you want. And, in fact, actually, it's an implementation detail, but the attributes are stored exactly as files are, so it's the same data structure.

You actually get roughly the same bandwidth you use reading or writing to an attribute and it can be 64-bit in size, should you so desire. Can't really imagine really a use for a 4-gigabyte attribute, but should it really be necessary you could actually do it.

So what can you do with attributes? Like I said, storing meta information about a file; for example, if you have an image that you've captured from somewhere, you could store the image size, the X and Y coordinates, width and height, comments about it.

Most image file formats don't necessarily have support for a comment field or for the author of it, for copyright information. You can also store this in the attributes of a file, which don't clutter the actual file format.

For example, TIF does have support for storing the author, but it's something that has to be added to the spec and not every file format has that. This is the way that we can do it.

Both tracker and StyleEdit actually uses attributes. Tracker uses it for recording the position of an icon on the screen, the window position, where your current position was in the window, et cetera.

StyleEdit, which actually I think you saw on the last demo or various other demos that have been displaying code, which has colored text and all of that. It's nice with StyleEdit it stores the style runs separately as -- as an attribute of the file, which allows you to, say, have multifont editing for source code and yet you can still compile it. It's not like a word processor format.

So the UTF-8 text is on one side and the attributes are stored separately. You can also build record structures around the file; for example, you can have people, you can have a phone number attribute, fax number attribute, addresses, E-mail, et cetera.

And all of this is accessible, so if one program is looking for just a phone number attribute, it doesn't have to store all the same -- or, you know, you don't have to have a phone number database and another person database for E-mail. They can all just be tagged onto the same person structure.

You can also -- this is something I really want to see done, this is more of a plug--recorded file type information. Well, we do that, but where a file came from; for example, a number of times I dial Netscape or save some text from a URL, but I don't know where I got it from later on and I try to remember and I can't figure it out and so net positive is probably going to record where the URL was of the file that you got it from, so that that way you can say where did this come from and it will tell you or If you FTP'd it, it will record that onto the file.

Okay. The indexing of attributes equals queries so having attributes about files is kind of nice, but just by themselves it wouldn't be super useful. So you can create an index for an attribute, so that is an attribute of a given name, such as comment or key word or the from field or an E-mail message can be indexed and then you can do queries on that.

If you're doing a query for a specific name, such as E-mail from a certain E-mail address, it can be -- it's extremely efficient, that is instantaneous lookups. I've done built indices with 12 or 13,000 E-mail messages and it was no problem at all.

You can also do searching for substrings if you want. Of course, the integer and floating point and double values, you know, you can do greater than or less than defined range of values. You have all the standard comparison operators. I plan on adding a few more extra perhaps more interesting ones and although it's not done in the release that's on the CD, case insensitive searching will be there, because most people want that, so that when you're not sure what the name of a file is you can simply do case insensitive searches instead of having to do, you know, know the specific case of a file name.

One thing, though, I do want to point out, is that the Be File system, the indexing and the attributes do not turn this into a relational database. There's no intent to try to make it into SQL or -- well, you could do, I guess, that. It's not supposed to be an Oracle type of database. That's not the intent, but it does allow you to do indexing so you can do efficient lookups and iterate through all the different files that match the different query.

Journaling. Okay. Journaling is a way to preserve the integrity of the file system while not compromising performance. The idea here is that if normal file systems, they have to do everything synchronously, they can't modify. If they modify more than one block as part of a transaction, such as creating a file or deleting a file, they have to be sure that the whole operation completed, and when you reboot if the file system's not clean you have to go through and verify all the Inodes, match all the files that exist in the directories and you have to check path name consistency and things like that. Journaling buffers the transaction in memory until it's complete, writes the completed transaction contiguously to one area of the disk and then allows the real blocks to be written asynchronously and it can be flushed out whenever needed.

What this buys you is that you're just doing one contiguous write. Furthermore, we actually collapse multiple transactions into one, so that that way if you're just adding a whole bunch of files in one directory, all those transactions will be collapsed into one disk write which will then be flushed and that buys you a lot, because you're basically doing the transactions in memory while preserving the integrity of the disk.

User data is not logged. That's actually a bit of confusion that comes up. People say well, how can you do logging and still maintain any kind of performance, because every time I write to a file you're writing into two places, but that's actually not the way journaling is. Journaling is not for user data, it's only for file system data structures, so that that way if a power failure ever happens or a crash, God forbid, should occur, you actually won't corrupt the file system structure. You may not have all the files or all the data that was written before the crash, but you will also have all of your old stuff too, though.

Transactions happen at most once, which is kind of important. If a transaction -- if you crash before the transaction has completed, it's as though it never happened. If you crash after the transaction is completed it is replayed at boot time. That's the journaling playback, log playback. And if you crash after it's been written to the log and if some of the blocks have been played back -- I mean if some of the blocks have been written to the correct place, again, the log playback takes care of all that.

The log playback is extremely simple. It simply says this block goes here, this block goes here and it plays them back. It's about 50 lines of code, and it's also very fast because it's simply reading a disk block and writing it back to another place.

Okay. Performance. That's what most of you probably care about. This is how do you get performance? What are the things you can do to insure good performance? As always, big long writes are good and help maintain file contiguity. The file system also does preallocation of files, so if you write 1K at a time it says well, chances are he's going to keep writing 1K at a time and it will allocate a big chunk for you and so then you'll actually get contiguous files on a disk, which is kind of important, especially if you happen to have interleafed writes. If you have two people writing to the disk at the same time you'll still manage to get reasonable chunks of data that are contiguous on disk.

Sometimes what's a problem in UNIX and NT is that everything has to go through the buffer cache and that can actually not be what you want. If you're streaming video to disk, putting it through the buffer cache is a waste, because you copy it to memory and then you copy it to disk immediately and that's not desirable.

So in this case what we do, and there's some fine-tuning that has to go on here, but when you do a write larger than 64K, it's basically useless to put it into the buffer cache. Simply write it directly to disk. It's DMA directly from your buffer straight to disk or write it from disk straight into your buffer on a read. This is most of the time what you want. Like I said, there's a few cases where some fine-tuning is required.

I/Os that are multiples of the file system block size are good. You want to generally do 1K or larger I/Os if you're writing raw -- if you're writing unbuffered I/O.

If you're using the POSIX F open API, it's handled for you. If you're using the BFile API, it's important to make sure that you write in larger chunks, because it doesn't actually do any buffering. Same thing with the open standard UNIX system call. When you do things that are smaller than a block size, of course, something has to be read and copied in and obviously it's less efficient than doing a big write.

I guess that's it. Next one. Quick summary: Attributes are cool. Journaling is cool. Indexing is cool. 64-Bit numbers. 64-bit is bigger than 32 and it's cool. Cool is cool. You know, we've spent a lot of time to make sure that the file system is robust. Testing has been a very important part of the development. A couple of guys are -- John and William Bull and Robert Chin are around. They have done a lot -- Baron especially, has done a lot to make sure it's something that you can depend on.

There's still issues that need to be dealt with, but for the most part it takes a fair bit to do things wrong. We haven't actually seen very many corrupted disks, where you can't actually recover data. There's enough checks in the file system that prevent bad data from being written to disk so you can be reasonably sure.

What else did I want to say? There's one thing. I guess not. Okay. I guess Cyril's up and he's going to talk about the file system independent layer.

CYRIL MEURILLON:: My name is Cyril, as shown on the slide, and you can even have a picture of me. It's on the CD. If you choose print -- the printer page setup for the HP Printer, you'll see my picture.

Okay. I'm going to talk about file systems in general, so not the file system as Dominic described, but the open file system architecture that we have in DR9. So as this slide says, things have changed. Under DR8 there was no support for foreign file systems, so we got rid of that architecture and I rewrote the file system abstraction layer that allows the plug-in of foreign file system.

So I have to introduce a simple notion. The file system handler is a kernel add-on, a little bit like a driver and it is code that provides files and directories. Of course, an obvious example is the B85 system, that file system and others, but of course it's privileged. But with DR9 you will see we have HFS for notes you read only, but hopefully for the final DR9 it will be read and write, and to appear later DOS and NFS and, you know, we can think of more.

Also another notion. You have a notion of volume that everybody knows, I guess. It's a setup file and directory that is located on the volume and that is served by a particular file system handler. Everybody knows about mounting and unmounting of volume, it's no more complicated than that. I'm going to show that in action.

So I'm going to select here drive setup, which is a very nice little application written by Robert Polic, employed by Be. It's essentially a volume manager so it's scanning the devices now. It shows me the available devices. I see here there is a hard disk that contains both the BFS partition, which we booted from and also the BHFS partition, which I'm going to mount now, so it's mounted it says. And if I go to the disk I can -- under the tractor I see it appears and I can browse through it.

So you see that the icon support for HFS is a bit weak now, everything appeared with a little application icon, but we'll polish that later. Also similarly, if I go into the shell on your PO2 UNIX people in here, but I can -- I can go there. It's under power HD and LS works. It's there.

And all that is accessible to programs through a single API. You're dealing with a BFS volume or dealing with an HFS volume. It's for some API and I'm going to show that, too. It's slow because it's scanning for SCSI buses. And here you see that it's been seen by drive setup and I will -- I will mount it. It's mounted and on it there is a -- a picture, one of my favorite pictures actually.

A SPEAKER: Cyril, drag into the apps. Find Rraster in the apps.

CYRIL MEURILLON:: So I'm dragging the file to the app and here's a picture of Baron, our favorite QA engineer. He's not dead. So that was an illustration of how to -- you know, raster is application that views pictures and it just reads a file and without any special code it could read from HFS floppy.

Dominic talked about certain advanced features which are related, like attributes, queries. There is also MIME, strong MIME typing, Hfile, which has a MIME type which is encoded using a string. Well, not all file systems necessarily provide those features and it is up to each file system to decide which features it will provide. For now HFS does not know about attributes, does not know about queries. You cannot do a query on an HFS volume, but we'll extend that later. The important point here is that volume for those support, for those advanced features varies along file systems.

Now I've talked about what I would call persistent file systems, file systems that deal with data on file system storage. We also have virtual file system handlers and unlike the previous type, they don't deal with real files on disk. Okay. Now you may ask me what is that.

Actually, no -- yeah, next slide. Okay. So what is that? Well, it's mostly a very powerful API available to a programmer that I can show you in action. /dev has been demo'd too, I think, a bit earlier.

In /dev you'll find all the devices available in the system and /dev is a file system so you can open -- you can open /dev disk floppy. It acts like a file but it's not a file. It's not the real file in the same sense as boot Fred. Fred is a file in my boot volume.

So, for example, here under the shell I'm in /dev and under disk SCSI I see SCSI disk is available. And if I go further to see ID zero, I see all the partitions available on that device.

So that gives you an idea of the kind of services that I can get from /dev, but we'll have more. We have also /pipe, which is a file service, so this one, for example, can have two terminals, two shells, so one -- one is writing to a pipe, a name pipe and the other is reading from that name pipe So you see that when I write this window it appears here.

So what's important here, it's not very useful under the shell, but it shows you that virtual file systems can be a powerful API. It's not intended to be used by the user, but mostly by the programmer. /prompt too, which is not implemented in DR9, which it will be later, under /prompt you'll find a list of all the running teams and threads and you can list the resources. So you see that virtual file systems is just a very powerful way of listing resources available in the system, so it's -- it's in there.

So we tried to just sum up a little bit what the new file system scratch layer brings. Well, now we have foreign file system. This is new. The support for it is a bit limited now, we don't have DOS, we don't have NFS, but that will come soon.

And second, we have also virtual file systems, which is very powerful. We intend sometime later to have the sockets implemented using the virtual file systems. So there's a lot that we can think of and it's just now a matter of time to implement it.

DOMINIC GIAMPAOLO: If there's any questions I guess we'll take them now, both of us, for either layer. Trey?

A SPEAKER: Yeah, what about support for prompt, the use of passwords and stuff, we're going to implement NFS or I guess passwords or something like pent files --

DOMINIC GIAMPAOLO: Oh, password authentication?

A SPEAKER: Mostly I want to know -- certainly can't -- going to have this discussion, but certainly can't pop up a window from the kernel driver, trying to make a connection to the NT server for a volume.

Is there an API for being able to put up a password entry screen or something like that and pass it off to the disk mounter or --

CYRIL MEURILLON:: The file system protocol, that's what we call the API between the kernel and the add-ons, that file system command is purely the level, there is no UI involved. Now, when it comes to higher-level issues like, you know, my file system needs extra parameters, in mount I need to know if it's speed only, if there's a password. All that should be dealt at a high level.

DOMINIC GIAMPAOLO: A simple user interface program would pass those parameters to the file system in its mount call. There's an extra parameter that can specify arbitrary information and it would be passed that way.

A SPEAKER: Your discussion of attributes reminds me of the Macintosh resource port in their files and that's an absolute nightmare to deal with if you ever have to exchange files with nonMacintosh file systems. What happens with the files if you have try to store them on a UNIX or a --

DOMINIC GIAMPAOLO: Well, again, if the file system doesn't support storing attributes, then those operations wouldn't work. Yes, it can be actually a bit of a nightmare to transfer files and part of that is being addressed in the way of providing an archive format, which actually does support attributes.

Zip, it turns out, has some rudimentary support but has other limitations, so there's been some discussion on the Bedevtalk mailing list and John Watte has actually implemented some code to create a new archive format which does support attributes and is portable and it works on all systems.

So yes, you can't store a BFile with attributes on an NT file system. Well, actually, it might be possible on NT because they do support some attributes. It's a matter of what features you need and if you absolutely need attributes then you have to use a BFile system.

Portability of files. There will be a way to flatten files easily and to unflatten them as well. That's just something that it's a simple thing to do really, which is sort of related to the archiving issue.

In the back.

A SPEAKER: Do you have support for resource codes on Macintosh products?

CYRIL MEURILLON:: No, we don't have yet support for that, but hopefully when HFS is more complete we'll actually be able to read the resource code perhaps as attribute. We don't know yet.

DOMINIC GIAMPAOLO: The code that we're actually using is from an HFS library that's on the Net. It's simply wrapped around with a driver's interface or some gluco to interface to our API. So we'll be making that code available. I mean it's copy lefted so we have to and that code does have support for reading resource fork and so there will probably be a way to do that fairly soon.

Right now we've kept it read only and just the data file, because we wanted to avoid trashing anybody's hard disk, things like that.

In the back?

A SPEAKER: Couple of questions. First of all, I assume the API is already published, already available, I can buy it today; is that right?

CYRIL MEURILLON:: No, actually, and it won't be until DR10. We are waiting for things to settle down. It's not easy. It's more complex than writing the driver. To give you an idea, the driver is about five- or six-hook function and a file system handler is --

DOMINIC GIAMPAOLO: 67

CYRIL MEURILLON:: Well, if you want to be complete, if you want to support attributes and database operations it's close to 70. Otherwise, if you're just straight POSIX compatible, it's 30 or so, so it's not easy.

DOMINIC GIAMPAOLO: Some operations, just as a side note, can be done not as a file system but as a device. For example, Mark Elrod from PGP wanted to do an encrypted file system, which, you know, I'm not going to do an encrypted file system myself, but he's done it simply as a device, which the file system -- it sits on top of a real disk device, but underneath is a file system and so he'll have an encrypted file system in a couple days.

A SPEAKER: What commissions model do you support? I saw that you had UNIX stuff --

DOMINIC GIAMPAOLO: UIG/DIG.

A SPEAKER: So you don't have an ATL?

DOMINIC GIAMPAOLO: No, we had seven months to do it, the file system.

In the back?

A SPEAKER: How do you handle virtual memories? Do you have them on all of the time? Do you use file marking?

CYRIL MEURILLON:: Next question, please?

No, no, seriously. Well, generally for now only works with an HFV file system. We'll try to clear up the interface between the add and the file system protocol later.

DOMINIC GIAMPAOLO: VF is always on, you can't turn it off. This is a real operating system, basically. We also plan on integrating the buffer cache in BM, which will significantly improve the speed of things, I hope, in the next release.

A SPEAKER: You have 64 bits so you could have a one-terabyte file system --

DOMINIC GIAMPAOLO: Much more than one terabyte, actually.

A SPEAKER: -- do you have any logical volume management technology so that you can span physical disks?

DOMINIC GIAMPAOLO: No, that sort of assumes RAID and things like that. No, we don't have anything for that right now. I'm very interested in it because it makes me look good when the file system is really fast. It's just a matter of time; there's nothing that prevents it. In fact, the same mechanism that Mark Elrod used to do the PGP disk could be used to do logical volume management; in fact, that is the only way you would do it.

CYRIL MEURILLON:: This is something we referred to is that we do have partitions now, as you've seen. That's new too in DR9.

A SPEAKER: Is the journal written to the middle of disk?

DOMINIC GIAMPAOLO: The position of the journal, which can affect the performance, is currently written at the beginning of the disk, but its position is stored as part of the super block and so it could be anywhere on the disk and eventually it even could be on another disk. I'll have support for that later. I didn't have time to experiment with it.

A SPEAKER: With regards to replaceable drives, is it possible to mount more than one file -- let's say I have a back floppy and let's say a DOS floppy and I was flopping between them, what about mounting more than one file system per device; is there a way to auto sense?

DOMINIC GIAMPAOLO: No, it doesn't work like the Omega.

A SPEAKER: Are we going to see bearing or strike volume any time soon?

DOMINIC GIAMPAOLO: That's the other question. Yeah, it requires a bit of support from the SCSI device model in terms of asynchronous transfers and disconnect and that and once you get that, that's the first thing I want to see done is a striked volume.

A SPEAKER: What kind of optimizations are you supporting for lots of small files? You talked about the optimizations for really big files.

DOMINIC GIAMPAOLO: We've done you know, 50, 60,000 files in one directory. I mean, you know, you have one block for the Inode and one block for the data, assuming it's less than the block side of the file system, which is about 1K.

Oh, that's the one thing I wanted to mention. There's a lot of confusion about the block size of a file system and the size of the disk that it's on. In the BFile system those two are completely independent. You can have a 9-gigabyte disk with 1K blocks, 2K blocks, 4K blocks, it doesn't matter. You can have a floppy with 4K blocks if you want. There's no business that oh, my disk is bigger than 2GB so I have to have 32K clusters or something like that, so that's important to keep in mind.

A SPEAKER: That's determined at initialization time?

DOMINIC GIAMPAOLO: It's determined at initialization time. It's fixed at the life of the file system.

A SPEAKER: So you have 56K of files you said on the directory; how fast is that?

DOMINIC GIAMPAOLO: To do what?

A SPEAKER: You said you've optimized for the large files, but what about if you're dealing with a lot of small files, you know, does that slow your system down dramatically?

DOMINIC GIAMPAOLO: No. I mean when I said there was 50,000 files in the directory, if you want to do a complete listing of them, like opening a tracker window, that would take a significant amount of time, because there's a significant amount of data. Has to read at least 50,000 disk blocks. Now, there's some optimizations done with the placement of the file control block so that many of them can be read at once and you get read ahead benefits there, but as with anything, there are worst-case scenarios which can be pretty slow.

A SPEAKER: Do you support creating and accessing raw partitions?

DOMINIC GIAMPAOLO: You can access the raw device, sure. You can just cat it if you want. Seriously.

CYRIL MEURILLON:: Actually, under the /dev virtual file system you can create pseudo devices that leave a range of sectors of your partition.

DOMINIC GIAMPAOLO: There's always a raw device for every SCSI or ID disk and as well you can virtualize partitions on top of that. That's actually not done in the kernel, it's done outside of the kernel. There are partition handlers that are part of -- right now we have Apple and Intel style partition maps and you can have other ones if you wanted as well.

Loren?

A SPEAKER: You mentioned that attributes are being used by tracker and style by Be and that's true until the second guy decides to use an attribute called lyte or something totally different. Is there going to be any sort of registry of used attributes and what they're supposed to mean so that people won't drop all over each other?

DOMINIC GIAMPAOLO: That's a higher issue. No, it's a very serious issue, though. The file system doesn't care. Many things can be implemented at a higher level; in fact, some of the table stuff that exists in DR8 can be, you know, abstracted in implementing at a higher level where it's registered and names -- internally names that aren't supposed to be overwritten are preappended with an underscore, just as in the C library underscore names are reserved by the library, and it would be a similar sort of thing there.

A SPEAKER: You said that attributes are stored in the same way as files. Does that mean that every attribute has a block of files?

DOMINIC GIAMPAOLO: I was sort of lying. Small attributes are actually stored -- there's a space of about 760 bytes as part of a file for a 1K-block file system, so that's a fast attribute area. So a number of small attributes can fit in there and are basically zero access time. Ones that spill out of there, yes, are stored as a file, yes. So at a minimum they would require one block for the file control block and one for the data. I'll actually reduce that back down to one for attributes that are between, say, a couple hundred bytes, they'll fit in the Inode of the attribute. It gets complicated but we'll try to minimize the impact of that.

A SPEAKER: Do you have a page size and do you break all the requests into many small requests that go into the page size?

DOMINIC GIAMPAOLO: Requests such as a disk write?

A SPEAKER: Right. Like suppose you've got an 8K page size and I want to read 800K byte, do I see on the SCSI bus 100 --

DOMINIC GIAMPAOLO: No. In fact, if you read up to one megabyte that will be done in one chunk, you'll see a request for one megabyte.

A SPEAKER: You claim that the file system is high performance. Have you done any comparison against other high performance file systems, for example, Linux ext2?

DOMINIC GIAMPAOLO: Lunix X2 there's a number of bench marks for creating and deleting files and for bandwidth and whatnot. Well, bandwidth is difficult because it depends on the underlying device that you're using and the implementation of the SCSI buses.

What I was more concerned with was maintaining a high percentage of the vault disk band. So there's a program called I/O Zone which simply reads and then writes a whole bunch -- well, writes and then reads back a bunch of data. With that, like I said, we're able to achieve between 85 and 95 percent of the raw disk bandwidth. So as the underlying disk mechanism, for example, our SCSI device as it gets disconnects, synchronous transfers, RAIDs, striped volumes, things like that, the file system will scale with that.

As for creating files and deleting files, we actually maintain pretty respectable numbers for a journaled file system. The Lunix X2 file system, which we receive a lot of -- well, internally a number people have beat me up about this. Lunix X2 is not safe. It does everything in memory, so depending on the size of your buffer cache, the bench mark that you're running may run entirely in memory and it will be a simple matter of how fast can you memcopy in and out of a kernel. Should you crash, however, you're hosed. So with journaling and maintaining indices, we can still see reasonable throughput. I've seen about half of X2 performance on a reasonably configured MacOS or MacBox.

A SPEAKER: How about again the BSDI?

DOMINIC GIAMPAOLO: I don't know about BSDI's file system. I know we looked at NEC's stuff for a little while and we're pretty much crushing them. Same thing with Sun OS. We had an UltraSPARC and the numbers I just saw the other day -- actually, I haven't tested them, because there's some machines with 64-meg of memory which I haven't played with. We were creating 400 files a second, deleting 2000 a second. For comparison, some friends of mine at SGI played with a 29 processor 4.8-gig main memory machine. It was doing 250, creating 300 and some deletes.

The Solaris machine that we have, the UltraSPARC is doing about 50 and 100 across the board so...

Yeah?

A SPEAKER: Is there any chance of seeing attributes in the tracker window?

DOMINIC GIAMPAOLO: Actually, we already have some, we have kind BApplication. Let's see, over here we can -- oh, that's an HFS file, can't do it. Sorry.

What's something that's more interesting. So here, kind. Here we have a whole bunch of different files. There's MIME-sniffing code, which is essentially the same, so here we have some things that are text files and things that are -- they're actually supposed to be marked as menu files, but that got a little munched just before the release.

This sniffing code that looks at the contents of a file, failing that it looks at the extension and guesses what the file type is. It's basically the same as the UNIX file command and stores it as a MIME type and the browser supports displaying that.

Additionally, there's some work being done to say files of this MIME type support these other attributes and the tracker will have that information available to it and allow it to display it. Steve Horowitz, who's up next, will probably talk a little bit more about that.

Tom?

A SPEAKER: Another stupid floppy question: Is there a favorite or native Be floppy format or are we just going to say that floppies are a thing of the past and zip this as the new floppy?

DOMINIC GIAMPAOLO: I want to say zip this, because it's too sloppy, but you can actually create a BFile system on a floppy. It's not particularly efficient, given that there's a 512K log on it. Well, I mean tar works pretty good on floppies, if you just want to store files. To be honest, just hardly anyone is using floppies around Be and it hasn't seemed like that big of a deal lately, between that and the Net, so...

A SPEAKER: I have this problem. What file system type do you have to put on a floppy if you want the kernel to pick a driver off of there? Does DR8 build a floppy and put, you know, in the system add-ons AppServer with your new video driver and have that one taking preference to the one on the disk? In DR9 that doesn't work, or I wasn't able to make it work.

CYRIL MEURILLON:: Well, the kernel is file system agnostic, so if you put DR8 -- oh, by the way, DR9 you can still read DR8 volumes because we have a file system handler for that.

DOMINIC GIAMPAOLO: Read only.

CYRIL MEURILLON:: Read only.

DOMINIC GIAMPAOLO: Got to move forward.

CYRIL MEURILLON:: So that works and I don't know what kind of problem you've run into but --

A SPEAKER: In the old processor you could pick up a new driver.

CYRIL MEURILLON:: We do that at Be.

DOMINIC GIAMPAOLO: He's talking for AppServer add-ons. It's probably just something that has broken into the AppServer add-on.

A SPEAKER: That works for kernel drivers?

DOMINIC GIAMPAOLO: Yeah, and it works for kernels as well when you're developing it.

One more question. Make it good.

A SPEAKER: Okay. In one of the big --

DOMINIC GIAMPAOLO: Oh, sorry, I was actually pointing to the guy behind you.

A SPEAKER: What are your plans for CDs and DVD drives?

DOMINIC GIAMPAOLO: CD ROMs? ISO 9660? It's just a matter of time to get a file code, source code that exists on the Net and make it available, some sample source code. ISO 9660 is pretty easy because it's read only and it doesn't have to do any of the indexing or actually be inserting that stuff. We just didn't have time for this release.

A SPEAKER: DVD drives?

DOMINIC GIAMPAOLO: Steve Sakoman will probably talk a little bit more about that. He's actually doing some stuff right now, I think, with DVD drives. Again, the architecture is there and we encourage people to work on it. We'll work with you, we'll give you sample source code, we'll help you out and we want to see this done so...

Just one other question.

A SPEAKER: One of the big problems with the Mac, with the resource fork is that you couldn't download it and so on because the Internet format supported it from web pages and so on, but with BeOS we never had that problem because of the resource file executable. Did that change for DR9?

CYRIL MEURILLON:: No, it has not.

DOMINIC GIAMPAOLO: Executables are still the same way.

A SPEAKER: You don't have that same kind of problem, you can still do self-extracting from it?

DOMINIC GIAMPAOLO: Four or five had extended attributes. For applications this is true. For files that have extended attributes you would have to have an archive format that supported it, which is something that we're actively working on. Partly developed in joint with the Be development community on Bedevtalk, John Watte. And personally I want to see that pushed because it's a useful thing and it's necessary to be able to portably move files around and not lose information.

All right. Thanks a lot.

Transcription provided by:

160 West Santa Clara St.
San Jose, California
408.280.1252

Copyright ©1997 Be, Inc. Be is a registered trademark, and BeOS, BeBox, BeWare, GeekPort, the Be logo and the BeOS logo are trademarks of Be, Inc. All other trademarks mentioned are the property of their respective owners.
Comments about this site? Please write us at webmaster@be.com.
Icons used herein are the property of Be Inc. All rights reserved.