Unicode and International Issues

Proceedings of the May Be Developer Conference

Unicode and International Issues

HIROSHI LOCKHEIMER: I'd like to start this. We are going to be talking about Unicode, UTF-8 and BeOS. I'm Hiroshi, this is Pierre, he did a lot of the server work, he did the font engine, I'm on the kit level, so I do a lot of the APIs.

So first I'd like to talk about Unicode. I am going to assume that you know Unicode but I would like to just go over the basics with you.

Unicode, obviously they don't call it Unicode for one reason, one code for all languages, it's an international character encoding standard, so this is in sharp contrast to, you know, if you are a Mac developer you know WorldScript, where you use shift jis for Japanese and other encoding for each country, so you have to learn different coding per country or per script, as they call it, here with Unicode you use one encoding, it's fixed width, it means it 16 bits, basically, just as you use characters, you assume that one bit is one character in the old world. Here you assume 16 bits is one character. And it includes characters for the major scripts of the world, Japanese, obviously; Chinese, Korean, English.

There are a lot of weird characters, OCR characters, anything, I have heard that Klingon is included, though I haven't seen it in the specs.

There aren't any escape sequences or control codes, so you are always, in Unicode you are always in Unicode, you don't have detects, it depends how you want to define escape sequences, there is one character they are talking about in the Unicode standard, but for all practical purposes, there are no escape sequences. Like in jis, you want to look for certain things. Here you do not.

Next slide. Code space. As I said, it's 16 bits, that means there are more than 65,000 code positions, 65,000 unique characters you can have. Roughly 20,000 are unassigned. I don't think we will be assigning any of those, the Unicode consortium will be assigning any of those pretty soon. But if we were to run out of space, you can have one million additional characters by using two Unicode characters as one character. In other words, that's 32 bits to describe one character, that's called the Surrogate Extension Mechanism. We support that. Although the Surrogate Extension Mechanism, I haven't seen it been used yet, obviously because the 65,000 code area is still unassigned there.

Now, I'd like to talk about UTF-8. UTF-8, that's what we do in the BeOS, this is Unicode, it's a transformation of Unicode into a multi-byte format. I will let Pierre explain what UTF-8 is.

PIERRE RAYNAUD-RICHARD:: As you know, if you have been trained to use any other encoding, it's usually pretty bad, you have to test a lot of special code and so on and so on. And the problem often is that there is different encoding, so you have to support all the different implementation just to process task, be able to find characters, be able to identify all the special characters you want. By moving to Unicode we are solving that. But at the same time we introduce something that seems to be not very good, 16-bit characters encoding. The problem with that, it breaks compatibility with ASCII, so you just can't look for 0 as a terminator for the string, because 0 can be just one of the two bytes of a 16-bit character. And also the API has to be different because it's not a char* stream, it's a wchar or what do we use usually, yes, wchar stream, which is completely different. You will get that same problem with Windows NT where all the API have been duplicate there is one for 8 bit, there is one for 16-bit.

The other solution is to use an UTF-8, which is sort of char*, which is a very fast translation of the 16-bit encoding, some sort of variable length encoding. We are going from one byte to three bytes and potentialy four bytes, when using surrogates.

So the first advantage of UTF-8, it is fully 7-bit ASCII compatible. So all the standard characters you are looking for, especially the terminator of a C string, are completely protected. There cannot be a fake one as part of a two bytes character, one of the extensions of Unicode. So if you don't care about implementing an extended character, there is absolutely no change compared to the old ASCII encoding. The extended characters will just be mystical meaningless values, more complicated than before, but as you don't care about processing them, it will just be the same.

So to go into more details, you can see here in the UTF-8 encoding, you see all the -- only this one, the 7-bit ASCII use 0 as the first bit of the byte, and all the other possible bytes used for anything else is using a 1. So there is no possible confusion between ASCII 7-bit and any extended version of UTF-8. And this is also a very nice code, as you can see, any extension byte beginning by 10, and any first byte of a multiple byte character begins by -- described by this number 1, the number of bytes, two ones, three ones or four ones, so it's very easy to go through a char* and synchronize again, find the first character. If you don't need it, you just look for a character smaller than 128 and it's just ASCII.

So, Hiroshi?

HIROSHI LOCKHEIMER: This might seem a little complicated at first, but really I have found that you don't really need to worry about that. We use the same type, as Pierre mentioned, we use characters, character pointers, the one major change from DR8 into DR9 in the API, I think really this was about the only change, was if you key down, this is where you receive key down messages in the view. It used to take in an ulong, that doesn't really apply anymore, so we take in a pointer to an array of characters, basically that is going to be up to four bytes, the nonbytes, you know, the ulong there will tell you how many bytes are in that pointer.

So it usually, if you are just typing Roman characters, for example, you only get one byte. You really don't have to worry about looking at these bits. In fact I converted the entire sort of BeOS source tree into the new key down. Except for the text engine, obviously, where I do have to worry about these things, there was nowhere else where I cared about the other bytes.

So this is what you probably will be doing quite often, unless you are writing a text engine. So you are just interested in the tab character. You just look at the first item, the first index, the bytes that's passed to you, just look, if it's a tab character then you do whatever you have to do. Of course you could do the same thing for arrow keys, up arrow, down arrow, whatever, you just add another case in here for those cases. If you don't want to handle it, you just pass the whole thing back to the inherited.

Now, you might be wondering how, for example, if you have a menu item, a lot of times you want to use the ellipsis character for open dot dot dot. That's the ellipsis character that does those three periods. You can of course type it in as three periods, but we recommend you use the ellipsis characters, as they look nicer. In UTF-8 the ellipsis is three bytes, in Interface Defs.h, which is a header file on the CDs that you have, we have defined a constant here, which is the UTF-8 sequence for the ellipsis character.

So if you are going to create a menu item with an ellipsis character, let's say you have an open menu item, you just simply have to type in your string, open, and have the compiler concatenate these bytes with the open, so the C compiler concatenates two strings, this is a compiler feature, and this works.

So you just create a new BMenu item with kOpenText, and that will be open with an ellipsis character right after that.

Now, Pierre will talk about something else, fonts.

PIERRE RAYNAUD-RICHARD:: So, we support only fonts that can be mapped to Unicode. So the first format we support is TrueType, using the Unicode cmap 3 1, for people who know more details about font files.

And in the next release we will support Type 1 fonts. We also have been supporting bitmap font file, it was used everywhere in the interface and it was pretty bad, it was not compatible at all with Unicode. So that part, that bitmap support is completely gone. We have working on improving the readibility of small point size for scalable fonts, but we will also support one bitmap format in the preview release, that will allow us, and aso you later, to hand-tune specific point size of specific fonts.

The idea is, we want to limit problems of compatibility of metrics for printing and all that stuff. And it's pretty damn difficult if you just use basic bitmap font file, and it's especially difficult to find Unicode in bitmap form, so what we are proposing to do is for small fonts, as we have a limited number of glyphs, you just choose your Truetype file and generate the font in the size you want and then you can use an editor specifically to hand-tune that size in our format, which is done to ensure compatibility of metrics when you change the size on the screen or when printing.

And basically that's all on fonts size.

HIROSHI LOCKHEIMER: Okay. The kick level there is a BFont object, this is new in DR9, that describes a font. We have three globals for that, the plain, bold and fixed. These fonts are globalled to your application, each application has a copy of this. And this describes a font of a certain size, certain characteristics, share, no share in this case, usually, unless the user wants a shared font as their default. And these, we encourage you to use these instead of hard coding font names in your source. Hard coding fonts names is a bad thing, you don't want to do that, that locks people into certain languages. If you hard code, for example, courier, for whatever use you need, whatever you need that for, courier might not have Japanese characters in it. And if there is a Japanese user of that software and they want to use Japanese, well, it's not going to work.

So if you use Be fixed fonts the user will be in control of what font they will be using and you don't have to care what the name of that font is. So it's a nice abstraction there. I think I can give you a quick demo of that.

There is a new preference map called font panel. These three items here, plain font, bold font, and fixed font correspond to Be plain, Be bold and Be fixed font. The user can set it, right now I have it set to a NowGothic, which is a Japanese font. But the user can set it and these will affect how your app will run. So let's set this, well, I'll show you. And since the plain and bold fonts are NowGothic, it is using NowGothic for its window title, its menus, basically all of our interface elements except for, actually -- okay. Let's go into detail here.

Window title, they use Be bold font. So I'll quit this. This is per app, so you can't change it while your app is running. You don't have to worry about the fonts changing on you while your app is running.

Let's change this to Baskerville, for example. Apply it. And you notice that the window title is now in Baskerville, also the menu here, which we have another map, this kind of gets confusing here, but the menu has its own, people like certain menu looks which might not correspond, so we have another sort of item for menus where you can set the font. But basically you can change the font, the user can change the font quite easily.

Let's go back to NowGothic before I show you Japanese right here. So we are back to NowGothic, and I will show you, this app uses the standard text engine object that's called BTextView, so this is a BTextView in here right now. BTextView is UTF-8 aware, that means you can open a UTF-8 document with Kanji characters, Hebrew, whatever. Here is a Kanji document right here. As you can see, this is not English. Now, I opened it with the wrong app. Don't do that. I will open it from here. Short way of doing things.

So this is just the English version of the application, and still, since you are using BTextView, it can display non-English characters. Your app doesn't have to be, you know, a Japanized version of itself in order to take advantage of this. Now, as I showed you inadvertently here, I open it with Sokyo Tubway, which is localized version of style, don't ask me what Sokyo Tubway is, I'm not going to tell you.

Here you have menu items that have been localized into Japanese. If you were in the general session yesterday, you have seen this already. It works just like the English version, except it's in Japanese. All our interface elements, buttons, check boxes, radio buttons, everything, has been UTF-8, converted to UTF-8, they can handle UTF-8. I will show you some buttons here, alerts, everything is in Japanese.

Now this says cancel, so I will cancel. Trust me. Okay, now I heard someone asking a question just a while ago. How do you input Kanji? For example -- kanji is the Japanese sort of alphabet. How do you input it with a keyboard that only has a few hundred or so buttons on it? Kanji, there are about 10,000 characters normally in a Kanji font. Now you don't want a keyboard that has 10,000 buttons on it, that's generally a bad idea. So what you use is what's called an input method. Now input method, what it does is converts input from the keyboard into some other form, in this case, basically it maps it to characters that are not defined in the keyboard.

So I will show you, in the case of Japanese, you input things phonetically. I will open a new document here so you can see what's going on. Notice that the window tab here is also in Kanji. That's something I didn't show yesterday.

So in Japanese you input things phonetically. The word in Japanese for Japanese is Nihongo, you can't see this probably, that's N-i-h-o, whoops. You press -- this is implementation specific. But in the case of this input method, which I sort of whipped up over the weekend for this demo, is press the space key to convert it to Kanji. Now you can see, one, Nihongo, the sound Nihongo might map to many different representations, different glyphs. So this is Katakana. There are three alphabets in Japanese, this is Katakana, this is Hiragana, this is Kanji. They all say Nihongo, they all say the same thing. Usually you want this. You press enter and that enters it. I want that to be bigger. So it enters it over here. This is what's called bottom line input.

The more advanced input methods can actually allow you input in the document over here. Unfortunately I ran out of time this weekend to do that. But that is something definitely that we will need. And, if you use BTextView, I'm going to make BTextView not for this release probably, but I will make it input method aware, so if you use BTextView you can take advantage of this without really knowing what's going on. It will do it for you. So that's BTextView.

So you must be wondering, or you should be wondering, how I got these characters for the menus. Well, as Greg Galanos was saying, the DID now handles Unicode, so we will make a Japanese version of Hello World, so where is it. That's not the file, Hello View. First I will show you this file, it will be easier. You can change the font, this is the Japanese font again, NowGothic, they are all NowGothic, so all of what you enter can be in Japanese. As you can see, we have Japanese comments and you actually just, the kit handles UTF-8 and this is in UTF-8 right here, so you just have to say draw string. With a UTF-8 the draw string doesn't care if it's Roman or whatever, it just cares it's UTF-8, it just says draw string Hello World, and that's it. You just compile it and there you go.

Now, you notice here the window title, same thing with window titles, you just pass, again, UTF-8 string, which happens to be Japanese in this case, into the constructor of BWindow and it will do the right thing for you.

A SPEAKER: Are there input methods in the DID?

HIROSHI LOCKHEIMER: No, input methods should be a system level thing. Currently in this release I haven't had time to make the API for that. We will be working on that, obviously. Once it's a system level thing, applications that handle Unicode will be able to handle through those input methods transparently characters. So for this demo, I guess I will reveal my secrets, what I did was -- yes, copy, paste or drag and drop, same thing. General idea, you got it.

Anything else to show? Pierre, do you think -- is that it?

A SPEAKER: Question. Wouldn't it be possible to put in an interface where the user can, if he wants to define his own IME? In other words, define his own input method if the user should want to.

HIROSHI LOCKHEIMER: Yes, we will be putting that -- I think -- are you a Macintosh developer, is that the keyboard menu you are talking about on the right top?

A SPEAKER: I'm a cross-platform developer.

HIROSHI LOCKHEIMER: Have you developed for the Mac? I don't know if I understand your question.

A SPEAKER: In other words, what you are saying, you can be building an IME in at the system level.

HIROSHI LOCKHEIMER: An IME API right.

A SPEAKER: An IME API. My question is, do I have to provide an option so that a user can specify the user's own IME? Because that would be very useful not only for Japanese, it would be useful for other scripts as well.

HIROSHI LOCKHEIMER: In other words, to be able to switch at run time, the user to be able to switch?

A SPEAKER: Yes, if there is a strange alphabet, that one little country in Africa has, this capability would allow an input method to be defined for that particular group of users.

HIROSHI LOCKHEIMER: Right, that's the plan. If you want, you can have a Japanese input method with a Pig Latin input method if you wanted at the same time, the user could select which one they wanted to --

A SPEAKER: I'm saying the user could specify the input method, not only select it, but actually define it.

HIROSHI LOCKHEIMER: Define the input method?

A SPEAKER: Yes.

HIROSHI LOCKHEIMER: I'm sorry?

A SPEAKER: The capability so a user can create a custom input method.

HIROSHI LOCKHEIMER: A lot of time it's called programming. If someone who wants to, you know, write that sort of input method that can parse a text file or whatever, or put up a dialer box, that's great. You know, that will be on top of the input method API that I provided. So sure, if someone wants to write that, if you want to write that, that would work.

A SPEAKER: I think that should be part of the operating system.

HIROSHI LOCKHEIMER: Well, I think that the system should handle the API, we take a pretty minimalist approach in our kits. I think the system should provide the necessary tools for people who want to write these things to do it themselves. So if you want to write that, that would work.

A SPEAKER: Perhaps some sort of a system preferences for input editors or something like that, if it isn't an invented one where you are simply typing in line.

HIROSHI LOCKHEIMER: Right.

A SPEAKER: You have a system font so you can have a system input method --

HIROSHI LOCKHEIMER: Right. That is something we could do, definitely.

Another question? Over there.

A SPEAKER: Yes, just a quick question. Will your text editor up here be publicly available any time soon so I can stop using my Hiragata TrueType fonts?

HIROSHI LOCKHEIMER: This text editor --

A SPEAKER: I'm trying to build a Japanese savvy application using TrueType fonts, and trying to get their ASCII codes is driving me nuts. If I could just copy and paste, it would save me a lot of trouble. You already said the ID doesn't support any input methods.

PIERRE RAYNAUD-RICHARD:: So you are saying you want this input method.

A SPEAKER: Right.

HIROSHI LOCKHEIMER: You are talking about Sokyo Tubway, in other words.

A SPEAKER: Yes.

HIROSHI LOCKHEIMER: Okay. You know, I'm not really sure. There have been public domain -- in fact, this input method I'm using here, I didn't whip up an input method, I just ported one. This is Kata. It was quite trivial to port. In fact there has been a Japanese developer who has ported Kata already, he has a bunch of apps that are implemented, I'm sure once he gets to DR9 he will port it over to DR9, and you can use his text editor. This is a quick, I'm not really comfortable with giving this out.

A SPEAKER: Fair enough.

A SPEAKER: How about support for things like Furigana and typing from top to bottom or left to right?

HIROSHI LOCKHEIMER: Support for things like Furigana?

A SPEAKER: And typing -- having your lines go vertically or right to left.

HIROSHI LOCKHEIMER: Okay, Pierre will handle half of the question. Furigana actually handles the right to left part right now.

PIERRE RAYNAUD-RICHARD:: So what are we doing for that?

HIROSHI LOCKHEIMER: The BFont object.

PIERRE RAYNAUD-RICHARD:: Using the BFont object you can get information to if the font you are using want to be drawn one way or another. And then when calling drawstring, the drawing will be done in the right direction. But between different drawstring, you are responsible to move the pen in the right direction. We just always move the pen from left to right. So that is what we will do in the API. Currently this is not implemented in DR9 but even later we don't plan to support much more than that.

HIROSHI LOCKHEIMER: For Furigana, I don't know, I'm not really sure, if that's something we want to provide, conceptually you could do it with a small, you know, imagine this as Furigana just small and draw on top of it yourself. I don't think we want to support that right now.

A SPEAKER: Text objects which imbed Furigana in the background, Ruby it's called?

HIROSHI LOCKHEIMER: Right. We haven't implemented any of that.

A SPEAKER: Nobody has.

A SPEAKER: Getting back to some of the stuff you were talking about at the beginning, the actual Unicode, is there any plans for an API to take 27 Latin and ISO 85, convert APIs from where you can pass in a character string and return a Unicode string?

HIROSHI LOCKHEIMER: The question was are we planning on providing an API that converts different encodings into UTF-8. Is that a fair rendition?

A SPEAKER: The MIME and the other charts, the definitions for the ISO 85.

HIROSHI LOCKHEIMER: So you are saying that encoding will be part of MIME and looking at that MIME will be converted to UTF-8. I don't know if we will go that high level. The plan is to provide an API for conversion utilities, in fact there is no API yet, but I will be working on that, there is a -- where is terminal, I did put in a quick tool, bin tool call XTOU which converts X encoding either MacIntosh or Roman or one of these into UTF-8. Obviously we will move this over into API.

A SPEAKER: Fantastic.

A SPEAKER: Overall system nationalization, you are shipping something to Germany, you would suddenly like everything to be in German on the system. Are you looking at some sort of resources or some way where text for everything can be localized and you just plug in the appropriate menu items for that language, or does everybody have to roll their own?

PIERRE RAYNAUD-RICHARD:: The question was are we looking at resources for localization.

A SPEAKER: Or something similar.

HIROSHI LOCKHEIMER: Or something similar, right. BeView has a new archiving scheme to it which basically you can -- you tell it to archive itself, it archives itself into a B message and B messages can be flattened.

So we already do have the foundations for resources. That will be probably our, you know, this is still not implemented, the archiving stuff is implemented in BeView, we still haven't formally adopted that as our localization scheme or resourcing scheme, but there is a pretty fair chance, since it's there already, that we will use that. We will still need to do a little, you know, preparation work for it, but definitely we need to support resources in that sense.

A SPEAKER: So do you have CID font encoded for TrueType fonts to support larger number of characters?

PIERRE RAYNAUD-RICHARD:: The question was CID, in other words do we do CID?

A SPEAKER: Are you doing CID encoded fonts?

PIERRE RAYNAUD-RICHARD:: I don't think so. And we are not sure if we understand the question.

A SPEAKER: It's the high character count TrueType fonts. It's basically to get a TrueType font that has more than like 100 characters, you have to either go to some specialized encoding format or you have to --

PIERRE RAYNAUD-RICHARD:: The only --

A SPEAKER: It's a CID map.

PIERRE RAYNAUD-RICHARD:: The only cmap we support is a Unicode cmap and nothing else, so we don't support it.

A SPEAKER: Type 1, a cmap --

A SPEAKER: I think it's basically the same thing.

A SPEAKER: They support type 1.

A SPEAKER: Close.

PIERRE RAYNAUD-RICHARD:: I am not clear on this point. I don't know exactly that encoding. So I don't know if it can be supported. Sorry.

A SPEAKER: This is in relation to a previous question on localization. Would localizing applications for German or for Japanese, whatever, have you figured translating the strings, when the strings are hard coded are very hard to -- have you thought about a distraction, possibly?

HIROSHI LOCKHEIMER: Yes, that is the resource for what we are talking about. I don't want to say this, I wanted to avoid saying this right now because we don't do it ourselves, but -- so don't do what we do, but do what we say, so don't, you know, once we have a resource system, obviously we would want people to not hard code strings.

A SPEAKER: Right.

A SPEAKER: Do you anti-alias all the fonts, even small point Japanese fonts on the screen?

PIERRE RAYNAUD-RICHARD:: That's a good question. As you know, Kanji are so complicated there is no way to just use and reduce it and get something readable because it can be nine lines, one over the other, and with the white space you will need at least 18 points. And you will never get it readable if you are smaller than 18 points. So it's not a simple thing todo, basically removing lines to get them readable.

So as I told you, we have a bitmap format, we will be working on some tool to integrate bitmap Kanji, we will probably take black and white bitmap and convert them to our own format. When that will be ready, we will release it with the source for people who want to do the same thing.

A SPEAKER: What's the native one byte encoding on the BeOS?

HIROSHI LOCKHEIMER: The default encoding of BeOS is UTF-8.

A SPEAKER: So that means you can't get type graphic quotes -- well, no -- okay.

HIROSHI LOCKHEIMER: They are just more than one byte. There is, I didn't really want to say this because I'm not really a fan of this feature, but we do have -- you can change the encoding on a view programmatically, it's not user settable, but programmatically you can set the encoding of a font to something other than UTF-8, but we strongly encourage you to use UTF-8 because all our interface elements, for example, use UTF-8 and it would be good if it used that in the back.

A SPEAKER: If you have a private font, one of the problems I found in DR8 you try to -- and it -- the problem was the character 31 -- using UTF-8 can you move that font?

PIERRE RAYNAUD-RICHARD:: I couldn't hear most of the question, sorry.

A SPEAKER: All right. We have our own private font that encoding is font specific. Do you -- and under DR8.2 what I tried to do was lie and say it was ISO Latin 1, basically, but unfortunately the characters 0 to 32 aren't allowed to be drawn.

PIERRE RAYNAUD-RICHARD:: So the first problem I can see with that -- can you repeat it?

HIROSHI LOCKHEIMER: The question is you have a private font, and I assume the characters that you wanted to draw were in the 0 to 31 or whatever range and in DR8 that wasn't being drawn approximate properly; is that correct? And you wondering if that happens with DR9.

A SPEAKER: Right.

PIERRE RAYNAUD-RICHARD:: The way we are handling the encoding, now we are using for everything the Unicode cmap and only that, so if your private font does not support a Unicode cmap, you will not be allowed to do anything at all. You have to support the Unicode cmap and then we support any character in the Unicode cmap, so the only parameter is to map the character in the Unicode cmap, even the wrong character, I think any value are supported properly, I think so.

A SPEAKER: So then can you also comment on the font -- on how fonts caching is done, it's kind of a black box. If you are using an international document, say Japanese and our other languages type faces, you know, the fonts aren't properly cached.

HIROSHI LOCKHEIMER: So basically the question is are fonts cached?

A SPEAKER: How are they cached?

HIROSHI LOCKHEIMER: How are they cached?

PIERRE RAYNAUD-RICHARD:: What do you mean exactly, how are they cached?

A SPEAKER: How do you decide which fonts to keep around? Currently basically you set the font, draw the font --

PIERRE RAYNAUD-RICHARD:: So --

A SPEAKER: If you have one view that is drawn with multiple fonts, then, you know, the performance issues really like to be able to keep -- it's really better for the application, which fonts to keep around because it has a better idea than the fonts -- font, system font mechanisms decided which fonts are useful.

PIERRE RAYNAUD-RICHARD:: So the first thing, the font cache. There is two API to control the settings of the font cache, one for the user, one for applications. The only parameter that the user can control directly is the global memory allocation of the font cache, for example to go from a roman setting, 256 Ko, to a Kanji setting, like 2 Mb, or if you just have a lot of memory and want to handle a lot of different fonts at the same time. After that each application has a call to handle more advanced settings, so you can set the priority at which we want to cache font bigger than a specific size threshold, you can define the threshold size itself and you can also define specific priorities for rotated fonts. And I think there is another parameter.

Basically the three priorities are normal, low and null. And after that, any application can change the setting, if you run an application that needs to use a big font cache to be running at all, you can ask to the overwrite user setting, just as if you had been using that memory for your application. Your application has the right to ask for as much memory as it can use. But you can have the cache using it for you, so you will boost your setting and as long as your application is alive and running the font cache will use a bigger setting.

And as for something even more flexible, there is nothing implement for now but the API is clearly opened. If you have ideas about some sort of advanced area that would improve the way the font cache decide to cache or trash font, I would be real interested to talk in private with you later if you want.

A SPEAKER: Translation methods, you use UTF-8 internally, which is fine, but if I bring up a web browser or e-mail program, it's probably ISO Latin 1, windows, drawing lifts or Mac lifts or something, is there a handy method I can say this is Latin 1, make it reasonable internal UTF-8?

HIROSHI LOCKHEIMER: Right. As I explained earlier, it's not there yet, but we will be adding that in the API.

A SPEAKER: It will probably go in Be fonts or something?

HIROSHI LOCKHEIMER: I'm not sure.

A SPEAKER: Two questions. First is sort of related to that one, that is once that support is there, will Net Positive support and display Japanese pages?

HIROSHI LOCKHEIMER: Will Net Positive support Japanese pages?

A SPEAKER: Yes.

HIROSHI LOCKHEIMER:: That's a question we should ask Peter Barrett, who is an author of Net Positive. I've been pushing for that but there are other things we obviously have to work on.

A SPEAKER: Once the input method API is completed, do you intend to ship the system with a bare bones input method so it will be useable for Japanese?

HIROSHI LOCKHEIMER: So once we have input methods API will we ship bare bones API input method with the system? Not sure, that's more of a marketing decision than an engineering one, I'm not qualified to answer.

Yes?

A SPEAKER: I have a specific question about this feature that you hate. So I have an application on DR8 that allows the user to, on a per view basis, change the mapping between a 16-bit value and the glyph. And of course this is very hard to do under DR8, since it didn't support 16-bit characters at all. Is it -- under DR9 is this very easy, can I throw all the code away and just make it work?

HIROSHI LOCKHEIMER: The question was you have some mapping that you do on a per view basis for character codes, you were wondering if the new feature in DR9 will allow you to throw away the code.

PIERRE RAYNAUD-RICHARD:: I think the answer is no. UTF-8, the only API for any system call, any window type call, is UTF-8. The only exception is a subset that you can use to draw in a view and to get your key down event in a view. But that's limited, so I don't think that can fit your problem.

A SPEAKER: Are input methods always going to be in a separate window? There is simple cases like extended sets which could be built in the text view, so --

HIROSHI LOCKHEIMER: So are simple input methods going to be in a separate window? No. That was just a quick demo to show people what input methods are. But definitely we will have to do inline input for Kanji as well as simple things like accented characters, definitely. That will be part of the API as well, you know.

A SPEAKER: Will you support vertical writing?

HIROSHI LOCKHEIMER: Excuse me?

A SPEAKER: Vertical writing.

HIROSHI LOCKHEIMER: We will support vertical draw string?

PIERRE RAYNAUD-RICHARD:: I think the answer is no.

A SPEAKER: Will not?

PIERRE RAYNAUD-RICHARD:: For now we are not planning to support that.

A SPEAKER: One more question, then. It's not related to Unicode, but is all that in gray scale always?

PIERRE RAYNAUD-RICHARD:: Yes. As we are still working on the font engine, to improve the readability of characters using anti-aliasing, it's possible we will still be working on the engine trying to improve the quality, and one option we would like to add is some way of controlling the anti-aliasing, be able to say I want it real strong or I want to disable it or to do something intermediate. Or say that for small size, smaller than 15 point size I don't want anti-aliasing, but it's not supported in the current font engine. That can be supported later but not for now. It's a technical program on the font engine side.

A SPEAKER: I heard some rumors that bit map fonts weren't supported in DR9.

Rumors that bit font maps aren't supported in DR9?

PIERRE RAYNAUD-RICHARD:: As I said a little before, all the bitmap fonts are completely gone. And to reduce any problem, getting around the metric problem when you want to change size when you want to print or anything, we want to use only fonts for which we have scalable fonts equivalent, that we want people to -- we will provide tools to import fonts that are provided in bit scale and scalable format, but we will not support bitmap only fonts anymore.

HIROSHI LOCKHEIMER: Anything else?

A SPEAKER: There are alternate --

HIROSHI LOCKHEIMER: I can't hear you.

A SPEAKER: There are alternate forms of numbers, will there be methods to recognize alternative --

HIROSHI LOCKHEIMER: Alternate forms of numbers, you mean how the commas is used to separate decimals, is that how it works, Pierre? I don't know.

A SPEAKER: Is there a multi-byte number that's different than a single byte?

HIROSHI LOCKHEIMER: Multi-byte numbers are different from single byte numbers? I don't know.

A SPEAKER: For example, if you want a string.

HIROSHI LOCKHEIMER: I'm not understanding the question.

A SPEAKER: Kanji glyphs representing Roman numbers perhaps.

HIROSHI LOCKHEIMER: Do you mean Kanji glyphs that represent numbers?

A SPEAKER: Yes.

HIROSHI LOCKHEIMER: I'm not sure about that. I don't know if I want to put that into the API. Not sure.

Yes?

A SPEAKER: Where can I get the document written for your input method API? Do you think it's easy to --

HIROSHI LOCKHEIMER: Where can you get the input method API and --

A SPEAKER: Information for the input method.

HIROSHI LOCKHEIMER: Information for the input method, where can you get that? It's not there yet. Once, you know, we actually start working full time on it, you know, right now there is no information, so you can't get it, basically.

A SPEAKER: And do you think it's easy to support TSM --

HIROSHI LOCKHEIMER: That depends on how we shape our input method API, obviously. I'm not, you know, I have never written an input method, I am familiar with TSM but I have never written one, I can't answer that.

A SPEAKER:SPEAKER: Do you have any plan to support wide character in API instead of the UTF-8 --

HIROSHI LOCKHEIMER: The what character?

A SPEAKER: Wide character.

A SPEAKER: Wide.

HIROSHI LOCKHEIMER: Oh, the type wide character, you mean. No, I don't think so.

PIERRE RAYNAUD-RICHARD:: No.

HIROSHI LOCKHEIMER: That was the whole point of UTF-8, we can use the same old character pointers instead of different types. So I don't think we will support that.

A SPEAKER: Will a POSIX library pass UTF-8 through a change, et cetera --

HIROSHI LOCKHEIMER: That's the advantage of UTF-8, since the null character is the null character, they work. Now, there are, you know, little details like string length, it counts the number of bytes instead of the number of characters.

A SPEAKER: You mean UTF-8 string?

HIROSHI LOCKHEIMER:IROSHI LOCKHEIMER: That probably will not be POSIX.

A SPEAKER: I was wondering, if you print an terminal application, will Kanji --

HIROSHI LOCKHEIMER: It should but Terminal hasn't been revised to handle UTF-8 yet, but I have been talking to Rico, the author of Terminal and he wants to do UTF-8 at some point.

A SPEAKER: How about organization support for number, time zone settings?

HIROSHI LOCKHEIMER: So date formats, time formats, how far? Do we support them right now? The API for that isn't in there. I'm not sure, you know, there seem to be some locale model API in the standard and in POSIX, apparently. I haven't really looked into that yet, so I'm not sure. But that is something we will have to think about.

PIERRE RAYNAUD-RICHARD:: It's time.

HIROSHI LOCKHEIMER: One last question, I guess.

A SPEAKER: Can you have a local font for an application?

HIROSHI LOCKHEIMER: Can you have a local font for an application?

A SPEAKER: An application supplied font.

HIROSHI LOCKHEIMER: Do you mean embedded into the application?

A SPEAKER: Right. Or something like not system installed. Not use your font render to render my font.

PIERRE RAYNAUD-RICHARD:: That's several questions. We are beginning to implement something that will be much more flexible than what was in DR8. We are just beginning currently right now to write the API to do that. But we will certainly provide something later.

A SPEAKER: So you do want to go that direction?

PIERRE RAYNAUD-RICHARD:: Excuse me?

A SPEAKER: I mean what I understand of how printing works, because you don't actually put the fonts into spools to print documents, that -- like I mean unless you hold on to the font, my application is gone.

PIERRE RAYNAUD-RICHARD:: I would say that if you try to be looking for problems, you will find problems.

A SPEAKER: I just want --

PIERRE RAYNAUD-RICHARD:: If you want the ability there, it will be there, it's reasonable. If you want to be using things, then removing them when you need them, I think it will effectively confuse printing.

Okay. I think that was the last question. It's time to break.

HIROSHI LOCKHEIMER: Thank you.

PIERRE RAYNAUD-RICHARD:: Thank you.

Transcription provided by:

160 West Santa Clara St.
San Jose, California
408.280.1252

Copyright ©1997 Be, Inc. Be is a registered trademark, and BeOS, BeBox, BeWare, GeekPort, the Be logo and the BeOS logo are trademarks of Be, Inc.
All other trademarks mentioned are the property of their respective owners.
Comments about this site? Please write us at webmaster@be.com.
Icons used herein are the property of Be Inc. All rights reserved.