|
 |


Table of Contents
BE ENGINEERING INSIGHTS: Using the New MALLOC_DEBUG
By Scott Bartasbarta@be.com
Like many BeOS developers, I experience daily frustration at
the lack of debugging tools on the BeOS. Unlike most
developers, though, I'm in a position to do more than just
rant (which still occasionally happens), so I've done some
work on MALLOC_DEBUG to try to help developers fix some of
the types of memory problems I've seen during NetPositive
development.
This code will be built into BeOS R4, but you can use it now
if you download the R3.1 debugging libroot from
ftp://ftp.be.com/pub/experimental/tools/libroot_debug.zip.
That archive also contains a more detailed description of
MALLOC_DEBUG and how to use it; you should look it up if
you'd like to take full advantage of MALLOC_DEBUG.
If you want a primer on the original MALLOC_DEBUG mechanism,
see Dominic's article at
http://www.be.com/aboutbe/benewsletter/Issue83.html. To
summarize, MALLOC_DEBUG works by hooking into the C
library's malloc() and free() calls to catch some common
memory violations: reading from uninitialized blocks,
reading blocks after they've been freed, freeing blocks
twice, or writing off the boundary of a block. Provided that
you don't override new and delete and do your own
suballocation, MALLOC_DEBUG works for both malloc-allocated
blocks and C++ class instances.
MALLOC_DEBUG trashes the block with garbage after it's been
initialized, to make sure your program doesn't depend on the
block being in a certain initial state. It trashes it again
when it's freed so you don't depend on the data still being
there afterward. It trashes the block with odd-numbered
values so your application faults immediately if you try to
dereference a pointer within the block. It adds padding
before and after the block and checks the padding when the
block is freed to make sure you haven't written off the end
of the block.
You turn on MALLOC_DEBUG by setting some environment
variables before starting your program. If your application
doesn't do anything illegal with memory, MALLOC_DEBUG won't
adversely affect your program's operation, other than to
slow it down slightly and use a bit more memory to store its
extra information.
What's New
The best new feature is that MALLOC_DEBUG now records in
every block the last seven levels of the call stack where
the block allocation took place. When the old MALLOC_DEBUG
detected an error, it just told you the error type and gave
you the block's address. If the block's identity wasn't
obvious from its content, it was difficult to figure out
what the problem was. Now, you can find out how the block
was allocated and immediately see what it is.
If your program trips up MALLOC_DEBUG, it prints out the
call stack in the debugger message shown when the debugger
is invoked. (This information is no longer printed to stdout
but appears in the debugger instead.) The call stack
consists of seven return addresses; convert them into
symbolic names using the wh command in the debugger.
Debugging Levels
The new MALLOC_DEBUG mechanism adds levels of debugging
instead of the previous simple on/off switch. The debugging
level sets how strict you want it to be and how much runtime
overhead you're willing to incur. You can assign a debugging
level from 1 to 10 through the MALLOC_DEBUG environment
variable. This is the same environment variable you used
before to turn on the old MALLOC_DEBUG; if you set
MALLOC_DEBUG=true as you did before, it defines the
debugging level to its lowest value, 1, which gives you
equivalent functionality.
Right now, only three levels of debugging are defined: 1, 5,
and 10, leaving room to add future features. Level 1 is
equivalent to the old MALLOC_DEBUG mechanism: it fills the
block with garbage upon allocation and after it is freed,
and checks to see if the block is freed twice, or its
boundaries are violated.
Level 5 does all the Level 1 checks, and adds an extra step
to do a better job of catching blocks after they're freed:
when you call free() on a block, the block is trashed and
placed on a "purgatory" list instead of being returned
immediately to the heap. The block stays on this list until
enough other blocks are freed; then it's pushed off of the
list and recycled. It does this to catch cases where your
program writes to or reads from a block after it has been
freed.
As an example, let's say there's a bug in your program where
you have an instance of a class. You maintain an old pointer
to it and write to it after it's been deleted (which is easy
to do in heavily threaded applications with poorly managed
object lifetimes). Sometimes, the memory the instance used
to occupy is free memory, and your error will likely go
undetected. Sometimes, though, the memory has been recycled
and is now occupied by another instance of the same class,
or a different class. An illegal memory write now affects
data in a different data structure. An illegal memory read
reads data from a different class instance. With no means of
detecting errors like these, you'll probably spend a lot of
time looking in the wrong place for the problem.
However, if the freed block is trashed, placed on a
purgatory list, and stays there awhile, it gives your
program ample opportunity to try to read from the block (and
see trashed data) or write to it. After some time, hopefully
after all the dangling pointers have gone away, the block
falls off the list, where it is checked to make sure you
haven't written to it, and then it is recycled.
You can set the size of the purgatory list through the
MALLOC_DEBUG_FREE_LIST_SIZE environment variable (the
default is 1000 blocks). The value is block based, not byte
based; if you have a high turnover rate of large blocks,
this chews up memory pretty fast. Adjust the value up or
down to determine the amount of time blocks spend in
purgatory and to tune memory usage during debugging.
Normally, MALLOC_DEBUG only performs its checks when blocks
are allocated, realloced, freed or when they drop off the
purgatory list and are recycled. This means that
MALLOC_DEBUG usually only catches a memory violation a long
time after it has occurred; in fact, if your program never
frees a block, it will never be checked at all.
You can prevent this by turning the debugging level all the
way up to 10. At the highest level, MALLOC_DEBUG performs
all the checks of lower levels. It also periodically checks
every currently allocated block and every purgatory block to
make sure that nothing illegal has happened. The
MALLOC_DEBUG_CHECK_FREQUENCY environment variable determines
how often this full check occurs; by default, it takes place
every 1000 calls to malloc()/realloc()/free(). (Individual
blocks are still always checked when freed or recycled, as
they were before, regardless of this setting.)
As you can imagine, this can be a pretty time-consuming
operation; with the period at 1000, the impact on
performance is small, but the latency between an illegal
operation and its detection is fairly large. If you're
having trouble tracking down where a problem happens, you
can crank this value down to something smaller, even all the
way down to 1, which performs a heap consistency check
*every* time, but is excruciatingly slow. When MALLOC_DEBUG
does detect the error, keep in mind that the memory call
where the error is detected may still be some distance away
(and possibly in a different thread) from the bug that
actually caused the problem; all you can hope to do is
minimize the distance.
One problem to be aware of is that the new MALLOC_DEBUG
exposes a bug in the Interface Kit of R3.x. You'll get a
"Block written to after being freed" exception on a BView
that has BScrollBars targeting it when you delete the window
that contains the view. This will be fixed in R4, but until
then, you can work around it by removing the scroll bars
>from the window and deleting them before deleting the
targeted BView.
That's all I have time for, because The Man is beckoning me
to crawl back in my cage and fix some bugs, so I can't tell
you about the values MALLOC_DEBUG trashes blocks with or
other interesting technical details. Look in the debugging
libroot archive for more information. I hope that
MALLOC_DEBUG helps you find some bugs. If you have ideas
about how we could make it better, let us know.
BE ENGINEERING INSIGHTS: Changes in the OpenGL World
By Jason Samsrjs@be.com
Ready for another article about 3D on the BeOS? The BeOS
Release 4 OpenGL implementation has been heavily modified
>from the previous R3.1 version. We've added support for
single-buffer rendering, reduced memory usage, and fixed
some bugs.
Single Buffering
Single buffering is perhaps the greatest improvement for R4.
OpenGL now uses the BDirectWindow protocol to provide single
buffering. It still works with regular BWindows, but at a
substantial performance penalty. To provide this
functionality, two new member functions have been added to
BGLView:
BGLView::DirectConnected( direct_buffer_info *info );
BGLView::EnableDirectMode( bool enabled );
DirectConnected must be called from the BDirectWindow hook
function with the same name. This keeps the BGLView in sync
with the current direct window information. It's as simple
as adding the following function to your code:
void
myDirectWindow::DirectConnected( direct_buffer_info *info )
{
if( m_glview )
m_glview->DirectConnected( info );
}
EnableDirectMode is present to allow your application to
enable and disable direct window drawing without having to
modify the direct_buffer_info information. By default,
direct mode is disabled.
Performance
Much effort is placed on having an OpenGL implementation
that performs well. Two factors limit OpenGL performance.
The first is the geometry processing (triangle) rate. This
is the rate at which incoming vertex data can be processed
and sent to the triangle, line, or point drawing hardware or
software. The performance of this portion is generally
independent of the size of the primitives sent to OpenGL. It
is primarily dependent on the number of these primitives and
other factors such as per vertex calculations like lighting
or texture coordinate generation.
The second factor is the fill rate -- the number of pixels
that can be drawn in a given period of time, usually a
second. This depends almost entirely on the state of the GL
pixel pipeline. For software rendering, disabling most of
the pipeline and rendering only flat, shaded triangles
generally gives the best performance. Smooth shading,
texturing, fogging, depth testing, stenciling, blending, and
alpha testing each reduce performance somewhat.
Most of the R4 effort has gone into geometry processing
optimizations. The processing speed should be greatly
increased from R3. The greatest improvement is in the
specular lighting code; specular lights should now have much
less impact on performance. Another big gain is in quick
clipping of primitives that are completely off screen.
We utilized some advantages of Intel processors and didn't
ignore the drawing code either. We now have a shiny new MMX
filler and some PII-specific depth testing code. For those
with other processors, don't worry -- OpenGL detects your
CPU and uses the right code. Those who've used our prior
OpenGL implementations may be wondering why most of the
effort went into the geometry portion and not the primitive
rendering code that takes most of the processor time. That
can be summed up in one word:
Hardware
While hardware support is not in R4, it's still on schedule
for R5. Our implementation of OpenGL now has the hooks to
support hardware acceleration. Continued incremental
improvements to the software engine will never approach the
performance provided by even a $50 3D-video card. All the
geometry improvements will become much more visible once
hardware acceleration is available. Can you say 200+ fps for
GLTeapot?
What can you expect once hardware acceleration is available?
Some existing OpenGL functions that were good for
performance will suddenly become very bad:
BGLView::CopyPixelsOut
BGLView::CopyPixelsIn
These functions will not be the ideal way to move data into
or out of a GLView. A better solution is to use glReadPixels
and glWritePixels that can be pipelined by the accelerator.
The CopyPixelsOut function forces a pipeline flush.
CopyPixelsIn may not force the flush but must push and pop
the entire pixel state to get the correct behavior. Because
your application knows the current GL state you can save and
restore only the needed portion of the state and call
glWritePixels.
BGLView::EmbeddedView
This function will always return NULL starting with R4. All
drawing in a GLView should be done with GL commands. Mixing
AppServer and GL is extremely bad for the performance of
both. This function is mostly used for displaying text in a
GLView. Below is an example of displaying text using only GL
commands.
Text in GL
One way to draw text in OpenGL is to create the font as a
texture and then draw it using standard GL quads. The
example below uses GL to draw the letter B. It uses the
app_server to create the font and GL to draw it.
int ObjectView::round( int in )
{
int tempCount = 0;
while( in > 7 )
{
in >>= 1;
tempCount ++;
}
return in << tempCount;
}
void ObjectView::makeFontMipmap( int maxSize, char c )
{
/* Get a fixed font */
BFont font( be_fixed_font );
int size = maxSize;
int level = 0;
float fontSize = maxSize;
font_height fh;
/* Calculate the largest font which will fit */
/* into the specified size */
do
{
fontSize /= 1.05;
font.SetSize( fontSize );
font.GetHeight( &fh );
} while ( fh.leading >= size );
float x = size / 4;
/* Round Y to ensure all but the last 3 mipmaps land on */
/* integer values */
float y = round( size - (fh.descent + fh.leading * 0.05) );
/* Reduce the size of the font until its fits the new */
/* location */
do
{
fontSize /= 1.05;
font.SetSize( fontSize );
font.GetHeight( &fh );
} while ( y < fh.ascent );
/* Create each mipmap for the font */
while( size >= 1 )
{
font.SetSize( fontSize );
makeFontLevel( size, level, &font, x, y, c );
size /= 2;
level ++;
x /= 2.0;
y /= 2.0;
fontSize /= 2;
}
}
void ObjectView::makeFontLevel( int size, int level,
BFont *font, float x, float y, char c )
{
/* Create a bounding rect for the bitmap */
BRect boundingRect( 0, 0, size-1, size-1 );
GLubyte *bits;
/* Create a gray scale bitmap to hold the font */
BBitmap bitmap( boundingRect, B_CMAP8, true, false );
/* Create an embedded view */
BView view( boundingRect, "Font view", B_FOLLOW_NONE, 0);
bitmap.Lock();
bitmap.AddChild( &view );
/* Set the background to bright white */
/* Could be done with app server call */
bits = (GLubyte *)bitmap.Bits();
for( int ct=0; ct<size*size; ct++ )
bits[ct] = 255;
/* Draw the character into the bitmap at the specified */
/* location */
view.SetFont( font );
view.DrawChar( c, BPoint( x, y ) );
view.Sync();
/* Invert the bitmap to make an intensity map where the */
/* text is intense and the background is not. */
for( int ct=0; ct<size*size; ct++ )
bits[ct] = 255 - bits[ct];
/* Load the intensity map into GL */
glTexImage2D( GL_TEXTURE_2D, level, GL_INTENSITY4, size,
size, 0, GL_LUMINANCE, GL_UNSIGNED_BYTE, bitmap.Bits() );
/* Clean up */
bitmap.RemoveChild( &view );
bitmap.Unlock();
}
void ObjectView::DrawFrame(bool noPause)
{
if( initCount < 1 )
return;
LockGL();
/* Enable texturing */
glEnable( GL_TEXTURE_2D );
/* Set texturing to clamp to prevent repeating the */
/* texture if invalid texture coordinates were given */
glTexParameteri( GL_TEXTURE_2D, GL_TEXTURE_WRAP_S,
GL_CLAMP );
glTexParameteri( GL_TEXTURE_2D, GL_TEXTURE_WRAP_T,
GL_CLAMP );
/* Set filters. This configures for trilinear filtering */
glTexParameteri( GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER,
GL_LINEAR );
glTexParameteri( GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER,
GL_LINEAR_MIPMAP_LINEAR );
/* Colored text is created with GL_MODULATE. */
/* The intensity map determines the brightness and the */
/* vertexes specify the color */
glTexEnvi( GL_TEXTURE_ENV, GL_TEXTURE_ENV_MODE,
GL_MODULATE );
/* Make the character texture */
makeFontMipmap( 128, 'B' );
/* Draw the texture */
glBegin( GL_QUADS );
glColor3f( 1.0, 0.0, 0.0 );
glTexCoord2f( 0.0, 0.0 );
glVertex2f( -1.0, 1.0 );
glColor3f( 1.0, 0.5, 0.0 );
glTexCoord2f( 1.0, 0.0 );
glVertex2f( 1.0, 1.0 );
glColor3f( 0.0, 0.0, 1.0 );
glTexCoord2f( 1.0, 1.0 );
glVertex2f( 1.0, -1.0 );
glColor3f( 0.0, 0.5, 1.0 );
glTexCoord2f( 0.0, 1.0 );
glVertex2f( -1.0, -1.0 );
glEnd();
UnlockGL();
};
This could be improved by creating the fonts in advance and
binding them to texture objects using glBindTexture.
Conclusion
Much has already changed in the OpenGL world and much
remains to be done. We are now very close to tremendous
performance gains through hardware acceleration and BeOS
Release 4 has paved the way.
Thanks, and let's create those great looking 3D apps.
DEVELOPERS' WORKSHOP: Fal Parsi
By Doug Fultonlbj@be.com
"Developers' Workshop" is a weekly feature that provides
answers to our developers' questions, or topic requests.
To submit a question, visit
http://www.be.com/developers/suggestion_box.html.
"Now that DR12 (excuse me, R4) is more than a twinkle in
Eddie's eye, can I ask about new features without
cramming my interlocutant down the Bocca de la Verita?"
-- Amfortas,
Monsalvat, Spain
Good of you to write, Mr. Amfortas (where'd you find Shroud
of Turin stationery?). To answer your only question first
(without actually answering it), here you go:
* Between the dum and dee of finding a handler's looper
and locking the fellow, there lies a race. Consider the
mayhem were the handler is removed from the looper between
the two calls. Rare? You bet, but the best bugs are just
so. Solve the problem with BHandler's new LockLooper()
function. In a single call the looper is cornered and
quartered. So, where you now have this (to examplicate the
commonest):
window = view->Window();
if (window->Lock()) {
...
window->Unlock();
}
...you will, in R4, type thus:
if (view->LockLooper()) {
...
view->UnlockLooper();
}
* Are you jealously interested in the other apps that the
user is sneaking about with when you're not looking? To
get this information, in R3, you had to pester the roster
like a five-year-old in the back seat on his way to
grandma's. Now, roster will pester you: BRoster's
StartWatching() and StopWatching() functions will let you
register for notifications of application launchings,
activations, and deaths. All just gossip, in my book.
* As Pulse() is the apian genua, the new BMessageRunner is
a feline huzzah. Mr. Message Runner sends a message, and
then sends it again some moments later, and again and
again and again, automatically, continuously, obsessively.
* You like BFile, but you miss being able to close() when
you're done. The lack of a proper goodbye feels as caddish
as leaving a twenty on the table and lying about calling,
you blackguard. Now we give you your cupcake, and yet
another, so you can eat one and have the remainder. Look
for BNode::Dup(), the call with the po-po-posixy name. It
duplicates the node's file descriptor so you can have your
way with it and properly close() it when you're done. It
doesn't actually affect the BNode's descriptor, so you may
still feel a bit roguish, but at least you can go through
the motions.
* BResources, the Flying Dutchman of the Storage Kit, has
shifted its sails once again. What started out as a
happy-go-lucky structure that could tack into nearly any
file was trimmed to make way for attributes a few releases
back. In R4 we'll trim again: You can use a BResources
object to *read* an application's resources, but you
mustn't *write* the data. Writing resources (signatures,
icons, etc.) is the job of professionals, such as
FileTypes, IconThingummy (what *do* we call it these
days?), and the new xres tool (which I'm not going to talk
about).
* How many times has mounting a volume evoked that feeling
of presque vu? Turn that "almost" into a certainty by
examining the volume's new "be:volume_id" attribute. 64
bits is better than fingerprints and costs less than DNA.
* The BGLView class, your interface to OpenGL, has learned
the BDirectWindow secret handshake. Also, OpenGL no longer
speaks pig latin when asked to back-cull. It's a z-axis
world; live in it.
That should help. By the way, wasn't Kundry an intern?
Beta Season
By Jean-Louis Gassée
It's almost here. We'll soon begin rounds of beta testing
for the upcoming Release 4 of the BeOS. And that seems like
a good opportunity to state our position or intentions on
the topic of release classifications.
First, an explanation of the terms. It used to be that
"alpha" meant something that occasionally worked and
represented what you wanted the product to do. "Beta" meant
"feature complete," including undocumented features --
a.k.a. bugs.
Cynics say that rounds of beta testing are used to
progressively approximate a commercial product, one that the
customer will pay for and not return in bankrupting numbers.
As with any language artifacts, "alpha" and "beta" have lost
some of their categorical meanings as they've evolved. Beta
testing is now an opportunity to add and delete features as
the product moves toward commercial completion. Some
features prove too problematic to fix in reasonable time.
Others that seemed like a good idea might be rejected by
real users. Missing functions in an earlier beta are now
feasible or clamored for.
With the Web, and the Software Valet client in the BeOS, we
have ideal tools for a more fluid beta testing process. I
mentioned "real users" and the clamor for certain features.
In an ideal world, we have a perfect QA organization with
testing programs that probe every tendril in our software
and take it where no human would dare tread. The more
mundane reality is that QA engineers are too sophisticated
and know too much, including unconscious knowledge. As a
result, they, or their programs, don't tread where normal
human beings naturally go. How did you do that, and why? I
don't know, replies the customer, already annoyed.
I know about this. Because of an apparently innate ability
to misuse software and washer-dryers, I'm used to being on
the receiving end of such questions. For example, on a
certain legacy operating system, the number of bytes
remaining on disk is displayed in a window title. I once
"managed" to replace the comma separator in the number with
a J. My hard disk was promptly confiscated. I promise we
won't do this at Be. We might, though, just beg to borrow
your system to make sure we can reproduce a problem we were
unable to create unaided.
Regarding the clamor for features, we're a little nervous.
Hopefully, the BeOS Release 4 will show that we've been
listening to assertive software developers and users. On the
other hand, with a much larger feature set, we're likely to
get even more vigorous feedback -- some of which will feed
the next round of fixes or improvements.
We like this, especially when we don't like it. The pain
means the critics have touched something important that we'd
better attend to.
Recent Be Newsletters |
1998 Be Newsletters
1997 Be Newsletters |
1995 & 1996 Be Newsletters
Copyright ©1998 Be, Inc.
Be is a registered trademark, and BeOS, BeBox, BeWare, GeekPort, the Be logo and the BeOS logo
are trademarks of Be, Inc. All other trademarks mentioned are the property of their respective owners.
Comments about this site? Please write us at webmaster@be.com.
|