Be Newsletter
Issue 85, August 6, 1997
Table of Contents
BE ENGINEERING INSIGHTS: So You Wanna Play With a Frame Buffer
By Benoît Schillings
I've spent way too much time over the last four years
reading and writing frame buffer memory. Admitting that I
have this problem is, for me, cleansing. Almost spiritual.
Let me share my enlightenment. And if you're doing
rendering using the Game Kit, or working with off screen
bitmaps, you might even find my confessions interesting.
The first thing that you realize when you use a fast machine
(like my beloved 225 MHz PowerPC) is that PCI is a pain.
This thing is so slow compared to the speed of the CPU that
thinking a little bit about the way you will access memory
can have some nice paybacks.
For the next few examples, let's assume that the frame
buffer is in 24-bit mode and has a width of 800 pixels.
The easiest way to draw a square 128 pixels wide is:
#define BUFFER_BASE 0x14000020
#define ROWBYTE (800*4)
void test1()
{
ulong *p;
ulong my_color;
int x;
int y;
my_color = 0x00ff0000; // a nice red color.
for (y = 0; y < 128; y++) {
p = (ulong*)(BUFFER_BASE + (y * ROWBYTE));
for (x = 0; x < 128; x++) {
*p++ = my_color;
}
}
}
On my machine with a Twin Turbo video card, this piece of
code runs in 3550 usecs. This is a bandwidth of about 18
MB/sec. If we run the same test using conventional memory,
the code now runs in only 730 usecs -- about 5 times
faster!! Excuse me, I have to get the phone.
I'm back. Moral #1: When you have a LOT of rendering to do,
it's faster to render in an off screen bitmap and then blit
the final result to the screen. This is even more apparent
if you try to READ stuff out of the buffer. If we change the
previous code into
for (y = 0; y < 128; y++) {
p = (ulong*)(BUFFER_BASE + (y * ROWBYTE));
for (x = 0; x < 128; x++) {
*p++ ^= 0xffffffff;
}
}
The execution time now goes up to a hugely 15100
microseconds, 4.5 times slower than the previous write only
case. Say it not, reading the contents of a PCI frame buffer
is a bad idea! In real memory, it takes only 800
microseconds. As you can see, the off screen method is the
clear winner.
Back to the simple writing case. It turns out that writing
doubles into the frame buffer helps the performance of the
PCI transaction:
void test1()
{
double *p;
double temp_double;
ulong my_color;
int x;
int y;
my_color = 0x00ff0000; // a nice green color.
*((ulong*)&temp_double) = my_color;
*(1 + (ulong*)&temp_double) = my_color;
for (y = 0; y < 128; y++) {
p = (double*)(BUFFER_BASE + (y * ROWBYTE));
for (x = 0; x < 128/2; x++) {
*p++ = temp_double;
}
}
}
This one runs in 1970 usec -- about 50% better than the one
using 32 bits transfer!
If we unroll the loop...
for (x = 0; x < 128/8; x++) {
*p++ = temp_double;
*p++ = temp_double;
*p++ = temp_double;
*p++ = temp_double;
}
...we don't actually gain anything. This runs in exactly the
same time as the non-unrolled version. This is because any
overhead between the write instructions is hidden by the
time taken to do the write. If you do some computation
between writes, you may find that it is free. For example:
for (x = 0; x < 128/2; x++) {
*p++ = temp_double;
my_color += (x<<24) | (x << 8) ^ x;
*((ulong*)&temp_double) = my_color;
my_color += (x<<24) | (x << 8) | x;
*(1 + (ulong*)&temp_double) = my_color;
}
It looks busy. It is busy. But this thing still runs in
EXACTLY the same time -- so much for optimization! (By the
way, this random piece of code looks very nice if you run it
a few thousand times, try it!)
Note that although the double write only needs to be aligned
on a 4-byte boundary, you should stick with 8-byte
alignment. 4-byte alignment carries an %80 performance
penalty.
A few nice tricks
When doing graphic-intensive operations in 32-bit mode, you
may find some of these functions useful. They're a
collection of tricks to speed up common blending operations.
Have fun decoding them!
The first one blends two RGB values...
The trivial implementation would be :
ulong calc_blend(ulong color1, ulong color2)
{
ulong result;
result = ((1 + ((color1 >> 24) & 0xff) + ((color2 >> 24)
& 0xff)) >> 1) << 24;
result |= ((1 + ((color1 >> 16) & 0xff) + ((color2 >> 16)
& 0xff)) >> 1) << 16;
result |= ((1 + ((color1 >> 8) & 0xff) + ((color2 >> 8)
& 0xff)) >> 1) << 8;
result |= ((1 + ((color1 ) & 0xff) + ((color2 )
& 0xff)) >> 1);
return result;
}
The fast version is:
ulong calc_blend(ulong color1, ulong color2)
{
return ((color1 & 0xFEFEFEFE)>>1)+
((color2 & 0xFEFEFEFE)>>1)+
(color1 & color2 & 0x01010101L));
}
A fast color addition with clipping to 0xff :
ulong calc_add(ulong c1, ulong c2)
{
return ((((((c1^c2)>>1)^((c1>>1)+
(c2>>1)))&0x80808080L)>>7)*0xFF)|(c1+c2);
}
Now subtraction:
ulong calc_sub(ulong c1, ulong c2)
{
c2 ^= 0xFFFFFFFFL;
return ((((((c1^c2)>>1)^((c1>>1)+(c2>>1)))&
0x80808080L)>>7)*0xFF)&(c1+c2+1);
}
By the way, thanks to Pierre for some of these tricks!
News from the Front
By William Adams
You know how you read magazine articles that start, "By
the time you read this..."
Well, it's Thursday before our developer's conference and
I'm sitting here making up things that will sound OK by the
time you read this next week. How about starting with things
we do know.
We now have accelerated 3D support for OpenGL®!! How the
heck? Well, start with the Diamond Monster3D board, add the
current 2.4 glide library support from 3Dfx, and mash on the
freely available Mesa OpenGL® implementation, and there you
have it. I'm telling you, you have done 3D until you've done
3D with a nice cheap hardware accelerator.
ftp://ftp.be.com/pub/samples/graphics/obsolete/glide.zip
ftp://ftp.be.com/pub/samples/open_gl/obsolete/Mesa-2.3_src.zip
This will work with any of the 3Dfx Voodoo based graphics
cards, not Voodoo Rush at the moment. Go buy one of these
cards (~$200), plug it into your Mac, and have at it. If you
are at the developer's conference, you'll see this in
action, if not, you'll just have to wait.
Speaking of the conference, what else will you see? Well,
since you won't see this until next week anyway, you'll see
a whole bunch of other nifty graphics stuff, as well as a
glimpse of what I fondly call WetTV. Not to dwell on my
favorite Bt848 subject, but I'm programming and I can't
stop!! Have you ever watched television where channel
changes have movie transitions? You will need a towel to run
this application because after you do, you will find that
you have peed your pants with excitement. Bold statements?
Yes of course, this stuff is kicking and you won't see it on
XYZ operating systems because those programmers simply
aren't as motivated. They're too busy poo pooing how little
chance we have of succeeding.
My brother has this favorite little statement, "We're going
to go eat from the big dog's bowl while he's not looking."
So here we are. Some software available. Some interesting
hardware support, including a new processor, and hungry
agile programmers who are willing to take advantage of
superior technology. Thanks for all your support.
So what kind of support do you get from us? Let me tell you
a story. We were at the local sandwich shop waiting for our
food. Geoff had just come from the local hardware store
where he had bought some stain or some such. He was trying
to balance the can on his head, but lacking enough hair, it
didn't stay too well and ended up on the floor. Boy that
stuff spreads fast!! He and one of the other customers
quickly mopped it up while Brian and I quickly got our food
and distanced ourselves from the scene. To redeem himself,
he did the 3Dfx support.
That's the kind of tech support personnel we have around
here! A little fool hardy, but we can program up a storm
when the need arises. I hope you all benefit from the fruits
of our labors.
BeOS on Intel Hardware
By Jean-Louis Gassée
Six months ago, we exited the hardware business. We loved
our BeBox very much, but we love to create opportunities for
BeOS developers even more. Our software running on Power Mac
compatibles was warmly received. As a result, BeOS
developers could see a much broader installed base than the
one provided by BeBoxen. Apple, Power Computing, Motorola
and Umax were very helpful in making this possible.
Once it became clear we were in the business of adding value
to popular hardware, the next logical step didn't require
much thought. We claimed and proved the portability of our
OS; porting it to Intel Architecture systems made sense. At
a time when our engineering resources were stretched by our
work on the major improvements in the Preview Release, Intel
helped us to get started by providing engineers who, for a
while, came to work in our cramped cubicles. Their
contribution to this important project is gratefully
acknowledged: We wouldn't be this far along without their
excellent work.
Now, such a move raises many new questions. I'll answer a
few today, leaving the rest to other columns -- or BeWeek
columnists.
First, does this put us more squarely in competition with
Microsoft? In other words, are we even crazier than
previously perceived?
Crazy, perhaps, but not suicidal. Actually, as one of our
co-founders, Steve Sakoman, remarks, one must be crazy to do
something original, as opposed to derivative. But not all
craziness is productive. What do we have to gain by
competing more directly with Microsoft?
Let's start by noting that, more and more, when you write a
line of C++, or Java code, you could be competing with
Microsoft, whose strategy could be summarized by one word:
Everything.
But universality has its drawbacks. Windows 95 is an
excellent general purpose desktop OS. Windows NT is a holy
terror in the enterprise market. Are we going to be
flattened by these two steamrollers? For us, the idea is to
exist to the left or right of them, not in their path. Put
another way, our focus is the digital media content creation
space.
The situation opposes a dedicated tool, the BeOS, versus
respected general purpose platforms. Some developers and
users will prefer the benefits of specialization, others
will pick the general purpose platform. Historically, this
leads to 75-25%, or 80-20% situations.
Let's continue by noting many advanced PC users already run
more than one OS. Popular software tools called boot
managers provide for such coexistence: Windows NT offers its
own, there is the extremely successful System Commander and
Lilo, a very nice Linux utility. We're proud of our work, we
see incredible potential in our OS, but the logical
consequence of specialization is coexistence with general
purpose products, as opposed to attempting to displace them
with a (yet) unproven OS such as ours.
Second, does this mean we are abandoning the PowerPC? Again,
no. Why should we do this at the very moment our OS could
run on most personal computers? As we have said in the past,
we are processor agnostic.
Agnostic, and hopeful. Comparing the performance between
Intel-based PCs and PowerPC systems, we see unrealized
potential in the PowerPC space. In the Intel market,
competitive forces have honed many parts of the system, chip
sets, bus, memory, disks, graphic accelerators...
As a result, with roughly equivalent Pentium and PowerPC
processors, system performance tends to be superior on the
Intel Architecture side. It's not always pretty, but
advances such as USB and FireWire are about to remove many
scars from the past -- and it is fast and inexpensive. A
system based on an Intel dual Pentium Pro motherboard, with
high-speed SCSI storage, Ethernet, sound, nice video, etc.,
can be had for about $2,500, monitor included.
On the other hand, until recently, the Mac market has been
deprived from the competitive forces which make hardware
subsystems more efficient and less expensive. This is where
we see an opportunity for the PowerPC. There is a chance the
much awaited CHRP will finally become a reality. If it does,
an active Mac clone industry will finally actualize the
"power" in PowerPC.
I wrote above we were processor agnostic and hopeful. We
aren't blind either. Apple is still struggling with its
licensing dilemma. As everyone else, I've read the New York
Times story reporting Apple Board's statement to clone
makers they were, in essence, no longer wanted. I hope the
NYT was misinformed but I'm struggling with the knowledge
John Markoff, the reporter, is well connected and very
careful.
We'll see. In the mean time, we have to take care of our
business, which is to expand opportunities for BeOS
developers. That's what we are doing with the Intel
Architecture version.
|