Tuesday, 13 July 2010

Massive blobs CPC

I had a long time since my last post here. Well, I am coding more stuff than posting a blog about them. Although I do like teasers and previews (I am a demo-consumer) what would be the meaning of posting teasers of every thing I am doing? Not speaking about time. But once in a while I might throw out something.

In my last CPC demo, Chunky Chan I displayed 5 big blobs (2d metaballs) of size 24*24, also in some linear mode of one byte per pixel chunk (so do they call them I think, say a 4*4 dithered pixel block, although scanlines removed because speed still sucks). Anyway,. I wrote the code for this part in a very short period while traveling to Amstrad Expo. So it wasn't optimized, it was straight forward. Read the value from blob bitmap, add it to the background, check for clamping (if it overflows above 255 keep it 255, a jump with carry check) and then write back the result to the background.

I already knew a more optimized version with unroll codes for every pixel of the blob, I just didn't have time then to make it, so I wrote the straightforward code. But now I tried an experiment just out of curiosity, to actually write those unroll codes. So, there is a 7*7 blob now, with unroll codes for all 49 pixels. It gets in HL the starting address it should write the upper left pixel in the chunky buffer and then works along like this:

A regular value from 1 to 254 should be:

LD A,nn ; directly the value in A as number (blob pixel value)
ADD (HL) ; add value A (blob) to the contents of HL (background)
JR NC,P11 ; if not overflow go to P11
LD A,255 ; else clamp to 255
LD (HL),A ; write whatever the result to background
INC L ; move to the next X position

As you see, we don't need to read the blob value from one buffer, increase it's pointer, etc. Since it's unrolled code for every blob pixel, the code can directly have the data inside. Many cycles gained and one more free register (as if we would ever need it here :)

Now, the nicer stuff. Since we write the unrolled code for every pixel of the blob, we know in advance which blob pixels are 0 and which are 255. So we write these:

If the blob value is 0 (usually near the four corners of the blob bitmap):

INC L ; Nothing to write. Move to the next on X.

If the blob value is 255 (close to the center of the blob):

LD (HL),255 ; no comparisons, nothing. If you have 255 anyways
INC L ;and you add it, you will certainly get overflow or 255.

So, imagine a big blob of the size I had in my demo or more. A big amount of the pixels would be 255, some would be 0. The above cases take either 1 or 4 cycles (It's 3 in my code. I write an LD B,255 in the very beginning before rendering each blob and much later in every case, LD (HL),B. Also I replace LD A,255 in the clamping loop with LD A,B to gain 1 cycle. I just didn't put it here in this code to make it easier for you to read). The original version took 10 (with LD A,B) per pixel. It could be averagely 3 to 5 depending on the coverage of the white or black areas in the blob. With the 7*7 blob there aren't many though (just 13 out of 49 pixels).

I did this code to test something that I think it will look very impressive. At the moment I just want to render a lot of blobs (96 in the picture) with this new engine and stare at the pictures. But what about using them to render particles, make a small slow rotating galaxy and other fancy stuff? Move them around. The chunky buffer rendering of 64*36 tiles on the screen now takes 2VBLs (I wish it could be faster), I can render 38 of the 7*7 Blobs in one VBL, I will spend few more to move them around or make them explode. It's not going to be something less than 3VBLs or 4 but I think it will be enough (Divide 50fps / number of VBLs for the frame rate, for those who don't know the CPC terminology :).

And maybe I don't need to render 96 or 200, an explosion with 32 will look cool enough, or a comet moving around with say 50. Of course I want full particles, velocity, acceleration, life fade. I have unroll codes also for smaller blobs (5*5, 3*3 at the moment) which also darken their color, just to fade out the particles as they die. I have done particle animation in 8bits before with few sprites or dots, 8/8 fixed point in 16bit regs, ADD HL,DE/BC, it won't take much cycles to move a small number.

Anyways,. cool! I gotta finish this little experiment of code and I really love to see it working and who knows, put it in a demo coming one day it would hopefully look great on the CPC!