{"id":38,"date":"2015-01-17T16:06:58","date_gmt":"2015-01-17T16:06:58","guid":{"rendered":"http:\/\/blog.nothinguntoward.eu\/?p=38"},"modified":"2015-01-17T18:51:07","modified_gmt":"2015-01-17T18:51:07","slug":"an-88-byte-bare-metal-mandelbrot-generator","status":"publish","type":"post","link":"https:\/\/nothinguntoward.eu\/?p=38","title":{"rendered":"An 88 Byte Bare-Metal Mandelbrot Generator"},"content":{"rendered":"<p>A while back, I was playing around with a bit of 8086 assembly. I&#8217;m not much of a programmer, but I&#8217;ve done some C{,++}, Python, PHP (sorry) and bits of a few others in the past. All of these are quite high level though, and I felt like going back to basics a bit. I chose to start with MS-DOS, since I thought (perhaps incorrectly?) that it would be easier to program. And I figured it would probably be a nice bit of computing history to learn anyway.<br \/>\nI soon moved on from MS-DOS and decided to set myself a programming challenge of a bare-metal Mandelbrot generator. In this post, I&#8217;ll give a bit of background about the Mandelbrot set, and explain my code.<\/p>\n<h2>MS-DOS<\/h2>\n<p>I started off with this <a title=\"MS-DOS assembly tutorial\" href=\"http:\/\/www.skynet.ie\/~darkstar\/assembler\/\" target=\"_blank\">MS-DOS assembly tutorial<\/a>. It seems to be broadly unchanged since it was first crawled by the <a href=\"https:\/\/web.archive.org\/web\/20021214132803\/http:\/\/www.skynet.ie\/~darkstar\/assembler\/\" target=\"_blank\">Wayback Machine<\/a> in 2002. Nice. I chose to use <a href=\"http:\/\/www.nasm.us\/\" target=\"_blank\">NASM<\/a> instead of <a href=\"https:\/\/en.wikipedia.org\/wiki\/A86_%28software%29\" target=\"_blank\">A86<\/a> but otherwise the tutorial&#8217;s pretty good. If you&#8217;re lazy, here&#8217;s a <a href=\"http:\/\/blog.nothinguntoward.eu\/wp-content\/uploads\/2015\/01\/msdos.ova\">virtualbox image<\/a>, with MSDOS 6.22 and NASM installed. My code is all in C:\\PROGRAMM, in case anyone wants to use it as a reference.<\/p>\n<p>This was fun for a while, but eventually I got bored of DOS programming and decided to try some bare metal stuff &#8211; and somehow, perhaps vaguely influenced by <a href=\"https:\/\/en.wikipedia.org\/wiki\/Demoscene\" target=\"_blank\">demoscene<\/a> style demos, I ended up setting myself the challenge of writing a bare-metal Mandelbrot generator within a 512-byte bootsector.<\/p>\n<h2>The Mandelbrot Set<\/h2>\n<p><img decoding=\"async\" src=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/2\/21\/Mandel_zoom_00_mandelbrot_set.jpg\/320px-Mandel_zoom_00_mandelbrot_set.jpg\" style=\"float:right\" \/><br \/>\nThe <a href=\"https:\/\/en.wikipedia.org\/wiki\/Mandelbrot_set\" target=\"_blank\">Mandelbrot Set<\/a> is the set of points, c, on the complex plane such that the sequence z<sub>n+1<\/sub>=z<sub>n<\/sub><sup>2<\/sup>+c (where z<sub>0<\/sub>=0) remains bounded. To clarify, here&#8217;s an example:<\/p>\n<ul>\n<li>Let&#8217;s check if the point (-0.5+0.5j) is in the set. Set c=(-0.5+0.5j)<\/li>\n<li>Now let&#8217;s perform the  first iteration. (-0.5+0.5j)\u00b2+(-0.5+0.5j)=-0.5<\/li>\n<li>And another iteration: (-0.5)\u00b2+(-0.5+0.5j)=-0.25+0.5j<\/li>\n<li>And another: (0.25+0.5j)\u00b2+(-0.5+0.5j)=-0.6875+0.25j<\/li>\n<li>We can keep on iterating again and again like this, and we find that the numbers stay small. Eventually, we might conclude that the sequence is bounded, and the point is therefore in the set.<\/li>\n<\/ul>\n<p>In fact, if we plot each iteration, we can see that it&#8217;s going around in a strange, almost spirograph-like shape:<br \/>\n<a href=\"http:\/\/blog.nothinguntoward.eu\/wp-content\/uploads\/2015\/01\/mandelbrotiter.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/blog.nothinguntoward.eu\/wp-content\/uploads\/2015\/01\/mandelbrotiter-300x225.png\" title=\"z1 through to z100\" width=\"300\" height=\"225\" class=\"aligncenter size-medium wp-image-47\" srcset=\"https:\/\/nothinguntoward.eu\/wp-content\/uploads\/2015\/01\/mandelbrotiter-300x225.png 300w, https:\/\/nothinguntoward.eu\/wp-content\/uploads\/2015\/01\/mandelbrotiter-1024x768.png 1024w, https:\/\/nothinguntoward.eu\/wp-content\/uploads\/2015\/01\/mandelbrotiter.png 1200w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><br \/>\nBut overall, the general trend seems to be going around in a closed circle, rather than shooting off to infinity, so it would probably be fairly safe to say here that the sequence is bounded. If you&#8217;re not convinced, here&#8217;s a plot of the magnitude at each iteration:<br \/>\n<a href=\"http:\/\/blog.nothinguntoward.eu\/wp-content\/uploads\/2015\/01\/mandelbrotiter-mag.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/blog.nothinguntoward.eu\/wp-content\/uploads\/2015\/01\/mandelbrotiter-mag-300x225.png\" title=\"all of the indicies are out by 1 here. whoops\" width=\"300\" height=\"225\" class=\"aligncenter size-medium wp-image-49\" srcset=\"https:\/\/nothinguntoward.eu\/wp-content\/uploads\/2015\/01\/mandelbrotiter-mag-300x225.png 300w, https:\/\/nothinguntoward.eu\/wp-content\/uploads\/2015\/01\/mandelbrotiter-mag-1024x768.png 1024w, https:\/\/nothinguntoward.eu\/wp-content\/uploads\/2015\/01\/mandelbrotiter-mag.png 1200w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>To take another example, let&#8217;s try it with the point (0.5,0.5). The iterations go:<\/p>\n<ul>\n<li>0.5+1j<\/li>\n<li>-0.25+1.5j<\/li>\n<li>-1.69-0.25j<\/li>\n<li>3.28+1.34j<\/li>\n<li>&#8230;<\/li>\n<li>990294279-77569977j<\/li>\n<p>So, this looks like it&#8217;s shooting off to infinity&#8230; and fast.<\/p>\n<p>But this raises an interesting question: at what point do we decide whether the sequence is or isn&#8217;t bounded? In general we&#8217;d have to continue iterating to infinity to know for sure. Perhaps we should set a threshold, beyond which we declare it to be unbounded? But where should it be? And how do we know that sequences which exceed that threshold aren&#8217;t coming back?<\/p>\n<p>Well, as it turns out, it can be <a href=\"http:\/\/www.softlab.ntua.gr\/Miscellaneous\/faq\/fractal\/faq-doc-6.html\">proven<\/a> (see A6d) that if any point leaves a circle of radius 2, (ie |z<sub>n<\/sub>|>2), it will never return. So a reasonable approach is to compute a fixed number of iterations, and then check whether the result is less than 2 in absolute value.<\/p>\n<p>Here are two simple implementations I wrote just to demonstrate the idea. First, this is for the C++-minded people:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">#include &lt;iostream&gt;\r\n#include &lt;complex&gt;\r\nusing namespace std;\r\nint main(){\r\n\tcout &lt;&lt; &quot;P1 800 800 &quot;;\r\n\tfor(int i=0;i&lt;800;i++){\r\n\t\tfor(int j=0;j&lt;800;j++){\r\n\t\t\tcomplex&lt;float&gt; c((j-400)\/200.0,(i-400)\/200.0); \/\/scale the indicies to floats in the range -2..2\r\n\t\t\tcomplex&lt;float&gt; z(0,0); \/\/ initialise z to 0\r\n\t\t\tfor (int iter=0;iter&lt;64;iter++) \/\/iterate 64 times\r\n\t\t\t\tz=z*z+c;\r\n\t\t\tcout &lt;&lt; (abs(z)&lt;2)?&quot;1 &quot;:&quot;0 &quot;; \/\/output a black pixel if in the set, white otherwise\r\n\t\t\t}\r\n\t\t}\r\n\tcout&lt;&lt;endl;\r\n\treturn 0;\r\n\t}<\/pre>\n<p>This goes through each pixel one-by-one and for each one runs 64 iterations. The output is a PBM file to stdout, so you&#8217;ll have to pipe it and open it in a compatible viewer (GIMP works) or convert it to a more common format.<\/p>\n<p>Or, if you&#8217;re more of a MATLAB fan:<\/p>\n<pre class=\"brush: matlabkey; title: ; notranslate\" title=\"\">c=ones(800,1)*linspace(-2,2,800);\r\nc=complex(c,c');\r\nz=zeros(800,800);\r\nfor i=1:50\r\n\tz=z.^2+c;\r\nend\r\nimshow(abs(z)&lt;2)<\/pre>\n<p>The same idea as the C++ example, but using vectorisation rather than iterating through pixels, and using MATLAB&#8217;s internal imshow command to display the image.<\/p>\n<h2>The Bare-Metal Version<\/h2>\n<p>With all that setting the scene done, let&#8217;s get down to how I wrote an 88-byte 8086 bare-metal mandelbrot generator.<\/p>\n<h3>A note on arithmetic<\/h3>\n<p>The general structure of the C++ example can be carried over, but the arithmetic gets difficult. Firstly, the complex number datatype in C++ is an abstraction. The processor doesn&#8217;t understand complex numbers (not an 8086 at least). So we&#8217;ll have to treat them as two reals. Addition and subtraction are trivial. For multiplication we just have to remember that:<br \/>\n<img src='https:\/\/s0.wp.com\/latex.php?latex=%28a%2Bbj%29%28c%2Bdj%29%3D%28ac-bd%29%2B%28ad%2Bbc%29j&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='(a+bj)(c+dj)=(ac-bd)+(ad+bc)j' title='(a+bj)(c+dj)=(ac-bd)+(ad+bc)j' class='latex' \/><br \/>\nand specifically:<br \/>\n<img src='https:\/\/s0.wp.com\/latex.php?latex=%28a%2Bbj%29%5E2%3D%28a%5E2-b%5E2%29%2B2abj&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='(a+bj)^2=(a^2-b^2)+2abj' title='(a+bj)^2=(a^2-b^2)+2abj' class='latex' \/><\/p>\n<p>This is conceptually not too difficult, but it does mean that complex operations take up a lot of program space.<\/p>\n<p>The second thing to consider is that floating point operations don&#8217;t exist in 16-bit intel. We&#8217;re going to have to work in integer instructions instead. With a little bit of trickery though, we can create a very simple fixed-point system. Essentially, we just represent every number as a fixed multiple of its true value. Say, this could be 10. Addition (and by extension, subtraction) is consistent:<br \/>\n<img src='https:\/\/s0.wp.com\/latex.php?latex=%2810a%29%2B%2810b%29%3D10%28a%2Bb%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='(10a)+(10b)=10(a+b)' title='(10a)+(10b)=10(a+b)' class='latex' \/><\/p>\n<p>Multiplication works too, with the caveat that we have to divide by the scale factor after:<br \/>\n<img src='https:\/\/s0.wp.com\/latex.php?latex=%5Cfrac%7B%2810a%29%2810b%29%7D%7B10%7D%3D10ab&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\\frac{(10a)(10b)}{10}=10ab' title='\\frac{(10a)(10b)}{10}=10ab' class='latex' \/><\/p>\n<p>It&#8217;s for this reason that it&#8217;s best for that multiple to be a power of two: this reduces the division to a right shift.<br \/>\nFor example, here is how we might multiply two numbers with a scaling of 1\/32:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">mov al,73 ; move the number 2.28 (73\/32) into al\r\nmov bl,-193 ;move -6.03 (-193\/32) into bl\r\nimul bl ; multiply bl (implicitly, with al). stores result in ax.\r\nsar ax,5 ; shift right by 5 places to divide by 2^5=32<\/pre>\n<p>Note the use of sar rather than shr to preserve the sign.<\/p>\n<h3>The code<\/h3>\n<p>So, without further ado, let&#8217;s start deconstructing my code, a section at a time:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">&#x5B;bits 16]\r\n&#x5B;org 7c00h]\r\nmov ax,0fh\r\nint 10h<\/pre>\n<p>First a bit of initialisation &#8211; instructions to the assembler that the code should be 16-bit, and will be loaded into memory at location 0x7c00.<br \/>\nThen, we call <a href=\"https:\/\/en.wikipedia.org\/wiki\/INT_10H\" target=\"_blank\">BIOS interrupt 10h<\/a> with AH=0 (change video mode) and AL=0fh (<a href=\"http:\/\/www.columbia.edu\/~em36\/wpdos\/videomodes.txt\" target=\"_blank\">640&#215;350 mono<\/a> resolution).<\/p>\n<p>This video mode will fit a nice big 512&#215;256 canvas on the screen.<\/p>\n<p>Total assembled size so far: 5 bytes<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nmov dx,-128 ;start on the first line\r\nyloop:\r\nmov di,-256 ;go to the beginning of the line\r\nxloop:\r\n;set z=c initially - start on iteration 1\r\nmov si,di\r\nmov bp,dx\r\nxor ax,ax ; set colour to black (al=0) and iteration counter to zero (ah=0)<\/pre>\n<p>So, here we initialise the two variables for our c value. DX will contain the y value and DI the x value. I&#8217;m starting at y=-1 and x=-2, using a scaling factor of 1\/128. My canvas is to be y from -1 to 1 and x from -2 to 2. The z variable is stored with its real part in SI  and its imaginary part in BP. We skip out the zeroth iteration here by initialising z=c.<\/p>\n<p>Later, the interrupt to write a pixel to the screen will read the pixel value from AL. This wants to be initialised to zero (black). If the pixel is in the set, this will be changed to one (white) later. Otherwise it will be unchanged.<br \/>\nAt the same time, I want to set an iteration counter to zero. Sneakily, I&#8217;ve chosen to use the other byte of AX for this, so that I can set both the pixel value and the iteration counter to zero in a single instruction. This single xor saves two bytes vs two 8-bit literal moves.<\/p>\n<p>This block: 12 bytes, So far: 17 bytes.<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">iterate:\r\n;calculate Im(z^2)=2xy\r\nmov cx,bp\r\nimul cx,si\r\njo overflow\r\nsar cx,6 ;maybe mov 6 into a register first?\r\n;cx contains Im(z^2)<\/pre>\n<p>At each iteration step we need to calculate <img src='https:\/\/s0.wp.com\/latex.php?latex=z_%7Bn%2B1%7D%3Dz_n%5E2%2Bc&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='z_{n+1}=z_n^2+c' title='z_{n+1}=z_n^2+c' class='latex' \/>. To do this we need to calculate the real and imaginary parts of <img src='https:\/\/s0.wp.com\/latex.php?latex=z_n&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='z_n' title='z_n' class='latex' \/>, then add these to the real and imaginary parts of c.<\/p>\n<p>As the comment suggests, here we&#8217;re calculating <img src='https:\/\/s0.wp.com\/latex.php?latex=Im%28z_n%5E2%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Im(z_n^2)' title='Im(z_n^2)' class='latex' \/>, which as I showed previously is <img src='https:\/\/s0.wp.com\/latex.php?latex=2Re%28z_n%29Im%28z_n%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='2Re(z_n)Im(z_n)' title='2Re(z_n)Im(z_n)' class='latex' \/>. We copy <img src='https:\/\/s0.wp.com\/latex.php?latex=Im%28z_n%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Im(z_n)' title='Im(z_n)' class='latex' \/> into CX, since we&#8217;ll need it again later and we don&#8217;t want to clobber it. Then we multiply this by SI (<img src='https:\/\/s0.wp.com\/latex.php?latex=Re%28z_n%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Re(z_n)' title='Re(z_n)' class='latex' \/>) and store the result in CX. The value in this register should now be <img src='https:\/\/s0.wp.com\/latex.php?latex=128Re%28z_n%29Im%28z_n%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='128Re(z_n)Im(z_n)' title='128Re(z_n)Im(z_n)' class='latex' \/> (note what I said about fixed-point multiplication earlier). Ignoring the conditional jump for now, we would then want to divide by 128 to get <img src='https:\/\/s0.wp.com\/latex.php?latex=Re%28z_n%29Im%28z_n%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Re(z_n)Im(z_n)' title='Re(z_n)Im(z_n)' class='latex' \/>. Except what we want is actually <img src='https:\/\/s0.wp.com\/latex.php?latex=2Re%28z_n%29Im%28z_n%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='2Re(z_n)Im(z_n)' title='2Re(z_n)Im(z_n)' class='latex' \/>, so we can save ourselves a multiplication by just dividing by 64 instead (right shift by 6).<\/p>\n<p>Now, back to the jump. This is how we catch the sequences which go off to infinity &#8211; we&#8217;ll see where it goes later. In the C++ example, we used abs(z) to check whether we were inside an r=2 circle. But calculating an abs value is expensive. It requires us to square two numbers and sum them. But consider this: any point which leaves the r=2 circle will eventually go off to infinity. If I wanted to I could use an r=3 circle, or an r=1,000,000 circle &#8211; it would just potentially take more iterations. Any region which completely contains the r=2 circle will do the job. So how about a rectangle?<\/p>\n<p>My jump is conditional on <img src='https:\/\/s0.wp.com\/latex.php?latex=128Re%28z_n%29Im%28z_n%29%3D64Im%28z_n%5E2%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='128Re(z_n)Im(z_n)=64Im(z_n^2)' title='128Re(z_n)Im(z_n)=64Im(z_n^2)' class='latex' \/> being greater than 32768\/128 in magnitude (the signed 16-bit max, after scaling). That is, <img src='https:\/\/s0.wp.com\/latex.php?latex=%7CIm%28z_n%5E2%29%7C%3E8&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='|Im(z_n^2)|&gt;8' title='|Im(z_n^2)|&gt;8' class='latex' \/><\/p>\n<p>So, provided that |Im(c)|<6 - which it always will be since we're only working with y in the range of -1 to 1 - this will contain the whole of the r=2 circle. Perfect.\n\nThis block: 10 bytes, So far: 27 bytes\n\n[code];calculate Re(z^2)=x^2-y^2\nmov bx,si ;we will work in the BX register for now\nadd bx,bp ;bx contains x+y\nsub si,bp ;si contains x-y\nimul si,bx\njo overflow\nsar si,7 ;si contains Re(z^2)[\/code]\n\n<a href=\"http:\/\/blog.nothinguntoward.eu\/wp-content\/uploads\/2015\/01\/areas.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/blog.nothinguntoward.eu\/wp-content\/uploads\/2015\/01\/areas-180x300.png\" alt=\"areas\" width=\"180\" height=\"300\" class=\"aligncenter size-medium wp-image-77\" style=\"float:right\" srcset=\"https:\/\/nothinguntoward.eu\/wp-content\/uploads\/2015\/01\/areas-180x300.png 180w, https:\/\/nothinguntoward.eu\/wp-content\/uploads\/2015\/01\/areas.png 393w\" sizes=\"auto, (max-width: 180px) 100vw, 180px\" \/><\/a>And now to calculate <img src='https:\/\/s0.wp.com\/latex.php?latex=Re%28z_n%5E2%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Re(z_n^2)' title='Re(z_n^2)' class='latex' \/>. As noted previously, this is given by <img src='https:\/\/s0.wp.com\/latex.php?latex=Re%28z_n%29%5E2-Im%28z_n%29%5E2&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Re(z_n)^2-Im(z_n)^2' title='Re(z_n)^2-Im(z_n)^2' class='latex' \/>. Earlier versions of my code calculated exactly this &#8211; multiplying each part by itself, then subtracting one from the other. Then I realised I was missing a trick here &#8211; I could factorise <img src='https:\/\/s0.wp.com\/latex.php?latex=a%5E2-b%5E2%3D%28a%2Bb%29%28a-b%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='a^2-b^2=(a+b)(a-b)' title='a^2-b^2=(a+b)(a-b)' class='latex' \/>, thus gaining an ADD (+2 bytes), but losing an IMUL (-3 bytes). A saving of 1 byte &#8211; score!<\/p>\n<p>After we&#8217;ve calculated <img src='https:\/\/s0.wp.com\/latex.php?latex=Re%28z_n%29%2BIm%28z_n%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Re(z_n)+Im(z_n)' title='Re(z_n)+Im(z_n)' class='latex' \/> with the add bx,bp, we no longer need <img src='https:\/\/s0.wp.com\/latex.php?latex=Re%28z_n%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Re(z_n)' title='Re(z_n)' class='latex' \/> (SI), so we save ourselves an MOV by clobbering it in the SUB instruction.<\/p>\n<p>Again, we check for overflow and SAR 7 to take out the factor of 128. Here, the overflow checks if <img src='https:\/\/s0.wp.com\/latex.php?latex=%7C128Re%28z_n%5E2%29%7C%3E%5Cfrac%7B32768%7D%7B128%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='|128Re(z_n^2)|&gt;\\frac{32768}{128}' title='|128Re(z_n^2)|&gt;\\frac{32768}{128}' class='latex' \/>, ie <img src='https:\/\/s0.wp.com\/latex.php?latex=%7CRe%28z_n%5E2%29%7C%3E4&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='|Re(z_n^2)|&gt;4' title='|Re(z_n^2)|&gt;4' class='latex' \/>. Over the range of |Re(c)|<2 - exactly the space we're working in - this completely contains the r=2 circle. In fact, at the far left and right edges of the canvas, they just touch. An illustration is shown to the right: including the $latex abs(z^2)<2$ criterion in grey and the rectangular criterion we're using here in blue. The rectangle moves around relative to the axes depending on the point being calculated.\n\nThis block: 14 bytes, So far: 41 bytes\n\n[code];calculate z'=z^2+c\nadd si,di\nadd cx,dx\nmov bp,cx\n\n;do another iteration\ninc ah\njno iterate[\/code]\n\nHere are the final steps to calculate $latex z_{n+1}$. We then increment the iteration counter, AH, and unless it overflows (passes 255), we go round again. Whee! 255 iterations is thoroughly excessive, but it allows us to just check overflow, rather than doing a CMP with an immediate value, which would cost 3 bytes.\n\nThis block: 10 bytes, So far: 51 bytes\n\n[code];iterations are over \ninc al ; if we've gotten all this way, set the colour to white \noverflow:[\/code]\n\nIf we get here just by the natural progression of the program, that will be because we went through 255 iterations without overflowing. Therefore the pixel is in the set, so we set it to white - a value of 1. Since I know its value before is zero, I can use an INC (2 bytes) instead of a MOV immediate (3 bytes).\n\nIf we get here via one of the jump on overflows, we skip out of this instruction and the pixel remains black.\n\nThis block: 2 bytes, So far: 53 bytes\n\n[code];now write a pixel\nmov ah,0ch ; write pixel interrupt\nxor bh,bh \nmov cx,di\nadd cx,320\nadd dx,175\nint 10h\nsub dx,175[\/code]\nHere we use another INT 10h interrupt to write the pixel to the screen. We set AH=0Ch for the \"write a pixel\" command, set BH=0 for page zero (using XOR- no size advantage over MOV immediate here, but faster), and stick the coordinates in CX and DX (in fact, I chose to use DX to store the y coordinate anyway, to save a MOV instruction here). Since our coordinate system was centred around zero, we add 320 and 175 to the x and y coordinates respectively, to centre the image on the screen. After the interrupt, we have to take the 175 back off DX, since it serves as our loop counter so we can't clobber it.\n\nAll of these ADD and SUB immediates are VERY expensive instructions - each one takes 4 bytes. I haven't thought of a way to optimise this though.\n\nThis block: 20 bytes, So far: 73 bytes\n\n[code];loop around, do the next pixel\ninc di\ncmp di,255\njne xloop\n;or if we've gotten to here, draw the next row\ninc dx\ncmp dx,127\njne yloop[\/code]\nAnd some loop logic. We go onto the next pixel in the line (inc DI), and loop around unless the pixel value's 255 (end of the line). This CMP literal is expensive (4 bytes) but not easily avoided - since it's a signed type (and we're only using the lower 9 bits anyway), we can't just test for overflow. Outside of this, we have the same structure to loop through lines.\n\nThis block: 13 bytes, So far: 86 bytes\n[code]cli\nhlt\ndb $-$$ ;tell us the length of the code \n\ntimes 510 - ($ - $$) db 0\ndw 0xAA55[\/code]\nAnd finally a little bit of boilerplate: clear BIOS interrupts and halt. The db $-$$ line is a little hack to make it easier to see the size of the program - it just inserts a byte of value (this memory location-memory location of program start). Then I just look at the code in xxd and the last byte before the long run of zeros is the size of the program.\n\nIt needs to be padded up to 512 bytes (the size of a boot sector), with the magic word AA55h at the end to indicate that it's bootable.\n\nAnd so that's a final 2 bytes of executable code (I'm not counting the padding, of course), bringing us up to 88 bytes!\n\nThe complete code is here: <a href=\"http:\/\/blog.nothinguntoward.eu\/wp-content\/uploads\/2015\/01\/mandelbrot-bootsector.asm\">mandelbrot-bootsector.asm<\/a><\/p>\n<h3>The result<\/h3>\n<p>The code is compiled with nasm:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\"> $ nasm -o mandelbrot mandelbrot-bootsector.asm<\/pre>\n<p>And run in qemu:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\"> $ qemu-system-i386 -video cirrus -fda mandelbrot -net none<\/pre>\n<p>Here&#8217;s a screenshot and a video of it running:<br \/>\n<a href=\"http:\/\/blog.nothinguntoward.eu\/wp-content\/uploads\/2015\/01\/mandelbrotscreenshot.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/blog.nothinguntoward.eu\/wp-content\/uploads\/2015\/01\/mandelbrotscreenshot-300x176.png\" alt=\"mandelbrotscreenshot\" width=\"300\" height=\"176\" class=\"aligncenter size-medium wp-image-74\" srcset=\"https:\/\/nothinguntoward.eu\/wp-content\/uploads\/2015\/01\/mandelbrotscreenshot-300x176.png 300w, https:\/\/nothinguntoward.eu\/wp-content\/uploads\/2015\/01\/mandelbrotscreenshot.png 642w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><br \/>\n<iframe loading=\"lazy\" width=\"420\" height=\"315\" src=\"\/\/www.youtube.com\/embed\/hJiaqFE5Wao\" frameborder=\"0\" allowfullscreen><\/iframe><\/p>\n<h3>Possible Optimisations<\/h3>\n<p>I&#8217;ve no doubt this code could be made smaller by someone more skilled than I am. I can see a few areas for potential optimisation, but can&#8217;t think how to implement them without breaking other things. For example, when writing the pixels to the screen, there are 3 ADD\/SUB immediate instructions. These take up 4 bytes each &#8211; if I could have the immediate values in a register instead, that would save two bytes per instruction. But alas, I&#8217;m already scraping the bottom of the barrel for registers (I tried using SP, but since BIOS interrupts apparently use the stack, that breaks things).<\/p>\n<p>Similarly, the CMP literals for the loop logic are pretty expensive. Avoiding literals is always desirable.<\/p>\n<p>Perhaps it&#8217;s possible to save operations by foregoing the BIOS interrupt to write pixels and just write to video memory instead? I haven&#8217;t looked into this at all.<\/p>\n<h3>Dissolve Animation<\/h3>\n<p>I made another version of my code which makes the set &#8220;dissolve&#8221; in. Essentially it uses an LFSR to cycle through pixel values, rather than a loop. I think it gives a pretty cool visual effect. I was going to write more on it, but this post is getting long enough already, so I&#8217;ll just leave the code and a video here if anyone&#8217;s interested. It&#8217;s not as heavily optimised as the code above.<br \/>\n<a href=\"http:\/\/blog.nothinguntoward.eu\/wp-content\/uploads\/2015\/01\/mandelbrot2-ver4.asm\">mandelbrot2-ver4.asm<\/a><br \/>\n<iframe loading=\"lazy\" width=\"420\" height=\"315\" src=\"\/\/www.youtube.com\/embed\/YqEX7v8hW2E\" frameborder=\"0\" allowfullscreen><\/iframe><\/p>\n<h2>Conclusion<\/h2>\n<p>This has turned out to be a very long blog post (my word count is saying over 2,700 words) but hey, I guess I had a lot to say. I had great fun learning assembly with this programming challenge, and I hope you&#8217;ve found my explanation interesting interesting too.  If you can optimise it any more, I&#8217;d be interested to see your results, so do let me know in the comments! <\/p>\n","protected":false},"excerpt":{"rendered":"<p>A while back, I was playing around with a bit of 8086 assembly. I&#8217;m not much of a programmer, but I&#8217;ve done some C{,++}, Python, PHP (sorry) and bits of a few others in the past. All of these are quite high level though, and I felt like going back to basics a bit. I [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-38","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/nothinguntoward.eu\/index.php?rest_route=\/wp\/v2\/posts\/38","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nothinguntoward.eu\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nothinguntoward.eu\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nothinguntoward.eu\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/nothinguntoward.eu\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=38"}],"version-history":[{"count":39,"href":"https:\/\/nothinguntoward.eu\/index.php?rest_route=\/wp\/v2\/posts\/38\/revisions"}],"predecessor-version":[{"id":84,"href":"https:\/\/nothinguntoward.eu\/index.php?rest_route=\/wp\/v2\/posts\/38\/revisions\/84"}],"wp:attachment":[{"href":"https:\/\/nothinguntoward.eu\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=38"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nothinguntoward.eu\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=38"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nothinguntoward.eu\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=38"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}