[ros-dev] [ros-diffs] [tkreuzer] 42353: asm version of DIB_32BPP_ColorFill: - Add frame pointer - Get rid of algin_draw, 32bpp surfaces must be DWORD aligned - Optimize the loop - Add comments
On 4-Aug-09, at 12:36 PM, Timo Kreuzer wrote:
> Alex Ionescu wrote:
>>>> I will provide some code and timings but I love how you ignored my
>> main points:
>>> And you ignored my main points:
> 1) The optimization "around" the function is not important, as the
> function is not called that often, the loop is much more important.
If the function is not called often, then you using ASM as an
optimization is what we call "premature optimization".
You should spend your time profiling the codebase and identifying real
bottlenecks.
> 2) It doesn't matter if the function performs differently on
> different machines as long as it's always faster than the portable
> code.
Which is an assumption you're making. My point is that it won't be.
> 3) Noone is forced to write optimized versions, we have a C version
> for all other architectures..
So now we have a huge performance delta (if it were to exist) between
different architectures, as well as two code bases (and possibly more,
as people write more ASM versions), and eventually there are 10
versions of the same function, with different bugs.
Great!
(Please don't write code in a real company's product, kthx).
> 4) I'm not worried about the loop, the loop is fine the way I wrote
> it ;-)
I don't get it? You just claimed the loop is "90%", and that yours is
better because it's in ASM and uses REP MOVSD. So take the C version,
and make an inline REP MOVSD instead of a memset, and you know have
your exact code, but written in C.
> I just claimed that you couldn't provide a faster C version. Faster
> in terms of real life usage. And I'm yet waiting for you to prove me
> wrong.
>>> 1) The optimizations of the code *around* the function (ie: the
>> callers), which Michael also pointed out, cannot be done in ASM.
>> 2) The fact if you try this code on a Core 2, Pentium 4, Pentium 1
>> and
>> Nehalem you will get totally different results with your ASM code,
>> while the compilers will generate the best possible code.
>> 3) The fact that someone will now have to write optimized versions
>> for
>> each other architecture
>> 4) The fact that if the loop is what you're truly worried about, you
>> can optimize it by hand with __builtinia32_rep_movsd (and MSVC has a
>> similar intrinsic), and still keep the rest of the function
>> portable C.
>>>> Also, gcc does support profiling, another fact you don't seem to
>> know.
>> However, with linker optimizations, you do not need a profiler, the
>> linker will do the static analysis.
>>>> Also, to everyone sayings things like "I was able to save a <operand
>> name here>", I hope you understand that smaller != faster.
>>>> On 4-Aug-09, at 10:13 AM, Timo Kreuzer wrote:
>>>>>>> Michael Steil wrote:
>>>>>>> I wonder, has either of you, Alex or Timo actually *benchmarked*
>>>> the
>>>> code on some sort of native i386 CPU before you argue whether it
>>>> should be a stosb or a stosd? If not, writing assembly would be a
>>>> clear case of "premature optimization".
>>>>>>>>>>> I did. on Athlon X2 64, I called the function a bunch ot times,
>>> with a
>>> 100x100 rect, measuring time with rdtsc the results were quite
>>> random,
>>> but roughly
>>> asm: ~580
>>> gcc 4.2 -march=k8 -fexpensive-optimizations -O3: ~1800
>>> WDK: /GL /Oi /Ot /O2 : ~2600
>>> MSVC 2008 express: /GL /Oi /Ot /O2 ~1800
>>>>>> using a 50x50 rect shifts the advantage slightly in direction of the
>>> asm
>>> implementations.
>>>>>> I added volatile to the pointer to prevent the loop to be optimized
>>> away.
>>> using memset was a bit slower than a normal loop.
>>> This is what msvc produced with the above settings
>>>>>> _DIB_32BPP_ColorFill:
>>> push ebx
>>> mov ebx, [eax+8]
>>> sub ebx, [eax]
>>> test ebx, ebx
>>> jg short label1
>>> xor al, al
>>> pop ebx
>>> retn
>>>>>> label1:
>>> mov ecx, [eax+4]
>>> push esi
>>> mov esi, [eax+0Ch]
>>> sub esi, ecx
>>> test esi, esi
>>> jg short label2
>>> pop esi
>>> xor al, al
>>> pop ebx
>>> retn
>>>>>> label2:
>>> mov eax, [edx+4]
>>> imul ecx, eax
>>> add ecx, [edx]
>>> cdq
>>> and edx, 3
>>> add eax, edx
>>> sar eax, 2
>>> add eax, eax
>>> push edi
>>> mov edi, ecx
>>> add eax, eax
>>> jmp short label3
>>>>>> align 10h
>>> label3:
>>> mov ecx, edi
>>> mov edx, ebx
>>>>>> label4:
>>> mov dword ptr [ecx], 3039h
>>> add ecx, 4
>>> sub edx, 1
>>> jnz short label4
>>>>>> dec esi
>>> add edi, eax
>>> test esi, esi
>>> jg short label3
>>>>>> pop edi
>>> pop esi
>>> mov al, 1
>>> pop ebx
>>> retn
>>>>>>>>>>>> I though myself I did something wrong. For me no compiler was able
>>> to
>>> generate code as fast as the asm code.
>>> I don't know how Alex managed to get better optimizations, maybe he
>>> knows a secret ninja /Oxxx switch, or maybe express and wdk version
>>> both
>>> suck at optimizing or maybe I'm just too supid... ;-)
>>>>>>>>>>>>> See above: If all you want to optimize is the loop, then have C
>>>> code
>>>> with asm("rep movsd") in it, or fix the static inline memcpy() to
>>>> be
>>>> more efficient (if it isn't efficient in the first place).
>>>>>>>>>>> I tried __stosd() which actually resulted in a faster function. with
>>> ~610 gcc was aslmost as fast as the asm implementation, msvc
>>> actually
>>> won with 590. But that was using not pure portable code. It's the
>>> best
>>> solution, it seems, although it will probably still be slower unless
>>> we
>>> set our optimization to max.
>>>>>> Btw, I already thought about rewriting our dib code some time ago.
>>> Using
>>> inline functions instead of a code generator. The idea is to make it
>>> fully portable, optimizable though inline asm functions where useful
>>> and
>>> easier to maintain then the current stuff. It's on my list...
>>>>>> Timo
>>>>>>>>> _______________________________________________
>>> Ros-dev mailing list
>>> Ros-dev at reactos.org
>>> http://www.reactos.org/mailman/listinfo/ros-dev
>>>>>>> Best regards,
>> Alex Ionescu
>>>>>> _______________________________________________
>> Ros-dev mailing list
>> Ros-dev at reactos.org
>> http://www.reactos.org/mailman/listinfo/ros-dev
>>>>>> _______________________________________________
> Ros-dev mailing list
> Ros-dev at reactos.org
> http://www.reactos.org/mailman/listinfo/ros-dev
Best regards,
Alex Ionescu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.reactos.org/pipermail/ros-dev/attachments/20090804/83435eb0/attachment.htm
More information about the Ros-dev
mailing list
CHAPTER VII. THE FOUR CLASSES OF SOCIETY. THE FOUR CLASSES OF SOCIETY. "After the herald had given the names of the wrestlers who were to make the first round, the fellows came in. They were dressed without any clothes to speak of, or rather they were quite undressed, with the exception of a cloth around their loins. They came in on opposite sides of the ring, and stood there about five feet apart, each man resting his hands on his knees, and glaring at the other like a wild beast. They[Pg 231] looked more like a pair of tigers than human beings, and for a moment I thought it was not at all unlike what a bull-fight in Spain might be. I turned upon her choking with anger, but her melting beauty rendered me helpless. Black woods were on our left. "Shall we turn in here?" I asked. "None of that with me," he growled. "Do you know who I am, Countess Lalage? I am Leon Lagage, Count of the Holy Roman Empire, and your husband. Incomparable woman, you cannot alter that fact. For better or worse, for richer or poorer, till death do us part!" I have in this way imperfectly indicated a methodical plan of generating a design, as far as words alone will serve, beginning with certain premises based upon a particular work to be performed, and then proceeding to consider in consecutive order the general character of the machine, mode of operation, movements and adjustments, general arrangement, strains, special arrangement, and proportions. ‘Alas! what is life, what is death, what are we, 11th January two best dresses. Commencement was as usual, with a few showers “All right,” agreed Sandy. “Dick, you and I are the ground crew. As soon as you’re ready, Mr. Whiteside, we’ll take hold!” Effects of Walpole's Administration—Formation of the new Ministry—Attitude of the Malcontents—Committee of Inquiry into Walpole's Administration—Walpole's Protectors—Ministerial Measures—Prorogation of Parliament—Disasters of the French—British Division in the Netherlands—Opening of Parliament—The German Mercenaries—Amendment of the Gin Act—George goes to Germany—Stair and De Noailles in Franconia—Stair in a Trap—Bold Resolution of King George—The Battle of Dettingen—Resignation of Stair—Retreat of the French—Negotiations for Peace—Treaty of Worms—Pelham becomes Prime Minister—The Attacks of Pitt on Carteret—Attempted Invasion of England—Its Failure—Progress of the French Arms—Frederick II. invades Bohemia—His Retirement—Resignation of Carteret—Pelham strengthens his Ministry—Death of the Emperor—Campaign in Flanders—Battle of Fontenoy—Campaign of Frederick II.—The Young Pretender's Preparations—Loss of the Elizabeth—Landing in the Hebrides—The Highland Clans join him—The First Brush—Raising of the Standard—Cope's Mistake—He turns aside at Dalwhinnie—Charles makes a Dash for Edinburgh—The March to Stirling—Right of the Dragoons—The "Canter of Coltbridge"—Edinburgh surprised by the Highlanders—Charles marching against Cope—Battle of Prestonpans—Delay in marching South—Discontent of the Highland Chiefs—The Start—Preparations in England—Apathy of the Aristocracy—Arrival of the Duke of Cumberland—Charles crosses the Border—Capture of Carlisle—The March to Derby—Resolution to retreat—"Black Friday"—The Retreat—Recapture of Carlisle—Siege of Stirling—Battle of Falkirk—Retreat to the Highlands—Cumberland's Pursuit—Gradual Collapse of the Highlanders—Battle of Culloden—Termination of the Rebellion—Cruelty of the Duke of Cumberland—Adventures of the Young Pretender—Trials and Executions—Ministerial Crisis. The next morning he was up betimes, and cooked the boys as good a breakfast as he could out of the remainder of his store and what he could get from the hospital, and then gave what was left to whoever came. The comfortable crib, which had cost the Deacon so much labor, had been pre-empted by the Surgeon for some of his weakest patients. "You two step forward one pace," he commanded. "Gentleman, I've got my six. The rest are yours." "Where are you goin'?" he said sternly. Every now and then the crowd would break into the latest rhymings of MacKinnon's poet: A large thicket, at this moment, gave the dusty foot an opportunity of doubling, and, for an instant, diverging from the straightforward course, though it availed him little, he seemed to feel the breath of his pursuer on the back of his neck; his foot sounded as if at his heels; he drew his garment closely around him, turned suddenly to the right, and, bounding from the ground, the next instant a splash was heard in the little river, and the fugitive was safe from his pursuer. HoME明日之后怎么免费刷一级纳米材料
ENTER NUMBET 0018www.sjtuonline.com.cn www.wiby.com.cn www.aeqv.com.cn www.baic-fwc.com.cn www.yjqc.net.cn www.beginauto.com.cn www.krce.com.cn www.fsnn.com.cn www.arccore.com.cn xabldzkj.com.cn