[ros-dev] [ros-diffs] [tkreuzer] 42353: asm version of DIB_32BPP_ColorFill: - Add frame pointer - Get rid of algin_draw, 32bpp surfaces must be DWORD aligned - Optimize the loop - Add comments
Note to everyone else: I just spent some time to do the calculations
and have data proving C code can be faster -- I will post tonight from
home.
Now to get to your argument, Jose..
Best regards,
Alex Ionescu
On Tue, Aug 4, 2009 at 2:19 PM, Jose Catena<jc1 at diwaves.com> wrote:
> With all respect Alex, although I agree with you in the core, that this does
> not deserve the disadvantages of asm for a tiny performance difference if
> any (portability, readability, etc), I don't agree with many your arguments.
Also keep in mind Timo admitted "This code is not called often",
making ASM optimization useless.
>> -->
> 1) The optimizations of the code *around* the function (ie: the
> callers), which Michael also pointed out, cannot be done in ASM.
>> <--
> Yes, it can. I could always outperform or match a C compiler at that, and
> did many times (I'm the author of an original PC BIOS, performance
> libraries, mission critical systems, etc).
> I very often used regs for calling params, local storage through SP instead
> of BP, good use and reuse of registers, etc.
An optimizing compiler will do this too.
> In fact, the loop the compiler generated was identical to the asm source
> except for the two instructions the compiler added (that serve for no
> purpose, it is a msvc issue).
Really? Here's sample code from my faster C version:
.text:004013E0 lea eax, [esi+eax*4]
.text:004013E3 lea esi, ds:0[edi*4]
.text:004013EA lea eax, [ebp+eax+0]
.text:004013EE db 66h
.text:004013EE nop
99% percent of people on this list (and you, probably) will tell me
"this is a GCC issue" or that this is "useless code".
Guess what, I compiled with mtune=core2 and this code sequence is
specifically generated before the loop.
Timo, and I admit not even myself, would think of adding this kind of
code. But once I asked some experts what this does, I understood why
it's there.
To quote Michael "if you think the compiler is generating useless
code, try to find out what the code is doing." In most cases, your
thinking that it is "wrong" or "useless" is probably wrong itself.
As a challenge, can you tell me the point of this code? Why is it
written this way? If I build for 486 (which is what ALL OF YOU SEEM TO
BE STUCK ON!!!), I get code that looks like Timo's.
> It is actually in the calling overhead and local initialization and storage
> where I could easily beat the compiler, since it complies with rules that I
> can safely break.
That doesn't make any sense. You are AGREEING with me. My point is
that a compiler will break normal calling rules while the assembly
code will have to respect at least some rules, because you won't know
apriori all your callers (you might in a BIOS, but not in a giant code
like win32k). The compiler on the other hand, DOES know all the
callers, and will hapilly clober registers, change the calling
convention, etc. Please re-read Michael's email.
> Furthermore, in most cases a compiler won't change calling convention unless
> the source specifies it
Completely not true. Compilers will do this. This is not 1994 anymore.
, and in any case the register based calling used by
> compilers is way restricted compared with what can be done in asm which can
> always use more efficient methods (more extensive and intelligent register
> allocation).
Again, simply NOT true. Today's compilers will be able to do things
like "All callers of foo must have param 8 in ECX", and will write the
code that way, not to save/restore ECX, and to use it as a parameter.
You CANNOT do this in assembly unless you have a very small number of
callers that you know nobody else will touch. As soon as someone else
adds a caller, YOU have to do all the work to make ECX work that way.
You seem to have a very 1990ies understanding of how compilers work
(respecting calling conventions, save/restoring registers, not
touching ebp, etc). Probably because you worked on BIOSes, which yes,
in that time, worked that way.
Please read a bit into the technologies such as LLVM or microsoft's
link time code generator.
> In any case, the most important optimizations are equally done in C and
> assembly when the programmer knows how to write optimum code and does not
> have to comply with a prototype.
Again, NO. Unless you control all your callsites and are willing to
update the code each single time a cal site gets added, the compiler
WILL beat you. LLVM and LTCG can even go 2-3 call sites away, such
that callers of foo which call bar which call baz have some sort of
stack frame or register content that will make barbaz faster.
> For example passing arguments as a pointer
> to an struct is always more efficient.
>
It actually depends, and again the compiler can make this choice.
> -->
> 2) The fact if you try this code on a Core 2, Pentium 4, Pentium 1 and
> Nehalem you will get totally different results with your ASM code,
> while the compilers will generate the best possible code.
>> <--
> There are very few and specific cases where the optimum code for different
> processors is different, and this is not the case.
False. I got radically different ASM when building for K8, I7, Core2
and Pentium.
> If gcc generates different code for this function and different CPUs, it is
> not for a good reason.
Excuse me?
> There is only a meaningful exception for this function: if the inner loop
> can use a 64 bit rep stos instead of 32. And in this case it can be done in
> asm, while I don't know any compiler that would use a 64 bit rep stos
> instruction for a 32 bit target regardless of the CPU having 64 bit
> registers.
Again, this is full of assumptions. You seem to be saying "GCC is
stupid, I know better". Yet you don't even understand WHY gcc will
generate different code for different CPUs.
Please read into the topics of "pipelines" and "caches" and
"micro-operations" as a good starting point.
>> -->
> 4) The fact that if the loop is what you're truly worried about, you
> can optimize it by hand with __builtinia32_rep_movsd (and MSVC has a
> similar intrinsic), and still keep the rest of the function portable C.
>> <--
> It is not necessary to use to use a built in function like you mention,
> because any optimizing compiler will use rep movsd anyway, with better
> register allocation if any different.
Ummm, if you think "rep movsd" is what an optimizing compiler will
use, then I'm sorry but you don't have the credentials to be in this
argument, and I'm wasting my time. Rep Movsd is the SLOWEST way to
achieve this loop on modern CPUs. On my core2 build, for example, gcc
used "mov" and a loop instead. Only when building for Pentium 1, did
it use a rep movsd.
Please stop thinking that 1 line of ASM is faster than 12 lines,
because 12 > 1. On modern CPUs, a "manual" loop will be faster than a
rep movsd, nearly ALWAYS.
> If inline asm is used instead, optimizations for the whole function are
> disabled, as the compiler does not analyze what's done in inline assembly.
LOL??? Again, maybe true in the 1990ies. but first of all:
1) Built-ins are not "inline asm", and will be optimized
2) GCC and MSVC both will optimize the inline assembler according to
the function the inline is present in. The old mantra that "inline asm
disables optimizations" hasn't been true since about 2001...
In fact, when assembly is *required* (for something like a trap save),
it is ALWAYS better to use an inline __asm__ block within the C
function, then to call the external function in an .S or .ASM file,
because compilers like gcc will be able to fine-tune the assembly you
wrote, and modify it to work better with the C code. LTCG will, in
some cases, optimize the ASM you wrote by hand in the external .ASM
file as well.
>> -->
> Also, gcc does support profiling, another fact you don't seem to know.
> However, with linker optimizations, you do not need a profiler, the
> linker will do the static analysis.
>> <--
> Function level linking and profiling based optimization are very different
> things, the linker in no way can perform a similar statistical analysis.
But it can make static analysis.
>> -->
> Also, to everyone sayings things like "I was able to save a <operand
> name here>", I hope you understand that smaller != faster.
>> <--
> The save of these two instructions improve both the speed and size. Note
> that the loop the compiler generated was exactly the same as the original
> assembly, only with those two instructions added. I discern where I save
> speed, size, both, or none, in either C or assembly.
>> I wrote this not to be argumentative or confrontational, but just because I
> don't like to read arguments that are not true, and I hope you all take this
> as constructive knowledge.
> BTW, I hardly support the use of assemly except in very specific cases, and
> this is not one. I disagreed with Alex in the arguments, not in the core.
Thanks Jose, but unfortunately you are wrong. If we were having this
argument in:
1) 1986
2) on a 486
3) about BIOS code (which is small and rarely extended, with all calls
"Controlled")
I would bow down and give you my hat in an instant, but times have changed.
I don't want to waste more time on these arguments, because I know I'm
right and I've asked several people which all agree with me -- people
that work closely with Intel, compiler technology and assembly. I
cannot convince people that don't even have the basic knowledge to be
able to UNDERSTAND the arguments. Do some reading, then come back.
I will post numbers and charts when I'm home, at least they will
provide some "visual" confirmation of what I'm saying, but I doubt
that will be enough.
>> Jose Catena
> DIGIWAVES S.L.
>>>> _______________________________________________
> Ros-dev mailing list
> Ros-dev at reactos.org
> http://www.reactos.org/mailman/listinfo/ros-dev
>
More information about the Ros-dev
mailing list
CHAPTER VII. THE FOUR CLASSES OF SOCIETY. THE FOUR CLASSES OF SOCIETY. "After the herald had given the names of the wrestlers who were to make the first round, the fellows came in. They were dressed without any clothes to speak of, or rather they were quite undressed, with the exception of a cloth around their loins. They came in on opposite sides of the ring, and stood there about five feet apart, each man resting his hands on his knees, and glaring at the other like a wild beast. They[Pg 231] looked more like a pair of tigers than human beings, and for a moment I thought it was not at all unlike what a bull-fight in Spain might be. I turned upon her choking with anger, but her melting beauty rendered me helpless. Black woods were on our left. "Shall we turn in here?" I asked. "None of that with me," he growled. "Do you know who I am, Countess Lalage? I am Leon Lagage, Count of the Holy Roman Empire, and your husband. Incomparable woman, you cannot alter that fact. For better or worse, for richer or poorer, till death do us part!" I have in this way imperfectly indicated a methodical plan of generating a design, as far as words alone will serve, beginning with certain premises based upon a particular work to be performed, and then proceeding to consider in consecutive order the general character of the machine, mode of operation, movements and adjustments, general arrangement, strains, special arrangement, and proportions. ‘Alas! what is life, what is death, what are we, 11th January two best dresses. Commencement was as usual, with a few showers “All right,” agreed Sandy. “Dick, you and I are the ground crew. As soon as you’re ready, Mr. Whiteside, we’ll take hold!” Effects of Walpole's Administration—Formation of the new Ministry—Attitude of the Malcontents—Committee of Inquiry into Walpole's Administration—Walpole's Protectors—Ministerial Measures—Prorogation of Parliament—Disasters of the French—British Division in the Netherlands—Opening of Parliament—The German Mercenaries—Amendment of the Gin Act—George goes to Germany—Stair and De Noailles in Franconia—Stair in a Trap—Bold Resolution of King George—The Battle of Dettingen—Resignation of Stair—Retreat of the French—Negotiations for Peace—Treaty of Worms—Pelham becomes Prime Minister—The Attacks of Pitt on Carteret—Attempted Invasion of England—Its Failure—Progress of the French Arms—Frederick II. invades Bohemia—His Retirement—Resignation of Carteret—Pelham strengthens his Ministry—Death of the Emperor—Campaign in Flanders—Battle of Fontenoy—Campaign of Frederick II.—The Young Pretender's Preparations—Loss of the Elizabeth—Landing in the Hebrides—The Highland Clans join him—The First Brush—Raising of the Standard—Cope's Mistake—He turns aside at Dalwhinnie—Charles makes a Dash for Edinburgh—The March to Stirling—Right of the Dragoons—The "Canter of Coltbridge"—Edinburgh surprised by the Highlanders—Charles marching against Cope—Battle of Prestonpans—Delay in marching South—Discontent of the Highland Chiefs—The Start—Preparations in England—Apathy of the Aristocracy—Arrival of the Duke of Cumberland—Charles crosses the Border—Capture of Carlisle—The March to Derby—Resolution to retreat—"Black Friday"—The Retreat—Recapture of Carlisle—Siege of Stirling—Battle of Falkirk—Retreat to the Highlands—Cumberland's Pursuit—Gradual Collapse of the Highlanders—Battle of Culloden—Termination of the Rebellion—Cruelty of the Duke of Cumberland—Adventures of the Young Pretender—Trials and Executions—Ministerial Crisis. The next morning he was up betimes, and cooked the boys as good a breakfast as he could out of the remainder of his store and what he could get from the hospital, and then gave what was left to whoever came. The comfortable crib, which had cost the Deacon so much labor, had been pre-empted by the Surgeon for some of his weakest patients. "You two step forward one pace," he commanded. "Gentleman, I've got my six. The rest are yours." "Where are you goin'?" he said sternly. Every now and then the crowd would break into the latest rhymings of MacKinnon's poet: A large thicket, at this moment, gave the dusty foot an opportunity of doubling, and, for an instant, diverging from the straightforward course, though it availed him little, he seemed to feel the breath of his pursuer on the back of his neck; his foot sounded as if at his heels; he drew his garment closely around him, turned suddenly to the right, and, bounding from the ground, the next instant a splash was heard in the little river, and the fugitive was safe from his pursuer. HoME明日之后怎么免费刷一级纳米材料
ENTER NUMBET 0018www.cassrna.com.cn fsmingjiang.com.cn sweetpiggy.com.cn www.laurel.org.cn www.sruh.com.cn zjy520.com.cn sshlt.com.cn www.cqnl.net.cn vridol.com.cn vcanlink.com.cn