Not logged in, Join Here! or Log In Below:  
 
News Articles Search    
 

 Home / General Programming / 'memcpy' : intrinsic function, cannot be defined - gaaaaaa Account Manager
 
Archive Notice: This thread is old and no longer active. It is here for reference purposes. This thread was created on an older version of the flipcode forums, before the site closed in 2005. Please keep that in mind as you view this thread, as many of the topics and opinions may be outdated.
 
pixelpajas

February 21, 2005, 06:40 PM

Ok. I'm not using any default libs (/NODEFAULTLIB) in the project.

I defined my own memcpy replacement function.

void *memcpy(void *s1, const void *s2, size_t n)
{
CopyMemory(s1, s2, n);
return s1;
}

And i get a : error C2169: 'memcpy' : intrinsic function, cannot be defined

Anyone that can help me out here?
How come the compiler is complaining about
a function that has not been defined anywhere else?
I am using Using VC++ 2005 btw.

I can do a - #pragma function(memcpy)
But i kind of want it to inline the function when I'm optimizing later.
And i also get a: warning C4717: 'memcpy' : recursive on all control paths, function will cause runtime stack overflow
Should i ignore this warning, as it seems bogus?



cheers
//iman


 
Bart

February 21, 2005, 07:11 PM

I would guess that the warning "warning C4717: 'memcpy' : recursive on all control paths, function will cause runtime stack overflow" is trying to tell you that CopyMemory() calls memcpy().

In which case, it would be recursive in a bad way.

 
Dr. Necessiter

February 21, 2005, 09:22 PM

IIRC, when "intrinsic" is used, this means the compiler is directly using a processor instruction to handle the function. In this case rep movs et. all. Maybe there is a pragma or flag to disable this intrinsic.

 
Per Vognsen

February 22, 2005, 12:15 AM

Dr. Necessiter wrote: IIRC, when "intrinsic" is used, this means the compiler is directly using a processor instruction to handle the function. In this case rep movs et. all. Maybe there is a pragma or flag to disable this intrinsic.


Uh, yeah. Go to the project settings -> C/C++ -> Optimization -> Enable Intrinsic Functions.

 
IronChefC++

February 24, 2005, 08:22 PM

I think the better choice is to just use the intrinsic function. I'm fairly certain the intrinsics will both perform better and compile smaller than any hand-written C function. Besides memcpy, what about throwing away all the other intrinsics: sin, cos, etc.? I would just sit back and let the compiler use intrinsics on this one. But that might be just me.

 
Gerald Knizia (cgk)

February 25, 2005, 12:57 AM

If he's not using default libs, he probably wants to create a very small executable. The standard-memcpy is a pretty large function however because of its optimizations.

His best bet for a memcpy function would probably be something along the lines of this (or the equiavalent using [] operator, which i've just learned to perform equally well or better on modern compilers..):

  1.  
  2. void* memcpy( void *dest, void *src, size_t count )
  3. {
  4.    char
  5.      *itSrc = static_cast<char*>( src ),
  6.      *itDest = static_cast<char*>( dest ),
  7.      *itEnd = static_cast<char*>( dest ) + count;
  8.    for ( ; itDest != itEnd; ++itDest, ++itSrc )
  9.       *itDest = *itSrc
  10. }
  11.  

 
pixelpajas

February 25, 2005, 03:31 AM

Actually i did a mmxmemcpy instead.
Way better than the rep movsd that is
generated by most compilers.

Also.. the standard memcpy in most
compilers is very small.... like a couple of opcodes.


cheers
//iman

 
Fabian 'ryg' Giesen

February 25, 2005, 04:37 AM

The intrinsic memcpy is just rep movsd/rep movsb (depending on size) and does not result in a call at all.

 
Fabian 'ryg' Giesen

February 25, 2005, 04:40 AM

"Way better than the rep movsd that is generated by most compilers."
Uh, that is so for relatively big blocks (a few kilobytes and up), but for relatively small copies (which are quite common) you're really best off with just stupid rep movsd.

 
Dr. Necessiter

February 25, 2005, 07:57 AM

Here's a question. I haven't looked lately, but the mem copy/move routines used to call some code that handled alignment issues, making sure to use the doubleword moves as much as possible. Does the intrinsic instruction only get generated if the move is aligned?

 
Gerald Knizia (cgk)

February 25, 2005, 08:24 AM

Oh, okay. Sorry for the confusion i've caused then. I assumed something like the real memcpy function would be generated and already wondered how anyone would want to use that as an intrinsic.

 
Gerald Knizia (cgk)

February 25, 2005, 09:00 AM

Side note: I just tested "my function" on VC 2k2 and the compiler kind of detected that this was supposed to be a memcpy. I.e. my function (inlined) and the intrinsic memcpy produced identical assembly output.

Further side note: The produced code was not always a rep stosb. Up to a (constant) size of 19 bytes it completely unrolled the loops when using the standard optimization settings (favor neither speed nor size). If set for small code generation, it produced a sequence of serveral movsd/b for small constant sizes, always resulting in very small resulting code.

Compilers have become really clever in the meantime.

 
pixelpajas

February 25, 2005, 03:47 PM


Sorry. I dunno why i didnt write that. I was thinking it but it never came down on the keyborad. =P

Yep your right.

//iman

 
Fabian 'ryg' Giesen

February 26, 2005, 05:07 AM

You mean, compilers have rather big pattern matching tables for instruction selection by now ;)

This is especially obvious with VC++ - it has loads of optimized code snippets for certain patterns that appear regularly in C(++) code (memory copies, several special usages of operator ?:, some string operations), but the low-level organization of generated code is sometimes rather brain-damaged even with maximum optimization; VC++ regularly loads values in registers and then doesn't use them, stores (non-volatile) integer variables in memory just to load them again in the next instruction, etc. A simple peephole optimizer that just makes another pass over the generated code to clean up such mess would work wonders I think ;)

 
El Pinto Grande

February 26, 2005, 08:53 AM

>A simple peephole optimizer that just makes another pass over the generated code to clean up such mess would work wonders I think ;)
I know you've put smileys, but don't you think you're over-simplificating?
Modern compilers already use dozen and dozen of different passes.

Anyway msvc(2k3) isn't worth comparing against as it never was too hot regarding codegen.

 
El Pinto Grande

February 26, 2005, 08:58 AM

Horrible.

A compiler using a builtin memcpy could (read will) make more assumptions about the surrounding code/arguments or function usage. Tho standard compliance may ask for pessimization.

 
Chris

February 27, 2005, 04:16 AM

It isn't always possible to decide at compile time whether a move is aligned. For example if the destination is specified by a pointer variable and the compiler is unable to decide what this pointer will point at.

I think the intrinsic version of memcpy handles alignment as well.

 
Gerald Knizia (cgk)

February 27, 2005, 01:08 PM

You mean, compilers have rather big patternmatching tables for instruction selection by now ;)


Well, it's not like real intelligence would work much different from that. It's the output that counts. And as it seems, the compiler understands the _concept_ of "moving memory around". This is has impressed me.

VC++ regularly loads values in registers and then doesn't use them

Could it be that it does that to precache memory locations in a compatible way? Afaik the PREFETCH instructions have been altered in their meaning serveral times and this pattern sounds like some reasonable way to get cache lines into place if the used registers are spare.

 
Gerald Knizia (cgk)

February 27, 2005, 01:15 PM

Horrible.

Much less horrible than something like the real memcpy function (of vc) would be used as an intrinsic. The code I posted could never be compiled in a way that would result in really large code if inlined. I was wrong however as Fabian already pointed out, the intrinsic does generate senseful code.

A compiler using a builtin memcpy could (read will) make more assumptions about the surrounding code/arguments or function usage.

"(read will)" -> I already pointed out that this is not the case. At least VC seemed to understand the concept of "moving memory around" and generated identical code. My problem with a builtin memcpy intrinsic was that i did assume it tries to align movs and partially unrolls loops (like the real memcpy function does), which would be something you _really do not want_ in this place.

Again, I'm sorry i caused this confusion.

 
El Pinto Grande

February 27, 2005, 01:41 PM

The function you've posted mandates char wide access and is aliasing prone, among other things.

Anyway i'm not going to argue about msvc codegen as it is *censored*.

PS: about your prefetch hypothesis, msvc(2k3) has no such notion.

 
Fabian 'ryg' Giesen

February 27, 2005, 04:48 PM

No, I mean stuff like

  1.  
  2.   // do something with eax
  3.   mov  [var], eax
  4.   mov  ecx, [var]
  5.   // do something more with ecx
  6.  


The store might or might not be necessary in that case, but unless "var" is marked as volatile, the second load is completely unnecessary and could just as well have been coded as mov ecx, eax. This kind of stuff does happen relatively often in VC++ generated code (even with maximum optimization). And concerning El Pintos comment, no, I don't think I'm over-simplifying here; this kind of optimization can really be purely done on assembler level with minimal knowledge of what the code represents, and in a linear pass over the generated code. I'm just talking about redundant load elimination/copy forwarding here.

 
El Pinto Grande

February 28, 2005, 12:55 AM

Fabian 'ryg' Giesen wrote: And concerning El Pintos comment, no, I don't think I'm over-simplifying here; this kind of optimization can really be purely done on assembler level with minimal knowledge of what the code represents, and in a linear pass over the generated code. I'm just talking about redundant load elimination/copy forwarding here.


Then you are over-simplifying.
It's called dead store/code removal, it's a bit more intricate than you think and has a few gotchas (ie the classic, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8537).

 
This thread contains 22 messages.
 
 
Hosting by Solid Eight Studios, maker of PhotoTangler Collage Maker.