How to define a utilization function which can be called both by a CUDA kernel and a regular C++ function -
i'm working on project involves lot of mathematics. single target problem( example, gradient calculation), have 2 versions of implementations: 1 cpu version , 1 cuda version.
now cpu version written in regular c++ , kernel version written in cuda. if want define small function, example, vec_weight returns weight of vector, have write 1 cpu compiled g++ cpu version , 1 cuda version has "__device__" before compiled nvcc.
i'm not trying define "__device__ __host__" function here. want kind of library can called regular c++ function , cuda kernel. tried use "__cudacc__" macro didn't work.
because have lot of small utilization functions needed both cpu version , gpu version, think reasonable combine them in one.
writing cpu version in .cu instead of .cpp may solve our problem not want.
so should do?
here code segment:
head.h:
1 #ifndef head_h 2 #define head_h 3 #ifdef __cplusplus 4 extern "c"{ 5 #endif 6 __device__ __host__ void myprint(); 7 #ifdef __cplusplus 8 } 9 #endif 10 #endif
head.cu:
1 #include "head.h" 2 #include <stdio.h> 3 void myprint(){ 4 // here 5 }
main.cpp
1 #include "head.h" 2 int main(){ 3 myprint(); 4 }
i compiled head.cu by:
nvcc -c head.cu
link them :
g++ main.cpp head.o -o main ( reason didn't use nvcc here using pgi's pgcpp in our project , need talk pgi's omp library. i'm sure there wrong here don't know how fix that. )
the error messages:
in file included main.cpp:18: head.h:6: error: ‘__device__’ not name type main.cpp: in function ‘int main()’: main.cpp:20: error: ‘myprint’ not declared in scope
so i'm pretty sure g++ couldn't recognize "__device__" prefix here. our project demands use pgcpp compile cpp file because way can have omp directives works fine both in fortran , c( our project mixes c/c++, fortran , cuda). here g++ can not work, think have fix first.
libraries contain code compiled target processor (cpu or gpu) - hence need compile through nvcc. hence, may put in .cu files.
if can release sources can put code in headers , include them .cpp or .cu files.
update
this did in code (hdf
function can called host , device):
file devhost.h
#ifndef functions_h_ #define functions_h_ int myhost_functin(int); #endif
file cu.cu
:
__host__ __device__ int hdf(int a) { return + 4; } int myhost_function(int a) { return hdf(a); } __global__ void kern(int *data) { data[threadidx.x] = hdf(data[threadidx.x]); }
file cpp.cpp
:
#include <stdio.h> #include <stdlib.h> #include "devhost.h" int main() { printf ("%d\n", myhost_function(5)); return 0; }
this how compile , link it:
nvcc -c cu.cu gcc -c cpp.cpp gcc cpp.o cu.o -lcudart -l/usr/local/cuda-5.5/lib64
note need link cudart cu file has device call.
update 2
slightly less elegant approach still seems compile having following in header file:
#ifdef __cudacc__ __host__ __device__ #endif static int anotherfunction(int a) { return * 50; }
in case have copy of code in every translation unit, increase compilation time , might increase executable size.
Comments
Post a Comment