How to define a utilization function which can be called both by a CUDA kernel and a regular C++ function -


i'm working on project involves lot of mathematics. single target problem( example, gradient calculation), have 2 versions of implementations: 1 cpu version , 1 cuda version.

now cpu version written in regular c++ , kernel version written in cuda. if want define small function, example, vec_weight returns weight of vector, have write 1 cpu compiled g++ cpu version , 1 cuda version has "__device__" before compiled nvcc.

i'm not trying define "__device__ __host__" function here. want kind of library can called regular c++ function , cuda kernel. tried use "__cudacc__" macro didn't work.

because have lot of small utilization functions needed both cpu version , gpu version, think reasonable combine them in one.

writing cpu version in .cu instead of .cpp may solve our problem not want.

so should do?

here code segment:

head.h:

  1 #ifndef head_h   2 #define head_h   3 #ifdef __cplusplus   4 extern "c"{   5 #endif   6 __device__ __host__ void myprint();   7 #ifdef __cplusplus   8 }   9 #endif   10 #endif 

head.cu:

  1 #include "head.h"   2 #include <stdio.h>   3 void myprint(){   4 // here   5 } 

main.cpp

 1 #include "head.h"  2 int main(){  3 myprint();  4 } 

i compiled head.cu by:

nvcc -c head.cu 

link them :

g++ main.cpp head.o -o main   ( reason didn't use nvcc here using pgi's pgcpp in our project , need talk pgi's omp library. i'm sure there wrong here don't know how fix that. ) 

the error messages:

in file included main.cpp:18: head.h:6: error: ‘__device__’ not name type main.cpp: in function ‘int main()’: main.cpp:20: error: ‘myprint’ not declared in scope 

so i'm pretty sure g++ couldn't recognize "__device__" prefix here. our project demands use pgcpp compile cpp file because way can have omp directives works fine both in fortran , c( our project mixes c/c++, fortran , cuda). here g++ can not work, think have fix first.

libraries contain code compiled target processor (cpu or gpu) - hence need compile through nvcc. hence, may put in .cu files.

if can release sources can put code in headers , include them .cpp or .cu files.

update

this did in code (hdf function can called host , device):

file devhost.h

#ifndef functions_h_ #define functions_h_  int myhost_functin(int);  #endif 

file cu.cu:

__host__ __device__ int hdf(int a) {     return + 4; }  int myhost_function(int a) {     return hdf(a); }  __global__ void kern(int *data) {     data[threadidx.x] = hdf(data[threadidx.x]); } 

file cpp.cpp:

#include <stdio.h> #include <stdlib.h>  #include "devhost.h"  int main() {     printf ("%d\n", myhost_function(5));     return 0; } 

this how compile , link it:

nvcc -c cu.cu gcc -c cpp.cpp gcc cpp.o cu.o -lcudart -l/usr/local/cuda-5.5/lib64 

note need link cudart cu file has device call.

update 2

slightly less elegant approach still seems compile having following in header file:

#ifdef __cudacc__ __host__ __device__ #endif static int anotherfunction(int a) {     return * 50; } 

in case have copy of code in every translation unit, increase compilation time , might increase executable size.


Comments

Popular posts from this blog

ios - UICollectionView Self Sizing Cells with Auto Layout -

node.js - ldapjs - write after end error -

DOM Manipulation in Wordpress (and elsewhere) using php -