r - how to programmatically back-out, deduce, decompile, reverse-engineer the algorithm used to construct a variable in a data set -
i'm looking algorithm or program or function deduce how variable created, long supply other variables. think computer programmers call "decompiling" , architects call "reverse-engineering" guess don't know statisticians call it..or if there accepted methods it.
let's i've got categorical column in data.frame
called newvar
, don't know how constructed. do know variables used create it..or @ least can provide exhaustive set of variables used create -- if not of them used.
# start example data set x <- mtcars # # # # # # # # # # # # # # # # # # # # # # # # # pretend block of code black box x <- transform( x , newvar = ifelse( mpg > 24 , 1 , ifelse( cyl == 6 , 9 , ifelse( hp > 120 , 4 , ifelse( mpg > 22 , 7 , 2 ) ) ) ) ) # end of unknown block of code # # # # # # # # # # # # # # # # # # # # # # # # # knowing `mtcars` has 11 columns choose names(x) # how these 11 columns used construct `newvar`? table( x$newvar ) # here's start.. y <- data.frame( ftable( x[ , c( 'mpg' , 'cyl' , 'hp' , 'newvar' ) ] ) ) # ..combinations records y[y[,5]!=0,] # that's not enough back-out construction
so think out construction of newvar
linear regression or decision trees, still require bit of thinking , piecing coefficients figure out happened inside black box.
is there algorithm available guesses @ black box, so-to-speak? thanks!!
in general, no. , applying lot of knowledge going on, still (probably) no. let me show example example. adding knowledge of "black box" output discrete values , derived based on thresholds of other values, classification tree should able recover criteria. so:
library("party") tmp <- ctree(factor(newvar) ~ ., data=x, controls=ctree_control(mincriterion=0, minsplit=2, minbucket=1))
i've set control values unreasonable values force algorithm drive each bucket containing single value. , not started with:
so simple example , adding more knowledge transformation, can not done, there not hope able in general case.
Comments
Post a Comment