python - Find unique columns and column membership -
i went through these threads:
- find unique rows in numpy.array
- removing duplicates in each row of numpy array
- pandas: unique dataframe
and discuss several methods computing matrix unique rows , columns.
however, solutions bit convoluted, @ least untrained eye. here example top solution first thread, (correct me if wrong) believe safest , fastest:
np.unique(a.view(np.dtype((np.void, a.dtype.itemsize*a.shape[1])))).view(a.dtype).reshape(-1, a.shape[1])
either way, above solution returns matrix of unique rows. looking along original functionality of np.unique
u, indices = np.unique(a, return_inverse=true)
which returns, not list of unique entries, membership of each item each unique entry found, how can columns?
here example of looking for:
array([[0, 2, 0, 2, 2, 0, 2, 1, 1, 2], [0, 1, 0, 1, 1, 1, 2, 2, 2, 2]])
we have:
u = array([0,1,2,3,4]) indices = array([0,1,0,1,1,3,4,4,3])
where different values in u
represent set of unique columns in original array:
0 -> [0,0] 1 -> [2,1] 2 -> [0,1] 3 -> [2,2] 4 -> [1,2]
essentially, want np.unique return indexes of unique columns, , indices of they're used? easy enough transposing matrix , using code other question, addition of return_inverse=true
.
at = a.t b = np.ascontiguousarray(at).view(np.dtype((np.void, at.dtype.itemsize * at.shape[1]))) _, u, indices = np.unique(b, return_index=true, return_inverse=true)
with a
, gives:
in [35]: u out[35]: array([0, 5, 7, 1, 6]) in [36]: indices out[36]: array([0, 3, 0, 3, 3, 1, 4, 2, 2, 4])
it's not entirely clear me want u
be, however. if want unique columns, use following instead:
at = a.t b = np.ascontiguousarray(at).view(np.dtype((np.void, at.dtype.itemsize * at.shape[1]))) _, idx, indices = np.unique(b, return_index=true, return_inverse=true) u = a[:,idx]
this give
in [41]: u out[41]: array([[0, 0, 1, 2, 2], [0, 1, 2, 1, 2]]) in [42]: indices out[42]: array([0, 3, 0, 3, 3, 1, 4, 2, 2, 4])
Comments
Post a Comment