Counting unique elements in sections of .csv columns (Python) -


i have .csv file of geological formations , occurrences of fossil species @ each formation. each fossil has own row in .csv file, formation name included in row.

the code wrote below printed out number of formation occurrences fine.

import csv collections import counter  out=open("bivalviagrdwis.csv", "rb") data=csv.reader(out) data.next() data=[row row in data] out.close()  formations = []   row in data:     if row[13]=='':         continue     else:                formations.append(row[13])  print counter(formations) 

however, there may duplicate fossil names ruin count; want number of unique fossils @ each formation. can add count unique elements in section of single column .csv file, rather elements?

you need keep track of fossils have seen, per formation. collections.defaultdict() object makes coding easiest; keeps set per formation can test against:

import csv collections import counter, defaultdict  fossil = 0   # fossil name first column (?) form   = 13  # formation 14th column  open("bivalviagrdwis.csv", "rb") inputfile:     data = csv.reader(inputfile)     next(data)  # skip header      seen = defaultdict(set)      counts = counter(         row[form]         row in data         if row[form] , row[form] not in seen[row[fossil]] , not seen[row[fossil]].add(row[form])     )  print counts 

the above code 'streams' csv rows in 1 generator expression straight counter() object; no intermediate data kept.

each row tested to:

  • see formation column not empty
  • see formation not yet recorded given fossil
  • record formation given fossil

i've assumed fossil name in column 0; didn't specify how extract fossil name in question.


Comments

Popular posts from this blog

ios - UICollectionView Self Sizing Cells with Auto Layout -

node.js - ldapjs - write after end error -

DOM Manipulation in Wordpress (and elsewhere) using php -