Counting unique elements in sections of .csv columns (Python) -
i have .csv file of geological formations , occurrences of fossil species @ each formation. each fossil has own row in .csv file, formation name included in row.
the code wrote below printed out number of formation occurrences fine.
import csv collections import counter out=open("bivalviagrdwis.csv", "rb") data=csv.reader(out) data.next() data=[row row in data] out.close() formations = [] row in data: if row[13]=='': continue else: formations.append(row[13]) print counter(formations)
however, there may duplicate fossil names ruin count; want number of unique fossils @ each formation. can add count unique elements in section of single column .csv file, rather elements?
you need keep track of fossils have seen, per formation. collections.defaultdict()
object makes coding easiest; keeps set
per formation can test against:
import csv collections import counter, defaultdict fossil = 0 # fossil name first column (?) form = 13 # formation 14th column open("bivalviagrdwis.csv", "rb") inputfile: data = csv.reader(inputfile) next(data) # skip header seen = defaultdict(set) counts = counter( row[form] row in data if row[form] , row[form] not in seen[row[fossil]] , not seen[row[fossil]].add(row[form]) ) print counts
the above code 'streams' csv rows in 1 generator expression straight counter()
object; no intermediate data kept.
each row tested to:
- see formation column not empty
- see formation not yet recorded given fossil
- record formation given fossil
i've assumed fossil name in column 0; didn't specify how extract fossil name in question.
Comments
Post a Comment