python - Looking for a strategy for parsing a file -
i'm experienced c programmer, complete python newbie. i'm learning python fun, , first exercise want parse text file, extracting meaningful bits fluff, , ending tab-delimited string of bits in different order.
i've had blast plowing through tutorials , documentation , stackoverflow q&as, merrily splitting strings , reading lines files , etc. think i'm @ point need few road signs experienced folks avoid blind alleys.
here's 1 chunk of text want parse (you may recognize mcmaster order). actual file contain 1 or more chunks this.
1 92351a603 lag screw wood, 18-8 stainless steel, 5/16" diameter, 5" long, packs of 5 part number: 7218-gyroid 22 packs today 5.85 per pack 128.70
note information split on several lines in file. i'd end tab-delimited string looks this:
22\tpacks\tlag screw wood, 18-8 stainless steel, 5/16" diameter, 5" long, packs of 5\t\t92351a603\t5.85\t\t128.70\t7218-gyroid\n
so need extract parts of string while ignoring others, rearrange them bit, , re-pack them string.
here's (very early) code have @ moment, reads file line @ time, splits each line delimiters, , end several lists of strings, including bunch of empty ones there double tabs:
import sys import string def split(delimiters, string, maxsplit=0): """split given string given delimiters (an array of strings) function lifted stackoverflow in post kos""" import re regexpattern = '|'.join(map(re.escape, delimiters)) return re.split(regexpattern, string, maxsplit) delimiters = "\t", "\n", "\r", "your part number: " open(sys.argv[1], 'r') f: line in f: print(split( delimiters, line)) f.close()
question 1 basic: how can remove empty strings lists, mash strings 1 list? in c i'd loop through lists, ignoring empties , sticking other strings in new list. have feeling python has more elegant way sort of thing.
question 2 more open ended: what's robust strategy here? should read more 1 line @ time in first place? make dictionary, allowing easier re-ordering of items later?
sorry novel. pointers. , please, stylistic comments more welcome, style matters.
you can remove empty strings by:
new_list = filter(none, old_list)
replace first parameter lambda expression true elements want keep. passing none equivalent lambda x: x
.
you can mash strings 1 string using:
a_string = "".join(list_of_strings)
if have several lists (of whatever) , want join them 1 list, then:
new_list = reduce(lambda x, y: x+y, old_list)
that concatenate them, can use non-empty string separator.
if you're new python, functions filter
, reduce
(edit: deprecated in python 3) may seem bit alien, save lot of time coding, it's worth getting know them.
i think you're on right track solving problem. i'd this:
- break lines
- break resulting list smaller list, 1 list per order
- parse orders "something meaningful"
- sort, output result
personally, i'd make class handle last 2 parts (they kind of belong logically) without it.
Comments
Post a Comment