python - Looking for a strategy for parsing a file -


i'm experienced c programmer, complete python newbie. i'm learning python fun, , first exercise want parse text file, extracting meaningful bits fluff, , ending tab-delimited string of bits in different order.

i've had blast plowing through tutorials , documentation , stackoverflow q&as, merrily splitting strings , reading lines files , etc. think i'm @ point need few road signs experienced folks avoid blind alleys.

here's 1 chunk of text want parse (you may recognize mcmaster order). actual file contain 1 or more chunks this.

1   92351a603   lag screw wood, 18-8 stainless steel, 5/16" diameter, 5" long, packs of 5 part number: 7218-gyroid 22 packs   today 5.85 per pack     128.70 

note information split on several lines in file. i'd end tab-delimited string looks this:

22\tpacks\tlag screw wood, 18-8 stainless steel, 5/16" diameter, 5" long, packs of 5\t\t92351a603\t5.85\t\t128.70\t7218-gyroid\n 

so need extract parts of string while ignoring others, rearrange them bit, , re-pack them string.

here's (very early) code have @ moment, reads file line @ time, splits each line delimiters, , end several lists of strings, including bunch of empty ones there double tabs:

import sys import string  def split(delimiters, string, maxsplit=0):     """split given string given delimiters (an array of strings)     function lifted stackoverflow in post kos"""     import re     regexpattern = '|'.join(map(re.escape, delimiters))     return re.split(regexpattern, string, maxsplit)  delimiters = "\t", "\n", "\r", "your part number: " open(sys.argv[1], 'r') f:     line in f:         print(split( delimiters, line))  f.close() 

question 1 basic: how can remove empty strings lists, mash strings 1 list? in c i'd loop through lists, ignoring empties , sticking other strings in new list. have feeling python has more elegant way sort of thing.

question 2 more open ended: what's robust strategy here? should read more 1 line @ time in first place? make dictionary, allowing easier re-ordering of items later?

sorry novel. pointers. , please, stylistic comments more welcome, style matters.

you can remove empty strings by:

new_list = filter(none, old_list) 

replace first parameter lambda expression true elements want keep. passing none equivalent lambda x: x.

you can mash strings 1 string using:

a_string = "".join(list_of_strings) 

if have several lists (of whatever) , want join them 1 list, then:

new_list = reduce(lambda x, y: x+y, old_list) 

that concatenate them, can use non-empty string separator.

if you're new python, functions filter , reduce (edit: deprecated in python 3) may seem bit alien, save lot of time coding, it's worth getting know them.

i think you're on right track solving problem. i'd this:

  • break lines
  • break resulting list smaller list, 1 list per order
  • parse orders "something meaningful"
  • sort, output result

personally, i'd make class handle last 2 parts (they kind of belong logically) without it.


Comments

Popular posts from this blog

ios - UICollectionView Self Sizing Cells with Auto Layout -

node.js - ldapjs - write after end error -

DOM Manipulation in Wordpress (and elsewhere) using php -