JavaScript + RegEx Complications- Searching Strings Not Containing SubString -
i trying use regex search through long string, , having trouble coming expression. trying search through html set of tags beginning tag containing value , ending different tag containing value. code using attempt follows:
matcher = new regexp(".*(<[^>]+" + starttext + "((?!" + endtext + ").)*" + endtext + ")", 'g'); data.replace(matcher, "$1"); the strangeness around middle ( ((\\?\\!endtext).)* ) borrowed thread, found here, seems describe problem. issue facing expression matches beginning tag, not find ending tag , instead includes remainder of data. also, lookaround in middle slowed expression down lot. suggestions how can working?
edit: understand parsing html in regex isn't best option (makes me feel dirty), i'm in time-crunch , other alternative can think of take long. it's hard markup parsing like, creating on fly. best can looking @ large table of data collected range of items on range of dates. both of these ranges can vary, , trying select range of dates single row. approximate value of starttext , endtext \\@\\@asset_id\\@\\@_<yyyy_mm_dd>. idea find code corresponds range of cells. (this edit quite possibly have made more confusing, i'm not sure how more information give without explaining entire application).
edit: well, stupid question. apparently, forgot add .* after last paren. can't believe spent long on this! of tried help!
first of all, why there .* dot asterisk in beginning? if have text following:
this text and want "my text" pulled out, my\stext. don't have .*.
that being said, since you'll matching need, don't need main capture group around "everything". this: .*(xxx) huge no-no, , can replaced this: xxx. in other words, regex replaced with:
<[^>]+xxx((?!zzz).)*zzz from there examine it's doing.
- you looking html opening delimeter
<. consume it. - you consume @ least 1 character not closing html delimeter, can consume many. important, because if tag
<table border=2>, have, @ minimum, far consumed<t, if not more. - you looking starttext. if starttext
table, you'll never find it, because have consumedt. replace+*. - the regex still success if following not closing text, starts end of document, because asterisk being greedy. suggest making lazy adding
?. - when backtracking fails, closing text , gather successfully.
the result of logic:
<[^>]*xxx((?!zzz).)*?zzz if you're going use dot anyway, okay new regex writers, not suggested seasoned, i'd go this:
<[^>]*xxx.*?zzz so javascript, code say:
matcher = new regexp("<[^>]*" + starttext + ".*?" + endtext, 'gi'); i put ignorecase "i" in there measure, may or may not want that.
Comments
Post a Comment