JavaScript + RegEx Complications- Searching Strings Not Containing SubString -
i trying use regex search through long string, , having trouble coming expression. trying search through html set of tags beginning tag containing value , ending different tag containing value. code using attempt follows:
matcher = new regexp(".*(<[^>]+" + starttext + "((?!" + endtext + ").)*" + endtext + ")", 'g'); data.replace(matcher, "$1");
the strangeness around middle ( ((\\?\\!endtext).)*
) borrowed thread, found here, seems describe problem. issue facing expression matches beginning tag, not find ending tag , instead includes remainder of data. also, lookaround in middle slowed expression down lot. suggestions how can working?
edit: understand parsing html in regex isn't best option (makes me feel dirty), i'm in time-crunch , other alternative can think of take long. it's hard markup parsing like, creating on fly. best can looking @ large table of data collected range of items on range of dates. both of these ranges can vary, , trying select range of dates single row. approximate value of starttext
, endtext
\\@\\@asset_id\\@\\@_<yyyy_mm_dd>
. idea find code corresponds range of cells. (this edit quite possibly have made more confusing, i'm not sure how more information give without explaining entire application).
edit: well, stupid question. apparently, forgot add .*
after last paren. can't believe spent long on this! of tried help!
first of all, why there .*
dot asterisk in beginning? if have text following:
this text
and want "my text" pulled out, my\stext
. don't have .*
.
that being said, since you'll matching need, don't need main capture group around "everything". this: .*(xxx)
huge no-no, , can replaced this: xxx
. in other words, regex replaced with:
<[^>]+xxx((?!zzz).)*zzz
from there examine it's doing.
- you looking html opening delimeter
<
. consume it. - you consume @ least 1 character not closing html delimeter, can consume many. important, because if tag
<table border=2>
, have, @ minimum, far consumed<t
, if not more. - you looking starttext. if starttext
table
, you'll never find it, because have consumedt
. replace+
*
. - the regex still success if following not closing text, starts end of document, because asterisk being greedy. suggest making lazy adding
?
. - when backtracking fails, closing text , gather successfully.
the result of logic:
<[^>]*xxx((?!zzz).)*?zzz
if you're going use dot anyway, okay new regex writers, not suggested seasoned, i'd go this:
<[^>]*xxx.*?zzz
so javascript, code say:
matcher = new regexp("<[^>]*" + starttext + ".*?" + endtext, 'gi');
i put ignorecase "i" in there measure, may or may not want that.
Comments
Post a Comment