JavaScript + RegEx Complications- Searching Strings Not Containing SubString -


i trying use regex search through long string, , having trouble coming expression. trying search through html set of tags beginning tag containing value , ending different tag containing value. code using attempt follows:

matcher = new regexp(".*(<[^>]+" + starttext + "((?!" + endtext + ").)*" + endtext + ")", 'g');  data.replace(matcher, "$1"); 

the strangeness around middle ( ((\\?\\!endtext).)* ) borrowed thread, found here, seems describe problem. issue facing expression matches beginning tag, not find ending tag , instead includes remainder of data. also, lookaround in middle slowed expression down lot. suggestions how can working?

edit: understand parsing html in regex isn't best option (makes me feel dirty), i'm in time-crunch , other alternative can think of take long. it's hard markup parsing like, creating on fly. best can looking @ large table of data collected range of items on range of dates. both of these ranges can vary, , trying select range of dates single row. approximate value of starttext , endtext \\@\\@asset_id\\@\\@_<yyyy_mm_dd>. idea find code corresponds range of cells. (this edit quite possibly have made more confusing, i'm not sure how more information give without explaining entire application).

edit: well, stupid question. apparently, forgot add .* after last paren. can't believe spent long on this! of tried help!

first of all, why there .* dot asterisk in beginning? if have text following:

this text 

and want "my text" pulled out, my\stext. don't have .*.

that being said, since you'll matching need, don't need main capture group around "everything". this: .*(xxx) huge no-no, , can replaced this: xxx. in other words, regex replaced with:

<[^>]+xxx((?!zzz).)*zzz 

from there examine it's doing.

  1. you looking html opening delimeter <. consume it.
  2. you consume @ least 1 character not closing html delimeter, can consume many. important, because if tag <table border=2>, have, @ minimum, far consumed <t, if not more.
  3. you looking starttext. if starttext table, you'll never find it, because have consumed t. replace + *.
  4. the regex still success if following not closing text, starts end of document, because asterisk being greedy. suggest making lazy adding ?.
  5. when backtracking fails, closing text , gather successfully.

the result of logic:

<[^>]*xxx((?!zzz).)*?zzz 

if you're going use dot anyway, okay new regex writers, not suggested seasoned, i'd go this:

<[^>]*xxx.*?zzz 

so javascript, code say:

matcher = new regexp("<[^>]*" + starttext + ".*?" + endtext, 'gi'); 

i put ignorecase "i" in there measure, may or may not want that.


Comments

Popular posts from this blog

ios - UICollectionView Self Sizing Cells with Auto Layout -

node.js - ldapjs - write after end error -

DOM Manipulation in Wordpress (and elsewhere) using php -