filter paragraphs from a XML using PHP and HTMLPurifier, SimpleXmlElement or DOM -


i'm trying remove social media buttons, leaving paragraphs, description field of xml here (it's big post here).

edit: since of couldn't access xml, follow part of 1 of description tags:

    <description>  <!-- twitter https://twitter.com/about/resources/buttons#tweet --> <script> document.write('<a href="https://www.twitter.com/tst_oficial" class="twitter-follow-button" data-show-count="false" data-lang="pt">seguir</a>'); !function(d,s,id){var js,fjs=d.getelementsbytagname(s)[0];if(!d.getelementbyid(id)){js=d.createelement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentnode.insertbefore(js,fjs);}}(document,"script","twitter-wjs");</script>  <!-- curtir site facebook (enviar) --> <iframe class="fb_ltr" src="http://www.facebook.com/plugins/like.php?href=https://www.facebook.com/tstjus&layout=button_count&show_faces=false&action=like&colorscheme=light&width=25&height=25&locale=pt_br" scrolling="no" frameborder="0" style="border:0px; margin-left:30px; overflow:hidden; width:120px; height:25px;vertical-align:bottom;" allowtransparency="true"></iframe>  <!-- google plus +1--> <script type="text/javascript" src="https://apis.google.com/js/plusone.js"></script>   <g:plusone size="medium" href="https://plus.google.com/103151838647081346830" style="border-left:-200px"></g:plusone>  </div> </br></br>   <div class="modelo_noticia">   <div>    <div style="float: left; width:47%; text-align:center; margin: 0 9px 0 0;"><a href="/image/journal/article?img_id=5733388&t=1377023456174" target="_blank" style="text-decoration:none; color:black;"><img src="/image/journal/article?img_id=5733388&t=1377023456174" style="margin: 0 5px; width:98%;"/><span style="font-style:italic;"></span> </a></div>    <p> &nbsp;</p>    <p style="text-align: justify;"> <span style="font-size:12px;">"a clt continua atual enq...a.</span></p>    <p style="text-align: justify;"> <span style="font-size:12px;">...or.</span></p>    <p style="text-align: justify;"> <span style="font-size:12px;">o min...do".</span></p>    <p style="text-align: justify;"> <span style="font-size:12px;">ca...as".</span></p>    <p style="text-align: justify;"> <span style="font-size:12px;">ao enc...izou.</span></p>     <p style="text-align: justify;"> <span style="font-size:12px;">também parti...o.</span></p>    <p style="text-align: justify;"> <span style="font-size:12px;">ao a...ócio".</span></p>     <p style="text-align: justify;"> <span style="font-size:12px;"><strong>debate: reforma na clt</strong></span></p>    <p style="text-align: justify;"> <span style="font-size:12px;">o min...s.</span></p>    <p style="text-align: justify;"> <span style="font-size:12px;">ao...disse.</span></p>    <p style="text-align: justify;"> <span style="font-size:12px;">o m...o o país". &nbsp;&nbsp;</span></p>  <p style="text-align: justify;"> <span style="font-size:12px;">(fernanda loureiro)</span></p>   </div>   <div style="clear:both;"></div>  </div>  <div style="vertical-align:bottom !important">   <!-- facebook curtir --> <!-- <script src="http://connect.facebook.net/pt_br/all.js#xfbml=1"></script>   <fb:like layout="button_count" show_faces="true" width="80"></fb:like>-->   <iframe class="fb_ltr" src="http://www.facebook.com/plugins/like.php?href=http://www.tst.jus.br/noticias/-/asset_publisher/89dk/content/{rss=true}&layout=button_count&show_faces=false&action=like&colorscheme=light&width=25&height=25&locale=pt_br" scrolling="no" frameborder="0" style="border:none;border:0;margin-left:0; overflow:hidden; width:95px; height:25px;horizontal-align:left;vertical-align:bottom;" allowtransparency="true"></iframe>   <!-- twittar --> <span style="margin-left:20px;"> <script type="text/javascript"> var endereco; endereco = window.location.href; document.write('<a href="http://twitter.com/share?url=' + endereco + '" class="twitter-share-button" data-text="presidente tst diz que trabalho precisa ser valorizado sem perda de competitividade" data-count="horizontal" data-via="tst_oficial">tweet</a>') </script><script type="text/javascript" src="http://platform.twitter.com/widgets.js"></script> </span>   <!-- ok facebook recomendar --> <!--<iframe id="f2ee48257c" name="f1f8d54994" frameborder="0" scrolling="no" style="border: none; overflow: hidden; height: 20px; width: 200px;" title="like content on facebook." class="fb_ltr" src="http://www.facebook.com/plugins/like.php?api_key=228619377180035&amp;locale=pt_br&sdk=joey&channel_url=http://www.facebook.com/tstjus?fref=ts&version=18%23cb%3df360a99c9c&origin=http://www.tst.jus.br/noticias&href=http://www.tst.jus.br/noticias%26relation%3dparent.parent&node_type=link&width=150&font=arial&layout=button_count&colorscheme=light&show_faces=false&send=true&extended_social_context=false&action=recommend" allowtransparency="true"></iframe>-->   <iframe border="0" frameborder="0" scrolling="no" class="fb_ltr" id="f2ee48257c" name="f1f8d54994" style="border:none;margin-left:0; overflow:hidden; width:200px; height:25px;horizontal-align:left;vertical-align:bottom;" allowtransparency="true" title="enviar notícia no facebook" class="fb_ltr" src="http://www.facebook.com/plugins/like.php?api_key=228619377180035&locale=pt_br&sdk=joey&channel_url=http://www.tst.jus.br/noticias%3fversion%3d18%23cb%3df360a99c9c%26origin%3dhttp://www.tst.jus.br/noticias%26relation%3dparent.parent&amp;href=http://www.tst.jus.br/noticias&node_type=link&amp;width=150&amp;font=arial&amp;layout=button_count&amp;colorscheme=light&show_faces=false&send=true&amp;extended_social_context=false&action=recommend"></iframe>    <!-- youtube --> <a href="http://www.youtube.com/tst" target="_blank"> <img src="http://www.tst.jus.br/image/image_gallery?uuid=49d1dfeb-fba6-48be-9984-c2ba7dac709e&groupid=10157&t=1359131490760" border="0" title="inscrição no canal youtube tst" alt="inscrição no canal youtube tst"></a>  </div> </br> </description> 

i've tried using regex, first paragraph ('#<p[^>]*>(.*)</p>#isu'). simplexmlelement, dom, keep getting errors (i don't know them, seem best way it) , htmlpurifier, filters , returns nothing relevant.

here how did @ end (following puggan se's suggestion):

$i=0; $feed= '<xml string>'; //the whole xml string here $dom = new domdocument(); //declaring domdocument $dom->preservewhitespace = false; //removing spaces $dom->loadxml($feed, libxml_parsehuge); //libxml_parsehuge long xmls $dom->formatoutput = true; // nice output ??  $xml = new domxpath($dom); //declaring xpath  $xml->registernamespace('a','http://purl.org/dc/elements/1.1/'); //getting namespace xml  //evaluates $source = $xml->evaluate("//channel/title"); $titles = $xml->evaluate("//item/title"); $links = $xml->evaluate("//item/link"); $dates = $xml->evaluate("//item/dc:date"); $descriptions = $xml->evaluate("//item/description");  //echoing channel's title  if($source->length > 0) {  $source= $source->item(0)->nodevalue;  echo $source. '<br /><br />';  }  //echoing items  foreach($titles $title) {   echo "{$titles->item($i)->nodevalue}<br /><br />";   echo "{$links->item($i)->nodevalue}<br /><br />";   echo "{$dates->item($i)->nodevalue}<br /><br />";   //filtering <p><span> text <description>   $description = "{$descriptions->item($i)->nodevalue} ";   $description = mb_convert_encoding($conteudo, 'html-entities', 'utf-8');    unset($domtmp);   $domtmp = new domdocument();   $domtmp->loadhtml($description );   $xmltmp = new domxpath($domtmp);   $desc= $xmltmp->evaluate("//p/span");    foreach($desc $node) {     echo "<p>{$node->nodevalue}</p>";    }   $i++;  } 

do know how improve it?

thank much, help!


Comments

Popular posts from this blog

ios - UICollectionView Self Sizing Cells with Auto Layout -

node.js - ldapjs - write after end error -

DOM Manipulation in Wordpress (and elsewhere) using php -