node.js - Scrap data from site with browser-based Template Engine -


trying scrap data page templates in browser lot of js. , when playing jsdom can't data, maybe page doesn't have enough time load or render. how scrap data in case: use timer or download page request

jsdom.env({   url: link,   scripts: ["http://code.jquery.com/jquery.js"],   done: function (errors, window) {     var $ = window.$;     var date = $('.date').text();     console.log(date);   } }); 

a colleague of mine has phantomjs-based project doing that: https://github.com/vmeurisse/phantomcrawl.

he has simple example looks lot snippet:

'use strict';  var phantomcrawl = require('./src/phantomcrawl');  var urls = [];  urls.push('http://www.bing.com'); var ptc = new phantomcrawl({     urls: urls,     nbthreads: 4,     crawlerperthread: 4,     maxdepth: 1 }); 

urls list of urls crawl.

nbthreads number of instances of phantomjs launched.

crawlerperthread number of pages crawled in parallel per instance of phantomjs.

maxdepth number of times crawled page follows links present in page.


Comments

Popular posts from this blog

ios - UICollectionView Self Sizing Cells with Auto Layout -

node.js - ldapjs - write after end error -

DOM Manipulation in Wordpress (and elsewhere) using php -