node.js - Scrap data from site with browser-based Template Engine -
trying scrap data page templates in browser lot of js. , when playing jsdom can't data, maybe page doesn't have enough time load or render. how scrap data in case: use timer or download page request
jsdom.env({ url: link, scripts: ["http://code.jquery.com/jquery.js"], done: function (errors, window) { var $ = window.$; var date = $('.date').text(); console.log(date); } });
a colleague of mine has phantomjs-based project doing that: https://github.com/vmeurisse/phantomcrawl.
he has simple example looks lot snippet:
'use strict'; var phantomcrawl = require('./src/phantomcrawl'); var urls = []; urls.push('http://www.bing.com'); var ptc = new phantomcrawl({ urls: urls, nbthreads: 4, crawlerperthread: 4, maxdepth: 1 });
urls
list of urls crawl.
nbthreads
number of instances of phantomjs launched.
crawlerperthread
number of pages crawled in parallel per instance of phantomjs.
maxdepth
number of times crawled page follows links present in page.
Comments
Post a Comment