node.js - Scrap data from site with browser-based Template Engine -

- April 15, 2015

trying scrap data page templates in browser lot of js. , when playing jsdom can't data, maybe page doesn't have enough time load or render. how scrap data in case: use timer or download page request

jsdom.env({   url: link,   scripts: ["http://code.jquery.com/jquery.js"],   done: function (errors, window) {     var $ = window.$;     var date = $('.date').text();     console.log(date);   } });

a colleague of mine has phantomjs-based project doing that: https://github.com/vmeurisse/phantomcrawl.

he has simple example looks lot snippet:

'use strict';  var phantomcrawl = require('./src/phantomcrawl');  var urls = [];  urls.push('http://www.bing.com'); var ptc = new phantomcrawl({     urls: urls,     nbthreads: 4,     crawlerperthread: 4,     maxdepth: 1 });

urls list of urls crawl.

nbthreads number of instances of phantomjs launched.

crawlerperthread number of pages crawled in parallel per instance of phantomjs.

maxdepth number of times crawled page follows links present in page.

Search This Blog

Naan

node.js - Scrap data from site with browser-based Template Engine -

Comments

Post a Comment

Popular posts from this blog

ios - UICollectionView Self Sizing Cells with Auto Layout -

asp.net - Passing parameter to telerik popup -

node.js - ldapjs - write after end error -