
WebHarvy can automatically scrape Text, Images, URLs & Emails from websites, and save the scraped content in various formats. WebHarvy is a visual web scraper. There is absolutely no need to write any scripts or code to scrape data. You will be using WebHarvy's in-built browser to navigate web pages. You can select the data to be scraped with mouse clicks. WebHarvy automatically identifies patterns of data occurring in web pages. So if you need to scrape a list of items (name, address, email, price etc) from a web page, you need not do any additional configuration. If data repeats, WebHarvy will scrape it automatically.
v3.3 [Jun 17, 2014]
- Fixed issues related to URL encoding in Category Scraping
- Added option to disable automatic pattern (data field repetition) detection in start page (more details)
- Option to follow links (URLs) obtained by applying Regular Expression on HTML – handles both absolute and relative URLs (more details)
- Option to capture images whose URL is obtained by applying Regular Expression on HTML – handles both absolute and relative URLs – works even when the image URL does not contain image file extension (more details)
- Separate options to download image and to capture image URL (more details)
- Fixed issue due to which downloaded image files did not have the correct file extension
- Added Multiline mode in RegEx processing
- Faster mining ‘restart’ from where it stopped (aborted) previously – remembers last mined URL and its PostData.
- Context menu options (copy/cut/paste) added for ‘Additional URLs in Configuration‘ window