Web scraping (or Web harvesting, Web data extraction) is a computer software technique of extracting information from websites.
I got a freelance work to extract all the hotel information in UK of some city from yellow pages, I wrote a simple php script which uses curl to get the data and parse it using regular expression and extract the require data and populate db, sorry was not aware probably ignored policies.
Now there is a PHP library that facilitates the process of creating web scrapers called Simplehtmldom. More information can be found here.
Monday, May 04, 2009
Subscribe to:
Post Comments (Atom)
2 comments:
MetaSeeker is a free Web scraper factory. A new scraper for a target site is created in minutes without coding a single line.
XPath, XSLT and XML are made use of to express Web data extraction rules and to store extraction results.
It can be downloaded for free from http://www.gooseeker.com/en/node/download/front
Interesting points on web scrapers, For simple stuff i use python to get or simplify data, data extraction can be a time consuming process but for other projects that include documents, files, or the web i tried "website scraper" which worked great, they build quick custom screen scrapers, web scrapers, and data parsing programs
Post a Comment