<?Ansar's Blog: Web scraping tutorial

Monday, May 04, 2009

Web scraping tutorial

Web scraping (or Web harvesting, Web data extraction) is a computer software technique of extracting information from websites.

I got a freelance work to extract all the hotel information in UK of some city from yellow pages, I wrote a simple php script which uses curl to get the data and parse it using regular expression and extract the require data and populate db, sorry was not aware probably ignored policies.

Now there is a PHP library that facilitates the process of creating web scrapers called Simplehtmldom. More information can be found here.

2 comments:

Fuller said...: MetaSeeker is a free Web scraper factory. A new scraper for a target site is created in minutes without coding a single line.

XPath, XSLT and XML are made use of to express Web data extraction rules and to store extraction results.

It can be downloaded for free from http://www.gooseeker.com/en/node/download/front; 8:06 AM
Anonymous said...: Interesting points on web scrapers, For simple stuff i use python to get or simplify data, data extraction can be a time consuming process but for other projects that include documents, files, or the web i tried "website scraper" which worked great, they build quick custom screen scrapers, web scrapers, and data parsing programs; 7:55 PM

<?Ansar's Blog

Monday, May 04, 2009

Web scraping tutorial

2 comments:

About Me

Facebook Badge

Join Bangalore PHP Users!

Favourite Link's

New releases

Events and Conferences

Below are events and conferences which i am looking forward for:

Labels

Shared Post

Twitter

Blog Archive

Who's online now?

Followers

<?Ansar's Blog

Monday, May 04, 2009

Web scraping tutorial

2 comments:

About Me

Facebook Badge

Join Bangalore PHP Users!

Favourite Link's

New releases

Events and Conferences

Below are events and conferences which i am looking forward for:

Labels

Shared Post

Subscribe

Twitter

Blog Archive

Who's online now?

Followers