Today while surfing dzone I found a library that I think is way better for the needs of the majority of script writers; Yahoo Query Language.
"The Yahoo! Query Language is an expressive SQL-like language that lets you query, filter, and join data across Web services. With YQL, apps run faster with fewer lines of code and a smaller network footprint." -from the official site
In addition to being able to use a SQL-esque language for querying pages/services you can also use XPath selectors in the query statements to make it even easier to grab the data that you want. And as always, you can use firebug to retrieve the page element's XPath.
You can get started working with YQL in several ways:
- Use the YQL web console
- Import yql into the python REPL and have a go at it
Here are some links to get you started in developing web scraping utilities using YQL:
I'll post some code samples sometime today most likely to show off what I've been up to this morning with YQL.
Scraping websites is usually pretty boring and annoying, but for some reason it always comes back. The Yahoo query language is just like SQL that helps us query, filter, and join data across Web services. Thanks a lot...
ReplyDeleteWeb Harvesting
Thanks for sharing, might be useful in the future. Do do web testing like selenium?
ReplyDeleteI've only done web testing using Perl with WWW::Mechanize and LWP::Agent. Selenium looks like it's a good tool to add to my toolbox. Thanks!
ReplyDelete