Q. Is there any webpage layout detection tool available in FOSS ?
A web page will be having left pane, right pane etc.. . As like a wiki pedia article page. I have to extract the article only . Not the content from left pane etc. I downloaded the Tamil wikipedia html dump and some blog pages. I have to extract content from this .
A. Firstly, what you are speaking about (extracting only some elements from a web page) is commonly referred to as web scraping. There are various libraries and tools available to web page scraping. Depending on how you wish to do it (ie: single pages, multiple pages, choice of programming language …etc). Do a google and if you need more help in narrowing down the choices ask again with more specifics of what type of tools you would prefer.
Secondly, if you intend to get large amount of content from wikipedia, it is recommended that you /do not/ use an automated tool:
Recently , i came across this link containing an article on futuristic design concepts.. i was stunned by the innovative ideas displayed . Most of the concepts were the result thinking different and practically , from the creative cigarette case to the Audi Shark – Flying Sportscar , the concepts were awesome .
You can read that article at WebDesignerDepot . Since the site contains lot of graphics it would take some time to load ,but its worth waiting 🙂