Q. Is there any webpage layout detection tool available in FOSS ?
A web page will be having left pane, right pane etc.. . As like a wiki pedia article page. I have to extract the article only . Not the content from left pane etc. I downloaded the Tamil wikipedia html dump and some blog pages. I have to extract content from this .
A. Firstly, what you are speaking about (extracting only some elements from a web page) is commonly referred to as web scraping. There are various libraries and tools available to web page scraping. Depending on how you wish to do it (ie: single pages, multiple pages, choice of programming language …etc). Do a google and if you need more help in narrowing down the choices ask again with more specifics of what type of tools you would prefer.