Web scraping

Q. Is there any webpage layout detection tool available in FOSS ?

A web page will be having left pane, right pane etc.. . As like a wiki pedia article page. I have to extract the article only . Not the content from left pane etc. I downloaded the Tamil wikipedia html dump and some blog pages. I have to extract content from this .

A. Firstly, what you are speaking about (extracting only some elements from a web page) is commonly referred to as web scraping. There are various libraries and tools available to web page scraping. Depending on how you wish to do it (ie: single pages, multiple pages, choice of programming language …etc). Do a google and if you need more help in narrowing down the choices ask again with more specifics of what type of tools you would prefer.

Secondly, if you intend to get large amount of content from wikipedia, it is recommended that you /do not/ use an automated tool:
Instead use one of the alternate methods mentioned in the page above.
Thirdly, if you just want to remove unnecessary elements from a page and save only the content, while browsing, I would suggest using one of these Firefox tools:
The answer has been provided by Mr.Steve.
You can mail him at steve@lonetwin.net
.

Google to logout of China: The countdown starts

Google is expected to wind up its operations in China, by april 10, though this news has not been officially released by them. Googles move to quit China was brought to the knowledge of the tech community after it announced about its plan in January  this year. Googles decision was the result of the restrictions imposed by the Chinese government to filter its search results. Google currently holds  35 percent of the Chinese search market, compared to its local competitor Baidu which dominates with 60 percent share. Baidu was in news recently for its fraudulent high-cost-per-click advertisements that blocked smaller websites in its search results for not opting-in to Baidu’s advertising pograms.

The Chinese government seems to have no regret for its moves in the Google issue  and its  Minister for  Industry and Information Technology has been quoted saying that Google should either abide to the law or pay the consequences, giving no sign of possible compromise in their dispute over the censorship and hacking.

Even if Google shuts down its Google labs, it would continue to run a development center in Beijing for a full fledged mobile  phone business. The workforce at the Google camp in china is expected to be around 700.

Related news:

Code Bubbles : re-thinking the concept of IDE

A software team at Brown university has developed an IDE for Java called Code Bubbles that is the talk of the town in the developer community. Code Bubbles distinguishes itself from the “traditional” IDEs by basing itself on the concept of “fragmentation instead of flat files”.

The following video has got more to speak. Have a look and i recommend reading <this>  article .


Related links:

North Korea’s home-made OS

Following China’s Red Flag Linux, North Korea has has come out with its “home-made OS”. This OS, named “Red Star“, has been brought to light by a Russian student of  Kim Il-sung University. N.Korea has been developing this Linux fork since 2006. Though based on Linux, its interface resembles Ms Windows.Its available only in  Korean language and costs $5. Earlier China had created its own Linux distro  after an anti-trust developed due to Microsofts Venus mismanagement. Microsoft Venus was a project that aimed to enable millions of Chinese who had TVs but not PCs to use their televisions to get online. This project was dropped due to some issues.

The latest news on China’s Red Flag Linux is that Internet cafe’s in China have been forced by the Chinese govt. to switch over to Red Flag Linux, even if they had a legal version of Windows.Obviously questions about spying and surveillance have arisen, with no comment from the Chinese Government.


Cameron is back!

I was one among the million who were disappointed that Jameron Cameron went home empty handed after the Oscars, but now there is a reason to console myself. Yes! the 1997 worldwide blockbuster and record creator Titanic is to be remade in 3D.This new version would hit the cinemas in 2012 to commemorate the 100th year anniversary of the Titanic disaster.Cameron is planning  to add additional footage and something creative to this new version.
Eagerly awaiting it :)

1997 இல் உலகை ஆட்கொண்ட டைட்டானிக்  திரைப்படத்தை  3D தொழில்நுட்பம் மூலம் வெளியிட ஜேம்ஸ் காமேரூன் முடிவு செய்துள்ளார். சமீபத்தில்   நடந்த ஆஸ்கார்  திருவிழாவில் காமேரூன் சிறந்த இயக்குனர் விருது பெறாத நிலையில் , காமேரூன்’நின்  இந்த முடிவு அவரது ரசிகர்களை உற்சாகப் படுத்தியுள்ளது. எனினும் 2012 இல்
தான் இத் திரைப்படம் திரைக்கு வரும் என்று அறிவிக்கப்பட்டுள்ளது. 2012 உடன்  டைட்டானிக் சம்பவம் நடந்து 100 ஆண்டுகள் நிறைவு பெறுவதை  ஒட்டி இந்த முடிவு எடுக்கப்பட்டுள்ளது.