Scripsit रवींदर *ाकुर (ravinder thakur):
Quote:
i am trying to find some generic way of getting the title and
description of webpages [...] i will be doing this in python. |
Try googling with words like
python html parse
The first hit I got is
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/286269
which might suit your needs.
It's probably easier to write two good HTML parsers than to decide which
of them is better. But for extracting the <title> element and the <meta>
element with name="description", any good or half-good parser should do.
Just make sure you recognize the tag and attribute names and the value
"description" in a case-sensitive manner and do not change the case of
anything in the title and description you extract (unless you really
want to).
--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/