773 snippets from 1583 members, and growing!
|
login
|
join
about
bytebin
members
tags
snippets
join
Snippets
Submit a Snippet
Search Snippets
New Snippets
Top Snippets
Top Tags
PHP
(137)
JavaScript
(125)
Java
(66)
VBSCRIPT
(58)
String
(44)
CSS
(31)
CSharp
(28)
File
(28)
HTML
(27)
mysql
(25)
C
(24)
VB.NET
(24)
python
(24)
CPlusPlus
(23)
groovy
(23)
New Snippets
PHP/MySQL impleme...
Analyte - Easy to...
Easy SQLite inter...
Very lightweight ...
AutoComplete plug...
AutoComplete plug...
Connection Java -...
View PostgreSql
Store Procedure
Pygame - Simple p...
Venture Capital Jobs
New Members
me
jamesmcm
Can
Kelmi
ysg
dannymo2
chorny
wallie
Hackdemian
impomatic
Top Members
dannyboy
sundaramkumar
mattrmiller
Pio
i_kenneth
ASmith
ctiggerf
sehrgut
bertheymans
SCoon
Home
/
Snippets
/
Prints a list of all links in a specified webpage
/
Comments
Prints a list of all links in a specified webpage
Snippet Menu
Revisions
Comments
Related Snippets
Add to Favorites
Email Snippet
Download Snippet
Print Snippet
Blog Snippet
snippet
|
revisions
|
comments
|
related
|
Add to Favorites
|
email
download
|
print
|
blog it
New Comment
This does not always work
Fri. Aug. 18th, 2006 11:56 AM
lgrover
This does not always work. If the user includes javascript events or nofollow directives in their links (within the a tag), then this will not work correctly. You need to use a more general pattern match.
Reply
.read(200000)
Sun. Oct. 29th, 2006 1:15 AM
dmitry
I'm not sure it's ideal
Reply
my changes
Sun. Oct. 29th, 2006 4:24 AM
dmitry
it works faste
htmlSource =
urllib
.
urlopen
(
"http://www.python.org/index.html"
)
.
readlines
(
)
....
linksList
=
re
.
findall
(
'<a href=(.*?)>.*?</a>'
,
repr
(
htmlSource
)
)
Reply
Reinventing the wheel!
Thu. Nov. 2nd, 2006 4:18 PM
anthony
You'd be better off using a more general parser, such as
elementTree
.
Reply
New Comment
htmlSource = urllib.urlopen("http://www.python.org/index.html").readlines()
....
linksList = re.findall('<a href=(.*?)>.*?</a>',repr(htmlSource))