regex - Use wget to crawl specific URLs -
i trying crawl links website use download manager download files.
i've tried:
wget --wait=20 --limit-rate=20k -r -p -u mozilla "www.mywebsite.com"
i can't figure out how use wget
or regular expressions save desired links only!
wget offers wide variety of options fine tuning files download in recursive crawl.
here few options can interest you:
--accept-regex urlregex
download url matching urlregex
. urlregex
regular expression matched against complete url.
--reject-regex urlregex
ignore url matching urlregex
. urlregex
regular expression matched against complete url.
-l
tells wget follow relative links.
relative links example:
<a href="foo.gif"> <a href="foo/bar.gif"> <a href="../foo/bar.gif">
non relative links:
<a href="/foo.gif"> <a href="/foo/bar.gif"> <a href="http://www.server.com/foo/bar.gif">
Comments
Post a Comment