Using wget

Get What?

If you have content to pull off a site or an ftp server, and the content runs to more than a few pages, save yourself an afternoon and use wget. Its fast, reliable, and a big time-saver.

Sit 'n' click to pull down page after page is a real pain, watching paint dry is way more fun. That's where wget solves the problem. I pulled down around 90 pages recently,  wget did it in a little over 3 minutes. Can you beat that with the mouse? I don't think so!

If you have never used wget, here's a quick over view.

First check you have wget installed. In an XTerm, do:

which wget

You should get something like:


It's installed.

If its not, you know the routine.

aptitude install wget

wget has several switches (sometimes also called options), the ones I mostly use are:

wget -r -l2 -k -t5 -p

Translates as:

-r     Turn on recursive retrieves.
-l     Maximum recursion level. Default is 5.
-k     After convert links to allow local viewing.
       Affects visible hyperlinks, or parts linking to
        external content, ie images, style sheets, etc.
-t     Number of retries. Zero for infinite retries.
        Default is 20. Fatal errors are not retried.
        "connection refused'' or "not found'' (404).
-p     Download all files to properly display page.
        Includes images, sounds, stylesheets.

Here's another:

>wget -r -l1 -t1 -nd -N -U mozilla -np -A.mp3 -erobots=off -i mp3_sites.txt


-r     Turn on recursion.
-l1   Go down one level below entry level.
-t1   Number of retries is 1.
-nd   Don't create directory for downloaded files.
-N     Turn on time-stamping.
-U     mozilla. Identify as wget as mozilla to the
        server. Some sites require browser identity.
        If you get that problem, use -U switch.
-np   No parent content. Don't include parent level
       in download.
-A     Get all .mp3 files. Similar to "globbing".
-e     Execute command.
       Ignore robots.txt file.
-i mp3_sites.txt.
        Read URLs from mp3_sites.txt file.
      If '-' used, URLs are read from command line.

wget is such a powerful utility. Mouse and browser are good, but the command line wins again. Check out the man page, for examples and other goodies.