Search: 
View
  By section
  As outline
  Fully expanded

FAQ sections
  Getting started
  Effective use
  General questions
  Authoring & Posting
  Forums and FAQs
  Rocket science
  Info for admins
  Mail admin
  Infrastructure
  Calendars
  Intellectual Property

Questions
  Broken pages
  Remote subsites
  Customize HTML
  CVS authoring
  CVS update
  CVS info
  Finding Bad Links
  Download Logging
Gigascale Systems Research Center
FAQ
Editing group pages, and other forms of rocket science
Previous section  |  This section  |  Next section
Previous question  |  This question  |  Next question

How do I find bad links in my group web page
Christopher Brooks, 24 Sep 2001
Last updated: 3 Sep 2003

The search engine gets run every night and generates a list of bad links in http://www.gigascale.org/gsrc/private/9.html that can be viewed only by GSRC members.

You can also use the wget command, but you will need to set it up to use the cookie file from Mozilla.

  1. Install wget
  2. Log in to the website using Mozilla and then exit Mozilla
  3. Find your cookie file. Mine was at c:/Documents and Settings/cxh/Application Data/Mozilla/Profiles/default/lwhpscha.slt/cookies.txt
  4. Copy the cookies.txt file to a place with a shorter name.
  5. Run wget:
     
    wget -r --load-cookies cookies.txt -np http://www.gigascale.org/yourGroup
    
    This will produce a directory called www.gigascale.org that contains the contents of yourGroup
  6. Look for Not found in the output
  7. If you find a file that was not found, then grep the files for that file. For example, if foo.htm was not found, we would do
    find . -name "*.htm" -print > /tmp/files
    grep foo.htm `cat /tmp/files`