Redirecting Old Websites
16 May 2009
I have been gone from Dartmouth for over a year and my website has remained active but somewhat broken due to neglect. It turns out that some of the blogs on that site are on the first page of google search results for related keywords. When my Dartmouth site finally goes away, it would be a shame if users encountered 404 errors when trying to access pages. I did some quick searching for the best way to redirect a page to a new location and it seems that a 301 redirect is a good option. I wrote a python script to redirect all the html pages on my site to their new urls and this script might be useful for other people facing the same problem.
The script walks the full site searching for files with specific extensions, in
.html. It then performs simple find and replace to
redirect old file paths to new urls.
You will need to change some of the variables to reflect your old and new sites.
These variables can be found under
Script settings below.
# Filename: make301.py # Last Modified: 5/16/2009 import sys import os.path import re # Script settings extensions = ['.html'] olddir = 'public_html' oldbase = '/~name' newsite = 'http://mynewdomain.com' # # Process command line arguments # def processArgs(argv): argc = len(argv) if argc < 3: print "Usage: make301.py <directory> <output file>" sys.exit() args = map(lambda s: s.strip(), argv[1:]) # Make sure directory exists if not os.path.exists(args): print 'Directory "%s" does not exist.' % args sys.exit() return tuple(args) # # Add filename to list if the extension is in the list # of extensions specified above. # def getAllFiles(files, root, names): for name in names: (base, ext) = os.path.splitext(name) if ext.lower() in extensions: files.append(os.path.join(root,name)) # # # def main(argv): (source_dir, htfile) = processArgs(argv) files =  os.path.walk(source_dir, getAllFiles, files) # Print image names to file fd = open(htfile,'w') for f in files: page = f.replace(olddir,oldbase) url = f.replace(olddir,newsite) line = "redirect 301 %s %s\n" % (page,url) fd.write(line) fd.close() if __name__ == "__main__": main(sys.argv)
To run the script, go to the
directory above your web directory (in my case
type the following:
% python make301.py public_html .htaccess