David Gessel
Search Engine Enhancement
Getting timely search engine coverage of a site means people can find things soon after you change or post them.
Linked pages get searched by most search engines following external links or manual URL submissions every few days or so, but they won’t find unlinked pages or broken links, and it is likely that the ranking and efficiency of the search is suboptimal compared to a site that is indexed for easy searching using a sitemap.
There are three basic steps to having a page optimally indexed:
- Generating a Sitemap
- Creating an appropriate robots.txt file
- Informing search engines of the site’s existence
Sitemaps
It seems like the world has settled on sitemaps for making search engine’s lives easier. There is no indication that a sitemap actually improves rank or search rate, but it seems likely that it does, or that it will soon. The format was created by Google, and is supported by Google, Yahoo, Ask, and IBM, at least. The reference is at sitemaps.org.
Google has created a python script to generate a sitemap through a number of methods: walking the HTML path, walking the directory structure, parsing Apache-standard access logs, parsing external files, or direct entry. It seems to me that walking the server-side directory structure is the easiest, most accurate method. The script itself is on sourceforge . The directions are good, but if you’re only using directory structure, the config.xml file can be edited down to something like:
<?xml version="1.0" encoding="UTF-8"?> <site base_url="http://www.your-site.com/" store_into="/www/data-dist/your_site_directory/sitemap.xml.gz" verbose="1" > <url href="http://www.your-site.com/" /> <directory path="/www/data-dist/your_site_directory" url="http://www.your-site.com/" default_file="index.html" />
Note that this will index every file on the site, which can be large. If you use your site for media files or file transfer, you might not want to index every part of the site. In which case you can use filters to block the indexing of parts of the site or certain file types. If you only want to index web files you might insert the following:
<filter action="pass" type="wildcard" pattern="*.htm" /> <filter action="pass" type="wildcard" pattern="*.html" /> <filter action="pass" type="wildcard" pattern="*.php" /> <filter action="drop" type="wildcard" pattern="*" />
Running the script with
python sitemap_gen.py --config=config.xml
will generate the sitemap.xml.gz file and put it in the right place. If the uncompressed file size is over 10MB, you’ll need to pare down the files listed. This can happen if the filters are more inclusive than what I’ve given, particularly if you have large photo or media directories or something like that and index all the media and thumbnail files.
The sitemap will tend to get out of date. If you want to update it regularly , there are a few options: one is to use a wordpress sitemap generator (if that’s what you’re using and indexing) which does the right thing and generates a sitemap using relevant data available to wordpress and not to the file system (a good thing) and/or add a chron script to regenerate the sitemap regularly, for example
3 3 * * * root python /path_to/sitemap_gen.py --config=/path_to/config.xml
will update the sitemap daily.
robots.txt
The robots.txt file can be used to exclude certain search engines, for example MSN if you don’t like Microsoft for some reason and are willing to sacrifice traffic to make a point, it also points search engines to your sitemap.txt file. There’s kind of a cool tool here that generates a robots.txt file for you but a simple one might look like:
User-agent: MSNBot % Agent I don't like for some reason Disallow: / % path it isn't allowed to traverse User-agent: * % For everything else Disallow: % Nothing is disallowed Disallow: /cgi-bin/ % Directory nobody can index Sitemap: http://www.my_site.com/sitemap.xml.gz % Where my sitemap is.
Telling the world
Search engines are supposed to do the work, that’s their job, and they should find your robots.txt file eventually and then read the sitemap and then parse your site without any further assistance. But to expedite the process and possibly enhance search results there are some submission tools at Yahooo, Ask, and particularly Google that generally allow you to add meta information.
Ask
Ask.com allows you to submit your sitemap via URL (and that seems to be all they do)
http://submissions.ask.com/ping?sitemap=http://www.your_site.com/sitemap.xml.gz
Yahoo
Yahoo has some site submission tools and supports site authentication, which means putting a random string in a file they can find to prove you have write-access to the server. Their tools are at
https://siteexplorer.search.yahoo.com/mysites
with submissions at
https://siteexplorer.search.yahoo.com/submit.php
you can submit sites and feeds. I usually use the file authentication which means creating a file with some random string (y_key_random_string.html) with another random string as the only contents. They authenticate within 24 hours.
It isn’t clear that if you have a feed and submit it that it does not also add a site, it looks like it does. If you don’t have a feed you may not need to authenticate the site for submission.
Google
Google has a lot of webmaster tools at
https://www.google.com/webmasters/tools/siteoverview?hl=en
The verification process is similar but you don’t have to put data inside the verification file so
touch googlerandomstring.html
is all you need to get the verification file up. You submit the URL to the sitemap directly.
Google also offers blog tools at
http://blogsearch.google.com/ping
Where you can manually add the feed for the blog to Google’s blog search tool.
Diary Of The Dead
George Romero’s Diary of the Dead premiered as a sold-out midnight show at TIFF. It doesn’t disappoint, keeping up the gore and fun of the series. The framing context of this particular undead episode is a bunch of film students and making a mummy film in the woods when the dead come walking home. They decide to flee to their family homes as they struggle to come to grips with the reality of the situation, somehow (against demographic) apparently not having seen any of the previous “… of the Dead” movies and not immediately grasping the seriousness of the situation and so making those wonderful horror movie mistakes.
As they go on, one of the gang becomes obsessed with capturing the disaster on film and the story is told from the perspective of his UGC (user generated content). In the end it isn’t clear whether the meta-comment is that UGC is valuable (“mainstream media is lying”) or detrimental (“with so many voices, nobody knows what to believe”).
Either way, it’s a fun movie with plenty of blood and gore.
2007 Toronto International Film Festival
My Enemy’s Enemy
“My Enemy’s Enemy” is a reference to the support given Klaus Barbie by the CIA (at least) following WWII by which the “Butcher of Lyon” evaded capture and avoided prosecution for his war crimes for almost 40 years.
The movie is a powerful testament to Barbie’s life-long commitment to torture and abuse, skills he learned as a Nazi, brutal methods he taught the CIA and various South American governments, successful programs of sadistic torture and abuse some of which (“the submarine” whereby victims are held underwater until they believe they are drowning, now known as “waterboarding”) are still known to be in use by the CIA today.
The movie also makes a case for Barbie’s defense argument, that it is hypocritical to prosecute him only when his utility to, particularly, the US has run out and not before, a long-held policy that continues with Hussein and Noriega and certainly many others.
2007 Toronto International Film Festival
Chacun son cin?ma
Chacun son cinéma ou Ce petit coup au coeur quand la lumière s’éteint et que le film commence is a series of 33 shorts about movies, each about 2 minutes long, each by a different and significant director. Some are a bit tedious or somewhat incomprehensible, but a few really stand out: a joke about moaning in pain (rather than orgasm) during a screening of Emanuele and another about a man who takes makes the ultimate response to a noisy, self-centered movie patron with an ice pick to the head, a fantasy to which we could all relate after dozens of movies.
2007 Toronto International Film Festival
Izgnanie
The Banishment is the story of mistrust and misunderstanding with disastrous consequences. The movie starts with a dramatic drive by a man (Aleksandr Baluyev as Mark) who appears to be bleeding to death from a bullet in his arm. His brother (Konstantin Lavronenko as Alex) dramatically removes the bullet with highly confident and seemingly practiced field surgery. Why, we have no idea and never learn.
Alex’s wife, Vera (Maria Bonnevie), also has a secret which comes out when Alex and family move out to his father’s house in the country (again, for reasons that are not explained). Ultimately the secret brings disaster and then regrets. It is a sad story of regret and insanity. But very compelling.
2007 Toronto International Film Festival
Mongol
Mongol tells the early life of Ghengis Kahn from choosing his charming bride at 10 to unifying the Mongol people’s in a final, climactic battle.
It is an extremely well produced movie by Sergei Bodrov, entirely up to Hollywood blockbuster standards, though the language is Mongolian (of course, since all the world will someday speak Mongol, the most beautiful language in the world).
Though the film is brutal and violent, it is so in the normal mode of Hollywood films: stylized and sanitized with sprays of computer-animated looking blood marking the (frequent) satisfying deaths of opponents and the tragic deaths of allies alike.
Is is the Gladiator of the Steppe, a strong, compelling film that should appeal broadly, despite subtitles.
2007 Toronto International Film Festival
My Winnipeg
Guy Maddin’s My Winnipeg was narrated live (by Guy) in the theater at our screening. It is, at times, howling funny even for a non-winnipegger, though the Toronto audience seemed very amused by some Canadian and Winnipeg specific jokes that I didn’t really get. The live performance added a lot, I thought, to the experience of the movie, the two working together very well to foreshadow and set up some of the jokes.
For a movie entirely about something as personal as Guy’s childhood and as place as specific (and as unlikely to be on most people’s must visit list) as Winnipeg, the movie touched a lot of universal truths while being quirky and colloquial.
2007 Toronto International Film Festival
Ulzhan
Ulzan is a movie about a man named Charles (Philippe Torreton) struggling with a difficult and unnamed event in his past that has led him on a reckless journey into Kazakhstan’s interior. Sort of Leaving Las Vegas on the Steppe. In the opening scene, he tells a border guard on entering Kazakhstan asking, after looking dubiously at the bottles littering the passenger side of his car, “are you a vagabond?” “No, French.”
Two interesting characters join him as generally unwelcome compatriots on his journey: the gorgeous Ayanat Ksenbai as the title role, a woman who perhaps leads Charles to salvation; David Bennent as Saukuni, a travelling vendor who sells rare words.
It is a moving and charming film that just might be about redemption, but is absolutely about finding life where you are. As the film progresses, it becomes more and more interesting, the characters more and more compelling. I very much enjoyed it.
2007 Toronto International Film Festival