Spellchecker code available
I’ve been asked privately whether the code of spellchecker service run on the W3C site was available; it wasn’t, but now it is, along with tidy on-line and HTTP HEAD services code.
The spellchecker uses a fairly simple Python wrapper around aspell, that:
- allows to pick the language of the document being spell checked – ideally, this would be autodetected in the HTTP headers (
Content-Language) and in the HTML document itself (with thelang/xml:langattributes) - presents the errors found, and optionally the possible corrections
- links to a different form (whose code hasn’t been released yet) to add words in the local dictionary
- works for HTTP-protected resources (Basic Authentication only)
For the last point, it relies on a Python module I use on pretty much all my Python CGIs, http_auth.py, which basically intercepts the 401 requests when doing an HTTP GET , sends it back to the originator, and re-uses the originator credentials to re-do the request.
If this feature is to be used:
http_auth.pymust be in the path wherepythonis going to search for modules, that is either in the same directory as the CGI script itself, in the global directory where python searches (à la/usr/lib/python2.3/site-packages/) or in any directory added manually to the path usingsys.path.insert(0,'/path/to/my/directory/)in the CGI script- the CGI script must be transmitted the HTTP header containing the authentication credentials, which is not possible by defaults on any sane Web server configuration; in Apache, to make this possible, the following directive is needed:
RewriteRule ^name_of_the_script(.*) name_of_the_script$1 [E=HTTP_AUTHORIZATION:%{HTTP:AUTHORIZATION},PT,L]
If the script is going to be used without the need for HTTP Authentication proxying, it is probably simpler to remove the underlying code, namely by replacing:
import http_auth
url_opener = http_auth.ProxyAuthURLopener()
by
import urllib
url_opener = urllib.FancyURLopener()
Patches to the code are welcome.