-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Make sure your locale is set appropriately. A UTF-8 ñ is a different character than it is under 8859-1. I've personally found that vim does really well editing files like this, as it does many of the conversions automatically. I would imagine that emacs does likewise. greg wm wrote: > hi folks, > > feels rather like i've ventured into uncharted territory, but somebody > out there somewhere must know the way.. > > i used wget to copy the entire http://nonviolentpeaceforce.org site to > http://nvpf.org/np. the former is asp pages, the latter captured as html. > > for example, http://nonviolentpeaceforce.org/spanish/welcome.asp was > captured to http://nvpf.org/np/spanish/welcome.asp.html > > as you can see, the capture is mostly fine, including spanish characters > in the text (eg año), however the spanish characters in the menus didn't > do quite so well (eg Misi?n) > > in the file año appears as año which is apparently "good", but > Misi?n appears as Misión, which is apparently "bad". > > first question: why is that bad? > > if i tell galeon, instead of automatic encoding, use western iso-8859-1, > or any of many others, presto, the page appears nicely. but i don't > have to do that to see the original, nor do i have to do that for > anybody else's pages, and of course i can't expect our audience to go > and fiddle with that in their browsers. > > but really now, why isn't an ó an ó? right after the title the file > says <meta http-equiv="Content-Type" content="text/html; > charset=iso-8859-1">. why isn't that good enough? do i need to change > some directive or setting in apache? > > second question: it looks like wget was inconsistent! why? > > likely hint: the menus are rendered out of some .asp database or > whatever, differently than the rest of the text of the page. > > but, so what? why didn't wget capture something identical to what my > browser shows? the command i ran was > wget -ENKkrl19 -nH -w2 -owget.log http://nonviolentpeaceforce.org > > so anyway i sez hey no problem, i'll just find and replace. well ha. > couldn't get either egrep nor sed to find an ñ that was right under > their noses. > > third question: what's the trick to find and replace these buggers? vim > can find them, in interactive mode, so.. should i be trying to figger > out how to use vim as a grep replacement.. uhh.. ..? > > fourth question: where should i be asking these questions, or, where do > i look for the mysterical solution, and will i recognize it when i see it? > > tia, > greg > > Greg Whitley Mott > IT Coordinator > NonviolentPeaceforce.org > > _______________________________________________ > TCLUG Mailing List - Minneapolis/St. Paul, Minnesota > tclug-list at mn-linux.org > http://mailman.mn-linux.org/mailman/listinfo/tclug-list > - -- Daniel Taylor random at argle.org Forget diamonds, Copyright is forever. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFDB0Iy8/QSptFdBtURAuwOAJ9yo1UnPGizkWL58dXwBBe0A9ulkACfVSCl 6JiAyfX1eKiFT6YouXp9Xdc= =c0fp -----END PGP SIGNATURE-----