From: rich@bofh.concordia.ca (Rich Lafferty)
Newsgroups: concordia.test,concordia.dept.iits.help
Subject: Re: Good, no more X-UNKNOWN...
Date: 14 Jan 2001 22:17:04 GMT
Message-ID: <slrn9649b0.iag.rich@bofh.concordia.ca>
References: <Pine.OSF.4.31.0101141230470.23869-100000@alcor.concordia.ca> <Pine.OSF.4.31.0101141231300.23869-100000@alcor.concordia.ca>

[Crossposted and followups set.]

In concordia.test,
Alexander Fong <ap_fong@alcor.ffosihtekat.concordia.catakethisoff> wrote:
> On Sun, 14 Jan 2001, Alexander Fong wrote:
> 
> > Test...
> > For some reason, it was posting as character set X-UNKNOWN.  Set to
> > US-ASCII.  Let's see if that works...
> 
> Good, it worked. =3D\

Your newsreader is still trying to post in Quoted-Printable. (I
mentioned this a while ago.) I suspect it might be being triggered by
whatever you might have done here:

> Alexander Fong          (** -> @)  ap_fong ** alcor =FA=FA concordia =FA=FA=

Where one might expect to find a ".", you've got two 0xFA, which is
certainly not ASCII! In ISO-8859-1, it's LATIN SMALL LETTER U WITH ACUTE,
which is probably not what you meant, but in DOS-LATIN-US (CP 437, the
IBM box-drawing character set), it's MIDDLE DOT (a bullet point).

I suspect replacing those MIDDLE DOTs with "." will solve the
problem. If you want to use 8-bit characters in news, I'd recommend
using only Latin-1 (unless, of course, you need characters outside of
Latin-1, although I don't think box-drawing characters are a "need");
even then, it's not yet safe to assume that news is 8-bit clean, so
US-ASCII is probably the best bet.

If you want to use a munged address, I'd recommend the following:

  1. Put something munged in the From: header, and have
     it end with ".invalid", which is guaranteed to never exist.
  2. Put your real address, unmunged, in the .signature *and* in
     the reply-to.

You might be surprised to hear that putting a legitimate address in
the .signature and reply-to gathers no spam. The reason for this is
found in the nature of the NNTP protocol and the manner in which
spammers harvest addresses from Usenet -- it's possible to quickly
retrieve all of the From: headers from a newsgroup from the overview
database, while it's considerably more work to retrieve the full
headers for an article, and much slower to retrieve the body
text. Tests done by the fine folks in the net-abuse groups have
demonstrated that spammers *do* only harvest addresses from overviews.

Doing it that way also makes it convenient for people to reply to you.

Interestingly, the Usenet Address Munging FAQ claims

     NOTE: DO NOT put a directly usable address in your sig, because
     many harvesters collect everything with an @ sign in it.

which is empirically not the case. As an example of why address
harvesters do *not* do this: If I were to request only the From:
headers of one million posts, I'd get one million lines such as the
following, with apologies to whoever this might be that I'm using as
an example:

389115  testing "Pekka Ylonen" <pekka.ylonen@pp1.inet.fi>      
     Wed, 03 Jan 2001 09:32:18 GMT   <CkC46.68$Wf4.7239@read2.inet.fi> 
     809     3      Xref: newsflash.concordia.ca misc.test:389115

(that's a single line broken for convenience). The typical text-only
Usenet article is around 2kB and the average Usenet article including
binaries is around 300kB. So, if you want to go through text-only
groups looking to harvest addresses, overview gets you 192MB of data,
of which about 1/8 will be email addresses, while retrieving full
articles gives you 1.9GB of data, about 1/100 of which will be email
addresses. If you go through an entire feed, overview gives you the
same numbers (recall, we said one million posts), but article bodies
gives you 300GB of data, about 1/20000 of which will be email
addresses. Obviously, address-harvesting is most efficient from the
overview, which only contains your From:. 

Give it a shot -- you'll avoid inconveniencing people that want to
reply, solve your character set problem, and continue to avoid spam,
while at the same time being a good Usenet citizen. Note also that
since you're adding a Reply-To, you can put something as arbitrary as 

  Alexander Fong <alex@this.address.is.invalid>

in your From:, and since your real Concordia address is appearing at
least once and probably twice in the message, you'll be staying within
Concordia's rules which require the sender of messages to be
identified.

Hope this helps, 

  -Rich
     
-- 
Rich Lafferty ----------------------------------------
 Nocturnal Aviation Division, IITS Computing Services
 Concordia University, Montreal, QC
rich@bofh.concordia.ca -------------------------------


