Home > Configuration Parameters

Cleanfeed - Configuration Parameters

Overview

Every characteristic of Cleanfeed's behaviour can be tuned using parameters within the local_config function of cleanfeed.local. Some parameters are defined only within this file, others can be applied to override the defaults. In all instances, consider that these are changes to the actual Cleanfeed code. The syntax should always be verified prior to reloading the INN filters. This can't be stressed too much as little internal checking is done. Get it wrong and you break Cleanfeed.

Parameter Details

The following table provides a description of each function and whether it is Local or Replace. Local implies that it is defined only within the cleanfeed.local file. Replace, that it overrides the internal cleanfeed default.
If you do nothing else, check out the Local options, there are only four of them.
Parameter Type Description Default
verbose Replace When True, messages about rejected articles will be posted to the INN news.notice log file. 1
aggressive Replace If set to 0, bad_body will be ignored and some elements of the scoring filter will not be processed. Unless you're very paranoid about dropping articles, this is best left at default. 1
maxgroups Replace Maximum groups to allow in a crosspost. The default is very high and can safely be reduced if you see much crosspost spam. 14
block_binaries Replace Should Cleanfeed reject articles posted to non-binary groups. Unless you have very special requirements, please leave this enabled. Misplaced binaries can have a serious performance impact on both the local server and its downstream peers. 1
block_all_binaries Replace If enabled, Cleanfeed will reject all binary postings, regardless of the distribution. This is useful for text-only services that wish to avoid the bandwidth overhead of propagating binaries in the event of a misconfigured upstream peer. 0
block_late_cancels Replace Should Cleanfeed block cancels of already rejected articles. This only applies to specially formatted Message-ID's (<cancel.[message-ID]>). 0
block_user_spamcancels Replace Prevent users from using bots to cancel what they consider to be Spam. This parameter addresses a specific bot and is unlikely to catch much on Usenet in current times. 1
block_user_cancels Replace When set to True, this will cause all user issued cancels to be rejected. This is one you may wish to change; in modern times, cancels are not honoured by a lot of News providers. 0
block_extra_reposts Replace Prevent bot wars where a cancelbot cancels an article and a repost bot tries to put it back again. 1
do_md5 Replace MD5 checking prevents multiple identical posts being sent to Usenet in high volumes. This is a typical characteristic of Spam so it's strongly advised to leave this enabled. The name of this check is slightly misleading: All the EMP type filters in Cleanfeed use MD5 but this is the only one that applies to the posted payload. 1
do_phl Replace An EMP check seeded by the NNTP-Posting-Host and number of lines within the message payload. Some spam evades the MD5 filter above by adding a signature (a random series of characters) to the posting. This filter attempts to address that issue although it can only function if the posting host applies an NNTP-Posting-Host header. 1
do_phn Replace An EMP check seeded by the NNTP-Posting-Host and the Newsgroups headers. This attempts to address flooding of a specific Newsgroup from a single source. It can cause false positives in some groups where individuals post high volumes of brief messages in a short period of time. These groups can be excluded from the filter using the phn_exclude parameter. 1
do_phr Replace A special EMP filter that will do nothing in default configuration. It relies on the operator specifying groups that are considered at a high risk of being flooded. These groups are specified in the flood_groups parameter. An additional parameter of phr_aggressive makes this filter seed on the Path header if no NNTP-Posting-Host is supplied. 1
do_fsl Replace An EMP seeded using the From, Subject and Lines headers. Similar in function to the PHL filter, it attempts to trap spam that evades the MD5 filter but also doesn't have an NNTP-Posting-Host header. 1
do_scoring_filter Replace When enabled Cleanfeed scores each article according to various spam characteristics. If the score exceeds a threshold, the message is rejected. 1
do_emp_dump Replace Enables dumping of hashes from the various EMP filters to disk. This means the filters are retained between INN restarts. This parameter requires emp_dump_file to be defined. 1
emp_dump_file Local The location to dump the EMP hash entries to if do_emp_dump is enabled.
MD5RateCutoff Replace How many identical MD5 hashes to allow before rejecting further duplicates. 5
MD5RateCeiling Replace When the count against a specific MD5 hash reaches this number, stop incrementing any further. 85
MD5RateBaseInterval Replace After this many seconds, begin to decrement the count against a specific hash. 7200
PHLRateCutoff Replace See MD5RateCutoff 20
PHLRateCeiling Replace See MD5RateCeiling 80
PHLRateBaseInterval Replace See MD5RateBaseInterval 3600
PHNRateCutoff Replace See MD5RateCutoff 150
PHNRateCeiling Replace See MD5RateCeiling 200
PHNRateBaseInterval Replace See MD5RateBaseInterval 3600
PHLRateCutoff Replace See MD5RateCutoff 10
PHRRateCeiling Replace See MD5RateCeiling 80
PHRRateBaseInterval Replace See MD5RateBaseInterval 3600
FSLRateCutoff Replace See MD5RateCutoff 20
FSLRateCeiling Replace See MD5RateCeiling 40
FSLRateBaseInterval Replace See MD5RateBaseInterval 1800
fuzzy_md5 Replace If the message payload is less than fuzzy_max_length lines then modify it before hashing. The modification strips out white spaces, converts everything to lowercase and does some other simplification to help match payloads that would be identical except for minor formatting differences.. 1
fuzzy_max_length Replace Only do fuzzy_md5 if the payload is less than this number of lines. 700
md5_max_length Replace Don't attempt to produce MD5 hashes for messages that exceed this number of lines. This is largely down to system performance as producing cryptographic hashes can take time and CPU. 2000
trim_interval Replace How frequently (in seconds) to review the EMP hash tables to see if any entries can be decremented. 900
stats_interval Replace How frequently (in seconds) should Cleanfeed produce its status file. In order for stats to be produced, either statfile must be defined or inn_syslog_status must be set to True. 3600
MIDmaxlife Replace How long (in hours) to remember rejected Message-ID's. 4
md5_skips_followups Replace If True, the MD5 filter will not be applied to articles containing a References header. As it's trivial for an abuser to add such a header, enabling this is not recommended for normal Usenet operations. 0
phn_aggressive Replace The PHN Filter tries to create hashes based on the NNTP-Posting_Host of the sender. If this header doesn't exist, then setting phn_aggressive to True will cause it to fall back on the Path header instead. The implication of this is that messages to a specific newsgroup will be rejected if too many originate from the same service provider instead of merely the same poster. 1
phr_aggressive Replace See phn_aggressive. As the PHR Filter only applies to defined newsgroups, it's advisable to leave this enabled unless you have a PHR group where multiple people use the same provider, one of whom is flooding the group. 1
do_mid_filter Replace If True, use the Message-ID checking filter. This works in combination with the refuse_messageids Regular Expression. 1
do_supersedes_filter Replace Supersedes headers enable users to indicate that a new posting supersedes the information in a previous one. Whilst useful, they are also often used for abuse purposes. This filter checks that a message isn't excessively superseded. The actual limit is evaluated on the Newsgroups being posted to. Also check the supersedes_exempt Regular Expression. 1
drop_useless_controls Replace Reject control messages of types sendsys, senduuname and version. If drop_ihave_sendme is True, additionally ihave and sendme control messages will also be rejected. 1
drop_ihave_sendme Replace If drop_useless_controls is True, then setting this to True will cause ihave and sendme control messages to also be dropped. 1
bad_rate_reload Replace After this many articles have passed through Cleanfeed, all the bad_* files will be re-read from disk. The default value causes a reload approximately every hour on a full text-only news server. For a server processing less articles, the value will need to be dropped accordingly. 10000
low_xpost_maxgroups Replace Override the default maxgroups setting for distributions that include a group considered unsuitable for broad cross-posting. These groups are defined in the low_xpost_groups Regular Expression. 6
meow_ext_maxgroups Replace Meow groups date back to 1996 when the most famous Usenet flame-wars erupted. More details about them can be read here. The actual Meow groups are defined in the meow groups Regular Expression. This parameter controls how many other groups are allowed in a distribution that includes a meow_group. 2
binaries_in_mod_groups Replace Setting this to True will allow binaries to be posted to moderated newsgroups. This is probably a bad idea and the moderators of the groups in question are unlikely to appreciate it. 0
block_mime_html Replace Block MIME encapsulated HTML. HTML is considered evil on Usenet. MIME encapsulated HTML doubly so. 1
block_html Replace Block text/html encoded postings. In some groups such as the Microsoft hierarchy, HTML postings are considered acceptable. The list of groups where it's accepted is defined in the html_allowed Regular Expression. 1
off_topic1_maxgroups Replace Define how many off-topic groups are permitted when a distribution contains a number of groups defined in topic1_groups or topic2_groups. The number of on-topic groups required to trigger the filters is defined in on_topic1_mingroups. 2
on_topic1_mingroups Replace This relates to the above off_topic1_maxgroups It defines how many groups must match topic1_groups or topic2_groups in order to trigger the topic filters. 2
on_topic1_maxgroups Replace This relates to the above off_topic1_maxgroups. It defines the maximum permissible number of groups that match topic1_groups. 5
off_topic2_maxgroups Replace See off_topic1_maxgroups. 2
on_topic2_mingroups Replace See on_topic1_mingroups. 2
on_topic2_maxgroups Replace See on_topic1_maxgroups. 5
active_file Local The location of the INN active file. If defined, Cleanfeed will use the active file to check if groups are Moderated instead of using the in-built &INN::newsgroup function. As Cleanfeed only reads the active file on startup/reload, it's best to go with the internal function by leaving this undefined.
debug_batch_directory Local The directory to which Cleanfeed will write any log files generated through the saveart function. If this points to a non-existent directory, your News Server's logfiles will contain lots of "No such file or directory" messages. By default, (without a cleanfeed.local file) this parameter is unset which results in no logfiles being generated.
statfile Local Location where Cleanfeed should write it's text status file. For a default install, /usr/local/news/cleanfeed/log/cleanfeed.stats is probably a good choice.
html_statfile Local If defined, Cleanfeed will write a crude HTML status file to this location. I find the text version fine, your milage may vary.

Summary

That's all the Cleanfeed configuration options. There are a lot of them and for newcomers it must look a bit daunting. Please don't be put off, you really only need to change the defaults in instances where they prove unsuitable for your site. In most instances they err on the side of caution; working on a principle that spam and abuse getting through is better than good articles getting rejected.

In the next section I'll cover the other aspect of Cleanfeed configuration, the Regular Expressions themselves.