Home > Cleanfeed Files Explained
| Filename | File Type | Description | |
| bad_adult_paths | Text | Messages containing a Path string that matches an entry in this file will be rejected if the distribution contains a group intended for adult material. Adult groups are defined in parameters adult_groups and not_adult_groups, but more on that in the configuration parameters section. | |
| bad_body | Regex | A list of Perl Regular Expressions for comparison with the message body. Matching content will result in the message being rejected. This file is only intended for simple regex matching such as spammed url's, email addresses or telephone numbers. More complex filters should be applied in cleanfeed.local, although a filter reload will be needed in order to activate changes applied to that file. | |
| bad_url | Regex | This works exactly like bad_body but will only match url's. This is achieved by prefixing all the filters in this file with "http://" or "www.". This file is intended to offer concenience over using bad_body whilst also removing the risk of filtering your entire feed by mistake. | |
| bad_url_central | Regex | Identical functionality to bad_url. The "_central" suffix implies that the file can be downloaded from a central register of spammed url's. | |
| bad_cancel_paths | Text | Cleanfeed will reject cancel messages where the Path contains one of these plain-text strings. | |
| bad_from | Regex | Messages will be rejected if the From header matches a Regular Expression defined in this file. Use with caution, these aren't sanity checked. If you tell Cleanfeed to filter every message, it will! | |
| bad_hosts | Text | Messages with an NNTP-Posting_Host that matches one of the entries in this file will be rejected. | |
| bad_hosts_central | Text | Exactly the same as bad_hosts, except this file is maintained and published by a central resource. If you don't want to use it, simply delete it and don't download the updates. | |
| bad_paths | Text | As with bad_adult_paths, except bad_paths applies to messages posted to any groups, not just adult ones. | |
| cleanfeed | Perl | The main filtering engine | |
| cleanfeed.local | Perl | A configuration file for customisation of Cleanfeed's behaviour. Although just an example, this file provides a good starting point and its use for beginners is strongly recommended. | 
| Function | Description | 
| local_flag_localfeed | In many news configurations, articles are fed to the server
from sources that are considered local.  The role of this function is to identify and tag
these local articles. The operator should define one or more conditional rules that
cause the function to return True if the source of the article is local.  For example: 
sub local_flag_localfeed {
    return 1 if $hdr{'X-Trace'} =~ /\.mydomain\.com/;
    return 1 if $hdr{Path} =~ /^not\-for\-mail$/;
    return 1 if $hdr{'Message-ID'} =~ /\@mydomain.com>$/;
    return 0;
};
 | local_flag_spamsource | Define a set of rules that return True if a posting
originates from a news-service that is considered a frequent source of spam.
One such example might be: 
sub local_flag_spamsource {
    return 1 if $hdr{'Injection-Info'} =~ /googlegroups\.com/;
    return 1 if $hdr{Path} =~ /newsguy\.com!news\d$/;
    return 0;
};
When True is returned, the Cleanfeed global variable $spamsource will be set
True. Besides user-defined functions, this variable is only used within
Cleanfeed's scoring functions.  Consequently if scoring is disabled, it will do
nothing. | 
| local_filter_first | Instructions within this function are performed against normal (non-control) articles. They happen very early on in the filtering process prior to any binary or EMP (Excessive Multi-Post) checks. | 
| local_filter_bot | Processed after the binary but before the EMP checks, filters within this function are intended to enhance the in-built bot signature scanning. | 
| local_filter_after_emp | As it's name suggests, instructions within this function are processed after Cleanfeed's EMP checks. This implies that articles rejected by this function will still seed the EMP hashes. | 
| local_filter_scoring | This function is intended to return article scores that are
appended to the score generated by the internal scoring filters.  If the
resulting score exceeds a defined threshold, the article is rejected. Scores
applied within this function must be correctly formatted. To negatively score an article: **NiceArt This will reduce the total score by 2. To positively score an article: !!!NastyArt This will increase the total score by 3. Here is an example that will deduct 1 from locally posted articles and add 2 to groups with "spam" in their name. 
sub local_filter_scoring {
    my $score = '';
    $score .= '*LocalPost' if $localfeed;
    $score .= '!!SpamGrp' if $hdr{Newsgroups} =~ /spam/;
    return $score;
};
 | 
| local_filter_last | Called last, after articles have been processed by all the
Cleanfeed internal filters.  This is probably the best function to put general
filters in.  This example will log and reject articles with PLUGH in the Subject: 
sub local_filter_last {
    if ($hdr{Subject} =~ /PLUGH/) {
        logart('plugh', 'Subject PLUGH');
	return reject('Subject contains PLUGH', 'PLUGH');
    };
};
 | 
| local_filter_cancel | This function isn't applied to normal messages, only Control
Cancel ones.  It runs after the internal Cleanfeed Cancel checks and offers a
means to define local policies for the handling of Cancel messages.  Some
operators choose not to honour Cancels at all, as per this example: 
sub local_filter_cancel {
    # Reject all locally posted cancel messages.
    if ($localfeed) {
        logart('local.cancel', 'Local Cancel');
        return reject('Cancels forbidden');
    };
};
 | 
| local_filter_control | This function is called on messages containing a newgroup or
rmgroup Control header and after internal checks have been processed.  Should
the Cleanfeed internal checks cause a message to be rejected, processing will
complete before this function is called.  In these instances, local_filter_reject should be used for further
actions, such as logging. All rejected Control messages have a short reason of
Bad control message. Example local_filter_cancel: 
sub local_filter_control {
    # Log locally posted Control messages.
    if ($localfeed) {
        logart('ctrl_rmgroup', 'rmgroup', 271)
            if $hdr{Control} =~ /^\s*rmgroup/i;
        logart('ctrl_newgroup', 'newgroup', 271)
            if $hdr{Control} =~ /^\s*newgroup/i;
    };
};
 | 
| local_filter_reject | The function doesn't quite fit the mould of the others.
Instructions within it are only processed on articles that have already
been rejected.  This is a good place to define log files for rejected articles.
There are some examples in the sample cleanfeed.local. In this example, we log rejected articles where the Message-ID contains Xyzzy. 
sub local_filter_reject {
    logart('xyzzy', 'Subject Xyzzy') if $hdr{'Message-ID'} =~ /Xyzzy/;
};
 | 
| local_config | Unlike all the other functions, this one doesn't contain any filters. It's the place for adjusting cleanfeed's internal defaults, such as the bad_rate_reload parameter described above. Details of all the adjustable parameters can be found in the configuration parameters section. |