Spamassassin: increase spam score for emoticons in email subject line
We are receiving more and more spam messages and marketing mails with emoticons in the subject line and I want to increase the spam score of such messages using a custom spam assassin rule.
It seems the emoticons are not embedded images, but technically normal UTF-8 characters, since they even show up when loading of images from email messages in disabled in the email client. There seems to be no way to remove/block them in Outlook or Thunderbird.
My questions are:
Can you confirm that these emoticons are UTF-8 characters, or suggest how I could test that?
I would like to create a custom spam assassin rule to increase the spam score of messages that contain these emoticons in the subject line. How would I do this? If the are UTF-8 characters, is there a character range of emoticons I could look for?
Example image of messages:
Top Answer/Comment:
I just had a wave of spam with subject lines all starting with emoji, so I made a dedicated rule for those:
header LOCAL_SUBJ_START_EMOJI Subject =~ /^(\xf0\x9f|\xe2[\x98-\x9b])/
score LOCAL_SUBJ_START_EMOJI 1.0
Notes:
This only filters emoji at the start of the subject line. If you remove the ^, your false positive rate might increase, since it will also match those byte sequences if they appear inside an UTF-8 encoded Unicode codepoint.
This filters the following Unicode ranges:
- UTF-8 starting with F0 9F: 1F000-1FFFF, containing various emoji and other pictograms,
- UTF-8 starting with E2 98/99/9A/9B: 2600-26FF, aka "Miscellaneous Symbols".
Note that this does not cover all emoji, nor does it cover only emoji (e.g. 1F000-1FFFF also includes arrows and chess/playing card symbols). Adapt as needed.
Legitimate mails from kids or marketing departments might also use emoji in the subject line, so don't set the score too high.
Can you confirm that these emoticons are UTF-8 characters, or suggest how I could test that?
Currently, the SMTP standard does not support embedding images in e-mail headers (and I sure hope it stays that way), so, yes, Unicode characters are currently the only way to get little pictures into subject lines. To verify things like that, have a look at the raw headers of your e-mail. How to do that depends on the e-mail client you use.
상단 광고의 [X] 버튼을 누르면 내용이 보입니다