About | Projects | Publications | Contacts
Русский | English

Detector: Module for detecting banned messages (LLC CentraSib)

Personal Knowledge Base Designer was used in the development of the web-based decision-making module «Detector» for the «SMS-Organizer» platform.

During the project, the PKBD functionality was expanded to support the import (transformation) of decision tables, as well as the generation of source codes in PHP. As the main result, a knowledge base was developed for the module for detecting messages that violate the Federal Low #38 "On Advertising", as well as subscribers that sending them. The data on the sent messages of the «SMS Organizer» platform was used as a source of information.

The developed module was tested on a selected database of 1,366,490 messages. In particular, after the modification, the knowledge base of the module was 498 rules, which provided detection of 653 banned messages from the test set, so the accuracy was 0.83. The average execution time (speed) was 0.00026 s. As a result of the inspection by the Detector subsystem of a set of 1,366,490 messages, 1,145 SPAM messages and 25 clients (senders) were identified that violated the provisions of the Federal Low #38, but which were not previously detected. A detailed description of the application is given in the paper:

Yurin A.Yu., Dorodnykh N.O. Creating Web Decision-Making Modules on the Basis of Decision Tables Transformations // Communications in Computer and Information Science. Modelling and Development of Intelligent Systems (MDIS 2020), 2021, Vol. 1341, P. 167-184. DOI: 10.1007/978-3-030-68527-0_11

The developing a prototype of knowledge bases for a module can be presented in the form of a diagram (Fig.1).

The development scheme for a PHP module using PKBD
Fig.1 The development scheme for a PHP module using PKBD

Next, let’s consider the steps in more detail.

Step 1. At the stepe the analysis of a domain was performed and key abstractions were identified (Fig. 2). At the conceptual level, the main elements in the problem being solved are the "SMS message" (Message) containing "keywords" (Keyword), the so-called "markers" of SPAM messages, and the Sender of messages (Sender). Later, during formalization, the model was simplified in order to better match the structure of logical rules.

The database of 1 366 490 messages was analyzed and 829 messages of clients who were previously caught in sending SPAM messages were selected.

The domain model
Рис.2 The domain model

The selected messages formed a training database (test set), in which five main groups were identified (Fig. 3): propaganda of prohibited substances; unfair advertising; fraud; threats and insults; without signs of banned messages. For each group, a possible recommendation of the decision support module is defined, in particular, blocking or not sending a message (changing its status) and blocking or not the sender.

The main groups of banned messages
Fig.3 The main groups of banned messages

Step 2. The selected concepts were used in the construction of platform-independent models, including in the form of a decision table containing information about 487 unique sets (Fig. 4). The table was developed by automated analysis of the message database, the table structure is formed by listing the properties of all concepts, where each property is a column of the table.

A fragment of a decision table
Fig.3 A fragment of a decision table

Step 3. Building platform-specific models (specific facts and rules) was carried out using PKBD by importing the developed decision table (Fig. 5) and then refining them in the form of RVML (Fig. 6), including the priority of the rules and the «by default» values.

A PKBD GUI form fragment: importing a decision table
Fig.5 A PKBD GUI form fragment: importing a decision table

A PKBD GUI form fragment: RVML representation of a rule
Fig.6 A PKBD GUI form fragment: RVML representation of a rule

Step 4. Using the PKBD generator, 6871 lines of PHP code describing 487 rules were synthesized. At the same time, each logical rule was presented in the form of a conditional operator (IF operator), in the condition of which the occurrence of certain keywords in the sent message is analyzed. A fragment of the received code is shown below:

<?php //************** exported from PKBD **************** // version: 4.2018.0201.6 // knowledge base: // info: //****************** classes *********************** class InputData_1{ var $Keyword; function Init(){ $this->Keyword = ""; } } class OutPut_1{ var $MesStatus; var $ClientStatus; var $FiredRule; function Init(){ $this->MesStatus = ""; $this->ClientStatus = ""; $this->FiredRule = ""; } } //******** Initialization (facts) ****************** $InputData_1_ = new InputData_1; $InputData_1_->Init(); $OutPut_1_ = new OutPut_1; $OutPut_1_->Init(); //**************** rules *************************** //rule_1-2 if ( ((strpos($InputData_1_->Keyword, "1п+1п=2п") !== false)) ){ $OutPut_1_->MesStatus = "Error"; $OutPut_1_->ClientStatus = "1"; $OutPut_1_->FiredRule = "rule_1-2"; } …

The priorities of the rules were taken into account by sorting the atomic conditional operators within the computing block (function), so that if the rule was activated, the output was stopped and the block was exited, and the result of the last activated rule was returned.

Later, the generated code was syntered to the «SMS Organizer» platform.

Step 5. Testing of the developed module "Detector" was carried out using the previously allocated message database. In particular, after the modification, the knowledge base of the module was 498 rules, which provided detection of 653 unwanted messages from the test set, so the accuracy was 0.83. The average execution time (speed) was 0.00026 s. As a result of the inspection by the Detector subsystem of a set of 1366490 messages, 1145 SPAM messages and 25 clients (senders) were identified, but which were not previously detected.

RSS.The Knowledge Core News:

03.03.2020
PKBD 4.2020.0303
01.02.2018
PKBD 4.2018.0201
28.10.2016
PKBD 4.2016.1028
15.10.2015
PKBD 3.2015.1015