Aggregator service
Text management, adding new texts, preprocessing text, sending asynchronous requests to other services, viewing check results.
Models
Text
-
fields:
- created: text creation date;
- language: string of length 2 with a language code;
- cleared_headline: article headline cleared by checker.CorrChecker;
- cleared_text: article text cleared by checker.CorrChecker;
- message_id: unique text identifier, integer in hex form, length 32;
- headline: raw article headline;
- raw_text: raw article text.
-
methods:
- clean: text clearing, correctness checking and language detection.
TextsAdmin
- fields:
- token: unique superuser identifier for API authentication.
Submodules
arclient
HTTP client for sending asynchronous requests. Sends a GET or POST requests to the specified list of services. Implements requests timeout, adding custom headers and logging.
checker
Text correctness checker. Workflow:
- detect language by langdetect, port of Google's language-detection library;
- delete all non-language symbols except spaces;
- check text correctness by entropic criteria;
- stem words by Snowball stemmers;
- delete stopwords.
H1 and H2 values for entropic criteria:
Language | H1 | H2 |
---|---|---|
English | 4.18 | 3.95 |
Russian | 4.5 | 4.2 |
Ukrainian | 4.6 | 4.2 |