summaryrefslogtreecommitdiff
path: root/searx/locales.py
AgeCommit message (Collapse)Author
2025-09-03[mod] drop: from __future__ import annotationsMarkus Heiser
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-09-03[mod] addition of various type hints / tbcMarkus Heiser
- pyright configuration [1]_ - stub files: types-lxml [2]_ - addition of various type hints - enable use of new type system features on older Python versions [3]_ - ``.tool-versions`` - set python to lowest version we support (3.10.18) [4]_: Older versions typically lack some typing features found in newer Python versions. Therefore, for local type checking (before commit), it is necessary to use the older Python interpreter. .. [1] https://docs.basedpyright.com/v1.20.0/configuration/config-files/ .. [2] https://pypi.org/project/types-lxml/ .. [3] https://typing-extensions.readthedocs.io/en/latest/# .. [4] https://mise.jdx.dev/configuration.html#tool-versions Signed-off-by: Markus Heiser <markus.heiser@darmarit.de> Format: reST
2025-01-28[refactor] typification of SearXNG (initial) / result items (part 1)Markus Heiser
Typification of SearXNG ======================= This patch introduces the typing of the results. The why and how is described in the documentation, please generate the documentation .. $ make docs.clean docs.live and read the following articles in the "Developer documentation": - result types --> http://0.0.0.0:8000/dev/result_types/index.html The result types are available from the `searx.result_types` module. The following have been implemented so far: - base result type: `searx.result_type.Result` --> http://0.0.0.0:8000/dev/result_types/base_result.html - answer results --> http://0.0.0.0:8000/dev/result_types/answer.html including the type for translations (inspired by #3925). For all other types (which still need to be set up in subsequent PRs), template documentation has been created for the transition period. Doc of the fields used in Templates =================================== The template documentation is the basis for the typing and is the first complete documentation of the results (needed for engine development). It is the "working paper" (the plan) with which further typifications can be implemented in subsequent PRs. - https://github.com/searxng/searxng/issues/357 Answer Templates ================ With the new (sub) types for `Answer`, the templates for the answers have also been revised, `Translation` are now displayed with collapsible entries (inspired by #3925). !en-de dog Plugins & Answerer ================== The implementation for `Plugin` and `Answer` has been revised, see documentation: - Plugin: http://0.0.0.0:8000/dev/plugins/index.html - Answerer: http://0.0.0.0:8000/dev/answerers/index.html With `AnswerStorage` and `AnswerStorage` to manage those items (in follow up PRs, `ArticleStorage`, `InfoStorage` and .. will be implemented) Autocomplete ============ The autocompletion had a bug where the results from `Answer` had not been shown in the past. To test activate autocompletion and try search terms for which we have answerers - statistics: type `min 1 2 3` .. in the completion list you should find an entry like `[de] min(1, 2, 3) = 1` - random: type `random uuid` .. in the completion list, the first item is a random UUID Extended Types ============== SearXNG extends e.g. the request and response types of flask and httpx, a module has been set up for type extensions: - Extended Types --> http://0.0.0.0:8000/dev/extended_types.html Unit-Tests ========== The unit tests have been completely revised. In the previous implementation, the runtime (the global variables such as `searx.settings`) was not initialized before each test, so the runtime environment with which a test ran was always determined by the tests that ran before it. This was also the reason why we sometimes had to observe non-deterministic errors in the tests in the past: - https://github.com/searxng/searxng/issues/2988 is one example for the Runtime issues, with non-deterministic behavior .. - https://github.com/searxng/searxng/pull/3650 - https://github.com/searxng/searxng/pull/3654 - https://github.com/searxng/searxng/pull/3642#issuecomment-2226884469 - https://github.com/searxng/searxng/pull/3746#issuecomment-2300965005 Why msgspec.Struct ================== We have already discussed typing based on e.g. `TypeDict` or `dataclass` in the past: - https://github.com/searxng/searxng/pull/1562/files - https://gist.github.com/dalf/972eb05e7a9bee161487132a7de244d2 - https://github.com/searxng/searxng/pull/1412/files - https://github.com/searxng/searxng/pull/1356 In my opinion, TypeDict is unsuitable because the objects are still dictionaries and not instances of classes / the `dataclass` are classes but ... The `msgspec.Struct` combine the advantages of typing, runtime behaviour and also offer the option of (fast) serializing (incl. type check) the objects. Currently not possible but conceivable with `msgspec`: Outsourcing the engines into separate processes, what possibilities this opens up in the future is left to the imagination! Internally, we have already defined that it is desirable to decouple the development of the engines from the development of the SearXNG core / The serialization of the `Result` objects is a prerequisite for this. HINT: The threads listed above were the template for this PR, even though the implementation here is based on msgspec. They should also be an inspiration for the following PRs of typification, as the models and implementations can provide a good direction. Why just one commit? ==================== I tried to create several (thematically separated) commits, but gave up at some point ... there are too many things to tackle at once / The comprehensibility of the commits would not be improved by a thematic separation. On the contrary, we would have to make multiple changes at the same places and the goal of a change would be vaguely recognizable in the fog of the commits. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-11-24[chore] *: fix typos detected by typos-cliBnyro
2024-09-15[fix] fetch_traits: brave, google, annas_archive & radio_browserMarkus
This patch fixes a bug reported by CI "Fetch traits" [1] (brave) and improves other fetch traits functions (google, annas_archive & radio_browser). brave: File "/home/runner/work/searxng/searxng/searx/engines/brave.py", line 434, in fetch_traits sxng_tag = region_tag(babel.Locale.parse(ui_lang, sep='-')) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/runner/work/searxng/searxng/searx/locales.py", line 155, in region_tag Error: raise ValueError('%s missed a territory') google: change ERROR message about unknow UI language to INFO message radio_browser: country_list contains duplicates that differ only in upper/lower case annas_archive: for better diff; sort the persistence of the traits [1] https://github.com/searxng/searxng/actions/runs/10606312371/job/29433352518#step:6:41 Signed-off-by: Markus <markus@venom.fritz.box>
2024-03-11[mod] pylint all files with one profile / drop PYLINT_SEARXNG_DISABLE_OPTIONMarkus Heiser
In the past, some files were tested with the standard profile, others with a profile in which most of the messages were switched off ... some files were not checked at all. - ``PYLINT_SEARXNG_DISABLE_OPTION`` has been abolished - the distinction ``# lint: pylint`` is no longer necessary - the pylint tasks have been reduced from three to two 1. ./searx/engines -> lint engines with additional builtins 2. ./searx ./searxng_extra ./tests -> lint all other python files Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-02-20[mod] reduce memory footprint by not calling babel.Locale.parse at runtimeAlexandre Flament
babel.Locale.parse loads more than 60MB in RAM. The only purpose is to get: LOCALE_NAMES - searx.data.LOCALES["LOCALE_NAMES"] RTL_LOCALES - searx.data.LOCALES["RTL_LOCALES"] This commit calls babel.Locale.parse when the translations are update from weblate and stored in:: searx/data/locales.json This file can be build by:: ./manage data.locales By store these variables in searx.data when the translations are updated we save round about 65MB (usually 4 worker = 260MB of RAM saved. Suggested-by: https://github.com/searxng/searxng/discussions/2633#discussioncomment-8490494 Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
2023-09-18[fix] spellingjazzzooo
2023-05-19use logger.warningpankaj
logger.warn() is depricated. logger.warning is already being used in some files.
2023-04-17[fix] doc of locales.get_engine_locale() / zh-classical is missleadingMarkus Heiser
Wikipedia's zh-classical is not zh_Hant (see doc-string of engines.wikipedia). Fixed the example in the doc-string of locales.get_engine_locale() to 'zh_TW'. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24[mod] replace utils.match_language by locales.match_localeMarkus Heiser
This patch replaces the *full of magic* ``utils.match_language`` function by a ``locales.match_locale``. The ``locales.match_locale`` function is based on the ``locales.build_engine_locales`` introduced in 9ae409a0 [1]. In the past SearXNG did only support a search by a language but not in a region. This has been changed a long time ago and regions have been added to SearXNG core but not to the engines. The ``utils.match_language`` was the function to handle the different aspects of language/regions in SearXNG core and the supported *languages* in the engine. The ``utils.match_language`` did it with some magic and works good for most use cases but fails in some edge case. To replace the concurrence of languages and regions in the SearXNG core the ``locales.build_engine_locales`` was introduced in 9ae409a0 [1]. With the last patches all engines has been migrated to a ``fetch_traits`` and a language/region concept that is based on ``locales.build_engine_locales``. To summarize: there is no longer a need for the ``locales.match_language``. [1] https://github.com/searxng/searxng/pull/1652 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24[mod] replace engines_languages.json by engines_traits.jsonMarkus Heiser
Implementations of the *traits* of the engines. Engine's traits are fetched from the origin engine and stored in a JSON file in the *data folder*. Most often traits are languages and region codes and their mapping from SearXNG's representation to the representation in the origin search engine. To load traits from the persistence:: searx.enginelib.traits.EngineTraitsMap.from_data() For new traits new properties can be added to the class:: searx.enginelib.traits.EngineTraits .. hint:: Implementation is downward compatible to the deprecated *supported_languages method* from the vintage implementation. The vintage code is tagged as *deprecated* an can be removed when all engines has been ported to the *traits method*. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-01-20Bump flask-babel from 2.0.0 to 3.0.0dependabot[bot]
Bumps [flask-babel](https://github.com/python-babel/flask-babel) from 2.0.0 to 3.0.0. - [Release notes](https://github.com/python-babel/flask-babel/releases) - [Changelog](https://github.com/python-babel/flask-babel/blob/master/CHANGELOG) - [Commits](https://github.com/python-babel/flask-babel/compare/v2.0.0...v3.0.0) --- updated-dependencies: - dependency-name: flask-babel dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
2022-11-05searx.locale: add Dhivehi languageAlexandre FLAMENT
2022-11-05searx.locales: improve support for languages not supported by babelAlexandre FLAMENT
* refactor get_translations() to rely on ADDITIONAL_TRANSLATIONS and LOCALE_BEST_MATCH * update RTL_LOCALES for languages in ADDITIONAL_TRANSLATIONS
2022-09-18[fix] and improve docs generated from source code.Markus Heiser
Fix:: searx/locales.py:docstring of searx.locales.get_engine_locale:17: \ WARNING: Definition list ends without a blank line; unexpected unindent. Improvement: don't show default values in the generated documentation whe it is more a mess than a usefull information (`:meta hide-value:`). Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-14[fix] get_engine_locale: better approximation of 'en' is 'en-US'Markus Heiser
Compared to `en-EN` the better approximation of 'en' is 'en-US'. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-14[fix] typo in get_engine_localeMarkus Heiser
Due to a typo in get_engine_locale, a language selection like `!qw :de siemens` did not work. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-14[fix] harden get_engine_locale: handle UnknownLocaleError exceptionsMarkus Heiser
When a user selects an unknown or invalid locale by using the search syntax: !qw siemens :de-TW Before this patch a UnknownLocaleError exception will be rasied: ``` Traceback (most recent call last): File "SearXNG/searx/search/processors/online.py", line 154, in search search_results = self._search_basic(query, params) File "SearXNG/searx/search/processors/online.py", line 128, in _search_basic self.engine.request(query, params) File "SearXNG/searx/engines/qwant.py", line 98, in request q_locale = get_engine_locale(params['language'], supported_languages, default='en_US') File "SearXNG/searx/locales.py", line 216, in get_engine_locale locale = babel.Locale.parse(searxng_locale, sep='-') File "SearXNG/local/py3/lib/python3.8/site-packages/babel/core.py", line 330, in parse raise UnknownLocaleError(input_id) ``` This patch implements a simple exception handling, since e.g. `de-TW` does not exists `de` will be used to get engines locale. On invalid terms like `xy-XY` the default will be returned. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-14[mod] add locale.get_engine_locale to get predictable resultsMarkus Heiser
The match_language function sometimes returns incorrect results which is why a new function get_engine_locale is required. A bugfix of the match_language is not easily possible, because there is almost no documentation for it and already the call parameters are undefined. E.g. the function processes values like the ones from yahoo:: "yahoo": [ "ar", ... "zh_chs", "zh_cht" ] The get_engine_locale has been documented in detail, there is a clear description of the assumptions as well as the requirements and approximation rules (read doc-string for more details):: Argument ``engine_locales`` is a python dict that maps *SearXNG locales* to corresponding *engine locales*: <engine>: { # SearXNG string : engine-string 'ca-ES' : 'ca_ES', 'fr-BE' : 'fr_BE', 'fr-CA' : 'fr_CA', 'fr-CH' : 'fr_CH', 'fr' : 'fr_FR', ... 'pl-PL' : 'pl_PL', 'pt-PT' : 'pt_PT' } .. hint:: The *SearXNG locale* string has to be known by babel! In the following you will find a comparison: >>> import babel.languages >>> from searx.utils import match_language >>> from searx.locales import get_engine_locale Assume we have an engine that supports the follwoing locales: >>> lang_list = { ... "zh-CN": "zh_CN", ... "zh-HK": "zh_HK", ... "nl-BE": "nl_BE", ... "fr-CA": "fr_CA", ... } Assumption: A. When a user selects a language the results should be optimized according to the selected language. B. When user selects a language and a territory the results should be optimized with first priority on territory and second on language. ---- Example: (Assumption A.) A user selects region 'zh-TW' which should end in zh_HK hint: CN is 'Hans' and HK ('Hant') fits better to TW ('Hant') >>> get_engine_locale('zh-TW', lang_list) 'zh_HK' >>> lang_list[match_language('zh-TW', lang_list)] 'zh_CN' ---- Example: (Assumption A.) A user selects only the language 'zh' which should end in CN >>> get_engine_locale('zh', lang_list) 'zh_CN' >>> lang_list[match_language('zh', lang_list)] 'zh_CN' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE hint: priority should be on the territory the user selected. If the user prefers 'fr' he will select 'fr' without a region tag. >>> get_engine_locale('fr-BE', lang_list, default='unknown') 'nl_BE' >>> match_language('fr-BE', lang_list, fallback='unknown') 'fr-CA' ---- Example: (Assumption A.) A user selects only the language 'fr' which should end in fr_CA >>> get_engine_locale('fr', lang_list) 'fr_CA' >>> lang_list[match_language('fr', lang_list)] 'fr_CA' ---- The difference in priority on the territory is best shown with a engine that supports the following locales: >>> lang_list = { ... "fr-FR": "fr_FR", ... "fr-CA": "fr_CA", ... "en-GB": "en_GB", ... "nl-BE": "nl_BE", ... } ---- Example: (Assumption A.) A user selects only a language >>> get_engine_locale('en', lang_list) 'en_GB' >>> match_language('en', lang_list) 'en-GB' hint: the engine supports fr_FR and fr_CA since no territory is given, fr_FR takes priority .. >>> get_engine_locale('fr', lang_list) 'fr_FR' >>> lang_list[match_language('fr', lang_list)] 'fr_FR' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE >>> get_engine_locale('fr-BE', lang_list) 'nl_BE' >>> lang_list[match_language('fr-BE', lang_list)] 'fr_FR' ---- If the user selects a language and there are two locales like the following: >>> lang_list = { ... "fr-BE": "fr_BE", ... "fr-CH": "fr_CH", ... } >>> >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_BE' Looks like both functions return the same value, but match_language depends on the order of the dictionary (which is not predictable): >>> lang_list = { ... "fr-CH": "fr_CH", ... "fr-BE": "fr_BE", ... } >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_CH' >>> The get_engine_locale selects the locale by looking at the "population percent" and this percentage has an higher amount in BE (68.%) compared to CH (21%) Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-07-08locales.py: add support for PapiamentoAlexandre Flament
2022-06-14[doc] fix some leftovers from ad964562cMarkus Heiser
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-06-12[fix] move locale code from webapp.py to locales.py and fix #1303Markus Heiser
To improve modularization this patch: - moves *locale* related implementation from the webapp.py application to the locale.py module. - The initialization of the locales is now done in the application (webapp) and is no longer done while importing searx.locales. In the searx.locales module a new dictionary named `LOCALE_BEST_MATCH` has been added. In this dictionary we can map languages without a translation to languages we have a translation for. To fix #1303 zh-HK has been mapped to zh-Hant-TW (we do not need additional translations of traditional Chinese) Closes: https://github.com/searxng/searxng/issues/1303 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-05-06Add support for the Silesian languageAlexandre FLAMENT
2021-12-27[format.python] initial formatting of the python codeMarkus Heiser
This patch was generated by black [1]:: make format.python [1] https://github.com/psf/black Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-10-12[mod] locale: use hyphen everywhere except for BabelAlexandre Flament
2021-09-17[pylint] fix global-variable-not-assigned issuesMarkus Heiser
If there is no write access, there is no need for global. Remove global statement if there is no assignment. global-variable-not-assigned: Using global for names but no assignment is done Used when a variable is defined through the "global" statement but no assignment to this variable is done. In Pylint 2.11 the global-variable-not-assigned checker now catches global variables that are never reassigned in a local scope and catches (reassigned) functions [1][2] [1] https://pylint.pycqa.org/en/latest/whatsnew/2.11.html [2] https://github.com/PyCQA/pylint/issues/1375 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-08-04[mod] searx/locales.py: language names based on Unicode CLDRAlexandre Flament
rename "oc" to "Occitan": * https://github.com/unicode-org/cldr/blob/35.1/seed/main/oc.xml#L115 * https://oc.wikipedia.org/wiki/Occitan see https://github.com/searxng/searxng/pull/247#issuecomment-892382001
2021-08-04[mod] pylint & document searx.locales (settings.yml: remove locales)Markus Heiser
- Add ``# lint: pylint`` header to pylint this python file. - Fix issues reported by pylint. - Add source code documentation of modul searx.locales Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-08-03[mod] settings.yml: remove localesAlexandre Flament
There are detected from the searx/translations directory