summaryrefslogtreecommitdiff
path: root/searx/utils.py
AgeCommit message (Collapse)Author
2026-01-11[fix] google: switch to using GSA for iPhone useragentHEADmastermg95
2025-11-25[mod] replace js_variable_to_python by js_obj_str_to_python (#2792) (#5477)Markus Heiser
This patch is based on PR #2792 (old PR from 2023) - js_obj_str_to_python handle more cases - bring tests from chompjs .. - comment out tests do not pass The tests from chompjs give some overview of what is not implemented. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-11-20[fix] minor type hint issues (#5459)Markus Heiser
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-11-20[fix] utils.js_variable_to_python - partial revert of 156d1eb8c (#5458)Markus Heiser
The JS string, whose encoding will be corrupted if all single quotes (followed by a comma) are replaced with double quotes. Bug was introduced in PR #4573. Here is a simple example in which the list get corrupted:: >>> s = r"""[ 'foo\'', 'bar']""" >>> print(s) [ 'foo\'', 'bar'] >>> print(s.replace("',", "\",")) [ 'foo\'", 'bar'] Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-09-20[mod] typification of SearXNG: add new result type PaperMarkus Heiser
This patch adds a new result type: Paper - Python class: searx/result_types/paper.py - Jinja template: searx/templates/simple/result_templates/paper.html - CSS (less) client/simple/src/less/result_types/paper.less Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-09-18[mod] addition of various type hints / engine processorsMarkus Heiser
Continuation of #5147 .. typification of the engine processors. BTW: - removed obsolete engine property https_support - fixed & improved currency_convert - engine instances can now implement a engine.setup method [#5147] https://github.com/searxng/searxng/pull/5147 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-09-03[mod] drop: from __future__ import annotationsMarkus Heiser
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-09-03[mod] addition of various type hints / tbcMarkus Heiser
- pyright configuration [1]_ - stub files: types-lxml [2]_ - addition of various type hints - enable use of new type system features on older Python versions [3]_ - ``.tool-versions`` - set python to lowest version we support (3.10.18) [4]_: Older versions typically lack some typing features found in newer Python versions. Therefore, for local type checking (before commit), it is necessary to use the older Python interpreter. .. [1] https://docs.basedpyright.com/v1.20.0/configuration/config-files/ .. [2] https://pypi.org/project/types-lxml/ .. [3] https://typing-extensions.readthedocs.io/en/latest/# .. [4] https://mise.jdx.dev/configuration.html#tool-versions Signed-off-by: Markus Heiser <markus.heiser@darmarit.de> Format: reST
2025-08-18[fix] revision of utils.HTMLTextExtractor (#5125)Markus Heiser
Related: - https://github.com/searxng/searxng/pull/5073#issuecomment-3196282632
2025-07-26[fix] cleanup: rename `searx` leftovers to `SearXNG` (#5049)Markus Heiser
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-06-27[fix] utils: truncated result (#4949)Ivan Gabaldon
Make sure to prase everything before returning. Related: \ ``` FAIL: test_html_to_text (tests.unit.test_utils.TestUtils.test_html_to_text) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/runner/work/searxng/searxng/tests/unit/test_utils.py", line 53, in test_html_to_text self.assertEqual(utils.html_to_text(r"regexp: (?<![a-zA-Z]"), "regexp: (?<![a-zA-Z]") ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AssertionError: 'regexp: (?' != 'regexp: (?<![a-zA-Z]' - regexp: (? + regexp: (?<![a-zA-Z] ```
2025-05-21[feat] engines: add Naver engine (#4573)Zhijie He
Refactor Naver engine (Web, News, Images, Videos, Autocomplete) - ref: https://search.naver.com/ - lang: `ko` - Wikidata: https://www.wikidata.org/wiki/Q485639 Co-authored-by: Bnyro <bnyro@tutanota.com>
2025-04-01[fix] hardening against arguments of type None, where str or dict is expectedMarkus Heiser
On a long-running server, the tracebacks below can be found (albeit rarely), which indicate problems with NoneType where a string or another data type is expected. result.img_src:: File "/usr/local/searxng/searxng-src/searx/templates/simple/result_templates/images.html", line 13, in top-level template code <img src="" data-src="{{ image_proxify(result.img_src) }}" alt="{{ result.title|striptags }}">{{- "" -}} ^ File "/usr/local/searxng/searxng-src/searx/webapp.py", line 284, in image_proxify if url.startswith('//'): ^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'startswith' result.content:: File "/usr/local/searxng/searxng-src/searx/result_types/_base.py", line 105, in _normalize_text_fields result.content = WHITESPACE_REGEX.sub(" ", result.content).strip() ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^ TypeError: expected string or bytes-like object, got 'NoneType' html_to_text, when html_str is a NoneType:: File "/usr/local/searxng/searxng-src/searx/engines/wikipedia.py", line 190, in response title = utils.html_to_text(api_result.get('titles', {}).get('display') or api_result.get('title')) File "/usr/local/searxng/searxng-src/searx/utils.py", line 158, in html_to_text html_str = html_str.replace('\n', ' ').replace('\r', ' ') ^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'replace' presearch engine, when json_resp is a NoneType:: File "/usr/local/searxng/searxng-src/searx/engines/presearch.py", line 221, in response results = parse_search_query(json_resp.get('results')) File "/usr/local/searxng/searxng-src/searx/engines/presearch.py", line 161, in parse_search_query for item in json_results.get('specialSections', {}).get('topStoriesCompact', {}).get('data', []): ^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'get' Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-03-25[refactor] duration strings: move parsing logic to utils.pyBnyro
2025-03-08[feat] add bilibili support to get get_embeded_stream_urlAustin-Olacsi
2025-02-26[fix] various issues in the documentationMarkus Heiser
Closes: https://github.com/searxng/searxng/issues/4370 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-02-20[feat] startpage: support for news and imagesBnyro
2024-11-24[chore] *: fix typos detected by typos-cliBnyro
2024-10-03add get_embeded_stream_url to searx.utilsAustin-Olacsi
2024-07-27[feat] videos template: support for view countBnyro
2024-07-27[fix] remove unused code / `_STORAGE_UNIT_VALUE`Markus Heiser
The `_STORAGE_UNIT_VALUE` dictionary is a left over from: - https://github.com/searxng/searxng/pull/3570 in this PR we removed the old implementations but forgot to delete this `_STORAGE_UNIT_VALUE`. Closes: https://github.com/searxng/searxng/pull/3672 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-06-15[perf] torrents.html, files.html: don't parse and re-format filesizeBnyro
2024-05-29[enh] add re-usable func to filter textAllen
2024-04-08[fix] remove usage of no longer existing names from lxmlMarkus Heiser
In lxml 5.1.1 the private name `_ElementStringResult` in module `lxml.etree` does no longer exists. This code was written nearly a decade ago, its no longer clear what the intention `_ElementStringResult` and `_ElementUnicodeResult` had been. It can be assumed that these classes will no longer occur. Closes: https://github.com/searxng/searxng/issues/3368 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-03-11[mod] pylint all files with one profile / drop PYLINT_SEARXNG_DISABLE_OPTIONMarkus Heiser
In the past, some files were tested with the standard profile, others with a profile in which most of the messages were switched off ... some files were not checked at all. - ``PYLINT_SEARXNG_DISABLE_OPTION`` has been abolished - the distinction ``# lint: pylint`` is no longer necessary - the pylint tasks have been reduced from three to two 1. ./searx/engines -> lint engines with additional builtins 2. ./searx ./searxng_extra ./tests -> lint all other python files Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-03-07[fix] nyaa engine - paging support & filesize (GiB)Markus Heiser
BTW: pylint engine Closes: https://github.com/searxng/searxng/issues/3290 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-02-25[refactor] images: add resolution, image format and filesize fieldsBnyro
Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
2023-10-22[fix] HTMLParser: undocumented not implemented methodMarkus Heiser
In python versions <py3.10 there is an issue with an undocumented method HTMLParser.error() [1][2] that was deprecated in Python 3.4 and removed in Python 3.5. To be compatible to higher versions (>=py3.10) an error method is implemented which throws an AssertionError exception like the higher Python versions do [3]. [1] https://github.com/python/cpython/issues/76025 [2] https://bugs.python.org/issue31844 [3] https://github.com/python/cpython/pull/8562 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-09-18[fix] spellingjazzzooo
2023-09-15[fix] brave.newsjazzzooo
2023-09-09Replace chompjs with pure Python codeAlexandre Flament
The new implementation is good enough for the current usage (brave)
2023-09-08[mod] utils.py: add markdown_to_text helper functionBnyro
2023-03-24[mod] replace utils.match_language by locales.match_localeMarkus Heiser
This patch replaces the *full of magic* ``utils.match_language`` function by a ``locales.match_locale``. The ``locales.match_locale`` function is based on the ``locales.build_engine_locales`` introduced in 9ae409a0 [1]. In the past SearXNG did only support a search by a language but not in a region. This has been changed a long time ago and regions have been added to SearXNG core but not to the engines. The ``utils.match_language`` was the function to handle the different aspects of language/regions in SearXNG core and the supported *languages* in the engine. The ``utils.match_language`` did it with some magic and works good for most use cases but fails in some edge case. To replace the concurrence of languages and regions in the SearXNG core the ``locales.build_engine_locales`` was introduced in 9ae409a0 [1]. With the last patches all engines has been migrated to a ``fetch_traits`` and a language/region concept that is based on ``locales.build_engine_locales``. To summarize: there is no longer a need for the ``locales.match_language``. [1] https://github.com/searxng/searxng/pull/1652 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24[mod] replace searx.languages by searx.sxng_localesMarkus Heiser
With the language and region tags from the EngineTraitsMap the handling of SearXNG's tags of languages and regions has been normalized and is no longer a *mystery*. The "languages" became "locales" that are supported by babel and by this, the update_engine_traits.py can be simplified a lot. Other code places can be simplified as well, but these simplifications should (respectively can) only be done when none of the engines work with the deprecated EngineTraits.supported_languages interface anymore. This commit replaces searx.languages by searx.sxng_locales and fix the naming of some names from "language" to "locale" (e.g. language_codes --> sxng_locales). Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-02-19[doc] improved docs of implementations for automatic speech recognitionMarkus Heiser
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-02-17Add "Auto-detected" as a language.Alexandre Flament
When the user choose "Auto-detected", the choice remains on the following queries. The detected language is displayed. For example "Auto-detected (en)": * the next query language is going to be auto detected * for the current query, the detected language is English. This replace the autodetect_search_language plugin.
2022-12-26Lazy load fasttext-predictAlexandre Flament
2022-12-16Replace langdetect with fasttextArtikusHG
2022-09-27[fix] typos / reported by @kianmeng in searx PR-3366Markus Heiser
[PR-3366] https://github.com/searx/searx/pull/3366 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-07-30[fix] pyright repported errorsAlexandre Flament
The errors make pyright usage useless since a new error won't be seen [1]. [1] https://github.com/searxng/searxng/pull/1569 ``` searx/compat.py:11:27 - error: Expression of type "Type[cached_property[_T@cached_property]]" cannot be assigned to declared type "Type[cached_property]" "Type[cached_property[_T@cached_property]]" is incompatible with "Type[cached_property]" Type "Type[cached_property[_T@cached_property]]" cannot be assigned to type "Type[cached_property]" (reportGeneralTypeIssues) searx/utils.py:69:36 - error: Expression of type "None" cannot be assigned to parameter of type "str" Type "None" cannot be assigned to type "str" (reportGeneralTypeIssues) searx/utils.py:573:85 - error: Expression of type "None" cannot be assigned to parameter of type "int" Type "None" cannot be assigned to type "int" (reportGeneralTypeIssues) searx/webapp.py:1306:22 - error: Argument of type "str" cannot be assigned to parameter "__a" of type "BytesPath" in function "join" Type "str" cannot be assigned to type "BytesPath" "str" is incompatible with "bytes" "str" is incompatible with protocol "PathLike[bytes]" "__fspath__" is not present (reportGeneralTypeIssues) searx/webapp.py:1306:68 - error: Argument of type "Literal['themes']" cannot be assigned to parameter "paths" of type "BytesPath" in function "join" Type "Literal['themes']" cannot be assigned to type "BytesPath" "Literal['themes']" is incompatible with "bytes" "Literal['themes']" is incompatible with protocol "PathLike[bytes]" "__fspath__" is not present (reportGeneralTypeIssues) searx/webapp.py:1306:78 - error: Argument of type "str | Any | None" cannot be assigned to parameter "paths" of type "BytesPath" in function "join" Type "str | Any | None" cannot be assigned to type "BytesPath" Type "str" cannot be assigned to type "BytesPath" "str" is incompatible with "bytes" "str" is incompatible with protocol "PathLike[bytes]" "__fspath__" is not present (reportGeneralTypeIssues) searx/webapp.py:1306:85 - error: Argument of type "Literal['img']" cannot be assigned to parameter "paths" of type "BytesPath" in function "join" Type "Literal['img']" cannot be assigned to type "BytesPath" "Literal['img']" is incompatible with "bytes" "Literal['img']" is incompatible with protocol "PathLike[bytes]" "__fspath__" is not present (reportGeneralTypeIssues) searx/engines/mongodb.py:8:6 - warning: Import "pymongo" could not be resolved (reportMissingImports) searx/engines/mysql_server.py:9:8 - warning: Import "mysql.connector" could not be resolved (reportMissingImports) searx/engines/postgresql.py:9:8 - warning: Import "psycopg2" could not be resolved from source (reportMissingModuleSource) searx/engines/xpath.py:187:28 - warning: "categories" is not defined (reportUndefinedVariable) searx/search/__init__.py:184:82 - warning: "flask" is not defined (reportUndefinedVariable) searx/search/checker/background.py:19:26 - error: Type of "schedule" is partially unknown Type of "schedule" is "(delay: Any, func: Any, *args: Any) -> Literal[True]" (reportUnknownVariableType) searx/shared/__init__.py:8:12 - warning: Import "uwsgi" could not be resolved (reportMissingImports) searx/shared/shared_uwsgi.py:5:8 - warning: Import "uwsgi" could not be resolved (reportMissingImports) ```
2022-06-03[fix] prepare for pylint 2.14.0Markus Heiser
Remove issue reported by Pylint 2.14.0: - no-self-use: has been moved to optional extension [1] - The refactoring checker now also raises 'consider-using-generator' messages for max(), min() and sum(). [2] .pylintrc: - <option name>-hint has been removed since long, Pylint 2.14.0 raises an error on invalid options - bad-continuation and bad-whitespace have been removed [3] [1] https://pylint.pycqa.org/en/latest/whatsnew/2/2.14/summary.html#removed-checkers [2] https://pylint.pycqa.org/en/latest/whatsnew/2/2.14/full.html#what-s-new-in-pylint-2-14-0 [2] https://pylint.pycqa.org/en/latest/whatsnew/2/2.6/summary.html#summary-release-highlights Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-04-22[test.pyright] suppress unneeded error & warning messagesMarkus Heiser
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-04-16searx.utils.html_to_text: replace <br/> by a spaceAlexandre Flament
2022-01-30[mod] searx.utils: more typingAlexandre Flament
2022-01-29[mod] add documentation about searx.utilsAlexandre Flament
This module is a toolbox for the engines. Is should be documented. In addition, searx/utils.py is checked by pylint.
2021-12-27[format.python] initial formatting of the python codeMarkus Heiser
This patch was generated by black [1]:: make format.python [1] https://github.com/psf/black Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-10-12[fix] fix match_language issue to make zh-TW match to zh-Hant-TWMarc Abonce Seguin
pybabel separates locales with underscores but we use hyphens everywhere babel doesn't directly touch
2021-10-06[fix] don't mix loaded modules with imported modules (sys.modules)Markus Heiser
The utils.load_module() function is used to load a python file (aka module) and return the module's namespace. SearXNG uses this function to load *engines and answerers* from arbitrary locations with arbitrary modifications. These are not real python modules and it is not intended to mix this *engines and answerers* with the python modules registered in sys.modules. Closes: https://github.com/searxng/searxng/issues/312 Suggested-by: @dalf in https://github.com/searxng/searxng/issues/312 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-08-24[mod] searx.utils.dict_subset: rewrite with comprehensionAlexandre Flament
2021-07-30version based on the git repositoryAlexandre Flament
This commit remove the need to update the brand for GIT_URL and GIT_BRANCH: there are read from the git repository. It is possible to call python -m searx.version freeze to freeze the current version. Useful when the code is installed outside git (distro package, docker, etc...)