summaryrefslogtreecommitdiff
path: root/searx/engines
AgeCommit message (Collapse)Author
2025-01-28[refactor] typification of SearXNG / EngineResultsMarkus Heiser
In [1] and [2] we discussed the need of a Result.results property and how we can avoid unclear code. This patch implements a class for the reslut-lists of engines:: searx.result_types.EngineResults A simple example for the usage in engine development:: from searx.result_types import EngineResults ... def response(resp) -> EngineResults: res = EngineResults() ... res.add( res.types.Answer(answer="lorem ipsum ..", url="https://example.org") ) ... return res [1] https://github.com/searxng/searxng/pull/4183#pullrequestreview-257400034 [2] https://github.com/searxng/searxng/pull/4183#issuecomment-2614301580 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-01-28[refactor] typification of SearXNG (initial) / result items (part 1)Markus Heiser
Typification of SearXNG ======================= This patch introduces the typing of the results. The why and how is described in the documentation, please generate the documentation .. $ make docs.clean docs.live and read the following articles in the "Developer documentation": - result types --> http://0.0.0.0:8000/dev/result_types/index.html The result types are available from the `searx.result_types` module. The following have been implemented so far: - base result type: `searx.result_type.Result` --> http://0.0.0.0:8000/dev/result_types/base_result.html - answer results --> http://0.0.0.0:8000/dev/result_types/answer.html including the type for translations (inspired by #3925). For all other types (which still need to be set up in subsequent PRs), template documentation has been created for the transition period. Doc of the fields used in Templates =================================== The template documentation is the basis for the typing and is the first complete documentation of the results (needed for engine development). It is the "working paper" (the plan) with which further typifications can be implemented in subsequent PRs. - https://github.com/searxng/searxng/issues/357 Answer Templates ================ With the new (sub) types for `Answer`, the templates for the answers have also been revised, `Translation` are now displayed with collapsible entries (inspired by #3925). !en-de dog Plugins & Answerer ================== The implementation for `Plugin` and `Answer` has been revised, see documentation: - Plugin: http://0.0.0.0:8000/dev/plugins/index.html - Answerer: http://0.0.0.0:8000/dev/answerers/index.html With `AnswerStorage` and `AnswerStorage` to manage those items (in follow up PRs, `ArticleStorage`, `InfoStorage` and .. will be implemented) Autocomplete ============ The autocompletion had a bug where the results from `Answer` had not been shown in the past. To test activate autocompletion and try search terms for which we have answerers - statistics: type `min 1 2 3` .. in the completion list you should find an entry like `[de] min(1, 2, 3) = 1` - random: type `random uuid` .. in the completion list, the first item is a random UUID Extended Types ============== SearXNG extends e.g. the request and response types of flask and httpx, a module has been set up for type extensions: - Extended Types --> http://0.0.0.0:8000/dev/extended_types.html Unit-Tests ========== The unit tests have been completely revised. In the previous implementation, the runtime (the global variables such as `searx.settings`) was not initialized before each test, so the runtime environment with which a test ran was always determined by the tests that ran before it. This was also the reason why we sometimes had to observe non-deterministic errors in the tests in the past: - https://github.com/searxng/searxng/issues/2988 is one example for the Runtime issues, with non-deterministic behavior .. - https://github.com/searxng/searxng/pull/3650 - https://github.com/searxng/searxng/pull/3654 - https://github.com/searxng/searxng/pull/3642#issuecomment-2226884469 - https://github.com/searxng/searxng/pull/3746#issuecomment-2300965005 Why msgspec.Struct ================== We have already discussed typing based on e.g. `TypeDict` or `dataclass` in the past: - https://github.com/searxng/searxng/pull/1562/files - https://gist.github.com/dalf/972eb05e7a9bee161487132a7de244d2 - https://github.com/searxng/searxng/pull/1412/files - https://github.com/searxng/searxng/pull/1356 In my opinion, TypeDict is unsuitable because the objects are still dictionaries and not instances of classes / the `dataclass` are classes but ... The `msgspec.Struct` combine the advantages of typing, runtime behaviour and also offer the option of (fast) serializing (incl. type check) the objects. Currently not possible but conceivable with `msgspec`: Outsourcing the engines into separate processes, what possibilities this opens up in the future is left to the imagination! Internally, we have already defined that it is desirable to decouple the development of the engines from the development of the SearXNG core / The serialization of the `Result` objects is a prerequisite for this. HINT: The threads listed above were the template for this PR, even though the implementation here is based on msgspec. They should also be an inspiration for the following PRs of typification, as the models and implementations can provide a good direction. Why just one commit? ==================== I tried to create several (thematically separated) commits, but gave up at some point ... there are too many things to tackle at once / The comprehensibility of the commits would not be improved by a thematic separation. On the contrary, we would have to make multiple changes at the same places and the goal of a change would be vaguely recognizable in the fog of the commits. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-01-28[refactor] translation engines: common interfaceBnyro
2025-01-20[feat] engines: add ipernity (images)Bnyro
2025-01-20[fix] engine brave: remove date from the content stringMarkus Heiser
Related: https://github.com/searxng/searxng/issues/4211#issuecomment-2601941440 Closes: https://github.com/searxng/searxng/issues/4006 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-01-20[feat] public domain image archive: automatically obtain algolia api keyBnyro
2025-01-20[feat] engines: public domain image archiveDenperidge
2025-01-20[feat] wikidata: add mastodon, peertube and Lemmy accounts to infoboxPopolon
Co-authored-by: Popolon <popolon@popolon.org> Co-authored-by: Bnyro <bnyro@tutanota.com>
2025-01-16[feat]: engines: add astrophysical data systemDanielMowitz
2025-01-14[json_engine] Fix R0912 (too-many-branches)Lucki
2025-01-14[json_engine] mirror xpath functionalityLucki
2025-01-14[json_engine] document existing optionsLucki
2025-01-06[fix] dockerhub: switch to new api pathBnyro
Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
2025-01-06Fix usage of `api_key` engine settingLucki
The value of `params['api_key']` isn't read anywhere. Writing directly into the header object solves this quite easily though. > [Users can authenticate by including their API key either in a request URL by appending `?apikey=<API KEY>`, or by including the `X-API-Key: <API KEY>` header with the request.](https://wallhaven.cc/help/api)
2024-12-29[fix] update_engine_traits.py: annas archive, bing-* and zlibrary enginesMarkus Heiser
Github action Update data - update_engine_traits [1] had issues in annas archive, bing-* and zlibrary engines: ./manage pyenv.cmd python ./searxng_extra/update/update_engine_traits.py [1] https://github.com/searxng/searxng/actions/runs/12530827768/job/34953392587 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-12-22[fix] engine google_video: google changed the layout of the HTML responseMarkus Heiser
Closes: https://github.com/searxng/searxng/issues/4127 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-11-29[mod] hardening xpath engine: ignore empty resultsMarkus Heiser
A SearXNG maintainer on Matrix reported a traceback:: File "searxng-src/searx/engines/xpath.py", line 272, in response dom = html.fromstring(resp.text) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "searx-pyenv/lib/python3.11/site-packages/lxml/html/__init__.py", line 850, in fromstring doc = document_fromstring(html, parser=parser, base_url=base_url, **kw) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "searx-pyenv/lib/python3.11/site-packages/lxml/html/__init__.py", line 738, in document_fromstring raise etree.ParserError( lxml.etree.ParserError: Document is empty I don't have an example to reproduce the issue, but the issue and this patch are clearly recognizable even without an example. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-11-28[feat] json/xpath engine: config option for method and bodyBnyro
2024-11-28[fix] wikicommons engine: remove HTML tags from result itemsMarkus Heiser
BTW: humanize filesize (Bytes) to KB, MB, GB .. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-11-27[fix] google engine: remove <script> tags from result itemsMarkus Heiser
In some results, Google returns a <script> tag that must be removed before extracting the content. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-11-27[fix] findthatmeme engine URLs have changedAustin-Olacsi
2024-11-26[chore] drop sjp engine: WEB side has changed a long time agoMarkus Heiser
The WEB page (PL only) has changed and there is now also a kind of CAPTCHA. There is currently no possibility to restore the function of this engine. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-11-26[chore] remove invalid base_url from settings.yml enginesMarkus Heiser
The engines do not have / do not need a property `base_url`, lets remove it from the settings.yml Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-11-26[fix] engine: Library of Congress - image & thumb linksMarkus Heiser
The properties `item.service_medium` and `item.thumb_gallery` are not given for every result item. It is more reliable to use the first (thumb) and last (image) URL in the list of of URLs in `image_url`. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-11-25[fix] duckduckgo extra: crashes and returns no resultsBnyro
2024-11-24[chore] *: fix typos detected by typos-cliBnyro
2024-11-24[feat] engine: add adobe stock video and audio enginesMarkus Heiser
The engine has been revised; there is now the option ``adobe_content_types`` with which it is possible to configure engines for video and audio from the adobe stock. BTW this patch adds documentation to the engine. To test all three engines in one use a search term like:: !asi !asv !asa sound Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-11-24[feat] engine: add adobe stock photosBnyro
2024-11-23[clean] Internet Archive Scholar search API no longer existsMarkus Heiser
Engine was added in #2733 but the API does no longer exists. Related: - https://github.com/searxng/searxng/issues/4038 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-11-23[fix] engine Library of Congress: fix API URL loc.gov -> www.loc.govMarkus Heiser
Avoid HTTP 404 and redirects. Requests to the JSON/YAML API use the base url [1] https://www.loc.gov/{endpoint}/?fo=json [1] https://www.loc.gov/apis/json-and-yaml/requests/ Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-11-17[fix] engine: duckduckgo - don't quote query stringMarkus Heiser
The query string send to DDG must not be qouted. The query string was URL-qouted in #4011, but the URL-qouted query string result in unexpected *URL decoded* and other garbish results as reported in #4019 and #4020. To test compare the results of a query like:: !ddg Häuser und Straßen :de !ddg Häuser und Straßen :all !ddg 房屋和街道 :all !ddg 房屋和街道 :zh Closed: - [#4019] https://github.com/searxng/searxng/issues/4019 - [#4020] https://github.com/searxng/searxng/issues/4020 Related: - [#4011] https://github.com/searxng/searxng/pull/4011 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-11-14[fix] engine: duckduckgo - only uses first word of the search termsNicolas Dato
during the revision in PR #3955 the query string was accidentally converted into a list of words, further the query must be quoted before POSTed in the ``data`` field, see ``urllib.parse.quote_plus`` [1] [1] https://docs.python.org/3/library/urllib.parse.html#urllib.parse.quote_plus Closed: #4009 Co-Authored-by: @return42
2024-11-01[fix] annas archive: crash when no thumbnail, differing results, pagingBnyro
2024-10-31[fix] google: display every result when keyword is contained in content fielduply23333
2024-10-29[refactor] engine: duckduckgo - https://html.duckduckgo.com/htmlMarkus Heiser
The entire source code of the duckduckgo engine has been reengineered and purified. 1. DDG used the URL https://html.duckduckgo.com/html for no-JS requests whose response is also easier to parse than the previous https://lite.duckduckgo.com/lite/ URL 2. the bot detection of DDG has so far caused problems and often led to a CAPTCHA, this can be circumvented using `'Sec-Fetch-Mode'] = “navigate”` Closes: https://github.com/searxng/searxng/issues/3927 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-10-19[fix] engine: duckduckgo - CAPTCHA detectionMarkus Heiser
The previous implementation could not distinguish a CAPTCHA response from an ordinary result list. In the previous implementation a CAPTCHA was taken as a result list where no items are in. DDG does not block IPs. Instead, a CAPTCHA wall is placed in front of request on a dubious request. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-10-15[upd] pypi: Bump pylint from 3.2.7 to 3.3.1dependabot[bot]
Bumps [pylint](https://github.com/pylint-dev/pylint) from 3.2.7 to 3.3.1. - [Release notes](https://github.com/pylint-dev/pylint/releases) - [Commits](https://github.com/pylint-dev/pylint/compare/v3.2.7...v3.3.1) --- updated-dependencies: - dependency-name: pylint dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>
2024-10-15[feat] engine: support for openlibraryBnyro
2024-10-15[enh] engine: mojeek - add language support0xhtml
Improve region and language detection / all locale Testing has shown the following behaviour for the different default and empty values of Mojeeks parameters: | param | idx | value | behaviour | | -------- | --- | ------ | ------------------------- | | region | 0 | '' | detect region based on IP | | region | 1 | 'none' | all regions | | language | 0 | '' | all languages |
2024-10-14[mod] engine gitea: compatible with modern gitea or forgejoSnoweuph
Without this patch the Gitea Search Engine is only partially compatible with modern gitea or forgejo: - Fixing some JSON Fields - Using Repository Avatar when Available To Verify My results you can look at the Modern API doc and results, its available on all Gitea and Forgejo instance by Default. Heres an Search API result of Mine: - https://git.euph.dev/api/v1/repos/search?q=ccna
2024-10-03[doc] slightly improve documentation of SQL enginesMarkus Heiser
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-10-03[feat] implement mariadb engineGrant Lanham
2024-10-03add get_embeded_stream_url to searx.utilsAustin-Olacsi
2024-09-29[enh] engine: stract - add language/region support0xhtml
2024-09-26[fix] use get accessor to pull desc from bing_imagesGrant Lanham
2024-09-23add Cloudflare AI Gateway engineZhijie He
add Cloudflare AI Gateway engine add settings for Cloudflare AI Gateway engine set utf8 encode for data, fix non english char cause 500 error format json data fixed indentation and config format error fix line-length limitation in CI reformatted code for CI reformatted code for CI limit system prompts to less 120 chars cleanup unused variable & format code
2024-09-15[fix] Removes ``/>`` ending tags for void HTML elementsGrant Lanham
Removes ``/>`` ending tags for void elements [1] and replaces them with ``>``. Part of the larger cleanup to cleanup invalid HTML throughout the codebase [2]. [1] https://html.spec.whatwg.org/multipage/syntax.html#void-elements [2] https://github.com/searxng/searxng/issues/3793
2024-09-15[fix] engine: qwant - detect captchaUrl and raise SearxEngineCaptchaExceptionMarkus
So far a CAPTCHA was not recognized in the response of the qwant engine and a SearxEngineAPIException was raised by mistake. With this patch a CAPTCHA redirect is recognized and the correct SearxEngineCaptchaException is raised. Closes: https://github.com/searxng/searxng/issues/3806 Signed-off-by: Markus <markus@venom.fritz.box>
2024-09-15[fix] fetch_traits: brave, google, annas_archive & radio_browserMarkus
This patch fixes a bug reported by CI "Fetch traits" [1] (brave) and improves other fetch traits functions (google, annas_archive & radio_browser). brave: File "/home/runner/work/searxng/searxng/searx/engines/brave.py", line 434, in fetch_traits sxng_tag = region_tag(babel.Locale.parse(ui_lang, sep='-')) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/runner/work/searxng/searxng/searx/locales.py", line 155, in region_tag Error: raise ValueError('%s missed a territory') google: change ERROR message about unknow UI language to INFO message radio_browser: country_list contains duplicates that differ only in upper/lower case annas_archive: for better diff; sort the persistence of the traits [1] https://github.com/searxng/searxng/actions/runs/10606312371/job/29433352518#step:6:41 Signed-off-by: Markus <markus@venom.fritz.box>
2024-09-15[feat] gitlab: implement dedicated moduleBnyro
Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>