summaryrefslogtreecommitdiff
path: root/searx/engines/duckduckgo.py
AgeCommit message (Collapse)Author
2025-11-25[mod] replace js_variable_to_python by js_obj_str_to_python (#2792) (#5477)Markus Heiser
This patch is based on PR #2792 (old PR from 2023) - js_obj_str_to_python handle more cases - bring tests from chompjs .. - comment out tests do not pass The tests from chompjs give some overview of what is not implemented. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-09-03[mod] addition of various type hints / tbcMarkus Heiser
- pyright configuration [1]_ - stub files: types-lxml [2]_ - addition of various type hints - enable use of new type system features on older Python versions [3]_ - ``.tool-versions`` - set python to lowest version we support (3.10.18) [4]_: Older versions typically lack some typing features found in newer Python versions. Therefore, for local type checking (before commit), it is necessary to use the older Python interpreter. .. [1] https://docs.basedpyright.com/v1.20.0/configuration/config-files/ .. [2] https://pypi.org/project/types-lxml/ .. [3] https://typing-extensions.readthedocs.io/en/latest/# .. [4] https://mise.jdx.dev/configuration.html#tool-versions Signed-off-by: Markus Heiser <markus.heiser@darmarit.de> Format: reST
2025-07-28[fix] duckduckgo engine: logger.error / missing argument (#5057)Markus Heiser
The error message in case the vqd value could not be determined was incorrect and triggered an exception:: File "/usr/local/searxng/searxng-src/searx/engines/duckduckgo.py", line 132, in get_vqd logger.error("vqd value from duckduckgo.com ", resp.status_code) Message: 'vqd value from duckduckgo.com ' Arguments: (202,)
2025-05-23[fix] ddg engine: IndexError exception is raised on empty contend (#4843)Markus Heiser
Sometimes (e.g. when ddg does not have a result item) there is no content and the engine will fail with an IndexError: * Error: IndexError * Percentage: 10 * Parameters: `()` * File name: `searx/engines/duckduckgo.py:375` * Function: `response` * Code: `item["content"] = extract_text(eval_xpath(div_result, './/a[contains(@class, "result__snippet")]')[0])` Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-05-23[refactor] duckduckgo engine: improve request logic and code structure (#4837)useralias
Changes: - Add trailing slash to base URL to prevent potential redirects - Remove advanced search syntax filtering (no longer guarantees a CAPTCHA) - Correct pagination offset calculation: Page 2 now starts at offset 10, subsequent pages use 10 + (n-2)*15 formula instead of the previous broken 20 + (n-2)*50 calculation that caused CAPTCHAs - Restructure request parameter building to better match a real request - "kt" cookie is no longer an empty string if the language/region is "all" - Group related parameter assignments together - Add header logging to debugging output Related: - https://github.com/searxng/searxng/issues/4824
2025-05-20[fix] duckduckgo engines: issue when get_vqd() is used by ddg-images and ↵Markus Heiser
ddg-videos (#4809) The global variable CACHE is not initialized when DDG images or DDG videos import the get_vqd() function (please remember: the engine modules are imported using the importlib method and not via the `import` keyword). Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-05-03[mod] engines: migration of the individual cache solutions to EngineCacheMarkus Heiser
The EngineCache class replaces all previously individual solutions for caches in the context of the engines. - demo_offline.py - duckduckgo.py - radio_browser.py - soundcloud.py - startpage.py - wolframalpha_api.py - wolframalpha_noapi.py Search term to test most of the modified engines:: !ddg !rb !sc !sp !wa test !ddg !rb !sc !sp !wa foo For introspection of the DB, jump into developer environment and run command to show cache state:: $ ./manage pyenv.cmd bash --norc --noprofile (py3) python -m searx.enginelib cache state cache tables and key/values =========================== [demo_offline ] 2025-04-22 11:32:50 count --> (int) 4 [startpage ] 2025-04-22 12:32:30 SC_CODE --> (str) fSOBnhEMlDfE20 [duckduckgo ] 2025-04-22 12:32:31 4dff493e.... --> (str) 4-128634958369380006627592672385352473325 [duckduckgo ] 2025-04-22 12:40:06 3e2583e2.... --> (str) 4-263126175288871260472289814259666848451 [radio_browser ] 2025-04-23 11:33:08 servers --> (list) ['https://de2.api.radio-browser.info', ...] [soundcloud ] 2025-04-29 11:40:06 guest_client_id --> (str) EjkRJG0BLNEZquRiPZYdNtJdyGtTuHdp [wolframalpha ] 2025-04-22 12:40:06 code --> (str) 5aa79f86205ad26188e0e26e28fb7ae7 number of tables: 6 number of key/value pairs: 7 In the "cache tables and key/values" section, the table name (engine name) is at first position on the second there is the calculated expire date and on the third and fourth position the key/value is shown. About duckduckgo: The *vqd coode* of ddg depends on the query term and therefore the key is a hash value of the query term (to not to store the raw query term). In the "properties of ENGINES_CACHE" section all properties of the SQLiteAppl / ExpireCache and their last modification date are shown:: properties of ENGINES_CACHE =========================== [last modified: 2025-04-22 11:32:27] DB_SCHEMA : 1 [last modified: 2025-04-22 11:32:27] LAST_MAINTENANCE : [last modified: 2025-04-22 11:32:27] crypt_hash : ca612e3566fdfd7cf7efe2b1c9349f461158d07cb78a3750e5c5be686aa8ebdc [last modified: 2025-04-22 11:32:30] CACHE-TABLE--demo_offline: demo_offline [last modified: 2025-04-22 11:32:30] CACHE-TABLE--startpage: startpage [last modified: 2025-04-22 11:32:31] CACHE-TABLE--duckduckgo: duckduckgo [last modified: 2025-04-22 11:33:08] CACHE-TABLE--radio_browser: radio_browser [last modified: 2025-04-22 11:40:06] CACHE-TABLE--soundcloud: soundcloud [last modified: 2025-04-22 11:40:06] CACHE-TABLE--wolframalpha: wolframalpha These properties provide information about the state of the ExpireCache and control the behavior. For example, the maintenance intervals are controlled by the last modification date of the LAST_MAINTENANCE property and the hash value of the password can be used to detect whether the password has been changed (in this case the DB entries can no longer be decrypted and the entire cache must be discarded). Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-03-21[fix] duckduckgo: answer sometimes contains faulty (duplicated) urlBnyro
2025-03-18[fix] duckduckgo: show proper source url of answersBnyro
2025-01-28[refactor] typification of SearXNG / EngineResultsMarkus Heiser
In [1] and [2] we discussed the need of a Result.results property and how we can avoid unclear code. This patch implements a class for the reslut-lists of engines:: searx.result_types.EngineResults A simple example for the usage in engine development:: from searx.result_types import EngineResults ... def response(resp) -> EngineResults: res = EngineResults() ... res.add( res.types.Answer(answer="lorem ipsum ..", url="https://example.org") ) ... return res [1] https://github.com/searxng/searxng/pull/4183#pullrequestreview-257400034 [2] https://github.com/searxng/searxng/pull/4183#issuecomment-2614301580 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-01-28[refactor] typification of SearXNG (initial) / result items (part 1)Markus Heiser
Typification of SearXNG ======================= This patch introduces the typing of the results. The why and how is described in the documentation, please generate the documentation .. $ make docs.clean docs.live and read the following articles in the "Developer documentation": - result types --> http://0.0.0.0:8000/dev/result_types/index.html The result types are available from the `searx.result_types` module. The following have been implemented so far: - base result type: `searx.result_type.Result` --> http://0.0.0.0:8000/dev/result_types/base_result.html - answer results --> http://0.0.0.0:8000/dev/result_types/answer.html including the type for translations (inspired by #3925). For all other types (which still need to be set up in subsequent PRs), template documentation has been created for the transition period. Doc of the fields used in Templates =================================== The template documentation is the basis for the typing and is the first complete documentation of the results (needed for engine development). It is the "working paper" (the plan) with which further typifications can be implemented in subsequent PRs. - https://github.com/searxng/searxng/issues/357 Answer Templates ================ With the new (sub) types for `Answer`, the templates for the answers have also been revised, `Translation` are now displayed with collapsible entries (inspired by #3925). !en-de dog Plugins & Answerer ================== The implementation for `Plugin` and `Answer` has been revised, see documentation: - Plugin: http://0.0.0.0:8000/dev/plugins/index.html - Answerer: http://0.0.0.0:8000/dev/answerers/index.html With `AnswerStorage` and `AnswerStorage` to manage those items (in follow up PRs, `ArticleStorage`, `InfoStorage` and .. will be implemented) Autocomplete ============ The autocompletion had a bug where the results from `Answer` had not been shown in the past. To test activate autocompletion and try search terms for which we have answerers - statistics: type `min 1 2 3` .. in the completion list you should find an entry like `[de] min(1, 2, 3) = 1` - random: type `random uuid` .. in the completion list, the first item is a random UUID Extended Types ============== SearXNG extends e.g. the request and response types of flask and httpx, a module has been set up for type extensions: - Extended Types --> http://0.0.0.0:8000/dev/extended_types.html Unit-Tests ========== The unit tests have been completely revised. In the previous implementation, the runtime (the global variables such as `searx.settings`) was not initialized before each test, so the runtime environment with which a test ran was always determined by the tests that ran before it. This was also the reason why we sometimes had to observe non-deterministic errors in the tests in the past: - https://github.com/searxng/searxng/issues/2988 is one example for the Runtime issues, with non-deterministic behavior .. - https://github.com/searxng/searxng/pull/3650 - https://github.com/searxng/searxng/pull/3654 - https://github.com/searxng/searxng/pull/3642#issuecomment-2226884469 - https://github.com/searxng/searxng/pull/3746#issuecomment-2300965005 Why msgspec.Struct ================== We have already discussed typing based on e.g. `TypeDict` or `dataclass` in the past: - https://github.com/searxng/searxng/pull/1562/files - https://gist.github.com/dalf/972eb05e7a9bee161487132a7de244d2 - https://github.com/searxng/searxng/pull/1412/files - https://github.com/searxng/searxng/pull/1356 In my opinion, TypeDict is unsuitable because the objects are still dictionaries and not instances of classes / the `dataclass` are classes but ... The `msgspec.Struct` combine the advantages of typing, runtime behaviour and also offer the option of (fast) serializing (incl. type check) the objects. Currently not possible but conceivable with `msgspec`: Outsourcing the engines into separate processes, what possibilities this opens up in the future is left to the imagination! Internally, we have already defined that it is desirable to decouple the development of the engines from the development of the SearXNG core / The serialization of the `Result` objects is a prerequisite for this. HINT: The threads listed above were the template for this PR, even though the implementation here is based on msgspec. They should also be an inspiration for the following PRs of typification, as the models and implementations can provide a good direction. Why just one commit? ==================== I tried to create several (thematically separated) commits, but gave up at some point ... there are too many things to tackle at once / The comprehensibility of the commits would not be improved by a thematic separation. On the contrary, we would have to make multiple changes at the same places and the goal of a change would be vaguely recognizable in the fog of the commits. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-11-25[fix] duckduckgo extra: crashes and returns no resultsBnyro
2024-11-24[chore] *: fix typos detected by typos-cliBnyro
2024-11-17[fix] engine: duckduckgo - don't quote query stringMarkus Heiser
The query string send to DDG must not be qouted. The query string was URL-qouted in #4011, but the URL-qouted query string result in unexpected *URL decoded* and other garbish results as reported in #4019 and #4020. To test compare the results of a query like:: !ddg Häuser und Straßen :de !ddg Häuser und Straßen :all !ddg 房屋和街道 :all !ddg 房屋和街道 :zh Closed: - [#4019] https://github.com/searxng/searxng/issues/4019 - [#4020] https://github.com/searxng/searxng/issues/4020 Related: - [#4011] https://github.com/searxng/searxng/pull/4011 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-11-14[fix] engine: duckduckgo - only uses first word of the search termsNicolas Dato
during the revision in PR #3955 the query string was accidentally converted into a list of words, further the query must be quoted before POSTed in the ``data`` field, see ``urllib.parse.quote_plus`` [1] [1] https://docs.python.org/3/library/urllib.parse.html#urllib.parse.quote_plus Closed: #4009 Co-Authored-by: @return42
2024-10-29[refactor] engine: duckduckgo - https://html.duckduckgo.com/htmlMarkus Heiser
The entire source code of the duckduckgo engine has been reengineered and purified. 1. DDG used the URL https://html.duckduckgo.com/html for no-JS requests whose response is also easier to parse than the previous https://lite.duckduckgo.com/lite/ URL 2. the bot detection of DDG has so far caused problems and often led to a CAPTCHA, this can be circumvented using `'Sec-Fetch-Mode'] = “navigate”` Closes: https://github.com/searxng/searxng/issues/3927 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-10-19[fix] engine: duckduckgo - CAPTCHA detectionMarkus Heiser
The previous implementation could not distinguish a CAPTCHA response from an ordinary result list. In the previous implementation a CAPTCHA was taken as a result list where no items are in. DDG does not block IPs. Instead, a CAPTCHA wall is placed in front of request on a dubious request. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-08-30[fix] Do not show DDG user-agent from zero clickAlexander Sulfrian
We do not want to show the user-agent information from the duckduckgo zero click info. This is the user-agent used by searxng and not the user-agent used by the user. This was already done for the IP address in: 0fb3f0e4aeecf62612cb6568910cf0f97c98cab9
2024-06-15[refactor] duckduckgo: use extr helper function in get_vqdBnyro
2024-05-29[enh] add re-usable func to filter textAllen
2024-05-29[fix] do not show DDG IP from zero clickJeff Alyanak
The zero click result from DuckDuckGo for IP should not be displayed. It will return the IP of the searxng server, not the user's IP, and looks a bit strange when the `self_info` plugin is enabled as two different IPs get returned.
2024-05-24[enh] add instant answers from ddgallendema_searxng_pi
2024-04-08[fix] ddg engine: if no vqd value can be determined, don't save NoneMarkus Heiser
Closes: https://github.com/searxng/searxng/issues/3370 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-03-11[mod] pylint all engines without PYLINT_SEARXNG_DISABLE_OPTIONMarkus Heiser
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-03-10[fix] duckduckgo.fetch_traist - URL of region definitions has changedMarkus Heiser
- https://duckduckgo.com/dist/util/u.7669f071a13a7daa57cb.js updated from u661.js to u.7669f071a13a7daa57cb / should be updated automatically? The last change was on March 23rd in dba8977b098 [1] - [1] https://github.com/searxng/searxng/pull/2269 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-03-05[fix] ddg engines (get_vqd) - the vqd value is no longer in the formMarkus Heiser
Closes: https://github.com/searxng/searxng/issues/3276 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-10-12[fix] ddg-lite & ddg-extra: don't send empty vqd valueMarkus Heiser
DDG's bot detection is sensitive to the vqd value. For some search terms (such as extremely long search terms that are often sent by bots), no vqd value can be determined. If SearXNG cannot determine a vqd value, then no request should go out to DDG (WEB): a request with a wrong vqd value leads to DDG temporarily putting SearXNG's IP on a block list. Requests from IPs in this block list run into timeouts. Not sure, but it seems the block list is a sliding window: to get my IP rid from the bot list I had to cool down my IP for 1h (send no requests from that IP to DDG). Since such issues can't reproduce in a local instance I tested this patch 24h on my public SearXNG instance: There are still errors (rare), but the reliability is still 100%. Related: - https://github.com/searxng/searxng/pull/2922 - https://github.com/searxng/searxng/pull/2923 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-10-10[fix] ddg-lite vqd value: some search terms do not have a vqd valueMarkus Heiser
Some search terms do not have results and therefore no vqd value BTW: remove a leftover from 9197efa Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-10-10[fix] duckduckgo lite engine: set HTTP header 'Referer'Markus Heiser
We have had problems with this before, the bot protection from ddg-lite seems to have included this referer in the rating [1][2]. From reverse engineering: - The Referer ``https://google.com/`` was set in commt 257dc7d6c4 --> DDG lite does not like this referer anymore! - The 'Referer' header is only set on second and follow up pages but not on the first page - The vqd value is not needed on the first page, the ddg-lite client sets this value only on follow up pages / this can help to reduce the vqd requests from SearXNG. Related to 'Referer' header & ddg requests: [1] https://github.com/searxng/searxng/pull/2161 [2] https://github.com/searxng/searxng/pull/2081 Closes: https://github.com/searxng/searxng/issues/2796 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-10-09[feat] duckduckgo: support for videos and newsBnyro
2023-09-22Revert "[fix] engine - duckduckgo vqd edge-case"Markus Heiser
This reverts commit 102502a4f09e78682cd4f030605be394bc33282c.
2023-09-20[fix] engine - duckduckgo vqd edge-casejazzzooo
2023-09-18[fix] spellingjazzzooo
2023-09-05[fix] engine - duckduckgo_images / determination of vqd value incorrectMarkus Heiser
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-25[fix] engine & network issues / documentation and type annotationsMarkus Heiser
This patch fixes some quirks and issues related to the engines and the network. Each engine has its own network and this network was broken for the following engines[1]: - archlinux - bing - dailymotion - duckduckgo - google - peertube - startpage - wikipedia Since the files have been touched anyway, the type annotaions of the engine modules has also been completed so that error messages from the type checker are no longer reported. Related and (partial) fixed issue: - [1] https://github.com/searxng/searxng/issues/762#issuecomment-1605323861 - [2] https://github.com/searxng/searxng/issues/2513 - [3] https://github.com/searxng/searxng/issues/2515 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-05-12[fix] engine ddg: minor change in the API of ddgMarkus Heiser
Closes: https://github.com/searxng/searxng/issues/2419 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-04-03[fix] engine ddg: quote !bangs in a request send to ddgMarkus Heiser
Closes: https://github.com/searxng/searxng/issues/392 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24[mod] DuckDuckGo: reversed engineered & upgrade to data_type: traits_v1Markus Heiser
Partial reverse engineering of the DuckDuckGo (DDG) engines including a improved language and region handling based on the enigne.traits_v1 data. - DDG Lite - DDG Instant Answer API - DDG Images - DDG Weather docs/src/searx.engine.duckduckgo.rst: Online documentation of the DDG engines (make docs.live) searx/data/engine_traits.json Add data type "traits_v1" generated by the fetch_traits() functions from: - "duckduckgo" (WEB), - "duckduckgo images" and - "duckduckgo weather" and remove data from obsolete data type "supported_languages". searx/autocomplete.py: Reversed engineered Autocomplete from DDG. Supports DDG's languages. searx/engines/duckduckgo.py: - fetch_traits(): Fetch languages & regions from DDG. - get_ddg_lang(): Get DDG's language identifier from SearXNG's locale. DDG defines its languages by region codes. DDG-Lite does not offer a language selection to the user, only a region can be selected by the user. - Cache ``vqd`` value: The vqd value depends on the query string and is needed for the follow up pages or the images loaded by a XMLHttpRequest (DDG images). The ``vqd`` value of a search term is stored for 10min in the redis DB. - DDG Lite engine: reversed engineered request method with improved Language and region support and better ``vqd`` handling. searx/engines/duckduckgo_definitions.py: DDG Instant Answer API The *instant answers* API does not support languages, or at least we could not find out how language support should work. It seems that most of the features are based on English terms. searx/engines/duckduckgo_images.py: DDG Images Reversed engineered request method. Improved language and region handling based on cookies and the enigne.traits_v1 data. Response: add image format to the result list searx/engines/duckduckgo_weather.py: DDG Weather Improved language and region handling based on cookies and the enigne.traits_v1 data. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24[mod] DuckDuckGo: fetch engine traits (data_type: supported_languages)Markus Heiser
Implements a fetch_traits function for the DuckDuckGo engines. .. note:: Does not include migration of the request methode from 'supported_languages' to 'traits' (EngineTraits) object! Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-02-03[fix-2146] set different HTTP Referer header to DuckDuckGo requestsMarkus Heiser
For what ever reasons, ddg-lite don't like the Referer https://lite.duckduckgo.com/ In an interactive session in the WEB browser the the Reverer has exactly this value, but ddg-lite don't like this value when the request is build up by SearXNG. The new value is: https://google.com/ What fakes a user comes from a google link. Related: https://github.com/searxng/searxng/pull/2081 Closes: https://github.com/searxng/searxng/issues/2146 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-01-06Add HTTP Referer header to DuckDuckGo requestsRudis Muiznieks
closes #2080
2022-12-22Fix: add trailing slash to duckduckgo urlRudis Muiznieks
Close #1854
2022-08-01[mod] add 'Accept-Language' HTTP header to online processoresMarkus Heiser
Most engines that support languages (and regions) use the Accept-Language from the WEB browser to build a response that fits to the language (and region). - add new engine option: send_accept_language_header Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-05[enh] add more categoriesMartin Fischer
2021-12-27[format.python] initial formatting of the python codeMarkus Heiser
This patch was generated by black [1]:: make format.python [1] https://github.com/psf/black Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-10-01[mod] engine duckduckgo - update supported_languages_urlMarkus Heiser
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-10-01[mod] engine duckduckgo - use DuckDuckGo-LiteMarkus Heiser
Implement a scrapper for DuckDuckGo-Lite [1]. The existing DuckDuckGo [2] engine does not support paging. DuckDuckgo-Lite is much faster, less verbose and does have a paging option (reversed engineered from the input form of [1]). [1] https://lite.duckduckgo.com/lite [2] https://duckduckgo.com/ Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-04-12[httpx] replace searx.poolrequests by searx.networkAlexandre Flament
settings.yml: * outgoing.networks: * can contains network definition * propertiers: enable_http, verify, http2, max_connections, max_keepalive_connections, keepalive_expiry, local_addresses, support_ipv4, support_ipv6, proxies, max_redirects, retries * retries: 0 by default, number of times searx retries to send the HTTP request (using different IP & proxy each time) * local_addresses can be "192.168.0.1/24" (it supports IPv6) * support_ipv4 & support_ipv6: both True by default see https://github.com/searx/searx/pull/1034 * each engine can define a "network" section: * either a full network description * either reference an existing network * all HTTP requests of engine use the same HTTP configuration (it was not the case before, see proxy configuration in master)
2021-03-25[enh] add year filter to duckduckgoAdam Tauber
2021-02-12[fix] duckduckgo engine: "!ddg !g" do not redirect to googleAlexandre Flament
* searx understand "!ddg !g time" as : send "!g time" to DDG * !g a DDG bang for Google: DDG return a HTTP redirect to Google This commit adds a the allows_redirect param not to follow HTTP redirect. The DDG engine returns a empty result as before without HTTP redirect.