summaryrefslogtreecommitdiff
path: root/searx/engines/startpage.py
AgeCommit message (Collapse)Author
2025-10-26[fix] startpage engine: properly display CAPTCHA if redirect page is seen ↵Aadniz
(#5380) Fixes an issue where startpage engine would display parsing error (`json.decoder.JSONDecodeError`) when returning CAPTCHA redirect page. The fix simply checks if response header has `Location` set, and if it starts with `https://www.startpage.com/sp/captcha`, it will raise a CAPTCHA exception before trying to parse the data.
2025-10-09[fix] startpage engine - SafeSearch works in reverse (#5290)Markus Heiser
The Name of the option is *disable_family_filter* -> we have to reverse the meaning of the ascending safe-search filter level. Closes: https://github.com/searxng/searxng/issues/5287 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-09-03[mod] addition of various type hints / tbcMarkus Heiser
- pyright configuration [1]_ - stub files: types-lxml [2]_ - addition of various type hints - enable use of new type system features on older Python versions [3]_ - ``.tool-versions`` - set python to lowest version we support (3.10.18) [4]_: Older versions typically lack some typing features found in newer Python versions. Therefore, for local type checking (before commit), it is necessary to use the older Python interpreter. .. [1] https://docs.basedpyright.com/v1.20.0/configuration/config-files/ .. [2] https://pypi.org/project/types-lxml/ .. [3] https://typing-extensions.readthedocs.io/en/latest/# .. [4] https://mise.jdx.dev/configuration.html#tool-versions Signed-off-by: Markus Heiser <markus.heiser@darmarit.de> Format: reST
2025-06-03[fix] startpage engine: resolve instant CAPTCHA issues (#4890)useralias
Changes: - Improve log messages for better debugging of future CAPTCHA issues - Fixed erroneous get_sc_url variable where sc was always blank (when no cached value) - Move Origin and Referer headers to request() function - Add missing form parameters (abp, abd, abe) required by Startpage to avoid being flagged as automated requests - Include segment parameter for paginated requests - Clean up unnecessary commented-out headers - Fix minor typos e.g. "time-stamp" → "timestamp", "scrap" → "scrapes" Related: - https://github.com/searxng/searxng/issues/4673
2025-05-03[mod] engines: migration of the individual cache solutions to EngineCacheMarkus Heiser
The EngineCache class replaces all previously individual solutions for caches in the context of the engines. - demo_offline.py - duckduckgo.py - radio_browser.py - soundcloud.py - startpage.py - wolframalpha_api.py - wolframalpha_noapi.py Search term to test most of the modified engines:: !ddg !rb !sc !sp !wa test !ddg !rb !sc !sp !wa foo For introspection of the DB, jump into developer environment and run command to show cache state:: $ ./manage pyenv.cmd bash --norc --noprofile (py3) python -m searx.enginelib cache state cache tables and key/values =========================== [demo_offline ] 2025-04-22 11:32:50 count --> (int) 4 [startpage ] 2025-04-22 12:32:30 SC_CODE --> (str) fSOBnhEMlDfE20 [duckduckgo ] 2025-04-22 12:32:31 4dff493e.... --> (str) 4-128634958369380006627592672385352473325 [duckduckgo ] 2025-04-22 12:40:06 3e2583e2.... --> (str) 4-263126175288871260472289814259666848451 [radio_browser ] 2025-04-23 11:33:08 servers --> (list) ['https://de2.api.radio-browser.info', ...] [soundcloud ] 2025-04-29 11:40:06 guest_client_id --> (str) EjkRJG0BLNEZquRiPZYdNtJdyGtTuHdp [wolframalpha ] 2025-04-22 12:40:06 code --> (str) 5aa79f86205ad26188e0e26e28fb7ae7 number of tables: 6 number of key/value pairs: 7 In the "cache tables and key/values" section, the table name (engine name) is at first position on the second there is the calculated expire date and on the third and fourth position the key/value is shown. About duckduckgo: The *vqd coode* of ddg depends on the query term and therefore the key is a hash value of the query term (to not to store the raw query term). In the "properties of ENGINES_CACHE" section all properties of the SQLiteAppl / ExpireCache and their last modification date are shown:: properties of ENGINES_CACHE =========================== [last modified: 2025-04-22 11:32:27] DB_SCHEMA : 1 [last modified: 2025-04-22 11:32:27] LAST_MAINTENANCE : [last modified: 2025-04-22 11:32:27] crypt_hash : ca612e3566fdfd7cf7efe2b1c9349f461158d07cb78a3750e5c5be686aa8ebdc [last modified: 2025-04-22 11:32:30] CACHE-TABLE--demo_offline: demo_offline [last modified: 2025-04-22 11:32:30] CACHE-TABLE--startpage: startpage [last modified: 2025-04-22 11:32:31] CACHE-TABLE--duckduckgo: duckduckgo [last modified: 2025-04-22 11:33:08] CACHE-TABLE--radio_browser: radio_browser [last modified: 2025-04-22 11:40:06] CACHE-TABLE--soundcloud: soundcloud [last modified: 2025-04-22 11:40:06] CACHE-TABLE--wolframalpha: wolframalpha These properties provide information about the state of the ExpireCache and control the behavior. For example, the maintenance intervals are controlled by the last modification date of the LAST_MAINTENANCE property and the hash value of the password can be used to detect whether the password has been changed (in this case the DB entries can no longer be decrypted and the entire cache must be discarded). Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-02-20[feat] startpage: support for news and imagesBnyro
2024-11-24[chore] *: fix typos detected by typos-cliBnyro
2024-05-29[fix] engine startpage: fetch_traits() / if lang name unknown by babelMarkus Heiser
Workflow "Update data - update_engine_traits.py" fails last night [1]. This issue has already been reported by @allendema [2]. [1] https://github.com/searxng/searxng/actions/runs/9278028691/job/25528337485#step:6:168 [2] https://github.com/searxng/searxng/pull/3504/files#r1613559565 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-05-04[fix] startpage engine: XPath expressions adapted for new HTML layoutMarkus Heiser
Startpage has changed its HTML layout, classes like ``w-gl__result__main`` do no longer exists and the result items have been slightly changed in their structure. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-03-11[mod] pylint all engines without PYLINT_SEARXNG_DISABLE_OPTIONMarkus Heiser
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-12-03[mod] add option max_page to bing, brave, qwant, startpage & mojeekMarkus Heiser
[1] https://github.com/searxng/searxng/issues/2982#issuecomment-1808975780 Reported-by: @Damaj301damaj-lol [1] Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-06-25[fix] engine & network issues / documentation and type annotationsMarkus Heiser
This patch fixes some quirks and issues related to the engines and the network. Each engine has its own network and this network was broken for the following engines[1]: - archlinux - bing - dailymotion - duckduckgo - google - peertube - startpage - wikipedia Since the files have been touched anyway, the type annotaions of the engine modules has also been completed so that error messages from the type checker are no longer reported. Related and (partial) fixed issue: - [1] https://github.com/searxng/searxng/issues/762#issuecomment-1605323861 - [2] https://github.com/searxng/searxng/issues/2513 - [3] https://github.com/searxng/searxng/issues/2515 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24[mod] Startpage: reversed engineered & upgrade to data_type: traits_v1Markus Heiser
One reason for the often seen CAPTCHA of the Startpage requests are the incomplete requests SearXNG sends to startpage.com: this patch is a complete new implementation of the ``request()`` function, reversed engineered from the Startpage's search form. The new implementation: - use traits of data_type: traits_v1 and drop deprecated data_type: supported_languages - adds time-range support - adds save-search support - fix searxng/searxng/issues 1884 - fix searxng/searxng/issues 1081 --> improvements to avoid CAPTCHA In preparation for more categories (News, Images, Videos ..) from Startpage, the variable ``startpage_categ`` was set up. The default value is ``web`` and other categories from Startpage are not yet implemented. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-03-24[mod] Startpage: fetch engine traits (data_type: supported_languages)Markus Heiser
Implements a fetch_traits function for the Startpage engine. .. note:: Does not include migration of the request methode from 'supported_languages' to 'traits' (EngineTraits) object! Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-01-28search.suspended_time settings: bug fixesAlexandre Flament
* fix type in settings.yml: replace suspend_times by suspended_times * always use delay defined in settings.yml: * HTTP status 402 and 403: read the value from settings.yml instead of using the hardcoded value of 1 day. * startpage engine: CAPTCHA suspend the engine for one day instead of one week
2022-10-14[fix] startpage engineAlexandre FLAMENT
2022-09-27[fix] typos / reported by @kianmeng in searx PR-3366Markus Heiser
[PR-3366] https://github.com/searx/searx/pull/3366 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-03-19fix startpage: update XPath in _fetch_supported_languagesAlexandre Flament
2022-01-15[fix] startpage: workaround to use the startpage networkAlexandre Flament
workaround for the issue #762
2022-01-10[mod] starpage engine: add comment about Startpage's FFox add-onMarkus Heiser
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-10[fix] startpage engine: fetch CAPTCHA & issues related to PR-695Markus Heiser
In case of CAPTCHA raise a SearxEngineCaptchaException and suspend for 7 days. When get_sc_code() fails raise a SearxEngineResponseException and suspend for 7 days. [1] https://github.com/searxng/searxng/pull/695 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-10[fix] Get an actual `sc` argument from startpage's home page.Markus Heiser
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-10[pylint] Startpage engineMarkus Heiser
Fix remarks from pylint Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-10[fix] startpage engine - avoid captchaMarkus Heiser
Startpage has introduced new anti-scraping measures that make SearXNG instances run into captchas: 1. some arguments has been removed and a new `sc` has been added. 2. search path changed from `do/search` to `sp/search` 3. POST request is no longer needed Closes: https://github.com/searxng/searxng/issues/692 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-01-05[enh] add more categoriesMartin Fischer
2021-12-27[format.python] initial formatting of the python codeMarkus Heiser
This patch was generated by black [1]:: make format.python [1] https://github.com/psf/black Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-02-01[mod] dynamically set language_support variableAlexandre Flament
The language_support variable is set to True by default, and set to False in only 5 engines. Except the documentation and the /config URL, this variable is not used. This commit remove the variable definition in the engines, and set value according to supported_languages length: False when the length is 0, True otherwise. Close #2485
2021-01-14[enh] engines: add about variableAlexandre Flament
move meta information from comment to the about variable so the preferences, the documentation can show these information
2020-12-16Fix the StartPage result title is showing the urllucky13820
Fix the issue 2395 where StartPage result title is showing the url. https://github.com/searx/searx/issues/2395
2020-12-13[Fix] Startpagejoshu9h
2020-11-14[mod] remove unused importAlexandre Flament
use from searx.engines.duckduckgo import _fetch_supported_languages, supported_languages_url # NOQA so it is possible to easily remove all unused import using autoflake: autoflake --in-place --recursive --remove-all-unused-imports searx tests
2020-10-02[mod] move extract_text, extract_url to searx.utilsAlexandre Flament
2020-09-22fetch supported languages for startpage engineMarc Abonce Seguin
2020-03-09[Fix] Startpage ValueError on Spanish date formatSpühler Stefan
datetime.parser.parse() does not know the Spanish date format which leads to a ValueError. Fixes #1870 Traceback (most recent call last): File "/usr/local/searx/searx/search.py", line 160, in search_one_http_request_safe search_results = search_one_http_request(engine, query, request_params) File "/usr/local/searx/searx/search.py", line 97, in search_one_http_request return engine.response(response) File "/usr/local/searx/searx/engines/startpage.py", line 102, in response published_date = parser.parse(date_string, dayfirst=True) File "/usr/local/searx/searx-ve/lib/python3.6/site-packages/dateutil/parser/_parser.py", line 1358, in parse return DEFAULTPARSER.parse(timestr, **kwargs) File "/usr/local/searx/searx-ve/lib/python3.6/site-packages/dateutil/parser/_parser.py", line 649, in parse raise ValueError("Unknown string format:", timestr) ValueError: ('Unknown string format:', '24 Ene 2013')
2019-11-15[mod] speed optimizationDalf
compile XPath only once avoid redundant call to urlparse get_locale(webapp.py): avoid useless call to request.accept_languages.best_match
2019-10-14[fix] pep8Adam Tauber
2019-10-14[fix] update startpage engine - closes #1601Adam Tauber
2019-01-07Revert "remove 'all' option from search languages"Noémi Ványi
This reverts commit 4d1770398a6af8902e75c0bd885781584d39e796.
2019-01-04Merge branch 'master' into masterNoémi Ványi
2018-12-14restore startpage search resultsMichael Pfitzner
2018-12-11update startpage.pydimqua
2017-12-06remove 'all' option from search languagesmarc
2017-05-15[enh] py3 compatibilityAdam Tauber
2016-12-13[mod] fetch supported languages for several enginesmarc
utils/fetch_languages.py gets languages supported by each engine and generates engines_languages.json with each engine's supported language.
2016-12-13Add language support for more engines.marc
2016-12-13[enh] add supported_languages on engines and auto-generate languages.pymarc
2016-12-09[mod] do not escape html content in enginesAdam Tauber
2016-07-11Fix anomalous backslash in stringstepshal
2016-01-18[fix] pep8 compatibiltyAdam Tauber
2015-10-24[enh] fix content fetching, parse published date from descriptionThomas Pointhuber