| Age | Commit message (Collapse) | Author |
|
Closes: https://github.com/searxng/searxng/issues/2477
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
For correct determination of the IP to the request the function
botdetection.get_real_ip() is implemented. This fonction is used in the
ip_limit and link_token method of the botdetection and it is used in the
self_info plugin.
A documentation about the X-Forwarded-For header has been added.
[1] https://github.com/searxng/searxng/pull/2357#issuecomment-1566211059
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
- counting requests in LONG_WINDOW and BURST_WINDOW is not needed when the
request is validated by the link_token method [1]
- renew a ping-key on validation [2], this is needed for infinite scrolling,
where no new token (CSS) is loaded. / this does not fix the BURST_MAX issue in
the vanilla limiter
- normalize the counter names of the ip_limit method to 'ip_limit.*'
- just integrate the ip_limit method straight forward in the limiter plugin /
non intermediate code --> ip_limit now returns None or a werkzeug.Response
object that can be passed by the plugin to the flask application / non
intermediate code that returns a tuple
[1] https://github.com/searxng/searxng/pull/2357#issuecomment-1566113277
[2] https://github.com/searxng/searxng/pull/2357#discussion_r1208542206
[3] https://github.com/searxng/searxng/pull/2357#issuecomment-1566125979
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
In order to be able to meet the outstanding requirements, the implementation is
modularized and supplemented with documentation.
This patch does not contain functional change, except it fixes issue #2455
----
Aktivate limiter in the settings.yml and simulate a bot request by::
curl -H 'Accept-Language: de-DE,en-US;q=0.7,en;q=0.3' \
-H 'Accept: text/html'
-H 'User-Agent: xyz' \
-H 'Accept-Encoding: gzip' \
'http://127.0.0.1:8888/search?q=foo'
In the LOG:
DEBUG searx.botdetection.link_token : missing ping for this request: .....
Since ``BURST_MAX_SUSPICIOUS = 2`` you can repeat the query above two time
before you get a "Too Many Requests" response.
Closes: https://github.com/searxng/searxng/issues/2455
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
By adding a random component in the limiter URL a bot can no longer send a ping
by request a static URL.
Related: https://github.com/searxng/searxng/pull/2357#issuecomment-1518525094
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
Block requests from PetalBlock. Normally robots.txt is enough to stop
PetalBlock from making requests [1]. However, if SearXNG is offered below a
path (example.org/search), then the robots.txt is not available in the root
paths of the domain / subdomain.
[1] https://webmaster.petalsearch.com/site/petalbot
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
Since [bb3a01f8] has been merged to the Farside project, Farside instances do no
longer need to send requests to SearXNG instances [1].
There are some old unmaintained Farside instances on the web that continue to
query SearXNG instances --> we can safely block their requests.
[1] https://github.com/benbusby/farside/issues/95
[bb3a01f8] https://github.com/benbusby/farside/commit/bb3a01f8
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
Related: https://github.com/searxng/searxng/issues/2310#issuecomment-1494417531
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
- requests without HTTP header 'Connection' or missing 'User-Agent' will be
blocked by the limiter
- re_bot is related to 'User-Agent' and has been renamed to block_user_agent
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
In debug mode more detailed logging is needed to evaluate if an access should
have been blocked by the limiter.
BTW: remove duplicate code checking bot signature ``re_bot.match(user_agent)``
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
When the user choose "Auto-detected", the choice remains on the following queries.
The detected language is displayed.
For example "Auto-detected (en)":
* the next query language is going to be auto detected
* for the current query, the detected language is English.
This replace the autodetect_search_language plugin.
|
|
Related: https://github.com/searxng/searxng/pull/2189
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
|
|
|
|
|
|
[fix] follow up of PR-1856
|
|
- Add documentation to the plugin
- Harmonize FastText language model with SearXNG's language model
Reosurces::
import fasttext # --> +10 MB
fasttext.load_model(str(data_dir / 'lid.176.ftz')) # --> +4MB
Suggested-by: @dalf
- To speed up and simplify the deployment use fasttext-wheel instead of fasttext
- Building numpy on the Alpine Linux of docker-images takes ages --> install
py3-numpy from Alpines package manager (apk)
- Alpine Linux on docker-images (musl libc) do not support fasttext-wheel (gnu
libc) --> patch Dockerfile and build from fastetxt:
sed -i s/fasttext-wheel/fasttext/ requirements.txt
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
|
|
|
|
Remove the abstraction in searx.shared.SharedDict.
Implement a basic and dedicated scheduler for the checker using a Redis script.
|
|
[PR-3366] https://github.com/searx/searx/pull/3366
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
Currentty, when oa_doi_rewrite find a DOI in the result URL, it replace the URL.
In this commit, the plugin adds the key "doi" to the result,
so the paper.html can show it.
|
|
Only raise "suspicious Accept-Encoding" when both "gzip" and "deflate" are missing from Accept-Encoding.
Prevent Browsers which only implement one compression solution from being blocked by the limiter plugin.
Example Browser which is currently blocked: Lynx Browser (https://lynx.invisible-island.net)
|
|
|
|
|
|
Add a redis library to generalize DB functions we need in SearXNG.
|
|
Remove issue reported by Pylint 2.14.0:
- no-self-use: has been moved to optional extension [1]
- The refactoring checker now also raises 'consider-using-generator' messages
for max(), min() and sum(). [2]
.pylintrc:
- <option name>-hint has been removed since long, Pylint 2.14.0 raises an
error on invalid options
- bad-continuation and bad-whitespace have been removed [3]
[1] https://pylint.pycqa.org/en/latest/whatsnew/2/2.14/summary.html#removed-checkers
[2] https://pylint.pycqa.org/en/latest/whatsnew/2/2.14/full.html#what-s-new-in-pylint-2-14-0
[2] https://pylint.pycqa.org/en/latest/whatsnew/2/2.6/summary.html#summary-release-highlights
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
|
|
|
|
Requested-by: https://github.com/searxng/searxng/discussions/993#discussioncomment-2396914
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
* oscar theme: code from searx/plugins/infinite_scroll.py
* simple theme: new implementation
Co-authored-by: Markus Heiser <markus.heiser@darmarIT.de>
|
|
[limiter] update
|
|
Rename result field data_src to iframe_src
Suggested-by: @dalf https://github.com/searxng/searxng/pull/882#issuecomment-1037997402
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
This is a rewrite of the hostname_replace.py that:
- don't stop to replace URL in fields ('data_src', 'audio_src') if there isn't a
'parsed_url',
- adds a comment about keep or remove a result from the result list
- adds a loop over ['data_src', 'audio_src'] instead of doubling code lines
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
|
|
Embedded HTML breaks SearXNG architecture. To modularize, HTML is generated in
the templates (oscar & simple) and result parameter 'embedded' is replaced by
'data_src' (and 'audio_src'), an URL for embedded content (<iframe>).
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
To test you need to redirect embeded videos (e.g.) from youtube to a invidios
instance. Search for videos using engine `!youtube lebowski`. The result URLs
and the embeded videos should link to the invidios instance.
Here is an example of such a `hostname_replace` configuration::
hostname_replace:
# youtube --> Invidious
'(.*\.)?youtube-nocookie\.com': 'invidio.xamh.de'
'(.*\.)?youtube\.com$': 'invidio.xamh.de'
'(.*\.)?invidious\.snopyta\.org$': 'invidio.xamh.de'
'(.*\.)?vid\.puffyan\.us': 'invidio.xamh.de'
'(.*\.)?invidious\.kavin\.rocks$': 'invidio.xamh.de'
'(.*\.)?inv\.riverside\.rocks$': 'invidio.xamh.de'
Closes: https://github.com/searxng/searxng/issues/873
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
also adjust the number of req/time
|
|
can replace filtron:
* rate limite the number of request per IP and per (IP, User-Agent)
* block some bots
use Redis
data stored in Redis never contains the IP addresses, only HMAC using the secret_key
Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
|
|
Previously the Setting classes used a horrible _post_init
hack that prevented proper type checking.
|
|
This patch was generated by black [1]::
make format.python
[1] https://github.com/psf/black
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
Disable the python code formatting from python-black, where the readability of
code suffers by formatting.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
add a new function "init" call when the app starts.
The function can:
* return False to disable the plugin.
* modify the Flask app.
|
|
* backport of https://github.com/searx/searx/pull/2724
* allow to remove result if the replacement is the boolean value false
|