diff options
| author | Markus Heiser <markus.heiser@darmarit.de> | 2019-12-12 11:19:26 +0100 |
|---|---|---|
| committer | Markus Heiser <markus.heiser@darmarit.de> | 2019-12-12 11:19:26 +0100 |
| commit | 49e8dd1e0dbce9b06a3a4ad2de6fc0e1d95eb49b (patch) | |
| tree | 9b947e35abc9325c0c75fff77ec78b7d83a543e3 /docs/admin/filtron.rst | |
| parent | 3f93fe04d8c43191cc3bff51c9add1c35728c789 (diff) | |
| parent | 61e9b0b75f9fabb4e50eddcd1aed6592fd2569bd (diff) | |
doc: move patches from /doc folder of branch gh-pages to master
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Diffstat (limited to 'docs/admin/filtron.rst')
| -rw-r--r-- | docs/admin/filtron.rst | 114 |
1 files changed, 114 insertions, 0 deletions
diff --git a/docs/admin/filtron.rst b/docs/admin/filtron.rst new file mode 100644 index 000000000..c422cb0a8 --- /dev/null +++ b/docs/admin/filtron.rst @@ -0,0 +1,114 @@ +How to protect an instance +========================== + +Searx depens on external search services. To avoid the abuse of these services it is advised to limit the number of requests processed by searx. + +An application firewall, ``filtron`` solves exactly this problem. Information on how to install it can be found at the `project page of filtron <https://github.com/asciimoo/filtron>`__. + +Sample configuration of filtron +------------------------------- + +An example configuration can be find below. This configuration limits the access of + + * scripts or applications (roboagent limit) + + * webcrawlers (botlimit) + + * IPs which send too many requests (IP limit) + + * too many json, csv, etc. requests (rss/json limit) + + * the same UserAgent of if too many requests (useragent limit) + + +.. code:: json + + [ + { + "name": "search request", + "filters": ["Param:q", "Path=^(/|/search)$"], + "interval": <time-interval-in-sec>, + "limit": <max-request-number-in-interval>, + "subrules": [ + { + "name": "roboagent limit", + "interval": <time-interval-in-sec>, + "limit": <max-request-number-in-interval>, + "filters": ["Header:User-Agent=(curl|cURL|Wget|python-requests|Scrapy|FeedFetcher|Go-http-client)"], + "actions": [ + {"name": "block", + "params": {"message": "Rate limit exceeded"}} + ] + }, + { + "name": "botlimit", + "limit": 0, + "stop": true, + "filters": ["Header:User-Agent=(Googlebot|bingbot|Baiduspider|yacybot|YandexMobileBot|YandexBot|Yahoo! Slurp|MJ12bot|AhrefsBot|archive.org_bot|msnbot|MJ12bot|SeznamBot|linkdexbot|Netvibes|SMTBot|zgrab|James BOT)"], + "actions": [ + {"name": "block", + "params": {"message": "Rate limit exceeded"}} + ] + }, + { + "name": "IP limit", + "interval": <time-interval-in-sec>, + "limit": <max-request-number-in-interval>, + "stop": true, + "aggregations": ["Header:X-Forwarded-For"], + "actions": [ + {"name": "block", + "params": {"message": "Rate limit exceeded"}} + ] + }, + { + "name": "rss/json limit", + "interval": <time-interval-in-sec>, + "limit": <max-request-number-in-interval>, + "stop": true, + "filters": ["Param:format=(csv|json|rss)"], + "actions": [ + {"name": "block", + "params": {"message": "Rate limit exceeded"}} + ] + }, + { + "name": "useragent limit", + "interval": <time-interval-in-sec>, + "limit": <max-request-number-in-interval>, + "aggregations": ["Header:User-Agent"], + "actions": [ + {"name": "block", + "params": {"message": "Rate limit exceeded"}} + ] + } + ] + } + ] + + + +Route request through filtron +----------------------------- + +Filtron can be started using the following command: + +.. code:: bash + + $ filtron -rules rules.json + +It listens on 127.0.0.1:4004 and forwards filtered requests to 127.0.0.1:8888 by default. + +Use it along with ``nginx`` with the following example configuration. + +.. code:: bash + + location / { + proxy_set_header Host $http_host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Scheme $scheme; + proxy_pass http://127.0.0.1:4004/; + } + +Requests are coming from port 4004 going through filtron and then forwarded to port 8888 where a searx is being run. |