summaryrefslogtreecommitdiff
path: root/docs/admin/filtron.rst
diff options
context:
space:
mode:
authorMarkus Heiser <markus.heiser@darmarIT.de>2019-12-24 13:33:07 +0100
committerGitHub <noreply@github.com>2019-12-24 13:33:07 +0100
commitfb668e2075484084a1f7a9b205ecbe7957ea5e8e (patch)
treec6f2e83d9d222d69d79348faac342c07c32dbbf3 /docs/admin/filtron.rst
parentf407dd8ef4e3f6c82bef31f678139d6db2a4d810 (diff)
parent6d232e9b695c2553b7594efe00c4f63aa96fc62d (diff)
Merge branch 'master' into libgen
Diffstat (limited to 'docs/admin/filtron.rst')
-rw-r--r--docs/admin/filtron.rst148
1 files changed, 148 insertions, 0 deletions
diff --git a/docs/admin/filtron.rst b/docs/admin/filtron.rst
new file mode 100644
index 000000000..07dcb9bc5
--- /dev/null
+++ b/docs/admin/filtron.rst
@@ -0,0 +1,148 @@
+==========================
+How to protect an instance
+==========================
+
+Searx depens on external search services. To avoid the abuse of these services
+it is advised to limit the number of requests processed by searx.
+
+An application firewall, ``filtron`` solves exactly this problem. Information
+on how to install it can be found at the `project page of filtron
+<https://github.com/asciimoo/filtron>`__.
+
+
+Sample configuration of filtron
+===============================
+
+An example configuration can be find below. This configuration limits the access
+of:
+
+- scripts or applications (roboagent limit)
+- webcrawlers (botlimit)
+- IPs which send too many requests (IP limit)
+- too many json, csv, etc. requests (rss/json limit)
+- the same UserAgent of if too many requests (useragent limit)
+
+.. code:: json
+
+ [{
+ "name":"search request",
+ "filters":[
+ "Param:q",
+ "Path=^(/|/search)$"
+ ],
+ "interval":"<time-interval-in-sec (int)>",
+ "limit":"<max-request-number-in-interval (int)>",
+ "subrules":[
+ {
+ "name":"roboagent limit",
+ "interval":"<time-interval-in-sec (int)>",
+ "limit":"<max-request-number-in-interval (int)>",
+ "filters":[
+ "Header:User-Agent=(curl|cURL|Wget|python-requests|Scrapy|FeedFetcher|Go-http-client)"
+ ],
+ "actions":[
+ {
+ "name":"block",
+ "params":{
+ "message":"Rate limit exceeded"
+ }
+ }
+ ]
+ },
+ {
+ "name":"botlimit",
+ "limit":0,
+ "stop":true,
+ "filters":[
+ "Header:User-Agent=(Googlebot|bingbot|Baiduspider|yacybot|YandexMobileBot|YandexBot|Yahoo! Slurp|MJ12bot|AhrefsBot|archive.org_bot|msnbot|MJ12bot|SeznamBot|linkdexbot|Netvibes|SMTBot|zgrab|James BOT)"
+ ],
+ "actions":[
+ {
+ "name":"block",
+ "params":{
+ "message":"Rate limit exceeded"
+ }
+ }
+ ]
+ },
+ {
+ "name":"IP limit",
+ "interval":"<time-interval-in-sec (int)>",
+ "limit":"<max-request-number-in-interval (int)>",
+ "stop":true,
+ "aggregations":[
+ "Header:X-Forwarded-For"
+ ],
+ "actions":[
+ {
+ "name":"block",
+ "params":{
+ "message":"Rate limit exceeded"
+ }
+ }
+ ]
+ },
+ {
+ "name":"rss/json limit",
+ "interval":"<time-interval-in-sec (int)>",
+ "limit":"<max-request-number-in-interval (int)>",
+ "stop":true,
+ "filters":[
+ "Param:format=(csv|json|rss)"
+ ],
+ "actions":[
+ {
+ "name":"block",
+ "params":{
+ "message":"Rate limit exceeded"
+ }
+ }
+ ]
+ },
+ {
+ "name":"useragent limit",
+ "interval":"<time-interval-in-sec (int)>",
+ "limit":"<max-request-number-in-interval (int)>",
+ "aggregations":[
+ "Header:User-Agent"
+ ],
+ "actions":[
+ {
+ "name":"block",
+ "params":{
+ "message":"Rate limit exceeded"
+ }
+ }
+ ]
+ }
+ ]
+ }]
+
+
+
+Route request through filtron
+=============================
+
+Filtron can be started using the following command:
+
+.. code:: sh
+
+ $ filtron -rules rules.json
+
+It listens on ``127.0.0.1:4004`` and forwards filtered requests to
+``127.0.0.1:8888`` by default.
+
+Use it along with ``nginx`` with the following example configuration.
+
+.. code:: nginx
+
+ location / {
+ proxy_set_header Host $http_host;
+ proxy_set_header X-Real-IP $remote_addr;
+ proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+ proxy_set_header X-Scheme $scheme;
+ proxy_pass http://127.0.0.1:4004/;
+ }
+
+Requests are coming from port 4004 going through filtron and then forwarded to
+port 8888 where a searx is being run.