First release

pierky · Dec 16, 2014 · 766c6b2 · 766c6b2
commit 766c6b2
Show file tree

Hide file tree

Showing 11 changed files with 1,349 additions and 0 deletions.
diff --git a/CONFIGURATION.md b/CONFIGURATION.md
@@ -0,0 +1,212 @@
+# Configuration of pmacct-to-elasticsearch
+
+## How it works
+
+pmacct-to-elasticsearch reads pmacct JSON output and sends it to ElasticSearch.
+
+It works properly with two kinds of pmacct plugins: "memory" and "print".
+The former, "memory", needs data to be passed to pmacct-to-elasticsearch's
+stdin, while the latter, "print", needs a file to be written by pmacct
+daemons, where pmacct-to-elasticsearch is instructed to read data from.
+
+For "print" plugins, a crontab job is needed to run pmacct client and to
+redirect its output to pmacct-to-elasticsearch; for "memory" plugins the pmacct
+daemon can directly execute pmacct-to-elasticsearch. More details will follow
+within the rest of this document.
+
+![Configuration files](https://raw.github.com/pierky/pmacct-to-elasticsearch/master/img/config_files.png)
+
+Print plugins are preferable because, in case of pmacct daemon graceful
+restart or shutdown, data are written to the output file and the trigger
+is regularly executed.
+
+## 1-to-1 mapping with pmacct plugins
+
+For each pmacct's plugin you want to be processed by pmacct-to-elasticsearch
+a configuration file must be present in the *CONF_DIR* directory to tell the
+program how to process its output.
+
+Configuration file's name must be in the format *PluginName*.conf, where
+*PluginName* is the name of the pmacct plugin to which the file refer to.
+
+Example:
+
+     /etc/pmacct/nfacctd.conf:
+
+        ! nfacctd configuration example
+        plugins: memory[my_mem], print[my_print]
+
+     /etc/p2es/my_mem.conf
+     /etc/p2es/my_print.conf
+
+Basically these files tell pmacct-to-elasticsearch:
+
+1. where to read pmacct's output from;
+
+2. how to send output to ElasticSearch;
+
+3. (optionally) which transformations must be operated.
+
+To run pmacct-to-elasticsearch the first argument must be the *PluginName*,
+in order to allow it to figure out what to do:
+
+        pmacct-to-elasticsearch my_print
+
+## Configuration file syntax
+
+These files are in JSON format and contain the following keys:
+
+- **LogFile** [required]: path to the log file used by pmacct-to-elasticsearch
+   to write any error encountered while processing the output.
+
+   It can contain some macros, which are replaced during execution:
+   *$PluginName*, *$IndexName*, *$Type*
+
+   Log file will be automatically rotated every 1MB, for 3 times.
+
+   **Default**: "/var/log/pmacct-to-elasticsearch-$PluginName.log"
+
+- **ES_URL** [required]: URL of ElasticSearch HTTP API.
+
+   **Default**: "http://localhost:9200"
+
+- **ES_IndexName** [required]: name of the ElasticSearch index used to store
+   pmacct-to-elasticsearch output.
+
+   It may contain Python strftime codes (http://strftime.org/) in order
+   to have periodic indices.
+
+   Example:
+     "netflow-%Y-%m-%d" to have daily indices (netflow-YYYY-MM-DD)
+
+   Default: no default provided
+
+- **ES_Type** [required]: ElasticSearch document type (_type field) used to store
+   pmacct-to-elasticsearch output. Similar to tables in relational DB.
+
+   From the official reference guide
+   http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_basic_concepts.html#_type:
+
+   > Within an index, you can define one or more types. A type is a logical
+   > category/partition of your index whose semantics is completely up to
+   > you. In general, a type is defined for documents that have a set of
+   > common fields. For example, let.s assume you run a blogging platform
+   > and store all your data in a single index. In this index, you may
+   > define a type for user data, another type for blog data, and yet
+   > another type for comments data."
+
+   Default: no default provided
+
+- **ES_IndexTemplateFileName** [required]: name of the file containing the
+   template to be used when creating a new index. The file must be in the
+   *CONF_DIR* directory.
+
+   **Default**: new-index-template.json (included in pmacct-to-elasticsearch)
+
+   The default template provided with pmacct-to-elasticsearch has the
+   _source field enabled; if you want to save some storage disable it
+   by editing the new-index-template.json file:
+
+           "_source" : { "enabled" : false }
+
+- **ES_FlushSize** [required]: how often to flush data to ElasticSearch BULK API.
+
+   Set it to 0 to only send data once the whole input has been processed.
+
+   **Default**: 5000 lines
+
+- **InputFile** [optional]: used mainly when configuring pmacct print plugins.
+   File used by pmacct-to-elasticsearch to read input data from (it
+   should coincide with pmacct's print plugin output file).
+   If omitted pmacct-to-elasticsearch will read data from stdin.
+
+- **Transformations** [optional]: the transformation matrix used to add new
+   fields to the output document sent to ElasticSearch for indexing.
+
+   More details in the [TRANSFORMATIONS.md](TRANSFORMATIONS.md) file.
+
+This is an example of a basic configuration file:
+
+     {
+          "ES_IndexName": "netflow-%Y-%m-%d",
+          "ES_Type": "ingress_traffic",
+          "InputFile": "/var/lib/pmacct/ingress_traffic.json",
+     }
+
+## Plugins configuration
+
+### Memory plugins
+
+For "memory" plugins, a crontab job is needed in order to periodically read
+(and clear) the in-memory-table that pmacct uses to store data:
+
+Example of a command scheduled in crontab:
+
+        pmacct -l -p /var/spool/pmacct/my_mem.pipe -s -O json -e | pmacct-to-elasticsearch my_mem
+
+In the example above, the pmacct client reads the in-memory-table
+referenced by the **/var/spool/pmacct/my_mem.pipe** file and write the JSON
+output to stdout, which in turn is redirected to the stdin of
+pmacct-to-elasticsearch, that is executed with the **my_mem** argument in order
+to let it to load the right configuration from **/etc/p2es/my_mem.conf**.
+
+### Print plugins
+
+For "print" plugins, the crontab job is not required but a feature of pmacct
+may be used instead: the **print_trigger_exec** config key.
+The print_trigger_exec key allows pmacct to directly run
+pmacct-to-elasticsearch once the output has been fully written to the output
+file. Since pmacct does not allow to pass arguments to programs executed using
+the print_trigger_exec key, a trick is needed in order to let
+pmacct-to-elasticsearch to understand what configuration to use: a trigger
+file must be created for each "print" plugin and it has to execute the
+program with the proper argument.
+
+Example:
+
+     /etc/pmacct/nfacctd.conf:
+
+        ! nfacctd configuration example
+        plugins: print[my_print]
+        print_output_file[my_print]: /var/lib/pmacct/my_print.json
+        print_output[my_print]: json
+        print_trigger_exec[my_print]: /etc/p2es/triggers/my_print
+
+     /etc/p2es/triggers/my_print:
+
+        #!/bin/sh
+        /usr/local/bin/pmacct-to-elasticsearch my_print &
+
+     # chmod u+x /etc/p2es/triggers/my_print
+
+     /etc/p2es/my_print.conf:
+
+        {
+                ...
+                "InputFile": "/var/lib/pmacct/my_print.json"
+                ...
+        }
+
+In the example, the nfacctd daemon has a plugin named **my_print** that writes
+its JSON output to **/var/lib/pmacct/my_print.json** and, when done, executes
+the **/etc/p2es/triggers/my_print** program. The trigger program, in turn, runs
+pmacct-to-elasticsearch with the **my_print** argument and detaches it.
+The **my_print.conf** file contains the "InputFile" configuration key that points
+to the aforementioned JSON output file (**/var/lib/pmacct/my_print.json**), where
+the program will read data from.
+
+The trigger program may also be a symbolic link to the **default_trigger** script
+provided, which runs pmacct-to-elasticsearch with its own file name as first
+argument:
+
+     # cd /etc/p2es/triggers/
+     # ln -s default_trigger my_print
+     
+     /etc/p2es/triggers/default_trigger:
+          
+          #!/bin/sh
+          PLUGIN_NAME=`basename $0`
+          /usr/local/bin/pmacct-to-elasticsearch $PLUGIN_NAME &
+
+Otherwise, remember to use the full path of pmacct-to-elasticsearch in order 
+to avoid problems with a stripped version of the *PATH* environment variable.
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,22 @@
+The MIT License (MIT)
+
+Copyright (c) 2014 Pier Carlo Chiodi
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
diff --git a/README.md b/README.md
@@ -0,0 +1,34 @@
+pmacct-to-elasticsearch
+=======================
+
+**pmacct-to-elasticsearch** is a python script designed to read JSON output from **pmacct** daemons, to process it and to store it into **ElasticSearch**. It works with both *memory* and *print* plugins and, optionally, it can perform **manipulations on data** (such as to add fields on the basis of other values).
+
+![Data flow](https://raw.github.com/pierky/pmacct-to-elasticsearch/master/img/data_flow.png)
+
+1. **pmacct daemons** collect IP accounting data and process them with their plugins;
+2. data are stored into **in-memory-tables** (*memory* plugins) or **JSON files** (*print* plugins);
+3. **crontab jobs** (*memory* plugins) or **trigger scripts** (*print* plugins) are invoked to execute pmacct-to-elasticsearch;
+4. JSON records are finally processed by **pmacct-to-elasticsearch**, which reads them from stdin (*memory* plugins) or directly from JSON file.
+
+Optionally, some **data transformations** can be configured, to allow pmacct-to-elasticsearch to **add or remove fields** to/from the output documents that are sent to ElasticSearch for indexing. These additional fields may be useful to enhance graphs and reports legibility, or to add a further level of aggregation or filtering.
+
+## Installation
+
+Clone the repository and run the ./install script:
+
+      # cd /usr/local/src/
+      # git clone https://github.com/pierky/pmacct-to-elasticsearch.git
+      # cd pmacct-to-elasticsearch/
+      # ./install
+
+## Configuration
+
+Please refer to the [CONFIGURATION.md](CONFIGURATION.md) file. The [TRANSFORMATIONS.md](TRANSFORMATIONS.md) file contains details about data transformations configuration.
+
+A simple tutorial on pmacct integration with ElasticSearch/Kibana using pmacct-to-elasticsearch can be found at http://blog.pierky.com/integration-of-pmacct-with-elasticsearch-and-kibana.
+
+## Author
+
+Pier Carlo Chiodi - http://pierky.com/aboutme
+
+Blog: http://blog.pierky.com Twitter: [@pierky](http://twitter.com/pierky)