Skip to content

Commit

Permalink
First release
Browse files Browse the repository at this point in the history
  • Loading branch information
pierky committed Dec 16, 2014
0 parents commit 766c6b2
Show file tree
Hide file tree
Showing 11 changed files with 1,349 additions and 0 deletions.
212 changes: 212 additions & 0 deletions CONFIGURATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,212 @@
# Configuration of pmacct-to-elasticsearch

## How it works

pmacct-to-elasticsearch reads pmacct JSON output and sends it to ElasticSearch.

It works properly with two kinds of pmacct plugins: "memory" and "print".
The former, "memory", needs data to be passed to pmacct-to-elasticsearch's
stdin, while the latter, "print", needs a file to be written by pmacct
daemons, where pmacct-to-elasticsearch is instructed to read data from.

For "print" plugins, a crontab job is needed to run pmacct client and to
redirect its output to pmacct-to-elasticsearch; for "memory" plugins the pmacct
daemon can directly execute pmacct-to-elasticsearch. More details will follow
within the rest of this document.

![Configuration files](https://raw.github.com/pierky/pmacct-to-elasticsearch/master/img/config_files.png)

Print plugins are preferable because, in case of pmacct daemon graceful
restart or shutdown, data are written to the output file and the trigger
is regularly executed.

## 1-to-1 mapping with pmacct plugins

For each pmacct's plugin you want to be processed by pmacct-to-elasticsearch
a configuration file must be present in the *CONF_DIR* directory to tell the
program how to process its output.

Configuration file's name must be in the format *PluginName*.conf, where
*PluginName* is the name of the pmacct plugin to which the file refer to.

Example:

/etc/pmacct/nfacctd.conf:

! nfacctd configuration example
plugins: memory[my_mem], print[my_print]

/etc/p2es/my_mem.conf
/etc/p2es/my_print.conf

Basically these files tell pmacct-to-elasticsearch:

1. where to read pmacct's output from;

2. how to send output to ElasticSearch;

3. (optionally) which transformations must be operated.

To run pmacct-to-elasticsearch the first argument must be the *PluginName*,
in order to allow it to figure out what to do:

pmacct-to-elasticsearch my_print

## Configuration file syntax

These files are in JSON format and contain the following keys:

- **LogFile** [required]: path to the log file used by pmacct-to-elasticsearch
to write any error encountered while processing the output.

It can contain some macros, which are replaced during execution:
*$PluginName*, *$IndexName*, *$Type*

Log file will be automatically rotated every 1MB, for 3 times.

**Default**: "/var/log/pmacct-to-elasticsearch-$PluginName.log"

- **ES_URL** [required]: URL of ElasticSearch HTTP API.

**Default**: "http://localhost:9200"

- **ES_IndexName** [required]: name of the ElasticSearch index used to store
pmacct-to-elasticsearch output.

It may contain Python strftime codes (http://strftime.org/) in order
to have periodic indices.

Example:
"netflow-%Y-%m-%d" to have daily indices (netflow-YYYY-MM-DD)

Default: no default provided

- **ES_Type** [required]: ElasticSearch document type (_type field) used to store
pmacct-to-elasticsearch output. Similar to tables in relational DB.

From the official reference guide
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_basic_concepts.html#_type:

> Within an index, you can define one or more types. A type is a logical
> category/partition of your index whose semantics is completely up to
> you. In general, a type is defined for documents that have a set of
> common fields. For example, let.s assume you run a blogging platform
> and store all your data in a single index. In this index, you may
> define a type for user data, another type for blog data, and yet
> another type for comments data."
Default: no default provided

- **ES_IndexTemplateFileName** [required]: name of the file containing the
template to be used when creating a new index. The file must be in the
*CONF_DIR* directory.

**Default**: new-index-template.json (included in pmacct-to-elasticsearch)

The default template provided with pmacct-to-elasticsearch has the
_source field enabled; if you want to save some storage disable it
by editing the new-index-template.json file:

"_source" : { "enabled" : false }

- **ES_FlushSize** [required]: how often to flush data to ElasticSearch BULK API.

Set it to 0 to only send data once the whole input has been processed.

**Default**: 5000 lines

- **InputFile** [optional]: used mainly when configuring pmacct print plugins.
File used by pmacct-to-elasticsearch to read input data from (it
should coincide with pmacct's print plugin output file).
If omitted pmacct-to-elasticsearch will read data from stdin.

- **Transformations** [optional]: the transformation matrix used to add new
fields to the output document sent to ElasticSearch for indexing.

More details in the [TRANSFORMATIONS.md](TRANSFORMATIONS.md) file.

This is an example of a basic configuration file:

{
"ES_IndexName": "netflow-%Y-%m-%d",
"ES_Type": "ingress_traffic",
"InputFile": "/var/lib/pmacct/ingress_traffic.json",
}

## Plugins configuration

### Memory plugins

For "memory" plugins, a crontab job is needed in order to periodically read
(and clear) the in-memory-table that pmacct uses to store data:

Example of a command scheduled in crontab:

pmacct -l -p /var/spool/pmacct/my_mem.pipe -s -O json -e | pmacct-to-elasticsearch my_mem

In the example above, the pmacct client reads the in-memory-table
referenced by the **/var/spool/pmacct/my_mem.pipe** file and write the JSON
output to stdout, which in turn is redirected to the stdin of
pmacct-to-elasticsearch, that is executed with the **my_mem** argument in order
to let it to load the right configuration from **/etc/p2es/my_mem.conf**.

### Print plugins

For "print" plugins, the crontab job is not required but a feature of pmacct
may be used instead: the **print_trigger_exec** config key.
The print_trigger_exec key allows pmacct to directly run
pmacct-to-elasticsearch once the output has been fully written to the output
file. Since pmacct does not allow to pass arguments to programs executed using
the print_trigger_exec key, a trick is needed in order to let
pmacct-to-elasticsearch to understand what configuration to use: a trigger
file must be created for each "print" plugin and it has to execute the
program with the proper argument.

Example:

/etc/pmacct/nfacctd.conf:

! nfacctd configuration example
plugins: print[my_print]
print_output_file[my_print]: /var/lib/pmacct/my_print.json
print_output[my_print]: json
print_trigger_exec[my_print]: /etc/p2es/triggers/my_print

/etc/p2es/triggers/my_print:

#!/bin/sh
/usr/local/bin/pmacct-to-elasticsearch my_print &

# chmod u+x /etc/p2es/triggers/my_print

/etc/p2es/my_print.conf:

{
...
"InputFile": "/var/lib/pmacct/my_print.json"
...
}

In the example, the nfacctd daemon has a plugin named **my_print** that writes
its JSON output to **/var/lib/pmacct/my_print.json** and, when done, executes
the **/etc/p2es/triggers/my_print** program. The trigger program, in turn, runs
pmacct-to-elasticsearch with the **my_print** argument and detaches it.
The **my_print.conf** file contains the "InputFile" configuration key that points
to the aforementioned JSON output file (**/var/lib/pmacct/my_print.json**), where
the program will read data from.

The trigger program may also be a symbolic link to the **default_trigger** script
provided, which runs pmacct-to-elasticsearch with its own file name as first
argument:

# cd /etc/p2es/triggers/
# ln -s default_trigger my_print
/etc/p2es/triggers/default_trigger:
#!/bin/sh
PLUGIN_NAME=`basename $0`
/usr/local/bin/pmacct-to-elasticsearch $PLUGIN_NAME &

Otherwise, remember to use the full path of pmacct-to-elasticsearch in order
to avoid problems with a stripped version of the *PATH* environment variable.
22 changes: 22 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
The MIT License (MIT)

Copyright (c) 2014 Pier Carlo Chiodi

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

34 changes: 34 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
pmacct-to-elasticsearch
=======================

**pmacct-to-elasticsearch** is a python script designed to read JSON output from **pmacct** daemons, to process it and to store it into **ElasticSearch**. It works with both *memory* and *print* plugins and, optionally, it can perform **manipulations on data** (such as to add fields on the basis of other values).

![Data flow](https://raw.github.com/pierky/pmacct-to-elasticsearch/master/img/data_flow.png)

1. **pmacct daemons** collect IP accounting data and process them with their plugins;
2. data are stored into **in-memory-tables** (*memory* plugins) or **JSON files** (*print* plugins);
3. **crontab jobs** (*memory* plugins) or **trigger scripts** (*print* plugins) are invoked to execute pmacct-to-elasticsearch;
4. JSON records are finally processed by **pmacct-to-elasticsearch**, which reads them from stdin (*memory* plugins) or directly from JSON file.

Optionally, some **data transformations** can be configured, to allow pmacct-to-elasticsearch to **add or remove fields** to/from the output documents that are sent to ElasticSearch for indexing. These additional fields may be useful to enhance graphs and reports legibility, or to add a further level of aggregation or filtering.

## Installation

Clone the repository and run the ./install script:

# cd /usr/local/src/
# git clone https://github.com/pierky/pmacct-to-elasticsearch.git
# cd pmacct-to-elasticsearch/
# ./install

## Configuration

Please refer to the [CONFIGURATION.md](CONFIGURATION.md) file. The [TRANSFORMATIONS.md](TRANSFORMATIONS.md) file contains details about data transformations configuration.

A simple tutorial on pmacct integration with ElasticSearch/Kibana using pmacct-to-elasticsearch can be found at http://blog.pierky.com/integration-of-pmacct-with-elasticsearch-and-kibana.

## Author

Pier Carlo Chiodi - http://pierky.com/aboutme

Blog: http://blog.pierky.com Twitter: [@pierky](http://twitter.com/pierky)
Loading

0 comments on commit 766c6b2

Please sign in to comment.