diff --git a/docs/sources/get-started/labels/_index.md b/docs/sources/get-started/labels/_index.md index 862c04762c5d1..bc946b418cff4 100644 --- a/docs/sources/get-started/labels/_index.md +++ b/docs/sources/get-started/labels/_index.md @@ -9,53 +9,252 @@ aliases: --- # Understand labels -Labels are key-value pairs and can be defined as anything! We like to refer to them as metadata to describe a log stream. If you are familiar with Prometheus, there are a few labels you are used to seeing like `job` and `instance`, and I will use those in the coming examples. +Labels are a crucial part of Loki. They allow Loki to organize and group together log messages into log streams. Each log stream must have at least one label to be stored and queried in Loki. -The scrape configs we provide with Grafana Loki define these labels, too. If you are using Prometheus, having consistent labels between Loki and Prometheus is one of Loki's superpowers, making it incredibly [easy to correlate your application metrics with your log data](/blog/2019/05/06/how-loki-correlates-metrics-and-logs--and-saves-you-money/). +In this topic we'll learn about labels and why your choice of labels is important when shipping logs to Loki. -## How Loki uses labels +{{< admonition type="note" >}} +Labels are intended to store [low-cardinality](https://grafana.com/docs/loki//get-started/labels/cardinality/) values that describe the source of your logs. If you frequently search high-cardinality data in your logs, you should use [structured metadata](https://grafana.com/docs/loki//get-started/labels/structured-metadata/). +{{< /admonition >}} + +## Understand labels + +In Loki, the content of each log line is not indexed. Instead, log entries are grouped into streams which are indexed with labels. + +A label is a key-value pair, for example all of the following are labels: + +- deployment_environment = development +- cloud_region = us-west-1 +- namespace = grafana-server + +A set of log messages which shares all the labels above would be called a log stream. When Loki performs searches, it first looks for all messages in your chosen stream, and then iterates through the logs in the stream to perform your query. + +Labeling will affect your queries, which in turn will affect your dashboards. +It’s worth spending the time to think about your labeling strategy before you begin ingesting logs to Loki. + +## Default labels for all users + +Loki does not parse or process your log messages on ingestion. However, depending on which client you use to collect logs, you may have some labels automatically applied to your logs. + +`service_name` -Labels in Loki perform a very important task: They define a stream. More specifically, the combination of every label key and value defines the stream. If just one label value changes, this creates a new stream. +Loki automatically tries to populate a default `service_name` label while ingesting logs. The service name label is used to find and explore logs in the following Grafana and Grafana Cloud features: -If you are familiar with Prometheus, the term used there is series; however, Prometheus has an additional dimension: metric name. Loki simplifies this in that there are no metric names, just labels, and we decided to use streams instead of series. +- Explore Logs +- Grafana Cloud Application Observability {{< admonition type="note" >}} -Structured metadata do not define a stream, but are metadata attached to a log line. -See [structured metadata]({{< relref "./structured-metadata" >}}) for more information. +If you are already applying a `service_name`, Loki will use that value. {{< /admonition >}} -## Format +Loki will attempt to create the `service_name` label by looking for the following labels in this order: + +- service_name +- service +- app +- application +- name +- app_kubernetes_io_name +- container +- container_name +- component +- workload +- job + +If no label is found matching the list, a value of `unknown_service` is applied. + +You can change this list by providing a list of labels to `discover_service_name` in the [limits_config](/docs/loki//configure/#limits_config) block. If you are using Grafana Cloud, contact support to configure this setting. + +## Default labels for OpenTelemetry + +If you are using either Grafana Alloy or the OpenTelemetry Collector as your Loki client, then Loki automatically assigns some of the OTel resource attributes as labels. Resource attributes map well to index labels in Loki, since both usually identify the source of the logs. + +By default, the following resource attributes will be stored as labels, with periods (`.`) replaced with underscores (`_`), while the remaining attributes are stored as [structured metadata](https://grafana.com/docs/loki//get-started/labels/structured-metadata/) with each log entry: + +- cloud.availability_zone +- cloud.region +- container.name +- deployment.environment +- k8s.cluster.name +- k8s.container.name +- k8s.cronjob.name +- k8s.daemonset.name +- k8s.deployment.name +- k8s.job.name +- k8s.namespace.name +- k8s.pod.name +- k8s.replicaset.name +- k8s.statefulset.name +- service.instance.id +- service.name +- service.namespace + +{{% admonition type="note" %}} +Because Loki has a default limit of 15 index labels, we recommend storing only select resource attributes as labels. Although the default config selects more than 15 Resource Attributes, some are mutually exclusive. +{{% /admonition %}} + +{{< admonition type="tip" >}} +For Grafana Cloud Logs, see the [current OpenTelemetry guidance](https://grafana.com/docs/grafana-cloud/send-data/otlp/otlp-format-considerations/#logs). +{{< /admonition >}} + +The default list of resource attributes to store as labels can be configured using `default_resource_attributes_as_index_labels` under the [distributor's otlp_config](https://grafana.com/docs/loki//configure/#distributor). You can set global limits using [limits_config.otlp_config](/docs/loki//configure/#limits_config). If you are using Grafana Cloud, contact support to configure this setting. + +## Labeling is iterative + +You want to start with a small set of labels. While accepting the default labels assigned by Grafana Alloy or the OpenTelemetry Collector or the Kubernetes Monitoring Helm chart may meet your needs, over time you may find that you need to modify your labeling strategy. + +Once you understand how your first set of labels works and you understand how to apply and query with those labels, you may find that they don’t meet your query patterns. You may need to modify or change your labels and test your queries again. + +Settling on the right labels for your business needs may require multiple rounds of testing. This should be expected as you continue to tune your Loki environment to meet your business requirements. + +## Create low cardinality labels + +[Cardinality](https://grafana.com/docs/loki//get-started/labels/cardinality/) refers to the combination of unique labels and values which impacts the number of log streams you create. High cardinality causes Loki to build a huge index and to flush thousands of tiny chunks to the object store. Loki performs very poorly when your labels have high cardinality. If not accounted for, high cardinality will significantly reduce the performance and cost-effectiveness of Loki. + +High cardinality can result from using labels with an unbounded or large set of possible values, such as timestamp or ip_address **or** applying too many labels, even if they have a small and finite set of values. + +High cardinality can lead to significant performance degradation. Prefer fewer labels, which have bounded values. + +## Creating custom labels + +{{< admonition type="tip" >}} +Many log collectors such as Grafana Alloy, or the Kubernetes Monitoring Helm chart, will automatically assign appropriate labels for you, so you don't need to create your own labeling strategy. For most use cases, you can just accept the default labels. +{{< /admonition >}} + +Usually, labels describe the source of the log, for example: + +- the namespace or additional logical grouping of applications +- cluster, and/or region of where the logs were produced +- the filename of the source log file on disk +- the hostname where the log was produced, if the environment has individually named machines or virtual machines. If you have an environment with ephemeral machines or virtual machines, the hostname should be stored in [structured metadata](https://grafana.com/docs/loki//get-started/labels/structured-metadata/). + +If your logs had the example labels above, then you might query them in LogQL like this: + +`{namespace="mynamespace", cluster="cluster123" filename="/var/log/myapp.log"}` + +Unlike index-based log aggregators, Loki doesn't require you to create a label for every field that you might wish to search in your log content. Labels are only needed to organize and identify your log streams. Loki performs searches by iterating over a log stream in a highly parallelized fashion to look for a given string. + +For more information on how Loki performs searches, see the [Query section](https://grafana.com/docs/loki//query/). + +This means that you don't need to add labels for things inside the log message, such as: + +- log level +- log message +- exception name + +That being said, in some cases you may wish to add some extra labels, which can help to narrow down your log streams even further. When adding custom labels, follow these principles: + +- DO use fewer labels, aim to have 10 - 15 labels at a maximum. Fewer labels means a smaller index, which leads to better performance. +- DO be as specific with your labels you can be, the less searching that Loki has to do, the faster your result is returned. +- DO create labels with long-lived values, not unbounded values. To be a good label, we want something that has a stable set of values over time -- even if there are a lot of them. If just one label value changes, this creates a new stream. +- DO create labels based on terms that your users will actually be querying on. +- DON'T create labels for very specific searches (for example, user ID or customer ID) or seldom used searches (searches performed maybe once a year). + +### Label format Loki places the same restrictions on label naming as [Prometheus](https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels): - It may contain ASCII letters and digits, as well as underscores and colons. It must match the regex `[a-zA-Z_:][a-zA-Z0-9_:]*`. -- The colons are reserved for user defined recording rules. They should not be used by exporters or direct instrumentation. - Unsupported characters in the label should be converted to an underscore. For example, the label `app.kubernetes.io/name` should be written as `app_kubernetes_io_name`. +- However, do not begin and end your label names with double underscores, as this naming convention is used for internal labels, for example, \__stream_shard__, that are hidden by default in the label browser, query builder, and autocomplete to avoid creating confusion for users. + +In Loki, you do not need to add labels based on the content of the log message. + +### Labels and ingestion order + +Loki supports ingesting out-of-order log entries. Out-of-order writes are enabled globally by default, but can be disabled/enabled on a cluster or per-tenant basis. If you plan to ingest out-of-order log entries, your label selection is important. We recommend trying to find a way to use labels to separate the streams so they can be ingested separately. + +Entries in a given log stream (identified by a given set of label names & values) must be ingested in order, within the default two hour time window. If you try to send entries that are too old for a given log stream, Loki will respond with the error too far behind. + +For systems with different ingestion delays and shipping, use labels to create separate streams. Instead of: + +`{environment="production"}` + +You may separate the log stream into: + +`{environment="production", app="slow_app"}` +`{environment="production", app="fast_app"}` +Now the "fast_app" and "slow_app" will ship logs to different streams, allowing each to maintain their order of ingestion. -## Loki labels demo +## Loki labels examples -This series of examples will illustrate basic use cases and concepts for labeling in Loki. +The way that labels are added to logs is configured in the client that you use to send logs to Loki. The specific configuration will be different for each client. -Let's take an example Promtail/Alloy config file: +### Alloy example + +Grafana Labs recommends using Grafana Alloy to send logs to Loki. Here is an example configuration: + +```alloy + +local.file_match "tmplogs" { + path_targets = [{"__path__" = "/tmp/alloy-logs/*.log"}] +} + +loki.source.file "local_files" { + targets = local.file_match.tmplogs.targets + forward_to = [loki.process.add_new_label.receiver] +} + +loki.process "add_new_label" { + // Extract the value of "level" from the log line and add it to the extracted map as "extracted_level" + // You could also use "level" = "", which would extract the value of "level" and add it to the extracted map as "level" + // but to make it explicit for this example, we will use a different name. + // + // The extracted map will be covered in more detail in the next section. + stage.logfmt { + mapping = { + "extracted_level" = "level", + } + } + + // Add the value of "extracted_level" from the extracted map as a "level" label + stage.labels { + values = { + "level" = "extracted_level", + } + } + + forward_to = [loki.relabel.add_static_label.receiver] +} + +loki.relabel "add_static_label" { + forward_to = [loki.write.local_loki.receiver] + + rule { + target_label = "os" + replacement = constants.os + } +} + +loki.write "local_loki" { + endpoint { + url = "http://localhost:3100/loki/api/v1/push" + } +} +``` + +### Promtail example + +Here is an example of a Promtail configuration to send logs to Loki: ```yaml scrape_configs: - - job_name: system - pipeline_stages: - static_configs: - - targets: - - localhost - labels: - job: syslog - __path__: /var/log/syslog +- job_name: system + pipeline_stages: + static_configs: + - targets: + - localhost + labels: + job: syslog + __path__: /var/log/syslog ``` This config will tail one file and assign one label: `job=syslog`. This will create one stream in Loki. You could query it like this: -``` +```bash {job="syslog"} ``` @@ -63,29 +262,29 @@ Now let’s expand the example a little: ```yaml scrape_configs: - - job_name: system - pipeline_stages: - static_configs: - - targets: - - localhost - labels: - job: syslog - __path__: /var/log/syslog - - job_name: apache - pipeline_stages: - static_configs: - - targets: - - localhost - labels: - job: apache - __path__: /var/log/apache.log +- job_name: system + pipeline_stages: + static_configs: + - targets: + - localhost + labels: + job: syslog + __path__: /var/log/syslog +- job_name: apache + pipeline_stages: + static_configs: + - targets: + - localhost + labels: + job: apache + __path__: /var/log/apache.log ``` Now we are tailing two files. Each file gets just one label with one value, so Loki will now be storing two streams. We can query these streams in a few ways: -``` +```nohighlight {job="apache"} <- show me logs where the job label is apache {job="syslog"} <- show me logs where the job label is syslog {job=~"apache|syslog"} <- show me logs where the job is apache **OR** syslog @@ -95,29 +294,29 @@ In that last example, we used a regex label matcher to view log streams that use ```yaml scrape_configs: - - job_name: system - pipeline_stages: - static_configs: - - targets: - - localhost - labels: - job: syslog - env: dev - __path__: /var/log/syslog - - job_name: apache - pipeline_stages: - static_configs: - - targets: - - localhost - labels: - job: apache - env: dev - __path__: /var/log/apache.log +- job_name: system + pipeline_stages: + static_configs: + - targets: + - localhost + labels: + job: syslog + env: dev + __path__: /var/log/syslog +- job_name: apache + pipeline_stages: + static_configs: + - targets: + - localhost + labels: + job: apache + env: dev + __path__: /var/log/apache.log ``` Now instead of a regex, we could do this: -``` +```nohighlight {env="dev"} <- will return all logs with env=dev, in this case this includes both log streams ``` @@ -127,7 +326,7 @@ Labels are the index to Loki log data. They are used to find the compressed log For Loki to be efficient and cost-effective, we have to use labels responsibly. The next section will explore this in more detail. -## Cardinality +### Cardinality examples The two previous examples use statically defined labels with a single value; however, there are ways to dynamically define labels. Let's take a look using the Apache log and a massive regex you could use to parse such a log line: @@ -137,19 +336,19 @@ The two previous examples use statically defined labels with a single value; how ```yaml - job_name: system - pipeline_stages: - - regex: - expression: "^(?P\\S+) (?P\\S+) (?P\\S+) \\[(?P[\\w:/]+\\s[+\\-]\\d{4})\\] \"(?P\\S+)\\s?(?P\\S+)?\\s?(?P\\S+)?\" (?P\\d{3}|-) (?P\\d+|-)\\s?\"?(?P[^\"]*)\"?\\s?\"?(?P[^\"]*)?\"?$" - - labels: - action: - status_code: - static_configs: - - targets: - - localhost - labels: - job: apache - env: dev - __path__: /var/log/apache.log + pipeline_stages: + - regex: + expression: "^(?P\\S+) (?P\\S+) (?P\\S+) \\[(?P[\\w:/]+\\s[+\\-]\\d{4})\\] \"(?P\\S+)\\s?(?P\\S+)?\\s?(?P\\S+)?\" (?P\\d{3}|-) (?P\\d+|-)\\s?\"?(?P[^\"]*)\"?\\s?\"?(?P[^\"]*)?\"?$" + - labels: + action: + status_code: + static_configs: + - targets: + - localhost + labels: + job: apache + env: dev + __path__: /var/log/apache.log ``` This regex matches every component of the log line and extracts the value of each component into a capture group. Inside the pipeline code, this data is placed in a temporary data structure that allows use for several purposes during the processing of that log line (at which point that temp data is discarded). Much more detail about this can be found in the [Promtail pipelines]({{< relref "../../send-data/promtail/pipelines" >}}) documentation. @@ -171,7 +370,7 @@ And now let's walk through a few example lines: In Loki the following streams would be created: -``` +```nohighlight {job="apache",env="dev",action="GET",status_code="200"} 11.11.11.11 - frank [25/Jan/2000:14:00:01 -0500] "GET /1986.js HTTP/1.1" 200 932 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 GTB6" {job="apache",env="dev",action="POST",status_code="200"} 11.11.11.12 - frank [25/Jan/2000:14:00:02 -0500] "POST /1986.js HTTP/1.1" 200 932 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 GTB6" {job="apache",env="dev",action="GET",status_code="400"} 11.11.11.13 - frank [25/Jan/2000:14:00:03 -0500] "GET /1986.js HTTP/1.1" 400 932 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 GTB6" @@ -185,39 +384,3 @@ Any additional log lines that match those combinations of labels/values would be Imagine now if you set a label for `ip`. Not only does every request from a user become a unique stream. Every request with a different action or status_code from the same user will get its own stream. Doing some quick math, if there are maybe four common actions (GET, PUT, POST, DELETE) and maybe four common status codes (although there could be more than four!), this would be 16 streams and 16 separate chunks. Now multiply this by every user if we use a label for `ip`. You can quickly have thousands or tens of thousands of streams. - -This is high cardinality, and it can lead to significant performance degradation. - -When we talk about _cardinality_ we are referring to the combination of labels and values and the number of streams they create. High cardinality is using labels with a large range of possible values, such as `ip`, **or** combining many labels, even if they have a small and finite set of values, such as using `status_code` and `action`. - -High cardinality causes Loki to build a huge index and to flush thousands of tiny chunks to the object store. Loki currently performs very poorly in this configuration. If not accounted for, high cardinality will significantly reduce the operability and cost-effectiveness of Loki. - -## Optimal Loki performance with parallelization - -Now you may be asking: If using too many labels—or using labels with too many values—is bad, then how am I supposed to query my logs? If none of the data is indexed, won't queries be really slow? - -As we see people using Loki who are accustomed to other index-heavy solutions, it seems like they feel obligated to define a lot of labels in order to query their logs effectively. After all, many other logging solutions are all about the index, and this is the common way of thinking. - -When using Loki, you may need to forget what you know and look to see how the problem can be solved differently with parallelization. Loki's superpower is breaking up queries into small pieces and dispatching them in parallel so that you can query huge amounts of log data in small amounts of time. - -This kind of brute force approach might not sound ideal, but let me explain why it is. - -Large indexes are complicated and expensive. Often a full-text index of your log data is the same size or bigger than the log data itself. To query your log data, you need this index loaded, and for performance, it should probably be in memory. This is difficult to scale, and as you ingest more logs, your index gets larger quickly. - -Now let's talk about Loki, where the index is typically an order of magnitude smaller than your ingested log volume. So if you are doing a good job of keeping your streams and stream churn to a minimum, the index grows very slowly compared to the ingested logs. - -Loki will effectively keep your static costs as low as possible (index size and memory requirements as well as static log storage) and make the query performance something you can control at runtime with horizontal scaling. - -To see how this works, let's look back at our example of querying your access log data for a specific IP address. We don't want to use a label to store the IP address. Instead we use a [filter expression]({{< relref "../../query/log_queries#line-filter-expression" >}}) to query for it: - -``` -{job="apache"} |= "11.11.11.11" -``` - -Behind the scenes, Loki will break up that query into smaller pieces (shards), and open up each chunk for the streams matched by the labels and start looking for this IP address. - -The size of those shards and the amount of parallelization are configurable and based on the resources you provision. If you want to, you can configure the shard interval down to 5m, deploy 20 queriers, and process gigabytes of logs in seconds. Or you can go crazy and provision 200 queriers and process terabytes of logs! - -This trade-off of smaller index and parallel brute force querying vs. a larger/faster full-text index is what allows Loki to save on costs versus other systems. The cost and complexity of operating a large index is high and is typically fixed -- you pay for it 24 hours a day if you are querying it or not. - -The benefits of this design mean you can make the decision about how much query power you want to have, and you can change that on demand. Query performance becomes a function of how much money you want to spend on it. Meanwhile, the data is heavily compressed and stored in low-cost object stores like S3 and GCS. This drives the fixed operating costs to a minimum while still allowing for incredibly fast query capability. diff --git a/docs/sources/get-started/labels/cardinality.md b/docs/sources/get-started/labels/cardinality.md new file mode 100644 index 0000000000000..7987931c2d405 --- /dev/null +++ b/docs/sources/get-started/labels/cardinality.md @@ -0,0 +1,52 @@ +--- +title: Cardinality +description: Describes what cardinality is and how it affects Loki performance. +weight: +--- + +# Cardinality + +The cardinality of a data attribute is the number of distinct values that the attribute can have. For example, a boolean column in a database, which can only have a value of either `true` or `false` has a cardinality of 2. + +High cardinality refers to a column or row in a database that can have many possible values. For an online shopping system, fields like `userId`, `shoppingCartId`, and `orderId` are often high-cardinality columns that can have hundreds of thousands of distinct values. + +Other examples of high cardinality attributes include the following: + +- Timestamp +- IP addresses +- Kubernetes pod names +- User ID +- Customer ID +- Trace ID + +When we talk about _cardinality_ in Loki we are referring to the combination of labels and values and the number of log streams they create. Loki was not designed or built to support high cardinality label values. In fact, it was built for exactly the opposite. It was built for very long-lived streams and very low cardinality in the labels. In Loki, the fewer labels you use, the better. This is why Loki has a default limit of 15 index labels. + +High cardinality can result from using labels with a large range of possible values, **or** combining many labels, even if they have a small and finite set of values, such as combining `status_code` and `action`. A typical set of status codes (200, 404, 500) and actions (GET, POST, PUT, PATCH, DELETE) would create 15 unique streams. But, adding just one more label like `endpoint` (/cart, /products, /customers) would triple this to 45 unique streams. + +To see an example of series labels and cardinality, refer to the [LogCLI tutorial] (https://grafana.com/docs/loki//query/logcli/logcli-tutorial/#checking-series-cardinality). As you can see, the cardinality for individual labels can be quite high, even before you begin combining labels for a particular log stream, which increases the cardinality even further. + +To view the cardinality of your current labels, you can use [logcli](https://grafana.com/docs/loki//query/logcli/getting-started/). + +`logcli series '{}' --since=1h --analyze-labels` + +## Impact of high cardinality in Loki + +High cardinality causes Loki to create many streams, especially when labels have many unique values, and when those values are short-lived (for example, active for seconds or minutes). This causes Loki to build a huge index, and to flush thousands of tiny chunks to the object store. + +Loki was not designed or built to support high cardinality label values. In fact, it was built for exactly the opposite. It was built for very long-lived streams and very low cardinality in the labels. In Loki, the fewer labels you use, the better. + +High cardinality can lead to significant performance degradation. + +## Avoiding high cardinality + +To avoid high cardinality in Loki, you should: + +- Avoid assigning labels with unbounded values, for example timestamp, trace ID, order ID. +- Prefer static labels that describe the origin or context of the log message, for example, application, namespace, environment. +- Don't assign "dynamic" labels, which are values from the log message itself, unless it is low-cardinality, or a long-lived value. +- Use structured metadata to store frequently-searched, high-cardinality metadata fields, such as customer IDs or transaction IDs, without impacting Loki's index. + +{{< admonition type="note" >}} +[Structured metadata](https://grafana.com/docs/loki//get-started/labels/structured-metadata/) is a feature in Loki and Cloud Logs that allows customers to store metadata that is too high cardinality for log lines, without needing to embed that information in log lines themselves. +It is a great home for metadata which is not easily embeddable in a log line, but is too high cardinality to be used effectively as a label. [Query acceleration with Blooms](https://grafana.com/docs/loki//operations/bloom-filters/) also utilizes structured metadata. +{{< /admonition >}}