Skip to content

Commit

Permalink
#531 Add the info about the new feature to README.
Browse files Browse the repository at this point in the history
  • Loading branch information
yruslan committed Jan 8, 2025
1 parent 1f50835 commit b031c17
Showing 1 changed file with 47 additions and 1 deletion.
48 changes: 47 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -237,13 +237,59 @@ Let's take a look on components of a data pipeline in more detail.

## Pipeline components

A pipeline consists of _sources_, _the metastore_ and _sinks_.
A pipeline consists of _common options_, _sources_, _the metastore_, _sinks_, and _operations_. All these
definitions form the workflow config. For big pipelines these definitions can be split among multiple files. Check out
`examples/` folder for example workflow definitions. Let's take a look at each section of a workflow separately.

Currently there are 3 types of jobs:
- _Ingestion_ jobs to get data from external sources to the metastore.
- _Transformation jobs_ to transform data inside the metastore.
- _Sink_ jobs to send data from the metastore to external systems.

### Common options
Pramen pipeline should have several options defined. Here is the minimum configuration. For the list of all options
and their default values check out [reference.conf](pramen/core/src/main/resources/reference.conf).

```hocon
pramen {
environment.name = "AWS Glue (DEV)"
pipeline.name = "CDC PoC"
bookkeeping.enabled = true
bookkeeping.jdbc {
driver = "org.postgresql.Driver"
url = "jdbc:postgresql://myhost:5432/pramen_database"
user = "postgresql_user"
password = "password"
}
temporary.directory = "s3://bucket/prefix/tmp/"
}
```

#### Email notifications
One section of config defines options for email notifications. You can define
```hocon
mail {
# SMTP configuration
# Any options from https://javaee.github.io/javamail/docs/api/com/sun/mail/smtp/package-summary.html
smtp.host = "smtp.example.com"
smtp.port = "25"
smtp.auth = "false"
smtp.starttls.enable = "false"
smtp.EnableSSL.enable = "false"
debug = "false"
# A custom email sender (optional)
send.from = "Pramen <[email protected]"
# Email recipients
send.to = "[email protected], [email protected]"
# A list of allowed domains (optional)
allowed.domains = [ "example.com", "test.com" ]
}
```

### Dates
Before diving into pipeline definition it is important to understand how dates are handled. Pramen is a batch data
pipeline manager for input data updates coming from applications which are usually referred to as _source systems_. Pramen is designed
Expand Down

0 comments on commit b031c17

Please sign in to comment.