Skip to content

Commit

Permalink
Incorporate end user documentation (#50)
Browse files Browse the repository at this point in the history
* incorporate end-user documentation sources

* incorporate end-user documentation sources
  • Loading branch information
carueda authored Dec 17, 2024
1 parent 25cfb71 commit ffe712f
Show file tree
Hide file tree
Showing 15 changed files with 689 additions and 0 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
.env
pbp-doc/site/
output*/
cloud_tmp*/
NRS11/
Expand Down
5 changes: 5 additions & 0 deletions .mbaridoc.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"destination": "pbp",
"docdir": "pbp-doc",
"public": true
}
23 changes: 23 additions & 0 deletions pbp-doc/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# README

This directory contains the sources for documenting the use of
[`mbari-org/pbp`](https://pypi.org/project/mbari-pbp/).

Merging changes in this directory into the main branch in the remote repo
will automatically trigger the update of the generated site at
<https://docs.mbari.org/pbp/>.

### Local doc development

The following commands assume `pbp-doc` is the current directory.

One-off setup:
```bash
just setup
```

Then:
```bash
just serve
```
and open the indicated URL in your browser.
33 changes: 33 additions & 0 deletions pbp-doc/docs/extra/extra.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
@import url(https://fonts.googleapis.com/css?family=Merriweather:400,300);
@import url(https://docs.mbari.org/css/iosevka-custom/iosevka-custom.css);

body {
font-family: 'Merriweather', serif;
font-weight: 300;
}

code, tt {
font-family: 'Iosevka', 'Roboto Mono', monospace;
font-weight: 400;
font-variant-ligatures: none;
}

[data-md-color-scheme=slate] {
/* more legible links: */
--md-typeset-a-color: #a6c1f1;
}

.md-content {
/* so the content expands a bit, mainly for `program --help` outputs */
min-width: unset;
}

/* restrict block width to that of container */
.md-typeset pre code {
max-width: 100%;
display: inline-block;
white-space: pre-wrap;
overflow-x: scroll;
word-wrap: break-word;
padding: 1rem;
}
12 changes: 12 additions & 0 deletions pbp-doc/docs/extra/refresh_on_toggle_dark_light.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
var paletteSwitcher1 = document.getElementById("__palette_1");
var paletteSwitcher2 = document.getElementById("__palette_2");

paletteSwitcher1.addEventListener("change", function () {
console.debug('change paletteSwitcher1=', paletteSwitcher1)
location.reload();
});

paletteSwitcher2.addEventListener("change", function () {
console.debug('change paletteSwitcher2=', paletteSwitcher2)
location.reload();
});
Binary file added pbp-doc/docs/img/NRS11_20200101.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
93 changes: 93 additions & 0 deletions pbp-doc/docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
---
description: Process ocean audio data archives to daily analysis products of hybrid millidecade spectra using PyPAM.
---

!!! note "WIP"
Thanks for your interest in PBP. This documentation is still a work in progress :construction:.
Please get in touch if you have any questions or suggestions.

# MBARI PBP

The [`mbari-pbp`](https://pypi.org/project/mbari-pbp/) package allows to
process ocean audio data archives to daily analysis products of hybrid millidecade spectra using
[PyPAM](https://github.com/lifewatch/pypam/).

You can use PBP by directly running the included CLI the programs,
as well as a dependency in your own Python code.

**Features**:

- [x] Audio metadata extraction for managed timekeeping
- [x] Start and duration of recognized wav and flac sound files either locally or in cloud (JSON)
- [x] Coverage plot of sound recordings
- [x] Audio file processing
- [x] Frequency and psd array output
- [x] Concatenation of processed 1-minute segments for daily product
- [x] Calibration with given sensitivity file (NetCDF), or flat sensitivity value
- [x] Data products
- [x] NetCDF with metadata
- [x] Summary plot
- [x] Cloud processing
- [x] Inputs can be downloaded from and uploaded to S3
- [x] Inputs can be downloaded from public GCS bucket
- [ ] Outputs can be uploaded to GCS

## Installation

On your environment the only requirement is Python 3.9, 3.10, or 3.11.[^1]
Make sure your Python installation includes the `pip` and `venv` modules,
or install them separately as needed.

You can run `python3 --version` to check the version of Python installed.

[^1]: As currently [required by PyPAM](https://github.com/lifewatch/pypam/blob/29e82f0c5c6ce43b457d76963cb9d82392740654/pyproject.toml#L16).

As a general practice, it is recommended to use a virtual environment for the installation.
```shell
python3.9 -m venv virtenv
source virtenv/bin/activate
```

Install the package:
```shell
pip install mbari-pbp
```

!!! note ""
If you are upgrading from a previous version, you can use the following command:
```shell
pip install --upgrade mbari-pbp
```

## Advanced Installation

If you want to install the package from source and have already installed with the `pip install mbari-pbp` command,
you can install the package from source with the following command. This will get the latest version :construction: from the main branch.

```shell
pip install --no-cache-dir --force-reinstall git+https://github.com/mbari-org/pbp.git
```

## Programs

The package includes the following CLI programs:

| Program | Description |
|---------------------------------|------------------------------------------------|
| [`pbp-meta-gen`](pbp-meta-gen/) | Generate JSON files with audio metadata. |
| [`pbp-hmb-gen`](pbp-hmb-gen/) | Main HMB generation program. |
| [`pbp-cloud`](pbp-cloud/) | Program for cloud based processing. |
| [`pbp-hmb-plot`](pbp-hmb-plot/) | Utility program to plot resulting HMB product. |


## References

- PyPAM - Python tool for Passive Acoustic Monitoring –
<https://doi.org/10.5281/zenodo.6044593>
- Computation of single-sided mean-square sound pressure spectral density with 1 Hz resolution follows
ISO 18405 3.1.3.13 (International Standard ISO 18405:2017(E), Underwater Acoustics – Terminology. Geneva: ISO)
https://www.iso.org/standard/62406.html
- Hybrid millidecade spectra: A practical format for exchange of long-term ambient sound data –
<https://asa.scitation.org/doi/10.1121/10.0003324>
- Erratum: Hybrid millidecade spectra –
<https://asa.scitation.org/doi/10.1121/10.0005818>
6 changes: 6 additions & 0 deletions pbp-doc/docs/notebooks/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
!!! note
This is a placeholder for documenting the use of PBP in notebooks.

# Notebooks

- [PBP-NRS11-batch.ipynb](https://colab.research.google.com/drive/1RaFVZzdRt88gY1SR_J34XMdRLgBjEdI-)
77 changes: 77 additions & 0 deletions pbp-doc/docs/pbp-cloud/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
!!! note
This is a placeholder for the documentation of the `pbp-cloud` command-line program.

# Processing in the cloud


## The `pbp-cloud` program
TODO: proper description of the `pbp-cloud` program.

For now, the following directly adapted from the source code:

----

TODO Adjustments for GCS as the program is still only focused on S3.


By cloud based processing we basically mean the ability
to get input files (json and wav) from S3 and write output files to S3.

All program parameters are to be passed via environment variables:

- `DATE`: (Required)
The date to process. Format: "YYYYMMDD".
- `S3_JSON_BUCKET_PREFIX`: (Optional)
Bucket prefix to be used to locate the YYYYMMDD.json file
By default, `s3://pacific-sound-metadata/256khz`.
- `S3_OUTPUT_BUCKET`: (Optional)
The bucket to write the generated output to.
Typically, this is to be provided, but it is optional to facilitate testing.
- `OUTPUT_PREFIX`: (Optional)
Output filename prefix. By default, `milli_psd_`.
The resulting file will be named as `<OUTPUT_PREFIX><DATE>.nc`.
- `GLOBAL_ATTRS_URI`: (Optional)
URI of JSON file with global attributes to be added to the NetCDF file.
- `VARIABLE_ATTRS_URI`: (Optional)
URI of JSON file with attributes to associate with the variables in the NetCDF file.
- `VOLTAGE_MULTIPLIER`: (Optional)
Applied on the loaded signal.
- `SENSITIVITY_NETCDF_URI`: (Optional)
URI of sensitivity NetCDF file that should be used to calibrate the result.
- `SENSITIVITY_FLAT_VALUE`: (Optional)
Flat sensitivity value to be used for calibration
if `SENSITIVITY_NETCDF_URI` is not given.
- `SUBSET_TO`: (Required) Format: `lower,upper`.
Subset the resulting PSD to `[lower, upper)`, in terms of central frequency.

TODO: retrieve sensitivity information using PyHydrophone when none
of the `SENSITIVITY_*` environment variables above are given.

Mainly for testing purposes, also these environment variables are inspected:

- `CLOUD_TMP_DIR`: (Optional)
Local workspace for downloads and for generated files to be uploaded.
By default, `cloud_tmp`.

- `MAX_SEGMENTS`: (Optional)
0, the default, means no restriction, that is, all segments for each day
will be processed.

- `ASSUME_DOWNLOADED_FILES`: (Optional)
If "yes", then if any destination file for a download exists,
it is assumed downloaded already.
The default is that downloads are always performed.

- `RETAIN_DOWNLOADED_FILES`: (Optional)
If "yes", do not remove any downloaded files after use.
The default is that any downloaded file is removed after use.


## Running on AWS

TODO: Describe how to run the program on AWS.

## Running on GCP

TODO: Describe how to run the program on GCP.
82 changes: 82 additions & 0 deletions pbp-doc/docs/pbp-hmb-gen/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
!!! danger "WIP"

# HMB Generation

`pbp-hmb-gen` is the main program for generating the HMB product.
It processes ocean audio data archives to daily analysis products of hybrid millidecade spectra using PyPAM.

The program accepts several options.
A typical use mainly involves the following:

| Option | To indicate |
| ----------------- |--------------- |
| `--json-base-dir` | base directory for JSON files |
| `--date` | date to be processed |
| `--global-attrs` | URI of a YAML file with global attributes to be added to the NetCDF file |
| `--variable-attrs`| URI of a YAML file with attributes to associate with the variables in the NetCDF file |
| `--output-dir` | output directory |
| `--output-prefix` | output filename prefix |
| `--subset-to` | subset of the resulting PSD in terms of central frequency |

Also, the following depending on the recorder:

| Option | To indicate |
| ------------------------ |--------------- |
| `--voltage-multiplier` | applied on the loaded signal |
| `--sensitivity-uri` | URI of sensitivity NetCDF for calibration of result |
| `--sensitivity-flat-value`| flat sensitivity value to be used for calibration |


## Usage

```shell
$ pbp-hmb-gen --help
```
```text
usage: pbp-hmb-gen [-h] [--version] --json-base-dir dir [--audio-base-dir dir] [--global-attrs uri] [--set-global-attr key value] [--variable-attrs uri]
[--audio-path-map-prefix from~to] [--audio-path-prefix dir] --date YYYYMMDD [--voltage-multiplier value] [--sensitivity-uri file]
[--sensitivity-flat-value value] --output-dir dir [--output-prefix prefix] [--s3] [--s3-unsigned] [--gs] [--download-dir dir] [--assume-downloaded-files]
[--retain-downloaded-files] [--max-segments num] [--subset-to lower upper]
Process ocean audio data archives to daily analysis products of hybrid millidecade spectra using PyPAM.
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--json-base-dir dir JSON base directory
--audio-base-dir dir Audio base directory. By default, none
--global-attrs uri URI of JSON file with global attributes to be added to the NetCDF file.
--set-global-attr key value
Replace {{key}} with the given value for every occurrence of {{key}} in the global attrs file.
--variable-attrs uri URI of JSON file with attributes to associate to the variables in the NetCDF file.
--audio-path-map-prefix from~to
Prefix mapping to get actual audio uri to be used. Example: 's3://pacific-sound-256khz-2022~file:///PAM_Archive/2022'.
--audio-path-prefix dir
Ad hoc path prefix for sound file location, for example, /Volumes. By default, no prefix applied.
--date YYYYMMDD The date to be processed.
--voltage-multiplier value
Applied on the loaded signal.
--sensitivity-uri file
URI of sensitivity NetCDF for calibration of result. Has precedence over --sensitivity-flat-value.
--sensitivity-flat-value value
Flat sensitivity value to be used for calibration.
--output-dir dir Output directory
--output-prefix prefix
Output filename prefix
--s3 s3 access is involved, possibly with required credentials.
--s3-unsigned s3 access is involved, not requiring credentials.
--download-dir dir Directory for any downloads (e.g., when s3 or gs is involved).
--assume-downloaded-files
If any destination file for a download exists, assume it was downloaded already.
--retain-downloaded-files
Do not remove any downloaded files after use.
--max-segments num Test convenience: limit number of segments to process. By default, 0 (no limit).
--subset-to lower upper
Subset the resulting PSD to [lower, upper), in terms of central frequency.
Examples:
pbp-hmb-gen --json-base-dir=tests/json \
--audio-base-dir=tests/wav \
--date=20220902 \
--output-dir=output
```
Loading

0 comments on commit ffe712f

Please sign in to comment.