-
Notifications
You must be signed in to change notification settings - Fork 62
v3_tuto_check
This tutorial explains how to setup a checksum policy to regularly check the contents of files and possibly detect data corruption.
The principle of this policy is to compute the checksum of files and store them in the robinhood database. Next time a file is checked, if the file doesn't seam to have changed (same mtime, same size), it should have the same checksum. If not, the file may be corrupted and its checksum status is marked as 'failed' in robinhood DB.
You can easily get a summary of the checksum status of filesystem entries, and list the entries for each status.
- Include 'check.inc' in your config file to define the 'checksum' policy:
%include "includes/check.inc"
- Specify your policy targets as fileclasses. Fileclass definitions must be based on rather static criteria (like owner, path, group...). They should NOT be based on time criteria (age of last access, last modification, etc.): time-based criteria will be specified in policy rules.
fileclass important_files { definition { type == file and name == "*.data" and tree == "/path/to/important_data" } }
- Define the following fileclass that will help you to define you policy rules:
fileclass never_checked { # never checked => last_check == 0. # 'output' stands for previous command stdout definition { checksum.last_check == 0 or checksum.output == "" } # don't display this fileclass in --classinfo reports. report = no; }
- Then specify checksum rules. In the following example, we run the initial checksum computation after 1day, and recheck entries weekly:
checksum_rules { # simple filters to optimize policy run ignore { last_check < 1d } ignore { last_mod < 1d }
rule initial_check { target_fileclass = never_checked; condition { last_mod > 1d } }
rule default { condition { last_mod > 1d and last_check > 7d } } }
If you plan to automatically trigger the cleanup policy (regularly, or when the filesystem is full) you need to define policy triggers.
You can trigger the cleanup policy in the following cases:
- When the filesystem is over a high threshold (in terms of space used or total number of entries). This is done by a global_usage trigger. The cleanup policy stops when the specified low threshold is reached.
cleanup_trigger { trigger_on = global_usage; high_threshold_pct = 80%; low_threshold_pct = 75%; check_interval = 15min; }
- In the case of Lustre, when a given OST usage is over a threshold. In this case, the cleanup policy will only apply to the entries on this OST. This is achieved by defining a ost_usage threshold. The cleanup policy stops when the specified low threshold is reached for the OST.
cleanup_trigger { trigger_on = ost_usage; high_threshold_pct = 80%; low_threshold_pct = 75%; check_interval = 15min; }
- When a user or group exceeds a given usage threshold (in volume or entries). This is done by user_usage and group_usage thresholds. In this case, the policy only applies to entries of the given user or group. The cleanup policy stops when the specified low threshold is reached for the user or group. This trigger can be optionally restricted to a given set of groups (see example below).
cleanup_trigger { trigger_on = group_usage(foo*,project01*); high_threshold_vol = 105TB; low_threshold_vol = 100TB; check_interval = 1d; }
- Run a daemon that regularly check the triggers and apply the policies when necessary:
- Check cleanup triggers and eventually apply the policy. The program exits after checking all policy triggers.
- Match policy rules for all entries. The program exits when the policy run is complete.
- Match policy rules for a subset of entries:
- Example 1: apply policy to user 'foo':
- Example 2: apply policy to fileclass 'small':
- Limit the number of deleted entries:
- Example 1: delete 100TB of data
- Example 2: delete 1000 files in OST 23
- Example 3: delete entries until the FS usage is 80%
To make robinhood daemon run the cleanup policy when you start robinhood service:
- Edit /etc/sysconfig/robinhood (or /etc/sysconfig/robinhood.''fsname'' on RHEL7)
- Append --run=cleanup to RBH_OPT (if the --run option is already present, add it to the list of run arguments, e.g. --run=policy1,cleanup).
- Start (or restart) robinhood service:
- On RHEL6: service robinhood restart
- On RHEL7: systemctl restart robinhood[@''fsname'']
Back to wiki home