-
Notifications
You must be signed in to change notification settings - Fork 95
Active Deletion of Small Objects
Original pull request: https://github.com/basho/riak_cs/pull/1174
In case where object size tend to be small in distribution objects could be deleted on the fly.
The tradeoff here is in the efficiency of garbage collection (+disk reclaim speed) and delete request latency. Instead of moving manifests to GC bucket, this skips garbage collection and deletes blocks directly. This makes disk space reclaim more aggressive. This is useful especially when objects are rather small like less than one megabyte and garbage collector is not well catching up the pace of object deletions.
To use this feature, turn this on via riak-cs.conf
like this:
active_delete_threshold = 1mb
Restart required in this case. With this, objects with its content-size smaller than 1049576 bytes are to be deleted when DELETE Object
or DELETE Multiple Objects
API is requested.
Or if you do not want to stop the CS node, attaching to the node by riak-cs attach
and hit following command:
> Threshold = 1024*1024.
> application:set_env(riak_cs, active_delete_threshold, Threshold).
Concurrent reads may be bothered by block deletion especially if the threshold is larger as to cover multiple blocks under single object, resulting in read stall or unreasonable connection close. This is because leeway period is not considered and delete of blocks happen immediately.
If active block deletion is enabled in replication-enabled cluster,
- Make sure block tombstones are being replicated in realtime: not having a line
{replicate_cs_block_tombstone, false}
inadvanced.config
of Riak configuration. - If block tombstones dropped at RTQ, there could happen blocks leaked in sink side. This is because a corresponding object manifest will be erased and replicated to sink cluster.
- To handle old manifests resides in fallback nodes and returned by handoff, manifests are to be kept in history so as not to let them resurrect.
Riak CS has (un)official toolkit to find inconsistent block and manifests. Refer to documentation for usages and further information.
From riak_cs_gc.erl
,
%% We do synchronous delete after it is marked
%% pending_delete, to reduce the possibility where
%% concurrent requests find active manifest (UUID) and
%% go find deleted blocks resulting notfound stuff.
%% However, there are still corner cases where
%% concurrent requests interleaves between marking
%% pending_delete here and deleting blocks, like:
%%
%% 1. Request A refers to a manifest finding active UUID x
%% 2. Request B deletes an object marking active UUID x as pending_delete
%% 3. Request B deletes blocks of UUID x according to this synchronous delete -> ok
%% 4. Request A refers to blocks pointed by UUID x -> notfound
%%
%% Manifests with blocks deleted here, have
%% `scheduled_delete' state here. They won't be
%% collected by garbage collector, as they are not
%% stored in GC bucket. Instead they will be collected
%% in `riak_cs_manifest_utils:prune/1' invoked via GET
%% object, after leeway period has passed.
maybe_delete_small_objects(PDManifests0, RcPid, Threshold);