Skip to content
Julien Capul edited this page Sep 3, 2019 · 16 revisions

Welcome to the pdwfs wiki!

Ideas

  • implement I/O libc calls to allow removing an entire directory (i.e. have "rm -r directory" works): the -at calls family, unlinkat, newstatat, etc,
  • refactor the inode layer and the handling of directories in pdwfs to remove the metadata bottleneck associated with registering in a single key on a unique Redis instance the inodes belonging to a directory, e.g. instead each Redis instance maintains the directory metadata for the stripes it contains, alternative could be to deactivate it by an option in case a workflow does not need it (no operations on directory like "rm -r directory"),
  • implement different data distribution strategy among the Redis instances for better locality awareness: per stripe, per file, per producer,
  • current data distribution strategy is based on hashing the stripe key which leads to some imbalance in the distribution, better balancing would be to take the stripe number and apply the modulo of the number of instances,
  • writers for asynchronous I/O (software burst buffer), processes running next to Redis instances that write new Redis file data on filesystem (written in Go ? C++ ? use MPI-IO to write efficiently ?)
  • tool to perform fast insertion of files into Redis instances, eg files that are already present in the directory intercepted could be pre-loaded (inserted) in Redis in order to be available in pdwfs (using redis-cli --pipe feature, see Redis documentation on mass insertion)
  • SLURM plugin to setup and teardown the Redis servers infrastructure (possibly with writers to have a software burst buffer),
  • I/O operation stats gathering (like Selfie) in Redis instances, dump stats in file or database on teardown
  • try aternative Redis-protocol-compatible database (e.g. ARDB + RocksDB, the latter being adapted for fast storage system like SSD, Optane?)
  • re-coded alternative to Redis adapted to pdwfs (e.g. https://github.com/JCapul/kvdroid), feature: binary protocol instead of ascii protocol, use low-level communication protocols for high-speed network fabric (e.g. OFI libfabric) instead of TCP, use go-native NATS for networking,
  • redis-protocol proxy server for a hierarchical/tiered data transfer infrastructure (to cope with hero runs with +100,000 clients)
  • secured data transfer infrastructure with spiped (recommended in Redis documentation)
  • read-cache mode: Redis instances used as read cache for files on FS, 2 options: files are read by the application from the filesystem and sent in Redis for caching or a dedicated reader process receives read notifications from a channel and push file data in Redis instances
  • gather stats from Redis instances (max rss, clients, keys, etc) into the central Redis instance and dump in a file on teardown (or make them available in a small webapp),
  • refactor the C layer as a "stackable" plugin system to facilitate hooking up custom implementations of I/O calls, the "libc" and "redisfs" being two different implementations, other possible implementations are: "/dev/null" to drop everything (like a dry run),
  • integration/coupling with Darshan I/O characterization tool
  • implement optional producer-consumer pattern with blocking open call on consumer side (waiting for file to be produced and completed by producers) and automatic file deletion when all declared consumers are finished with the file, the pattern was implemented in the first version of PaDaWAn in Python
  • add the possibility to intercept libc I/O calls by using the wrapping feature of the linker (ld --wrap=) instead of LD_PRELOAD, see this article
Clone this wiki locally