Conversion to polling only - Issues

The client uses file events and snapshot differencing to avoid unnecessary hash calculations, accessing locked files and redundant server calls.
Converting to a single poll that compares server and local hashes raises the following issues:

The Poll algorithm can't detect moves and renames.

The Poll algorithm can't detect moves and renames, forcing the deletion of the source file and recreation of the target file

  • Mitigation: Use the server's Object ID instead of the file name to identify matching files. This means that after a file is uploaded, the server OID must be retrieved and stored in the local filestate database.
The Object ID can't help because :
  • Moves caused by other clients will delete the source file therefore discarding the Object ID.
  • It isn't possible to associate the Object ID with a file at the file system level in FAT which makes it useless for detecting local moves.

Renames may cause inconsistent changes visible to the users

The behavior of the algorithm in case of renames depends on the order the poll encounters the source and target file names.

If the target file is encountered first, the file data is synced causing duplicate files. When the source file is detected, it will be deleted. The time the duplicates will appear depends on the time it takes to process all other files between the source and target file.

If the source file is encountered first, the file is deleted and disappears from the local drive or server. When the target file is encountered, it is first hashed and then uploaded using a hashmap. Both the source and target files will disappear for as long as it takes to process all files between the source and target PLUS the time required to hash the target.

  • Mitigation: Perform hashing of all local files before syncing per file. This will reduce the delay between source and target file processing.
  • Mitigation: Use Object IDs to detect renames and execute MOVE operations on the server and client
  • Mitigation: Separate uploads,downloads and deletes. Perform deletes last. This way we only get duplicates on the server and other clients.

Renames of large files or folders may cause inconsistent changes to ripple to all clients that monitor the same file/folder.

Since a rename takes essentially as long as a hash of the renamed file and potentially an upload, there is a considerable time interval between the time a renamed file is deleted from the server and the target file appears.
This deletion can ripple to all clients that monitor the renamed file, causing deletions in all of them.

Conversely, the creation of duplicates can also ripple to all clients.

  • Mitigation: Perform hashing of all local files before syncing per file.
  • Mitigation: Use Object IDs to detect renames and execute MOVE operations on the server and client

Failure or disconnection of the initiating client during rippling will leave all other clients in an inconsistent state.

Only when the initiating client finishes changes will there be enough information on the server to allow other clients to reach an eventual stable state. If the changes are interrupted due to a crash, connectivity loss or system shutdown, the inconsistencies will remain.

Local file hashes will have to be calculated continually.

  • Mitigation: A possible mitigation is to calculate and store the MD5 hash during each poll, and compare it with the previously stored value before recalculating the more expensive Merkle hash.

Local files may be still in use during polling

Local files may be still in use during polling, especially if a large file or folder operation is in progress, resulting in expensive exceptions. There is no way to check if a file is locked without trying to open it first.

  • Mitigation: Use the FileIdleBatch class to wait until there are no other file operations. May result to a long delay if multiple small files are being modified.
  • Mitigation: Perform hash calculations in parallel, to allow other, unlocked files to proceed. Will result in high CPU and IO load
  • Mitigation: Use a queue of files to hash, putting any locked files back into the queue.

Polls that occur while the Pithos folder is in use will propagate unexpected changes to the server and clients

Polls that occur while a user is creating a new file or folder will see and upload the default "New File" or "New Folder" name. The file/folder will NOT be renamed until the next poll.

  • Mitigation: Use the FileIdleBatch class to wait until there are no other file operations

Polls will have to be infrequent

Polls will have to be infrequent by necessity, resulting in very slow propagation of changes from the local disk to server and back.
This exacerbates the problem of rippling inconsistencies