Failure to complete a backup

Setup:
- Win2K8 w/ Acrosync 1.0 backing up folder with two (large) files to NAS4Free 9.2.0.1 (latest) box in daemon mode.
- Acrosync is set to backup the files weekly, no SSH
- There are other (five total) backups setup in Acrosync, each at different interval. All of these run to completion without problems

Problem:
On multiple occasions, the backup of the folder with the two large file gets stuck and never completes.  As a result all the other jobs (some run hourly) are blocked from running and fill up the queue.

Symptoms:
1) On Windows:
- Acrosync runs at about 80% of a single thread of one core and acquires hours of CPU time.
- There is no I/O activity
- The job can't be stopped, and neither can Acrosync be stopped.  The only way to get out of this state is to force crash Acrosync through the Task Manager
- When Acrosync is restarted and the failed job is run again it will get stuck again, again on the second file and in roughly the same place while creating a new temporary file on the NAS box... not good.
- The only way out of this is to delete all files on the NAS and have Acrosync copy them anew. Note: removing just the partial files on the NAS box makes no difference.
- The first backup runs beautifully (Acrosync is fast!)... the failure occurs on the following runs, i.e. when trying to update the files.  BTW, the files are VM disk images... they never change in size. 

2) On NAS side:
- No CPU nor I/O activity
- Partial rsync file(s... one for each run) left behind.  All of those are from the second file to be backed up, all are about 90 GB (of about 290 GB) in size.

So, there are multiple issues here:
1 - The failure to backup the files.
2 - The partial files left on NAS
3 - The blocking of other jobs.  BTW, if a job is to run every hour, and another job runs long and is blocking the hourly job from running, Acrosync should not keep piling up multiple copies of the hourly job in the queue.


Thanks

Comments

  • Found a potential bug during code review which would make upload extremely slow because of redundant MD5 calculations under certain circumstances.  Not sure if this is exactly the issue you were experiencing, but it may be worth to try this new build:



    Let me know if this is the fix.

    Thanks,
    Gang
  • The previous fix has no effect due to a mistake.  I uploaded a new build to overwrite the previous build.  Please reinstall it using the same links (you should see a build number of 400 instead of 399 in the log).


  • edited November 2014
    Here are the results;
    On the Windows side:

     Directory of V:\VMs\Tried

    2014/08/22  19:04    <DIR>          .
    2014/08/22  19:04    <DIR>          ..
    2014/11/09  11:30   149,069,758,464 Tried_1.vdi
    2014/11/09  11:30   277,922,971,648 Tried_2.vdi
                   2 File(s) 426,992,730,112 bytes

    - On the NAS side:

    VMs/Tried:
    total 551133006
    -rw-------  1 files  wheel   37578604544 Nov  8 23:45 .Tried_1.vdi.6XNjpq
    -rw-------  1 files  wheel   60907323392 Nov  9 14:00 .Tried_1.vdi.G2861J
    -rw-------  1 files  wheel   38053478400 Nov  8 22:36 .Tried_1.vdi.Uz2F7W
    -rw-------  1 files  wheel   38168952832 Nov  8 21:27 .Tried_1.vdi.j9JlhB
    -rw-------  1 files  wheel   38268567552 Nov  8 20:15 .Tried_1.vdi.p8uLnZ
    -rw-r--r--  1 files  wheel  149069758464 Nov  7 18:37 Tried_1.vdi
    -rw-r--r--  1 files  wheel  277922971648 Nov  7 18:37 Tried_2.vdi

    rsync --version
    rsync  version 3.1.0  protocol version 31
    Copyright (C) 1996-2013 by Andrew Tridgell, Wayne Davison, and others.
    Web site: http://rsync.samba.org/
    Capabilities:
        64-bit files, 32-bit inums, 64-bit timestamps, 64-bit long ints,
        socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace,
        append, ACLs, xattrs, iconv, symtimes, no prealloc


    - Acrosync log content:
    2014/11/08 18:56:30    Acrosync for windows 1.1 build 400 initialized
    2014/11/08 19:05:00    Performing a full backup of V:/VMs/
    2014/11/08 19:05:00    Uploading V:/VMs/ to VMs/
    2014/11/08 19:05:01    rsync command: rsync --server --modify-window=2 --out-format=%n --links --recursive -tude. . VMs/
    2014/11/08 20:05:00    Performing a full backup of V:/VMs/
    2014/11/08
    20:13:52    The rsync channel has been blocked for more than 120
    seconds (writeAll:-1:1:1:1:0:64000:4:64000:0:9029701:0:1)
    2014/11/08 20:13:52    Uploading V:/VMs/ to VMs/
    2014/11/08 20:13:52    rsync command: rsync --server --modify-window=2 --out-format=%n --links --recursive -tude. . VMs/
    2014/11/08 21:05:00    Performing a full backup of V:/VMs/
    2014/11/08
    21:25:11    The rsync channel has been blocked for more than 120
    seconds (writeAll:-1:1:1:1:0:64000:4:64000:0:8964167:0:1)
    2014/11/08 21:25:11    Uploading V:/VMs/ to VMs/
    2014/11/08 21:25:11    rsync command: rsync --server --modify-window=2 --out-format=%n --links --recursive -tude. . VMs/
    2014/11/08 22:05:00    Performing a full backup of V:/VMs/
    2014/11/08
    22:35:10    The rsync channel has been blocked for more than 120
    seconds (writeAll:-1:1:1:1:0:64000:4:64000:0:8833109:0:1)
    2014/11/08 22:35:10    Uploading V:/VMs/ to VMs/
    2014/11/08 22:35:10    rsync command: rsync --server --modify-window=2 --out-format=%n --links --recursive -tude. . VMs/
    2014/11/08 23:05:00    Performing a full backup of V:/VMs/
    2014/11/08
    23:44:58    The rsync channel has been blocked for more than 120
    seconds (writeAll:-1:1:1:1:0:64000:4:64000:0:8898638:0:1)
    2014/11/09 00:05:00    Performing a full backup of V:/VMs/
    2014/11/09 01:05:00    Performing a full backup of V:/VMs/
    2014/11/09 02:05:00    Performing a full backup of V:/VMs/
    2014/11/09 03:05:00    Performing a full backup of V:/VMs/
    2014/11/09 04:05:00    Performing a full backup of V:/VMs/
    2014/11/09 05:05:00    Performing a full backup of V:/VMs/
    2014/11/09 06:05:00    Performing a full backup of V:/VMs/
    2014/11/09 07:05:00    Performing a full backup of V:/VMs/
    2014/11/09 08:05:00    Performing a full backup of V:/VMs/
    2014/11/09 09:05:00    Performing a full backup of V:/VMs/
    2014/11/09 10:05:00    Performing a full backup of V:/VMs/
    2014/11/09 11:05:00    Performing a full backup of V:/VMs/
    2014/11/09 12:05:00    Performing a full backup of V:/VMs/
    2014/11/09 12:05:01    Uploading V:/VMs/ to VMs/
    2014/11/09 12:05:01    rsync command: rsync --server --modify-window=2 --out-format=%n --links --recursive -tude. . VMs/
    2014/11/09 13:05:00    Performing a full backup of V:/VMs/
    2014/11/09
    13:58:43    The rsync channel has been blocked for more than 120
    seconds (writeAll:-1:1:1:1:0:64000:956:64000:0:15123893:0:1)
    2014/11/09 14:05:00    Performing a full backup of V:/VMs/
    2014/11/09 15:05:00    Performing a full backup of V:/VMs/
    2014/11/09 16:05:00    Performing a full backup of V:/VMs/
    2014/11/09 17:05:00    Performing a full backup of V:/VMs/
    2014/11/09 18:05:00    Performing a full backup of V:/VMs/
    2014/11/09 19:05:00    Performing a full backup of V:/VMs/
    2014/11/09 20:05:00    Performing a full backup of V:/VMs/

    Misc comments:
    -
    Acrosync pops up the "rsync channel has been blocked..." message window
    and the sync job keeps on going.  When the message is acknowledged, the
    sync job quits as well.
    - This version of the program gets tripped up already on the first file, so neither file gets properly updated

    Hope this helps you to determine what's going on
  • Uploaded build 411 (available at the same links) with the following 2 changes:

    1) Change the channel blocking time from 120 seconds to 10 minutes.  This anti-deadlock mechanism was added in early development stage to guard against potential bugs in ssh transfer layer.  Now that the implementation has been much more mature, it becomes more or less useless and will only give false alarms.  In this case the server may be just too busy copying contents from the original file to the temporary.

    2)  Add an 'in-place' option which should be enabled for uploading huge files like VM disk images.  I believe this change alone should allow the backup to run to completion and get rid of partial files at the same time.

    Please let me know if this new build works.
  • edited November 2014
    The 'in-place' option seems to have done the trick.  Here's the pertinent part of the log"
    2014/11/10 17:00:00    Performing a full backup of V:/VMs/
    2014/11/10 17:00:00    Uploading V:/VMs/ to VMs/
    2014/11/10 17:00:00    rsync command: rsync --server --modify-window=2 --out-format=%n --links --recursive --inplace -tude. . VMs/
    2014/11/10 18:00:00    Performing a full backup of V:/VMs/
    2014/11/10 18:57:31    Uploaded Tried/Tried_2.vdi (277922971648/166817332)
    2014/11/10 18:57:55    Attempting to upload 'Tried/Tried_2.vdi' again
    2014/11/10 19:00:00    Performing a full backup of V:/VMs/
    2014/11/10 20:00:00    Performing a full backup of V:/VMs/
    2014/11/10 20:47:49    Uploaded Tried/Tried_2.vdi (277922971648/15690608)
    2014/11/10 20:48:08    Uploaded 2 files; total 426992730112 bytes, skipped 149069758464 bytes, sent 277922971648/182507940 bytes
    2014/11/10 20:48:08    Uploaded V:/VMs/: uploaded 2 files
    2014/11/10 20:48:08    Uploading V:/VMs/ to VMs/
    2014/11/10 20:48:08    rsync command: rsync --server --modify-window=2 --out-format=%n --links --recursive --inplace -tude. . VMs/
    2014/11/10 20:48:09    Uploaded 0 files; total 426992730112 bytes, skipped 426992730112 bytes, sent 0/0 bytes
    ...

    I'm going to stress-test in the next few days, but it looks like this is ready to be deployed.

    Thanks
  • > 2014/11/10 18:57:31    Uploaded Tried/Tried_2.vdi (277922971648/166817332)
    > 2014/11/10 18:57:55    Attempting to upload 'Tried/Tried_2.vdi' again

    This is the first time I see repeated file transfer.  For some reason the MD5 checksums weren't matched so rsync on the server asked for a re-transfer.  Fortunately the second time it succeeded.

    > 2014/11/10 20:47:49    Uploaded Tried/Tried_2.vdi (277922971648/15690608)

    The first number is the file size.   The second number is the actually bytes having been transferred.  So there were only 15M new bytes but the sync took 50 minutes.  This makes me wonder that maybe rsync isn't the best solution for backing up VM images, because it can't dissect the disk format to determine which parts have been modified. 
Sign In or Register to comment.