Hello everyone --
I am about to convert my CD collection to FLAC format, but before I fill my new 1TB hard drive with all these fresh digital tracks, I want to ensure that I first have a workable backup solution.
I already have FTP access to the server hosting my personal web site. Currently, this is through Powweb, and my monthly subscription fee (cheap) allows me 1.5 TB RAID storage, as well as 1.5 TB upload a month. I thus intend to send all my digital music files across the network to this storage area. Some people might find this over the top, but the idea of backing my files on a local (external) drive is totally unappealing to me: in case of fire or burglary, I could easily loose both the original files as well as the backup files. Thus this requirement to backup my files at a remote site, over the Internet.
There are synchronization programs that allow to backup local files on a FTP site -- for example, the excellent (and free) SyncBack utility (www.2brightsparks.com). The problem with these tools is that they backup the *whole* file again as soon as the file is slightly modified: a typo fix in the filename, a tag "Pop-Folk" changed to "Pop-Funk", etc. Long term, this looks like a really bad backup strategy, as I want to remain free to change the tags in the files as *often* as I want, and I want to be able to move the files around in my Music directory structure, without having to re-transfer hundreds of gigabytes of files across the Internet.
I was thus thinking about developing a utility that would better address these backup requirements for audio files. I'll explain you what I have in mind, and I will welcome your comments / suggestions. My idea is based on the fact that the bulk of a digital music file is mostly composed of a payload that will never change. Once the encoded audio data is written in the file, it will never change. Only information typically found in the file header and/or footer will change. The header and/or the footer may vary in length over time, but the "body" of the file will remain fixed for ever. I was thus thinking about writing a utility that would be intelligent enough to break files in either one, two or three chunks: header, body and footer. When it is detected that a file has changed (or when it is a new file), the utility would identify these chunks in the file, compute their respective length, and perform a MD5 (electronic signature) of each chunk. The utility would copy a chunk to the FTP site only if it was never copied in the past. At the end of the backup session, the utility would generate a backup summary file that would describe in details the music directory structure, with pointers to the various chunks composing the files. This backup summary file would be transfered to the FTP site as well, and it would be dated. Such backup summary files could be kept for 30 days, for example, before being flushed. This would allow the recovery of a music directory as of any specific day during that period. It could prevent, for example, to loose an important portion of an audio collection following an unconscious directory erasure.
This seems to be a good programming project for me, and if I am ever embarking on it, I would prefer if it could be used by other people who might also find it useful. Ideally, this kind of tool would be open source.
Comments / suggestions are welcome...!
JL
I am about to convert my CD collection to FLAC format, but before I fill my new 1TB hard drive with all these fresh digital tracks, I want to ensure that I first have a workable backup solution.
I already have FTP access to the server hosting my personal web site. Currently, this is through Powweb, and my monthly subscription fee (cheap) allows me 1.5 TB RAID storage, as well as 1.5 TB upload a month. I thus intend to send all my digital music files across the network to this storage area. Some people might find this over the top, but the idea of backing my files on a local (external) drive is totally unappealing to me: in case of fire or burglary, I could easily loose both the original files as well as the backup files. Thus this requirement to backup my files at a remote site, over the Internet.
There are synchronization programs that allow to backup local files on a FTP site -- for example, the excellent (and free) SyncBack utility (www.2brightsparks.com). The problem with these tools is that they backup the *whole* file again as soon as the file is slightly modified: a typo fix in the filename, a tag "Pop-Folk" changed to "Pop-Funk", etc. Long term, this looks like a really bad backup strategy, as I want to remain free to change the tags in the files as *often* as I want, and I want to be able to move the files around in my Music directory structure, without having to re-transfer hundreds of gigabytes of files across the Internet.
I was thus thinking about developing a utility that would better address these backup requirements for audio files. I'll explain you what I have in mind, and I will welcome your comments / suggestions. My idea is based on the fact that the bulk of a digital music file is mostly composed of a payload that will never change. Once the encoded audio data is written in the file, it will never change. Only information typically found in the file header and/or footer will change. The header and/or the footer may vary in length over time, but the "body" of the file will remain fixed for ever. I was thus thinking about writing a utility that would be intelligent enough to break files in either one, two or three chunks: header, body and footer. When it is detected that a file has changed (or when it is a new file), the utility would identify these chunks in the file, compute their respective length, and perform a MD5 (electronic signature) of each chunk. The utility would copy a chunk to the FTP site only if it was never copied in the past. At the end of the backup session, the utility would generate a backup summary file that would describe in details the music directory structure, with pointers to the various chunks composing the files. This backup summary file would be transfered to the FTP site as well, and it would be dated. Such backup summary files could be kept for 30 days, for example, before being flushed. This would allow the recovery of a music directory as of any specific day during that period. It could prevent, for example, to loose an important portion of an audio collection following an unconscious directory erasure.
This seems to be a good programming project for me, and if I am ever embarking on it, I would prefer if it could be used by other people who might also find it useful. Ideally, this kind of tool would be open source.
Comments / suggestions are welcome...!
JL