title
Products            Buy            Support Forum            Professional            About            Codec Central
 

AccurateRip - Future Direction

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Spoon
    Administrator
    • Apr 2002
    • 44510

    AccurateRip - Future Direction

    It has been brought to my attention that the CRC used in AccurateRip is not doing its job propperly, in laymans terms the Right Channel rolls out of the CRC Calculation every 1.5 seconds (that is 1st sample right channel is used 100%, by the 65535 sample it is not used, 65536 sample it is used 100% again, this repeats over and over). It is estimated that effectively 3% of the data is not getting into the CRC (at a 97% coverage, I stand behind AccurateRip @ 97% is better than most (? all) c2 implementations). Going back over the early AccurateRip code it seems the design of the CRC is fine, just the implementation (L and R channels were supposed to go in seperately, but were optimized to both go in without bringing down the upper 32 bits).

    Steve will post his findings in detail on his discovery.

    It is a relatively easy fix (detailled below), however this presents an opportunity, which was not around when AccurateRip was first implemented (the understanding of different CD pressings and how they were implemented was almost non-existing).

    ----------------------------
    1. Fix: Fix the algorithm so all the data is used, both new and old CRC are calculated, new checked first, old second (with less Accuracy). New submissions would effectively appear as different pressings in the database.
    ----------------------------
    2. Fix : Change the CRC algorithm to something like CRC32, the reason it was not used in the first place, was tracks 2 to x-1 would match the CRC presented in EAC, but 1 and last would never, causing confusion, the CRC could be XOR'd to avoid this confusion.
    ----------------------------
    3. Fix & Additional Development: Use CRC32 and the old CRC (there is lots of data in the database), new CRC32 would go into a parallel 2nd database, increasing the strength of the CRC to almost 64 bits (not taking into account the flaw). Back end there is little changes to be made, both databases are the same design.
    ----------------------------
    4. Fix & Additional Development: Use a different hash, MD5, sha-1, these would increase storage of the database by 5x (160bits of sha-1).
    ----------------------------
    5. Brainstorm a method of having a hash which would be resistant to pressings, yet still be feasable for a CD ripper to rip track rather than whole CD based (and not have the need to read outside of the track).
    ----------------------------
    6. ???

    Bear in mind the existing database before construction takes up some 14 GB.
    Spoon
    www.dbpoweramp.com
  • pls1
    dBpoweramp Enthusiast

    • Jan 2008
    • 91

    #2
    Re: AccurateRip - Future Direction

    Excellent. With Classical CD repacking/re-pressings bordering on random this would really help me since even when my CD is in accuraterip a vast majority of the time at least one track does not match but is labeled Secure.

    Since I'm paranoid, I have been testing re-rip on two different machines/ Plextor models and comparing the CRCs in those instances.

    Phil

    Comment

    • EliC
      dBpoweramp Guru

      • May 2004
      • 1175

      #3
      Re: AccurateRip - Future Direction

      Would it be time to look at hashing chunks of the tracks, so that it can be identified which chunk has errors? I realize this would increase the db size, but it would also allow more power for secure rippers to know what parts of the track are accurate and which parts are inaccurate.

      Comment

      • bhoar
        dBpoweramp Guru

        • Sep 2006
        • 1173

        #4
        Re: AccurateRip - Future Direction

        Some of the ideas sound good, others sound good but seem to carry a lot of negative ramifications. I don't envy you.

        Some other thoughts:

        1. Perhaps some of the suggested changes (or variants of them) could allow you to implements that trim or combine submission content more often: e.g. roll the submissions into single entries with counters if they fit certain criteria

        2. I like Eli's recommendation of generating and storing what I'd like to call "macro C2 pointers" (heh) to help the ripper decide which part of the track the error is more likely to be in.

        Of course, the question is: what problem does the solution solve. In addition, different implementations would address different problems: breaking a song into four "blocks" for smaller checksums would allow you to know which quadrant is broken (but shouldn't c2 errors, suspicious positions, etc. tell you that already?) or generating each block based on data from every fourth sample or even every fourth bit, interleaved instead (perhaps allowing for checking for unusual sample transitions or bit flipping to correct single bit errors...if either was ever a problem).

        But with the above mentioned current space requirements, the projected growth of AR could limit the feasibility.

        3. Re-examine the "real zero" for offsets based on the research that showed it to be different than originally assumed. I suggested re-zeroing AR last year but you gave two successful arguments why that shouldn't happen: basically, nothing to gain by moving the zero mark that number of frames...and you'd have to throw out all results generated from the current offset "zero". That last part may now happen due to the original post's stated reasons. The first part may still stand, of course, but it might be worth looking at again, just to be sure.

        4. "AccurateRip II: Electric Boogaloo!"

        -brendan

        PS -

        5. Brainstorm a method of having a hash which would be resistant to pressings, yet still be feasable for a CD ripper to rip track rather than whole CD based (and not have the need to read outside of the track).
        I think the goal and the restriction are mutually exclusive. If you can't "look" outside of a single track's boundaries, and that track's boundaries move together the same amount forward or backward over the same larger data set from pressing to pressing, then there's no way to generate a single hash or checksum for both pressings. If you relaxed that requirement by allowing the ripper to "start early" and "end late" when ripping tracks, perhaps 2x the largest "delta" seen between pressings, then that would allow you to do this. The first pressing submitted would serve as the master, matching but TOC-delta'd discs would have the same entry with an offset delta.

        (ignoring problems with disc begin/end overreads, of course, as well as limitations in the MMC command set for working directly with tracks - sounds like more fun)
        Last edited by bhoar; February 21, 2008, 08:39 PM.

        Comment

        • EliC
          dBpoweramp Guru

          • May 2004
          • 1175

          #5
          Re: AccurateRip - Future Direction

          Should we also be looking at discs as a whole? Not that I would suggest requiring people to rip all songs, but it would be nice to know more which may give better insight to the "different pressing" issue.

          I like the idea of confirming where the REAL ZERO is, and taking the opportunity to do it right.

          What about adding information from secure rippers as to if the track was ripped securely? Secure rips could be kept in the db even if there is only one rip of that pressing (though the pressing issue may go away), until it was shown to be inaccurate.

          I thought you had mentioned before that you would be able to compare different pressings by knowing the pressing offset?

          Comment

          • pls1
            dBpoweramp Enthusiast

            • Jan 2008
            • 91

            #6
            Re: AccurateRip - Future Direction

            Perhaps I 'm confused about this but I'm having two infrequent problems. The first is on CD's where all but one or two tracks has a match in accuraterip and the one or two tracks on the CD does NOT match accuraterip but listed as secure. These first rips have all been on Plextor 760 or 750 drives.

            Now usually, from knowing the classical music business, it seems to be mostly re-packaged CDs. But then I have had two different CDs of Madrigals where two of the middle 15 tracks show as not accuraterip and secure. But I know there has been only one pressing world wide. I've have now seen about 10 of these.

            More problematic is where the track shows as matching accuraterip but the log shows errors and a re-rip can generate a different sum. Again maybe about 10 tracks. I've been re-ripping on multiple different drives (now Plextor 230A drives) to get a clean rip consistent rip.

            While not statistically significant, due to my sample size of a few thousand tracks, I estimate these anomalies have occurred each roughly at about 1/4%.

            Perhaps I just don't really understand this and don't get me wrong, dbpoweramp is a great product. 99.75% automated quality confidence is nothing to complain about. However, I'm carefully monitoring full error logs and need to have a personal work-flow to clear these anomalies as I rip my collection.

            Phil

            Comment

            • Spoon
              Administrator
              • Apr 2002
              • 44510

              #7
              Re: AccurateRip - Future Direction

              (have a cup of coffee before reading this...)

              6. I think I have the solution! as it stands in the database for each track (forget pressings for the moment) is a track CRC (which has the flaw) and an offset finding CRC (which does not have the flaw).

              I will be talking about 2 databases, side by side, the existing database is DB1 and new is DB2

              [DB1] Work should be done in EAC and dBpoweramp ASAP to correct the flaw, each program should calulate 2 CRCs , the old one and the new one. Only the new one should be submitted once the fix is implemented. The old CRCs would in time be replaced by the new CRCs in the same database.

              [DB2] In addition a 2xCRC32's should be generated:

              [CRC1][..............CRC2............][CRC1]

              So CRC1 is the first say 5 frames and last 5 frames of the track, CRC2 is all the track. These 2 CRCs could be submitted to a 2nd database, where the CRC1 will go into the current offset finding slot, no changes on the backend! (apart from creating the 2nd database)

              Why do this? It would allow a match if your CD is a different pressing and not really in the database, no rolling CRCs are needed as the CRC from the existing database that is used to find offsets of drives can find the offset of the pressing and as long as it is < 5 frames +-, the pressing can be verified. It also has the benifit with track 1 (which currently is only calculated from 5 frames in) for any drive with a + offset it would have the correct CRC1, so all of track 1 could be verified in its entireity (not possible for the last track as majority of drives cannot overread).

              When I started AccurateRip the idea of pressings messing the audiodata was not known (to me), if you had 40 different pressings of the same CD (could be with worldwide releases over 10 years) that lowers the 1 in 4 billion of a working 32-CRC routine to 1:100 Million of the chance of a CRC clash, adding the 2nd CRC would boost CRC to 64 bits effectively. Then AccurateRip could return:

              Match using old CRC method,
              Partial pressing match (10 frames of the file missing)
              Match using CRC fix method (32 bit), in additon CRC32 match (on CRC1 and CRC2, so whole track)

              All that would need to be done is a method of showing which of the above to the end user.
              Changing to MD5 would mean the whole backend being rewritten, and there is about 30x more code on the backend - to keep the database clean from rouge data, such as drives configured with wrong offsets.
              Spoon
              www.dbpoweramp.com

              Comment

              • Steve_Gabriel

                • Feb 2008
                • 1

                #8
                Re: AccurateRip - Future Direction

                Spoon asked me to post details of the flaw in the Accurate Rip CRC to Hydrogen Audio, so there's a parallel thread going on there about this topic.



                I first contacted spoon privately about this bug because I thought it was quite a serious error. I want to thank him for inviting me to bring this up publicly.

                Accurate Rip as it now stands is blind to 3% of the possible single bit errors in file. Luckily a CD read error tends to scatter bad bits all over the frame and is very, very likely to be detected even by the current faulty algorithm. However, if there are errors that appear only in certain MSBs of the Right channel, they will not be detected.

                This is not nearly as bad as it sounds. If we assume the bit errors are randomly distributed, a big assumption, but not completely crazy since the file has been through 2 layers of error correction (C1 and C2) which tends to randomize errored output, then each additional bit error reduces the undetected probability by 97%.

                The crude formula for the probability that an error in your file is not detected is

                2 ^ -(5 * number_of_bits_wrong)

                This formula is accurate down to about 2 ^ -32, so if there are at least 6 bits wrong in the file, you've reached the full detection power of a 32 bit checkword.

                Comment

                • EliC
                  dBpoweramp Guru

                  • May 2004
                  • 1175

                  #9
                  Re: AccurateRip - Future Direction

                  Any new system should also be built from the ground up to be able to verify lossless rips after the fact, especially as more entries are added to AR2.

                  Comment

                  • funkyblue
                    dBpoweramp Enthusiast

                    • Oct 2007
                    • 62

                    #10
                    Re: AccurateRip - Future Direction

                    If there is going to be a new DB, what about changing the offset as well? Was there not some thoughts that the offset settings we use are off by 30?

                    Comment

                    • Spoon
                      Administrator
                      • Apr 2002
                      • 44510

                      #11
                      Re: AccurateRip - Future Direction

                      It would mean EAC and dBpoweramp would have to change, and would create such a confusion about offsets, not worth it.
                      Spoon
                      www.dbpoweramp.com

                      Comment

                      • funkyblue
                        dBpoweramp Enthusiast

                        • Oct 2007
                        • 62

                        #12
                        Re: AccurateRip - Future Direction

                        I forgot about the offset confusion But it could still be done since there will be a new database anyway.

                        Comment

                        • Porcus
                          dBpoweramp Guru

                          • Feb 2007
                          • 792

                          #13
                          Re: AccurateRip - Future Direction

                          Some thoughts, which may be technically nonsense (I am not sure if I have understood correctly how AccurateRip works)

                          1) on backwards compatibility:
                          Originally posted by Spoon
                          [DB1] Work should be done in EAC and dBpoweramp ASAP to correct the flaw, each program should calulate 2 CRCs , the old one and the new one. Only the new one should be submitted once the fix is implemented. The old CRCs would in time be replaced by the new CRCs in the same database.
                          Is it a good idea to only submit the new one? Wouldn't it be better to submit both, and score the new-CRC up or down according to the accuracy of the old-CRC for the same rip? This would enable you to give a better estimate of accuracy for tracks with one or few new-CRC submissions.

                          I think one should pay attention to how much time it has taken to populate the AR database. The physical CD format is in decline (which on the other hand might increase the need for secure ripping, as those of us who care about sound quality will might buy second hand collections ...) If the number of AR submission grows exponentially at a high rate, then maybe an AR2 database will be useful in short time. Just think of it.


                          2) offset issues
                          2a) a "check this file" feature?
                          I know it would be hard to prevent multiple submissions though, but if you are to consider an update of AccurateRip, it might be worth to have this in mind. Ideally one should be able to do so even for rips with incorrect offset: take a folder with n wavs, process k CRCs corresponding to offset (takes time, but on user's computer ...), find one which matches, and use this offset to check the other files in the folder. At least it would help to confirm a lonely AR entry, and if one has to "adjust for offset" in the file, then one knows that it is not the same rip as the one in the database.

                          2b) store offset used?
                          More generally, a "file ripped with offset x" datapoint in the AR base would certainly require some bits, but if two files ripped with different offsets would match, then one is safer; they are not multiple entries if the same rip, and AFAIK not the same drive or model.

                          2c) different pressings?
                          And then: is this a way of dealing with different pressings? Are different pressings usually bit-identical up to different offsets? (Hm, I suspect they would also differ by pressing-specific bit errors, hence a need for secure ripping?)



                          A suggestion: Would it be an idea to store users' AR entries and lookups locally? (Voluntarily, for privacy reasons.) Could prevent multi-submissions.


                          Originally posted by Spoon
                          Bear in mind the existing database before construction takes up some 14 GB.
                          Is that much? Not in terms of hard disc cost ...

                          Comment

                          • Fiber

                            • Jan 2008
                            • 3

                            #14
                            Re: AccurateRip - Future Direction

                            Originally posted by Porcus
                            Some thoughts, which may be technically nonsense (I am not sure if I have understood correctly how AccurateRip works)

                            1) on backwards compatibility:

                            Is it a good idea to only submit the new one? Wouldn't it be better to submit both, and score the new-CRC up or down according to the accuracy of the old-CRC for the same rip? This would enable you to give a better estimate of accuracy for tracks with one or few new-CRC submissions.

                            I think one should pay attention to how much time it has taken to populate the AR database. The physical CD format is in decline (which on the other hand might increase the need for secure ripping, as those of us who care about sound quality will might buy second hand collections ...) If the number of AR submission grows exponentially at a high rate, then maybe an AR2 database will be useful in short time. Just think of it.


                            2) offset issues
                            2a) a "check this file" feature?
                            I know it would be hard to prevent multiple submissions though, but if you are to consider an update of AccurateRip, it might be worth to have this in mind. Ideally one should be able to do so even for rips with incorrect offset: take a folder with n wavs, process k CRCs corresponding to offset (takes time, but on user's computer ...), find one which matches, and use this offset to check the other files in the folder. At least it would help to confirm a lonely AR entry, and if one has to "adjust for offset" in the file, then one knows that it is not the same rip as the one in the database.

                            2b) store offset used?
                            More generally, a "file ripped with offset x" datapoint in the AR base would certainly require some bits, but if two files ripped with different offsets would match, then one is safer; they are not multiple entries if the same rip, and AFAIK not the same drive or model.

                            2c) different pressings?
                            And then: is this a way of dealing with different pressings? Are different pressings usually bit-identical up to different offsets? (Hm, I suspect they would also differ by pressing-specific bit errors, hence a need for secure ripping?)



                            A suggestion: Would it be an idea to store users' AR entries and lookups locally? (Voluntarily, for privacy reasons.) Could prevent multi-submissions.




                            Is that much? Not in terms of hard disc cost ...
                            It's not the harddisk costs, it's the load of the server.

                            Comment

                            • Porcus
                              dBpoweramp Guru

                              • Feb 2007
                              • 792

                              #15
                              Re: AccurateRip - Future Direction

                              Originally posted by Fiber
                              It's not the harddisk costs, it's the load of the server.
                              That's why I asked

                              Comment

                              Working...

                              ]]>