title
Products            Buy            Support Forum            Professional            About            Codec Central
 

Python scripts to improve my CD Ripper/mp3tag workflow

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • philiplu
    • Jan 2010
    • 9

    Python scripts to improve my CD Ripper/mp3tag workflow

    After a failed attempt at ripping my CD collection a few years ago, I recently tried again. This time, I've written about 1,500 lines of Python 3 code to make the job easier. These are scripts that analyze the tags across an album-worth of FLAC tracks, looking for errors that need correcting and renaming the files quickly when tags get changed. If you're interested, you can find them out at https://github.com/plucid/dBpa-scripts, along with a long README describing the scripts. Comments or suggestions welcome.

    Using the Run External DSP, the error-checking script gets run automatically after each CD is ripped. That tells me what cleanup I need to do in mp3tag. Here's an example of the output after ripping a random CD, after introducing some intentional problems with the tags in CD Ripper first (like turning on the compilation flag) and omitting some of the tracks so there's something to see:
    Code:
    Renaming 'cuesheet.cue' to 'Cowboy Junkies - The Trinity Session.cue'
    
    Checking 'D:\CDRip\Cowboy Junkies\The Trinity Session (1988)'
      Missing Tracks: 7, 8, 11
      Track 6 has duplicate value in tag 'composer': Margo Timmins, Michael Timmins, Michael Timmins
      Incompatible values for tags 'artist' and 'artist sort':
        Track 4: Tag 'artist sort' not found
      Incompatible values for tags 'composer' and 'composersort':
        Track 6: 'Margo Timmins; Michael Timmins; Michael Timmins' versus 'Timmins, Margo; Timmins, Michael'
        Track 12: 'Alan Block; Don Hecht' versus 'Block, Alan; Hecht, Donald'
      For this compilation, AlbumArtist should be 'Various Artists', not 'Cowboy Junkies'
    
    Processed 1 album, 1 disc, 9 tracks - 1 album with issues
    
    Press Enter when ready...
    Here's the current list of tests performed:
    • Test if the DiscNumber or TrackNumber tags are missing or malformed, or if the same Disc/Track settings are found in multiple files.
    • Test that the DiscTotal tag is identical across all files, and that all disc numbers from 1 to DiscTotal, and no others, are found in the FLAC files.
    • For multi-disc albums, test that tags which should be identical across discs are so.
    • Test non-FLAC files. Make sure the cover file folder.jpg exists, and that the cuesheet and extraction log files are present and have the expected names.
    • Test if obsolete forms of certain tags are present. CD Ripper used to use the tags Organization, TotalDiscs, and TotalTracks instead of the current Label, DiscTotal, and TrackTotal.
    • Test that the CD Ripper profile setting is Classical if and only if the Genre tag is also Classical.
    • Test that AccurateRip was successful in all tracks.
    • Test that tags which should be present in all tracks are so.
    • Test that no unknown tags were found.
    • Test that the only multivalued tags found were tags that permit multivalues (e.g. Artist or Soloists).
    • Test that the TrackTotal tag is identical across all files for a single disc, and that all track numbers from 1 to TrackTotal, and no others, are found in the disc's FLAC files.
    • Test that tags which should be identical across tracks of a single disc are so.
    • Test that tags which should have unique values in each track are so.
    • Test that multivalued tags don't repeat one of those values (e.g. an Artist tag of John Doe; John Doe).
    • Test that the regular and sorted version of paired tags (e.g. Artist and Artist Sort) have the same values, ignoring ordering.
    • Test if certain tags start with a leading 'The ', e.g. The Beatles should instead be Beatles, The.
    • For non-classical CDs where the Artist tag varies across tracks, the AlbumArtist tag should either be found in the Artist tag, or the AlbumArtist should be Various Artists, Soundtrack, or TV Theme.
    • If a disc is marked as a compilation (the Compilation tag exists), make sure the compilation status makes sense. For classical CDs, the 'Composer' tag should not be identical across tracks. If the Genre is Soundtrack, the AlbumArtist should be as well. Otherwise, the AlbumArtist should be either Various Artists or TV Theme.
    • For classical discs, test that the Orchestra tag is present if it looks like it should, because either an artist's name includes something like 'Orchestra' or 'Symphony', or there's a Conductor tag.


    There's also a script that duplicates the functionality of the Arrange Audio utility codec for Batch Converter, but it's lighter-weight since it's run from the command line, and it can move/copy all the files in an album folder, not just the track files and folder.jpg. I use it for renaming files and the album folder from within mp3tag, and for moving the cleaned-up files from my raw rip folders up to my NAS.

    These scripts assume my current naming and tagging scheme, so they'll need tweaking for use elsewhere. But if you're comfortable in Python, that should be fairly easy. This was my first real Python project, and it went together remarkably quickly.
  • BrodyBoy
    dBpoweramp Guru
    • Sep 2011
    • 754

    #2
    Re: Python scripts to improve my CD Ripper/mp3tag workflow

    Impressive work! I'm a little unclear, however, how it makes the job easier. It appears that you're replicating a lot of the automation that can be accomplished in dBp and/or mp3tag, just doing via external script. Unfortunately, none of these programs or scripts can really address the most basic metadata "error"....that the data obtained from the online databases is (too) often incomplete and inaccurate. That still requires human intervention and remains the most time-consuming aspect of ripping CDs.

    Comment

    • mville
      dBpoweramp Guru
      • Dec 2008
      • 4015

      #3
      Re: Python scripts to improve my CD Ripper/mp3tag workflow

      Originally posted by philiplu
      After a failed attempt at ripping my CD collection a few years ago, I recently tried again. This time, I've written about 1,500 lines of Python 3 code to make the job easier.
      Why re-invent the wheel? If you are using MP3Tag to cleanup your tags, why not use a combination of MP3Tag actions and convert menu commands to do the same job? I dare say this would have saved a lot of time and been much easier.

      ... BrodyBoy, you are just that bit quicker of the mark than I
      Last edited by mville; 06-06-2015, 01:19 AM. Reason: message to BrodyBoy

      Comment

      • philiplu
        • Jan 2010
        • 9

        #4
        Re: Python scripts to improve my CD Ripper/mp3tag workflow

        Originally posted by BrodyBoy
        Impressive work! I'm a little unclear, however, how it makes the job easier. It appears that you're replicating a lot of the automation that can be accomplished in dBp and/or mp3tag, just doing via external script. Unfortunately, none of these programs or scripts can really address the most basic metadata "error"....that the data obtained from the online databases is (too) often incomplete and inaccurate. That still requires human intervention and remains the most time-consuming aspect of ripping CDs.
        The meat of what I wrote is CheckFlacTags, which does a bunch of checks for tag consistency. That's what saves me time. I like to make sure that tags are consistent across tracks, and between related tags within tracks. For instance, I can use the metadata screen in CD Ripper and find that one of the DBs has the composer tags set up already, which is great. But then I want to make sure that the ComposerSort tags are compatible with those. The script checks that, and if it's OK, that's one less thing for me to waste time checking manually. I wrote this script because I already have to remember a bunch of steps with each CD ripped; if there are steps which can be automated to remove some of the scutwork, all better. Saves more time for the steps that do require a human (like verifying the titles/artists/composers in liner notes against what's in the DBs).

        I'm not using these scripts to clean up my tags; I'm using them to advise me which tags need cleaning, once I've already used the DB lookups and the like that dBpa and mp3tag make available. I looked into the scripting within mp3tag, but it's pretty rudimentary - no loops, apparently no way to programmatically compare across multiple track files. The nice thing for me about these scripts is that it's easy to add some new test - generally takes only a few minutes. And frankly, I was looking for a fun project to learn Python with, and this fit the bill.

        Once I had the tag info read in, I decided to write the other big script RearrangeAudioFiles, because I wanted more flexibility than I got from dBpa. The main problem was that the Arrange Audio utility codec, with the Folder Preserve DSP, wouldn't copy everything in the album folder, just certain files. And it wouldn't rename the audio extraction log and cuesheet the way I wanted. So it was easier for me to write that script as well.

        Now that I've spent the time writing this (really, it was only a couple weeks of free-time work), I figured I might as well throw it out somewhere. Who knows, maybe someone will find it useful

        Comment

        • BrodyBoy
          dBpoweramp Guru
          • Sep 2011
          • 754

          #5
          Re: Python scripts to improve my CD Ripper/mp3tag workflow

          Originally posted by philiplu
          The meat of what I wrote is CheckFlacTags, which does a bunch of checks for tag consistency. That's what saves me time. I like to make sure that tags are consistent across tracks, and between related tags within tracks. For instance, I can use the metadata screen in CD Ripper and find that one of the DBs has the composer tags set up already, which is great. But then I want to make sure that the ComposerSort tags are compatible with those. The script checks that, and if it's OK, that's one less thing for me to waste time checking manually. I wrote this script because I already have to remember a bunch of steps with each CD ripped; if there are steps which can be automated to remove some of the scutwork, all better. Saves more time for the steps that do require a human (like verifying the titles/artists/composers in liner notes against what's in the DBs).
          I hear ya....I'm pretty obsessive about imposing thoroughness and consistency in my tags, too. Once all the actual data is available, I want things like the -SORT tags set, both forms of the ALBUMARTIST tag, and a bunch of other details that are specific to album type or genre. I've come to think of this part as "tag grooming!"

          I'm not using these scripts to clean up my tags; I'm using them to advise me which tags need cleaning, once I've already used the DB lookups and the like that dBpa and mp3tag make available. I looked into the scripting within mp3tag, but it's pretty rudimentary - no loops, apparently no way to programmatically compare across multiple track files. The nice thing for me about these scripts is that it's easy to add some new test - generally takes only a few minutes. And frankly, I was looking for a fun project to learn Python with, and this fit the bill.
          I'm kind of surprised you found that to be the case with mp3tag.....having used it for years, I'm still discovering new functions and I'm continually impressed with how powerful it is in terms of precisely the kinds of tasks we're talking about. I've developed elaborate action groups (macros) for each of my album types (Compilations, Operas, Soundtracks, etc) that perform all these grooming tasks with a single click. I don't really find it necessary to search or check for tag issues, as you described above, since these macros just "impose" the consistency I want. No need to check anything about COMPOSERSORT, for example, since I know my macro will take the COMPOSER data, format it the way I want, and write the proper COMPOSERSORT tag. Doesn't matter what was there before. But again, the time-consuming part is the thing no program can automate- ensuring accurate COMPOSER data to begin with.

          Now that I've spent the time writing this (really, it was only a couple weeks of free-time work), I figured I might as well throw it out somewhere. Who knows, maybe someone will find it useful
          Absolutely! Like I said, impressive work, especially as your first Python project! I've always wanted to wanted to learn Python and play around with it a bit, but just haven't found a project to motivate me. Looks like you had fun with it....I'll have to be sure to give it a try one of these days.

          Comment

          Working...

          ]]>