title
Products            Buy            Support Forum            Professional            About            Codec Central
 

Bi-directional multi-lingual file name weirdness

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • RadicalDad

    • Jan 2011
    • 30

    Bi-directional multi-lingual file name weirdness

    I just ripped a Hebrew language educational CD and the file name results were unexpected and problematic to fix. I'm wondering what Spoon may know about how ripped filenames are generated. The CD was not found in any of the metadata databases (nor was it in AccurateRip) so I typed in the track titles - in Hebrew, of course. Unfortunately, the CD has 8 tracks, but the cover only lists 7 tracks. Not knowing what the title of the 8th track should be (or even which track is the "extra" for which I don't have the title), I entered "Unknown" (in English) as the title of the last track.

    For some reason, the Ripper was not my friend this evening. It decided that all but one of the tracks had an "unknown artist" despite the fact that I had typed in the artist name (in Hebrew). And the last track ripped as "Track 8" rather than the name I gave it - Unknown. The Ripper didn't act up like that on the second CD of the set, though I did have two unknown tracks (7 tracks, but only 5 titles on the printed liner material.)

    I have no idea what happened on the first rip, but since my CD drive had trouble reading a few of the tracks, I decided it would be faster to just rename the files and fix the metadata manually. That's when I ran into trouble.

    Normally, my file names look like this: Track* Artist - Title.flac. The last time I ripped a Hebrew music CD was years ago - version R.14 of the software and probably Windows 7. (Now I'm on R.16 and Win 10.) These years-old rips do not exhibit the problems I'm about to illustrate.

    Problem 1: The track file name is out of order for tracks that are all in Hebrew. Instead of 01 ArtistName - TrackName, I've got 01 TrackName - ArtistName.

    Problem 2: It is nearly impossible to fix the mixed Hebrew-English track filenames because there are LTR (left-to-right) and RTL (right-to-left) control characters embedded in the file name text. Those control characters play havoc with the cursor movement keys as well as the delete and backspace keys. One cannot delete the control characters using those keys, the next letter to be deleted is unpredictable when crossing the Hebrew-English boundary, and often that boundary cannot be crossed at all. Mixed Hebrew-English track names from the years-old rips don't have embedded control characters.

    As an aside, bi-directional text actually worked better in Windows XP than it does in Windows 10. Things have been getting steadily worse with regard to Arabic, Hebrew, Pashto, and other RTL languages at Microsoft since they fired their best i18N guru several years ago and never hired a replacement, but I digress. (And don't pile on Apple fan boys - RTL support in OSX is markedly worse. Don't get me started.)

    Anyway, I'm wondering how you feed the file name data to the Windows OS and if there is any way you can change it so that this problem doesn't occur. The control characters are not required by the Unicode standard (even the latest one) and it seems that the inclusion of the control characters is responsible for most, if not all, the problems I have cited. I could cite more information about how I know the control characters are there, but I'm afraid your eyes are already glazing over. If this isn't something you can fix, perhaps you can tell me the OS calls you make that write the file names, and whether they have changed from R.14 to R.16. At least I'd have something to take to MS in hopes they actually will listen to the problem, realize that this worked properly in the past, and maybe fix it. (Ha!) And what in tarnation happened with that first rip?

    Thanks.
  • Spoon
    Administrator
    • Apr 2002
    • 44509

    #2
    Re: Bi-directional multi-lingual file name weirdness

    The filename calls have not changed, it is CreateFileW which is used. If you did not type the control characters, then it is likely the OS which is trying to be smart and self add, perhaps others have had this on Windows 10, try a generic search as starting point.
    Spoon
    www.dbpoweramp.com

    Comment

    Working...

    ]]>