James E Keenan on 27 Oct 2020 18:14:39 -0700 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
Re: [Philadelphia-pm] Unicode BOM in input files |
On 10/27/20 3:45 PM, Eric Roode wrote:
Hello fellow mongers! Today I opened and read a file. Advanced stuff, right? :-) open my $fh, '<', 'file.dat'; $line = <$fh>; if ($line =~ /^Your data:/) ....The problem is that the input file has a Unicode BOM (byte-order mark), so the first three bytes of the string are in fact 0xEF, 0xBB, and 0xBF. So the match fails, even though if you look at the file in an editor, it looks like it begins with "Your data". It took me a fair amount of time to figure this out.
Yes, this is annoying. I have encountered the problem before, in the form of a bug report for my CPAN distro Text-CSV-Hashify:
https://rt.cpan.org/Ticket/Display.html?id=130048If you read that ticket, you will appreciate some of the complexities in this issue. Unfortunately, I haven't had time to develop a solution -- magical, automagical or otherwise.
Thank you very much. Jim Keenan _______________________________________________ Philadelphia-pm mailing list Philadelphia-pm@pm.org https://mail.pm.org/mailman/listinfo/philadelphia-pm