Re: [Philadelphia-pm] Unicode BOM in input files

James E Keenan on 27 Oct 2020 18:14:39 -0700

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [Philadelphia-pm] Unicode BOM in input files

From: James E Keenan <jkeenan@pobox.com>
To: philadelphia-pm@pm.org
Subject: Re: [Philadelphia-pm] Unicode BOM in input files
Date: Tue, 27 Oct 2020 16:21:40 -0400
Dkim-signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=subject:to :references:from:message-id:date:mime-version:in-reply-to :content-type:content-transfer-encoding; s=sasl; bh=+2SUdQJamEJ3 9sU1cxhu+Ys4kCw=; b=Q0ZrrVuzPERW0k4PDHCajzxmsKvG/IyET0JdRDMUzPVM 7jMxtawSlKTaNhiAToImzg2w7UFGl+QMeXbmm3cnRsjVXRCuhuwU32vGnJneFGwm X6ssDdzAfpZ3tHWNWfecNVOCVKUsG7Vc567W2yC9fHcHnGqaeSyyw3S6VCb7Zt8=
List-archive: <http://mail.pm.org/pipermail/philadelphia-pm/>
Sender: "Philadelphia-pm" <philadelphia-pm-bounces+historian=netisland.net@pm.org>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0

On 10/27/20 3:45 PM, Eric Roode wrote:

Hello fellow mongers!

     Today I opened and read a file.  Advanced stuff, right?  :-)

    open my $fh, '<', 'file.dat';
    $line = <$fh>;
    if ($line =~ /^Your data:/) ....
The problem is that the input file has a Unicode BOM (byte-ordermark), so the first three bytes of the string are in fact 0xEF, 0xBB,and 0xBF. So the match fails, even though if you look at the file in aneditor, it looks like it begins with "Your data". It took me a fairamount of time to figure this out.

Yes, this is annoying. I have encountered the problem before, in theform of a bug report for my CPAN distro Text-CSV-Hashify:

https://rt.cpan.org/Ticket/Display.html?id=130048

If you read that ticket, you will appreciate some of the complexities inthis issue. Unfortunately, I haven't had time to develop a solution --magical, automagical or otherwise.


Thank you very much.
Jim Keenan
_______________________________________________
Philadelphia-pm mailing list
Philadelphia-pm@pm.org
https://mail.pm.org/mailman/listinfo/philadelphia-pm

Follow-Ups:
- Re: [Philadelphia-pm] Unicode BOM in input files
  - From: John Karr <brainbuz@brainbuz.org>

References:
- [Philadelphia-pm] Unicode BOM in input files
  - From: Eric Roode <sdn.phlpm@mailnull.com>

Prev by Date: Re: [Philadelphia-pm] Unicode BOM in input files
Previous by thread: [Philadelphia-pm] Unicode BOM in input files
Next by thread: Re: [Philadelphia-pm] Unicode BOM in input files
Index(es):
- Date
- Thread