Rich Freeman via plug on 31 Aug 2020 16:46:13 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Transcoding MTS vids to something else for frame extraction


On Mon, Aug 31, 2020 at 4:50 PM JP Vossen via plug
<plug@lists.phillylinux.org> wrote:
>
> By the way, the videos are short clips of dolphins just off the coast of
> Kitty Hawk, NC from a recent vacation.  We're not fast enough with a
> camera to capture them so we took vids with the express idea of
> extracting frames for pics.  The camera is a relatively recent Sony
> point-n-shoot.
>

So, you're dealing with a couple of things here.

The only thing you can correct at this point is the processing -
extracting the best still that you can from the data the camera
captured.  I don't have any advice here, though doing multiple
conversions probably isn't a good idea.  You really want something
that will construct the best image it can in memory, and convert it
once to jpeg.

I suspect most of your problems were with the original capture, and
there is no way you can fix those now.  I do a fair bit of photography
these days but I'm not really an expert at the video side - I know
enough to be dangerous.  Your loss of image quality for extracting a
still are going to come from two sources: how each frame of the video
is captured, and how the video is encoded.  I'll elaborate on each.

When it comes to capturing a frame of video, you need to understand
that video is generally captured in a way that is intended to render a
good image when it is played back continuously.

The first issue you'll have is rolling shutter - a video camera
basically captures lines from top to bottom in semi-parallel, but not
synchronously across the entire frame.  It will start capturing line
one, then two, then three, and so on, and then after a sufficient
delay it will finish capturing line one, then two, then three.  So at
any time many lines are being exposed with this area of capture moving
down the frame.  If the scan rate is slow enough it could end up just
wrapping right around so that it is capturing line one of the next
frame before it is finished the last line of the previous frame.  This
causes artifacts when you have panning or rapidly moving objects
in-frame, like video of an aircraft propeller which looks bent.
Better cameras can scan the sensor faster, so that more of it can be
run in parallel to finish the entire frame in time, and you get less
rolling shutter.

The next capture issue is interlacing.  I have no idea if your video
was actually interlaced but if it was then that will cause obvious
artifacts when you want to capture only one "frame" - which is two
separate fields captured at slightly different times.  They're going
to give you that screen-door-like pattern if you're panning with every
other line being shifted.  Interlacing isn't so common these days but
maybe cheaper cameras still do it.

Then you get issues with how the pixels themselves are sampled.  Many
cameras record video at a lower resolution than stills.  A 1080p frame
is only about 2 megapixels - not an impressive spec by today's
standard.  So, you're not going to get more resolution than that from
a video frame.  Then there can be issues around how the color vs
luminosity are captured, and in general the cheaper the camera the
worse this is.  Then there is bit depth - cheaper cameras are probably
only going to sample 8 bits, while better ones will do more, which is
an issue if you have highlights and shadows.

Now we'll talk about encoding.  The first issue there that you touched
on is inter-frame compression.  First, every frame is compressed with
discrete cosine like a jpeg, which of course reduces quality (and the
lower the bitrate the worse this gets).  Then the stream is divided
into I/P/B frames.  I frames or keyframes are basically just jpegs -
if you extract one you get a clean jpeg (within the limits above
around how the frame was captured).  P frames are basically diffs vs
one frame, and B frames are bilinear diffs vs two frames.  The
software will do the best it can to encode this but in general the
further you are from a keyframe the worse a still will look.  A
higher-end camera will record only I frames or raw frames, at a very
high bitrate.  rtjpeg is basically all I frames.  Storing that
obviously requires a very large memory card, and a fast one especially
for 4k+ or higher frame rates (the newest memory cards are basically
NVMe drives, and priced like them).

Your fairly cheap point/shoot camera is probably checking all the
boxes above in terms of things that make the camera less expensive to
build, and which greatly reduce the quality of the video.

Really though I'm not sure what alternatives you have.  If the camera
doesn't have a mechanical shutter (clicking sound) then it will have
rolling shutter even for stills, and of course the latency/etc is such
that timing a shot is probably not going to be great.  Your best bet
is if the camera has a burst mode of some kind - some point/shoot
cameras will basically fill the buffer with a series of jpgs captured
at a higher than normal rate when you hit the bottom.  This will
probably get you much better quality than trying to pull frames from
video.  Alternatively you could get a better camera - either a still
camera that has high fps and a mechanical shutter, or a camera that
records higher-quality video, but you're going to spend closer to $1k
or more either way (something that does really good 4k video like the
A7Siii is over $3k).

Unfortunately in photography you tend to get what you pay for.

You might still get a somewhat better image out of your existing
camera - I'm just not sure what the best FOSS solution for this is.  I
don't deal with much video.

-- 
Rich

-- 
Rich
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug