Objective comparison of different Hacks

mo7ies

Is there a technique that would show definite differences between the hacks?

So far everyone is just saying "to my eye, X hack looks great/better/worse than the Y hack"

But how do we measure the actual differences?

Kihlian

I started something like this, but now I don´t have time to make comparison and no one follow... http://www.personal-view.com/talks/discussion/2812/only-settings-comparison-no-small-talk#Item_8

thepalalias

A group of us in L.A. have been testing this. Shooting part one (outdoor, much less controlled) was completed a couple weeks ago and we've been getting together the footage from that (it was a short shoot and everyone had to leave right after so it's been a challenge to acquire everyone's footage). We are also looking at doing a more tightly controlled indoor test for part 2.

LPowell

The best way I've found to see the details in video files is to display them at 400% magnification and scrub through the frames in a video editor. (Make sure the video frames are rendered at full resolution without any type of smoothing or filtering.) At this level of magnification, you can clearly see rectangular macroblocking artifacts and coarse quantization from inadequate bitrates. I use Adobe After Effects (which can display individual RGB channels) and Premiere Pro for this purpose, though I'm sure other NLE's can do this as well.

bmcent1

I can't wait to see the results from the LA shoot!

A test should be controlled and repeatable. On the other hand if you're there shooting the same thing side by side with others using the same camera, real life shooting situations probably wouldn't be more controlled than that so it will be interesting to see how different the video from the LA shoot is. If it's hard to tell A B results apart, that would suggest the various settings have matured even if there is a little more performance that can still be teased out.

I don't know how the LA tests were shot, but it would be cool if there was some high motion, some very shallow DOF stuff, some low light, basically through the extreme cases at it and see if there's a great all around setting and when one is highly tuned to one area.

balazer

It seems some guys don't know the meaning of the word "objective"...

There are objective ways of analyzing encoder performance: looking at quantization parameters, comparing the unencoded input to the output, calculating the signal to noise ratio, estimating the signal to noise ratio.

mo7ies

@balazer: has it been done yet with any GH2 hacks? If yes, links to the results, please?

If not, how can we do these measurements?

driftwood

Bitrate is still king. If this was not so, a lot of hi end codecs such as Pro Res wouldnt need for example 147Mbps for a normal transcode setting. On the GH2, a good employment of Q makes good use of the required bitrate alongside decent setup parameters. Lower bitrate recordings can look great - relying on predictive sampling of the reference frames but these subtle differences whichlook great on first acquisition to the human eye - may suffer on the edit and the grading. Many of the top testers go thru many hours (or days ) to achieve new settings. However, what looks good to one, looks not so good to others. Horses for courses.

mo7ies

@driftwood So what do you think are the top 2 hacks at this time, judging by the following criteria:

Highest quality image, including motion, in 1080p24, 1080p30, 720p60 (no perceptible macroblocks, good latitude etc)
Low video noise at normal ISOs, including in shadows
Lowest possible noise at high ISOs
Spans on 64Gb 95MB/s SanDisk cards
Stable, reliable functionality - no freezups, errors etc - on Class 10 cards

driftwood

I have settings which push nighttime recordings, daytime setups, closeups and wide shots with detail. Each have to be tempered for the GH2 as no single setting covers the majority well enough. All rounders? Canis, Sedna and Mysteron are very good, BKs GOLGOP stuff is good with its decent prediction, and there are others... but no single setting will satisfy everybody.

mo7ies

@driftwood - what do you think of Sanity 4.1, or v5 ?

thepalalias

@mo7ies I was just reading some of your comments in the Sanity thread and I want to point out to you some things that may be helpful as you seek to get more information (and things that I am keeping mind as we plan stage 2 of the L.A. comparisons).

1) There is no entirely objective ranking system for image quality characteristics. All of them will be subjective to some extent because each viewer has differing aesthetics and priorities.

2) As regards "low noise", specifically, keep in mind that we are generally talking about different approaches to dealing with the same amount of noise, not actually ways of reducing the noise the sensor provides. This makes the question exceedingly subjective. Are we looking for settings that do so much smoothing and noise reduction in camera that we cannot see the noise that is there, or are we looking for the ones that give us the most detail and the most control of the noise in post? Do we want the noise to stay as smooth as possible and update infrequently, or do we want it to be very fine and update as frequently as possible? The settings that are often praised for their low-light performance are most often ones that give the finest rendering of noise and of details that are sometimes reduced by encoding quirks in other settings. Here is an example that illustrates that.

Please read the article referenced on the Vimeo page before asking any questions about the comparison.

Password Driftwood1

3) Because of "1)", no "objective" table can be compiled, only subjective ones. However, the images or videos with which to compare can be made available for people to make their own comparisons (objective data, though not always properly controlled) and subjective commentary or rankings could optionally be provided.

4) We need more comparisons and it would be helpful to have them consolidated. I am doing everything I can (within the time restrictions of my other work) to provide additional ones when I can, and to coordinate others in the area in doing so as well. I've posted informal guidelines for testing in the past and will be happy to codify them later on if (and only if) there is sufficient interest.

LPowell

@thepalalias I downloaded a Sedna AQ1 MTS file from your website and checked to confirm it was recorded with an Intra patch. Not to complain, but that means your Sedna AQ1 vs. Stock FW 1.1 video above is only testing the "1080p24 Scaling I" Sedna quantizer table, and leaving the Scaling P and Scaling B tables untested. Objectively speaking, your current test results will only cover Sedna patches with GOP-1 Intra settings.

thepalalias

@LPowell I am not sure I follow. Are you referring to the range of settings tested or to the construction of the test?

LPowell

@thepalalias An Intra-only patch will only make use of the "Scaling I" quantizer table. If you combine the Sedna quantizer tables with a patch that uses a GOP of two or more, it will use Sedna's "Scaling P" and/or "Scaling B" quantizer tables as well as the "Scaling I" quantizer table. Since your tests use an Intra-only patch, your results cannot be used to evaluate the Sedna "Scaling P" and "Scaling B" quantizer tables.

thepalalias

@LPowell Ah, I see. Yes, that's already on track to be addressed with the inclusion of Sedna GOLGOP in an upcoming test. But just to be completely clear, I wasn't testing the applicability of the different matrices - my tests have (in the past) only been designed to evaluate specific pre-compiled settings.

So you're right on the money - I wasn't trying to test more than that.

Do you have notes in testing and comparing pre-compiled settings that use P and B frames? That test was really designed around the intra settings with the other settings included to compare against them, but some future tests may address other aspects. I plan to use fans to get (completely repeatable) motion in some shots, for instance.

Also, how do you feel about shutter speeds in motion tests like that? Should I keep them at 180 degrees to give a more realistic representation of normal usage? Or should I use narrower shutter angles to highlight sharpness differences while rendering motion?

LPowell

@thepalalias In my view, P and B frame quality is a crucial issue for all non-Intra patches. AVCHD encoders are typically tuned to encode P and B frames at deliberately coarser quantization factors than I frames in order to achieve high compression efficiency. The assumption being that motion vectors will provide close enough approximations that residual image data can be encoded at coarser quantization without noticeable degradation in quality. Where this assumption fails is in P and B frames that are forced to use coarse quantizers on Intra-encoded high-motion macroblocks. This is one of the main reasons unhacked AVCHD encoders are unable to match the still image quality of Intra-only encoders.

With shutter speed, I almost always test with a 1/60 second shutter for several reasons:

1/60 is the slowest shutter speed that works properly in all video modes.
Speeds faster than 1/60 are not commonly used for most video subject matter.
A fixed shutter speed produces a consistent amount of motion blur in all modes.
1/60 eliminates 60hz power line strobing (in my region; PAL users may prefer 1/50).

Since I'm testing a digital video camera operating at a wide range of frame rates, a "180-degree shutter" is not a meaningful test parameter for my purposes. Consistently reproduceable focus, exposure, and motion blur are what I consider most important.

CRFilms

Everybody is trying different ways to test things but there's no uniform test. Shooting resolution charts or color charts is good, but you need movement. So how do you test movement and color and resolution in such a way that ANYBODY can replicate it?

I've felt for a while now there should be a "Walmart" or "Amazon Test". I mean that there should be some items we could all agree on that anyone could easily buy at Walmart or order on Amazon and use them as a standardized test for between $20-30. A small personal fan and a motorized toy for movement. Some colorful items, watercolor set, markers, etc... for color and somethings fuzzy or textured for detail. Plus a box and some small lights so they could be inside the box with their own light source.

That way everybody could order the same items and arrange them similarly and we'd get some uniformity. Your thoughts?

LPowell

@CRFilms For testing movement, how about an inexpensive metronome?

http://www.amazon.com/Wittner-Taktell-Super-Mini-Metronome-Black/dp/B000I6KE4S/ref=sr_1_39?s=musical-instruments&ie=UTF8&qid=1335750488&sr=1-39

CRFilms

@LPowell Heheh...I said $30 and that's $40....again...I am very cheap. >_< Even checking ebay the cheapest I see like that is around $25.

But to clarify...we need a self contained box with multiple objects inside it with their own light source. Something like this: http://www.amazon.com/Fulcrum-30010-301-Battery-Operated-Stick-On-Light/dp/B000R7PM36/ref=sr_1_1?s=home-garden&ie=UTF8&qid=1335751260&sr=1-1

A box, filled with lights and colorful objects with texture. Preferably under $30...at most $40. Stuff anybody could get.

EDIT: You could shoot the metronome, but not everybody would have the same wall behind it or same lighting. Put everything in the box, zoom in until it fills the screen or put the camera in the box as well and just record what's in it.

LPowell

@CRFilms I'm skeptical of shooting direct light sources as you suggest. I think shooting real-life objects illuminated with a standardized light source produces more realistic test results.

CRFilms

@LPowell They don't have to be directly aimed at the camera. They could be placed behind some of the objects or behind/above the camera if the camera is also in the box.

I do think one light should be visible so you could test the fringing or haloing(whatever it's called) of a light source at various ISOs or bitrates like the old Zacuto Shootout Lightbulb test or Blooms ultra low light/high iso match flame test.

thepalalias

UPDATE: Here is an array of PNG files I made comparing the green channel for a large number of the current settings, including all 3 released versions of Canis Majoris http://personal-view.com/talks/discussion/comment/59978#Comment_59978

@CRFilms Or like the MTS files referenced above. You can probably tell the model of the light from the video clips:)

Anyway, as a reference point, every additional variable adds a minimum of 30 seconds to testing, per setting. So if you are only shooting one region (PAL or NTSC) and shooting one clip each for every AVCHD mode, with and without ETC, at one quality (FSH, HBR, SH, 24H, 80%24H). So that is 6 minutes, and you have left out the other region as well as MJPEG, FH, H, VMM 160%, 200% and 300% (with the latter 3 being mainly for spanning testing). If you want to test both regions, add 3 minutes to flash and another 90 seconds (HBR, FSH, H). So 10.5 minutes for one lens and picture profile with one setup for one setting. Add 3 minutes extra to any subesequnet settings for flashing time.

So if you want to test one type of shot for 4 settings, that is at least 50 minutes, just to shoot (plus the time to setup each card, unless you have enough for each setting. Then you have the download and organization time.

You can see why I think ideally it would be better to have at least 2 or 3 people working together rather than test by yourself. That way you could keep the time down much closer to that 50 minutes (one to shoot, one to note and organize and one to edit the footage as it is off-loaded). If you do it by yourself, you are up past 2 hours for those 4 settings (for most people) and you have not had the chance to do one low-motion high detail and one high motion shot, or one low ISO and one high ISO test. If you do both, double the time involved again. And do not even get me started on spanning tests.

In other words, if we can divide the tests up a bit instead of emphasizing re-testing things, it could really help speed things up. This is also the answer to why I do not test more settings, typically. :)

Howdy, Stranger!

Categories

Tags in Topic

Top Posters