On the 48 entry tables - that's the conclusion I came to - otherwise the sequence of quantization values doesn't make sense. That might imply that quantization (and therefore other coding functions as well) happen in a 4:4:4 color sample environment and that color sub-sampling happens later. Actually, that makes sense because if you want a codec that can handle other color sample schemes you would only want one coding engine common to all color sub-sample schemes and do the sub-sampling at the end; it results in much simpler code. Either that or there are simply 4x as many blocks for the Y component as the others. I doubt that, though, because that would make motion vector calculations more complicated. There might be something interesting to explore here. To some extent I would assume that Panasonic has a common codec that is used across several products; otherwise code maintenance would be a nightmare.
I have no idea what the 66 entry table might be. I'll look at the reference codec to try to locate any 66 entry tables.
I had a look at the H.264 reference code (available at http://iphome.hhi.de/suehring/tml/) and if you look at the routines having to do with quantization (they are named Qxxxxxx in the header and c code sections) you can see structural similarities with the GH2 tables. The numbers inside the structures are different however. The reference code is not optimized (it's very, very slow), so there is no pre-calculation of table values, as I suspect Panasonic is doing. Another place to look is the x264 codec (available at http://www.videolan.org/developers/x264.html) which is optimized. I haven't gone through that yet, but I am somewhat hopeful that there will be some hints in there for us.
As to the numbers you mentioned related to GOP length. Actually, they seem more related to framerate, corresponding to the GOP closest to 1 second. I vaguely recall the same parameters in the GH1 codec. I don't remember what the value for 1080/50 was, whether it was 40, 50, 0r 52 (because the GOP was 13 for the GH1). I had surmised that it had something to do with when frames are flushed to flash memory, but I'm not sure.
By the way, a confusing thing about this is why there are 8160 blocks and not 8100. If you take 1920x1080 you get 2073600 pixels. That divided by 256 (16x16) equals 8100, not 8160. The catch is that 1080 is not a multiple of 16 so you have to add an extra half-row of macroblocks. The actual calculation is (1920 x 1088) / 256 = 8160.
@woody123 Yes, I got that it is 16x16 blocks using math skills :-) For 720p they are using 3600 constant.
Other interesting thing that in the same block we have setting of value that is proportional to GOP length. For 1080p24 we have 24 (GOP=12), 60 for 1080i60 (GOP=15), 48 for 1080i50 (GOP=12).
"Picture segmentation Each 4:2:2 PsF 1920 x 1080 picture is first reconstituted into a progressive 1920 x 1080 frame, then each frame is divided into 8160 16x16 shuffle blocks for luminance and two co-sited 8160 8x16 blocks for chrominance. In the case of 4:4:4 PsF, there are three 8160 16x16 blocks for each of RGB. In the case of interlace signals, each field is treated as an independent 1920 x 540 field, and is divided into 4080 16x16 blocks for luminance and two 4080 8x16 blocks for chrominance. An example for 4:2:2 PsF is shown in figure 2."
Vitaliy, as you noted, the asm listing calculates an index into the 52-element quantization table, in part by using a sequence of hard-coded reference levels (137, 152, 168, 192, 216, 240) as index cut off points. If there is only a single active instance of this routine in the encoder, the hard-coded reference levels could be patched with different values to globally bias the Qstep selection toward higher quality quantization factors.
Alternately, the coarse end of the Qstep index range could easily be capped, boosting the quality of low-detail macroblocks without increasing the bitrate of medium and high-detail blocks. The to_check routine limits the most coarse quantization index to 51; if this hard-coded value were decreased, it would force the encoder to use a higher quality quantization factor in low-detail macroblocks.
Chris. Do you recognize 4080 and 8160 constants (used for interlaced and progressive 1080 footage)? All I found is that some encoders report this numbers with AC, DC and MV.
Maybe they do the block encoding first, and then the color subsampling. That would make sense if the codec was intended to support higher color subsampling rates. Boy, that would be nice - fat chance, though.
I'll have to study this a bit against the H.264 standard and see if I can correlate it to some of the reference codecs. It's been a while since I've looked at all this.
Here is one: word 0x906, 0xE08, 0xE08,0x100A,0x120A,0x100A,0x120C,0x1810,0x1810,0x120C,0x2014,0x5218,0x2014,0x6C1C,0x6C1C,0x6C20 word 0x805, 0xF0A, 0xF0A,0x1E14,0x140F,0x1E14,0x6050,0x6050,0x6050,0x6050,0x806C,0x806C,0x806C,0x9C8C,0x9C8C,0x9CB0 0x805, 0xF0A, 0xF0A,0x1E14,0x140F,0x1E14,0x6050,0x6050,0x6050,0x6050,0x806C,0x806C,0x806C,0x9C8C,0x9C8C,0x9CB0
Really? I would expect three parts; one for the Y component which might be, say, 16x16, and the U and V parts would be 8x8 - or, half of the Y table's dimensions (whatever they are). Color subsampling, another trick that contributes to compression, typically occurs before quantization. Come to think of it, with 48 entries I would expect the Y parts to be 32 elements, the U and V parts to be 8 elements each. Those are somewhat strange sizes as they do not correspond to squares, but there might be some data packing going on. How big is each element?
In practice we have 52 elements tables (three for each mode).
And also 4 tables for each mode consisting from 48 elements (similar ones are used in GH1 encoder). Each such table looks like 3 parts consisting from 16 elements (this can be 4x4 matrix in fact).
lpowell is right, you don't want to mess with coefficient tables, etc... Actually, I'm not sure anything is to be gained by playing with quantization tables either. Compression typically happens on a macroblock level. A typical 4x4 macroblock (we'll keep it small just for this example) would look something like this:
A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 C4 D1 D2 D3 D4
A1 is the DC Coefficient. The rest are all AC coefficients. Basically, the DC coefficient sets the base value and the AC coefficients are offsets using A1 (the DC coefficient) as the base. As you go to the right you'll see horizontal values representing higher frequency horizontal coefficients (i.e. more detail in the horizontal plane). As you go down you see higher frequency components for the vertical plane. So, the top left is the lowest detail on both planes, and the bottom right is the highest.
Quantization basically works by chopping off values going toward the bottom-right. You'll still see the entire macroblock, but values toward the bottom-right will be zeros after quantization. When the macroblock is transmitted huffman encoding (or the equivalent) is used in a zig-zag pattern, processing coefficients in an order where A1 comes first, followed by A2, followed by B1, etc..., with D4 coming last. This will cause in all the high frequency coefficients which have been set to zero by the quantization process to all be in a row at the end of the bitstream for the macroblock, which the huffman encoding will turn into just a few characters.
During the encoding process H.264 codecs will typically choose a quantization level according to available bandwidth, so theoretically it should not be necessary to mess with quantization tables. The codec should simply truncate, or not truncate macroblock entries according to available bandwidth.