As we are moving deeper I am making this topic to talk about some AVCHD encoder things.
It is about QStep table (52 elements table each [i+6] item is doubled compared to [i] item). As I am looking at this table in GH2 video encoder code. I want to complety understand its role in quanization.
Do not post there if you do not understand that it is about. Any stupid questions will be deleted instantly. Otherwise any info from knowledgeable people are welcome.
pertinent info from the pdf: "H.264 supports a total of 52 values of quantiser step size (Qstep), which are indexed by the QP. The value of step size doubles for every increment of 6 in QP. For instance, as seen in the Table1.1, when quantiser step increases from 1 to 2, the index QP changes from 4 to 10, and ∆QP (DQ) equals to 6.
The QP used for each picture is called “base QP” or “picture QP. If the QP changes for macroblocks in a picture, the base QP is the average QP for the picture. Using “fixed QP” means that all macroblocks in a sequence are encoded with a fixed constant QP. In contrast, a rate control” method may be used to control the bit rate by adjusting the base QP between pictures. When the bit rate is exceeding the target bit rate, the base QP is increased by the rate control. In this case, several base QPs are used when encoding one sequence.
Encoding with a higher QP results in less bit cost but poorer quality and vice versa."
No, GH2 table consist from integer values (and this table differs from similar table in GH1). It is easy to understand that this is QStep table as first 6 values define whole table (see my first post in this topic).
I have followed this Italian Encoding Engineers blog for some years, he writes codecs for H.264 applications. There is some very useful info on his site. http://sonnati.wordpress.com/best-of-blog
Thanks. I am trying to understand how it all works. That I understand now is that quanizer must play vital role in bitrate values as H264 uses QStep in quantization and later in inverse quantizaton (and overall it looks like transform->quanization->inverse quantization->inverse transform). But I must read more to get all this.
The first two articles by Iain Richardson pin down the mathematical details of how the AVCHD Quantization Parameter is calculated. As you noted, quantization scale factors are defined in a fixed range of 52 equally-spaced multiplicative steps. While the quantization scale factor for any step can be calculated directly, it's likely more efficient to simply store them in a look-up table.
The 52 quantization scale factors are a fixed part of the AVCHD standard and are used in both encoding and decoding. For that reason, patching the encoder's copy of this table with different values would produce non-standard video files that could not be correctly decoded.
However, the AVCHD Quantization Parameter that is used to index the table is selected individually by the encoder for each macroblock, and is recorded in the bitstream for use by the decoder. This parameter could potentially be manipulated to systematically reduce the overall quantization granularity, which would produce higher image quality and require higher bitrates.
Could you say same thing is simpler terms. I understand how this table is calculated. After this I still do not have clear picture. Start of table is 10, 11, 13, 14, 16 (they define whole table anyway) I see this in H265 description and they are used for V matrix construction. I also have bunch of other 52 sized tables user for various modes (they consist from very similar numbers, sometimes it is the same constant). My understanding is that they are used as scalers.
lpowell is right, you don't want to mess with coefficient tables, etc... Actually, I'm not sure anything is to be gained by playing with quantization tables either. Compression typically happens on a macroblock level. A typical 4x4 macroblock (we'll keep it small just for this example) would look something like this:
A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 C4 D1 D2 D3 D4
A1 is the DC Coefficient. The rest are all AC coefficients. Basically, the DC coefficient sets the base value and the AC coefficients are offsets using A1 (the DC coefficient) as the base. As you go to the right you'll see horizontal values representing higher frequency horizontal coefficients (i.e. more detail in the horizontal plane). As you go down you see higher frequency components for the vertical plane. So, the top left is the lowest detail on both planes, and the bottom right is the highest.
Quantization basically works by chopping off values going toward the bottom-right. You'll still see the entire macroblock, but values toward the bottom-right will be zeros after quantization. When the macroblock is transmitted huffman encoding (or the equivalent) is used in a zig-zag pattern, processing coefficients in an order where A1 comes first, followed by A2, followed by B1, etc..., with D4 coming last. This will cause in all the high frequency coefficients which have been set to zero by the quantization process to all be in a row at the end of the bitstream for the macroblock, which the huffman encoding will turn into just a few characters.
During the encoding process H.264 codecs will typically choose a quantization level according to available bandwidth, so theoretically it should not be necessary to mess with quantization tables. The codec should simply truncate, or not truncate macroblock entries according to available bandwidth.
In practice we have 52 elements tables (three for each mode).
And also 4 tables for each mode consisting from 48 elements (similar ones are used in GH1 encoder). Each such table looks like 3 parts consisting from 16 elements (this can be 4x4 matrix in fact).
Really? I would expect three parts; one for the Y component which might be, say, 16x16, and the U and V parts would be 8x8 - or, half of the Y table's dimensions (whatever they are). Color subsampling, another trick that contributes to compression, typically occurs before quantization. Come to think of it, with 48 entries I would expect the Y parts to be 32 elements, the U and V parts to be 8 elements each. Those are somewhat strange sizes as they do not correspond to squares, but there might be some data packing going on. How big is each element?
Here is one: word 0x906, 0xE08, 0xE08,0x100A,0x120A,0x100A,0x120C,0x1810,0x1810,0x120C,0x2014,0x5218,0x2014,0x6C1C,0x6C1C,0x6C20 word 0x805, 0xF0A, 0xF0A,0x1E14,0x140F,0x1E14,0x6050,0x6050,0x6050,0x6050,0x806C,0x806C,0x806C,0x9C8C,0x9C8C,0x9CB0 0x805, 0xF0A, 0xF0A,0x1E14,0x140F,0x1E14,0x6050,0x6050,0x6050,0x6050,0x806C,0x806C,0x806C,0x9C8C,0x9C8C,0x9CB0
I'll have to study this a bit against the H.264 standard and see if I can correlate it to some of the reference codecs. It's been a while since I've looked at all this.
Maybe they do the block encoding first, and then the color subsampling. That would make sense if the codec was intended to support higher color subsampling rates. Boy, that would be nice - fat chance, though.
Chris. Do you recognize 4080 and 8160 constants (used for interlaced and progressive 1080 footage)? All I found is that some encoders report this numbers with AC, DC and MV.
Vitaliy, as you noted, the asm listing calculates an index into the 52-element quantization table, in part by using a sequence of hard-coded reference levels (137, 152, 168, 192, 216, 240) as index cut off points. If there is only a single active instance of this routine in the encoder, the hard-coded reference levels could be patched with different values to globally bias the Qstep selection toward higher quality quantization factors.
Alternately, the coarse end of the Qstep index range could easily be capped, boosting the quality of low-detail macroblocks without increasing the bitrate of medium and high-detail blocks. The to_check routine limits the most coarse quantization index to 51; if this hard-coded value were decreased, it would force the encoder to use a higher quality quantization factor in low-detail macroblocks.
"Picture segmentation Each 4:2:2 PsF 1920 x 1080 picture is first reconstituted into a progressive 1920 x 1080 frame, then each frame is divided into 8160 16x16 shuffle blocks for luminance and two co-sited 8160 8x16 blocks for chrominance. In the case of 4:4:4 PsF, there are three 8160 16x16 blocks for each of RGB. In the case of interlace signals, each field is treated as an independent 1920 x 540 field, and is divided into 4080 16x16 blocks for luminance and two 4080 8x16 blocks for chrominance. An example for 4:2:2 PsF is shown in figure 2."
@woody123 Yes, I got that it is 16x16 blocks using math skills :-) For 720p they are using 3600 constant.
Other interesting thing that in the same block we have setting of value that is proportional to GOP length. For 1080p24 we have 24 (GOP=12), 60 for 1080i60 (GOP=15), 48 for 1080i50 (GOP=12).
By the way, a confusing thing about this is why there are 8160 blocks and not 8100. If you take 1920x1080 you get 2073600 pixels. That divided by 256 (16x16) equals 8100, not 8160. The catch is that 1080 is not a multiple of 16 so you have to add an extra half-row of macroblocks. The actual calculation is (1920 x 1088) / 256 = 8160.