Personal View site logo
Make sure to join PV on Telegram or Facebook! Perfect to keep up with community on your smartphone.
DR and sick people
  • Each time I hear about 8bit images and 6-7 stops I just can't bear it...

    Just to make it into extreme to show stupidity.

    Let's make it 1bit.

    • 0 - will be defined as "low building".
    • 1 - will mean "high building".

    Dynamic range will be ratio between low and high buildings heights.

    One stop just means that this ratio is equal to 2, two stops that it is 4, etc

    Did you noticed that we never referenced specific heights numbers here?

    In reality we can define "low building" as 1 meter and "high building" as 8000 meters (we are free to do it, can also use other definition).

    Hence we have 13 stops, using 1 bit.

    Well, you can say, but we are losing all buildings in between. After encoding they will become either "high" or "low".

    May be this is where the limit is? Well, no not here.

    For 8bit we have 256 separate values now. We can make out definition as

    • 0 - lowest building
    • 1 - one that is 2 times higher
    • 10 - one that is 4 times higher
    • 11 - one that is 8 times higher
    • etc

    So it'll be total 256 stops. :-)

    We'll talk where the limit lies soon..

  • 32 Replies sorted by
  • Are you sure?

  • @caveport

    Are you sure?

    I am sure that 99.9% of people talking about DR in video files with specific bit depth do not understand that they talk about. Sadly.

  • I think the wording of your post may be a bit unclear. But I do agree. F-stops and bit depth are not really related. Most people have very limited understanding of the terminology.

  • I think the wording of your post may be a bit unclear.

    It is extremely clear samples showing some utter stupidity.

    But I do agree. F-stops and bit depth are not really related.

    F-stops were never mentioned and it is totally different thing.

  • DR and shade resolution are entirely separate concepts. DR is the difference between the highest input value capable of being represented and the lowest, below which noise is the dominant value. The human eye, for instance, has an approximate dynamic range of 100dB give or take, but cannot experience this full range at the same time, due to optical glare and masking, among other phenomena. In strict dB terms, an Fstop or EV is equivalent to ~6 dB, making the human eye theoretically capable of perceiving roughly 100/6 = 17 EV, but practical limitations make this the standard figure of roughly 15 EV, in other words the optical nerves saturate (cannot represent a higher value) at a linear stimulus value about 100 dB over the lowest value at which noise becomes the dominant signal on the nerve, but we cannot determine this many different gradations of tone when presented with a signal that extreme - either we squint into the highlights and lose our ability to see into the shadows, or we are partly blinded in the bright sun in order to catch a glimpse of the interior of the cave or building.

    In digital terms, this is roughly similar: the sensitivity of a given sensor may be capable of representing any arbitrary number of EV in terms of actual stimulus before the cells saturate and above the point at which internal cell noise becomes dominant. In Vitaliy's terms, these are "high buildings" and "low buildings" respectively. Obviously we can't tell "low buildings" apart from "no buildings" because somewhere between the two is the noise floor.

    But this says absolutely nothing about the in betweens, as Vitaliy has been pointing out. And here is where bit depth plays a part, but a very nuanced part in some ways. You see, bits are always powers of two (e.g. 1 bit = 6 dB in these cases). But when the ends of the scale are fixed reference points, as in the real world where the sensor is physically limited by an actual saturation point and a noise floor, bits cease to be meaningful with respect to the EV or whatever, and only act by dividing the distance between the upper value ("high building") and the lower bound ('low building') with either a greater number of divisions (more bits) or fewer, but farther apart, divisions (fewer bits).

    So, to Vitaliy's point: DR and bit depth have nothing to do with each other. Bit depth has to do with the ability to represent shades with greater or lesser precision from one another. Dynamic range has to do with how far the endpoints (bright and dark) are from each other in terms of actual light stimulus. DR cannot be inferred from bit depth and bit depth cannot affect or control dynamic range given the same sensor chemistry, assuming the ADCs are properly calibrated to the sensor.

    For a more practical example: If I have a chip that, at a given exposure, saturates at the light intensity of a 100W light bulb and has a noise floor at the light level put out by a 4W nightlight, adding bits won't mean that I can see into darker shadows if I'm exposing for the same 100W light bulb. It will mean the shadows that are brighter than the 4W might have a bit more detail, but the darker shadows will still be clipped to noise (note, I did not say clipped to black). To increase the actual dynamic range means I need to change the chip, so that it's noise floor is lower, so that darker input can still be represented with meaningful data rather than noise. More bits can mean, in this case, that I might take greater advantage of this detail by more clearly separating the colour and shade of, say, the cockroach on the dark floor from the shadow in which it is running, but they won't help me "see" anymore into that same shadow if I keep the exposure correct for the face lit by the 100W light bulb - I still may not see the black scorpion hiding by the couch leg unless I choose to clip the light bulb to white and catch more light to brighten the shadows by changing the exposure.

    Does this help explain the difference?

  • @StudioDCCreative

    Thanks for good post, but I have strong suggestion to move it to separate topic. May be leave directly related part.

    You see, bits are always powers of two (e.g. 1 bit = 6 dB in these cases).

    This is wrong, for example.

    This topic is ONLY about digital part and dynamic range. Not about sensors, not about eyes, not about monitors.

    I'll get to whole picture step by step.

  • I have never thought bit depth related at all to DR. However saying that, storage of extreme DR in certain formats can have issues with banding.

    In your example @vitaliy_kiselev we could have 1 bit file of 100 stops of dynamic range. This would cause extreme banding. More bits less banding. (In today's tech bits should be limited to sensor native noise level, as above that we are just sampling noise better)

    Example of this is GH4/5 and recording in VLog-L. Similar issues exist in recording SLog as well.

    Obviously solution is simply "don't use log", however then the look of the image is burned into image, and gives less creativity afterwards. Also sometimes on-set you don't have time to get the image 100% perfect in camera- so it is best to do that work afterwards. Hence why RAW is so popular.

    Similar in auido recording 16 bit is great for distribution, but 24 bit gives much more versatility for editing; eq'ing; repair etc.

    Then again, one only has to look at all the issues with 8bit distributed video on YouTube etc, (Banding) to get an idea why a 10bit display and 10bit file could help here.

    Noise dithering can also be used to fix such banding issues, (which can be extreme not only in skies but also others scenes). However video land doesn't like the idea of "adding" noise to images.

    Obviously this discussion is only talking about final image- not capture or sensor. So good idea to use CGI as example, or even motion graphics. If I were to generate a lovely gradient of black to white, we would need at least 8 bit image to have decent non-obvious banding. I have found personally 10-16 bit is best for no banding artifacts "at all" in this example. Now if one were to dither the 8 bit file, gradient banding would not be as prominent.

    There also must be prior research into human perception of banding and light intensity. A level at which changes of intensity disappear to the human vision system.

    So one would take that value, and then the black point and white point of the display device into account, and that should tell us the best step value, which would easily be able to be translated to bit depth.

    Similar research tells us that 60p feels more lifelike than 30p for example. Because at the end of the day, displays are only intended to be viewed by humans.

  • In your example @vitaliy_kiselev we could have 1 bit file of 100 stops of dynamic range. This would cause extreme banding. More bits less banding. (In today's tech bits should be limited to sensor native noise level, as above that we are just sampling noise better)

    Please carefully read that I wrote. Do not attach here monitor or anything.

    Then again, one only has to look at all the issues with 8bit distributed video on YouTube etc, (Banding) to get an idea why a 10bit display and 10bit file could help here.

    Youtube use very high comporession done using cheapest algorithms on GPUs.

    Noise dithering can also be used to fix such banding issues, (which can be extreme not only in skies but also others scenes). However video land doesn't like the idea of "adding" noise to images.

    No one actually care that they like (actually they very much like to add "proper" film grain). It is just simple math.

    Obviously this discussion is only talking about final image- not capture or sensor. So good idea to use CGI as example, or even motion graphics. If I were to generate a lovely gradient of black to white, we would need at least 8 bit image to have decent non-obvious banding.

    This exact discussion is about digital images only. And do not add here perception things for now.

    I talked only about DR using strict definitions.

    There also must be prior research into human perception of banding and light intensity. A level at which changes of intensity disappear to the human vision system.

    Humans use big brain part to "process" images. And it is very complex system with feedback loops, system dealing only with tiny bit of source information.

  • @Vitaliy_kiselev problem is all perception of images is "human perception". In nature there are extended ranges of colour such as IR range (birds etc) which do not exist in video range. Video colour range is specifically designed for human image perception system only.

    Same issue is with sound, human hearing is only capable of 20khz (most only hear 16 if lucky, and young), however ultra sound exists in nature.

    So all discussions regarding video encoding has to be based on human perception. This is actually how MP3 was developed, looking at human hearing system.

    Even our scientific capture systems have to translate different ranges to visable human range. (Such as IR camera)

  • So all discussions regarding video encoding has to be based on human perception.

    No.

    Again, it is isolated math only topic for now.

    I SPECIFICALLY removed everything else. And will add part by part.

  • Ah, ok. This will be fun then!

    Looks like it will not only be maths, but also computer science a bit. :-)

  • @Vitaliy I think you mean steps not stops. Probably a Google translation error ;-)

  • I think you mean steps not stops.

    LOL

  • DR. Dynamic range PHD in geriatric sensors with speciality in low gain at elder age sensors with low Dynamic range specially with CCD and early CMOS tech.

    (No staked sensors and later CMOS tech)

    DR. Dynamic range. Atention all mondays from 10am to 4pm.

    Call secretary asistant miss vitally to make your chek in.

    we dont accept HDR sensors. 10 bit or higher with latest h265 wont be atended. (Only h264 8bit prossesing)

    1800-drdynamic.

  • @endotoxic

    This not bad humor topic :-)

  • You are awake. Lol

  • Digitization of light intensity is slowly working its way through similar issues as digital audio technology did decades ago. The 8-bit SoundBlaster standard ruled PC audio for years, much as 8-bit H.264 video does now. Audio digitization was eventually perfected with 24-bit capture and 16-bit playback, but 24-bit recording would require excessive storage sizes for video. The fundamental problem is that both audio and video work on logarithmic (db) rather than linear scales, and are most efficiently encoded in floating point rather than fixed-point formats. Camera sensors are digitally-sampled analog devices that currently produce 12 or 14-bit linear output, but this RAW RGB data is sensor-specific and unsuitable for direct viewing. Ideally, it should be gamma-corrected into an industry-standard flat log scale, with each color channel encoded in a 16-bit floating point format, with a 4-bit exponent and 12-bit mantissa. That would provide a 96db dynamic range with a consistent 12-bit color resolution at all levels of light intensity, from shadows to highlights.

    For practical purposes, color resolution could be truncated down to 8-bits, with the 4-bit DR data packed into a separate 8-bit stream. Since DR varies at a far slower rate than color variations, simple delta-compression would reduce the DR stream to a small fraction of the size of the 8-bit color data, which could be compressed losslessly by about 50%. So it would require about 7-bits per native R, G or B sensor pixel to encode the entire sensor output with 96db DR and 8-bit color resolution.

  • @LPowell

    Digitization of light intensity is slowly working its way through similar issues as digital audio technology did decades ago. The 8-bit SoundBlaster standard ruled PC audio for years, much as 8-bit H.264 video does now. Audio digitization was eventually perfected with 24-bit capture and 16-bit playback, but 24-bit recording would require excessive storage sizes for video.

    Yes! Totally agree. This has been my thoughts for a long time. Coming from audio land, video seems very much like a huge step back.

  • @alcomposer Audio digitising is way simpler than video. The processes are not comparable at all, really.

  • @caveport the concept is not that they are the same process- or that they are comparable. The idea is that both Audio and Video are digitized in a similar way, (sampling analog values using 1:0).

    Audio digitising is way simpler than video

    I disagree with this statement. They are very similar. Both video and audio use AD-DA chips, video simply has greater data for a single channel. Mind you, to record an orchestra- or a large band, one can easily use 32 channels, each of which using 96-192khz sampling rates at 24 bit depth.

    Fortunately for video, a computer can always drop a frame during playback if it gets over-worked, (miss-matched frame rates- or high complexity) in playback. Audio can not do that- as one would hear a horrid crackling sound.

    So in that regard it is not simpler at all. Look into software synth design, and computer audio buffers, it is quite complex.

    Also remember that high quality audio interfaces can be as expensive as a RED or Blackmagic URSA camera. With very complex designs, including word clocks, low jitter PLL etc...

    Check out these mastering grade AD-DA interfaces: http://en.antelopeaudio.com/pro-audio-devices/

  • @StudioDCCreative

    Thanks for good post, but I have strong suggestion to move it to separate topic. May be leave directly related part.

    You see, bits are always powers of two (e.g. 1 bit = 6 dB in these cases).

    This is wrong, for example.

    Ok a 'bit' (e.g. a 1/0 binary digital element) has whatever meaning you want to interpret it as, but when they are used in base-2 mathematics which I assume since you are discussing the concept of single-bit (as compared to multi-bit) word values, this is what I am speaking of - but at this point you can omit bits and just deal with numbers themselves and the number of divisions / distinctions between the upper and lower bound they provide.

    However, speaking of single-bit word values, I'm curious as to what your core argument is. It reminds me "a bit" (pun intended) of the whole 1-bit DSD encoding method right now, where the bitstream is the important part and the single bit less so, but I don't think that's where you're going with it.

    I'll get to whole picture step by step.

    Looking forward to it!

  • @StudioDCCreative

    Ok a 'bit' (e.g. a 1/0 binary digital element) has whatever meaning you want to interpret it as, but when they are used in base-2 mathematics which I assume since you are discussing the concept of single-bit (as compared to multi-bit) word values, this is what I am speaking of - but at this point you can omit bits and just deal with numbers themselves and the number of divisions / distinctions between the upper and lower bound they provide.

    Did not understand fully this part.

    Thing that I did is just making definitions. At the stage I am talking here I am free to do such definitions.

  • @alcomposer Video digitising is way more complicated because it has spatial information which audio does not have. It is also comprised of multiple channels, each of which can be different information i.e. RGB, YUV etc. which is also very different from audio. Also the different channels can be different sample rates depending upon codec choices such as 4:2:2, 4:2:0 etc.

    Quote:

    "Fortunately for video, a computer can always drop a frame during playback if it gets over-worked, (miss-matched frame rates- or high complexity) in playback. Audio can not do that- as one would hear a horrid crackling sound.

    So in that regard it is not simpler at all. Look into software synth design, and computer audio buffers, it is quite complex.

    Also remember that high quality audio interfaces can be as expensive as a RED or Blackmagic URSA camera. With very complex designs, including word clocks, low jitter PLL etc..."

    These comments don't really have anything to do with digitising complexity.

    I'm also surprised you are comparing 32 tracks of audio to the RGB/YUV 3 channel encoding for video. A more accurate comparison would be 32 layers of video in a timeline, i.e. 32 similar streams processed simultaneously. I have had this kind of argument with audio engineers for many years & none of them ever been able to deliver a LOGICAL argument for why they think video digitising is as easy as audio.

    I don't mean to be argumentative but I think there are gaps in your knowledge of video systems engineering and technicalities. The first post here pretty much enlightens me to the general level of understanding, even from people who claim to be knowledgable!

  • The thing is, 32 tracks of audio are not "experienced" at different times. If you are suggesting that 32 audio tracks are the same as 32 video tracks, that's not 100% correct either. Why? Because no-one watches 32 video angles at the same time. However the 32 audio channels are mixed together into one sound experience "at the same time".

    So, obviously, if we were talking about computational video systems such as light field capture- sure-32 video angles could be merged into one super amazing video file. But 32 video channels currently can only be edited together, and viewed one at a time, unlike audio.

    Lets put it this way, you can use an iPhone to film and audio record an orchestra... Or a multichannel audio desk with a RED camera. Both are capturing in the same temporal space, at the best quality in 2017. Once another camera is added you need to edit between cameras, thus we are not watching both cameras at once. One could use a split screen technique, but not for 32 HD channels.

    I think I was a bit quick however in saying before that both audio and video were same complexity, rather that at the end of the day they are both utilising similar AD systems. Obviously video has much much much much more data to process.

  • Actually, I have realised though this discussion that there is a huge problem with my logic:

    We need a way to quantify the quality of both audio and video capture systems.

    If audio is "much simpler" then we have to quantify at what "quality" is it simpler.

    Lets say we capture an SD video image of an orchestra, and a super high audio recording: are they both similar? No. Why? Because the audio recording will sound much more real than the video recording.

    I would suggest that 4k 60p is the equivalent of a good audio recording in making the viewer feel like they were there "in the room".

    So with that logic, and definition of similar quality then yep, video wins the complexity race.