Personal View site logo
NVidia Ampere GPUs from RTX 3050 to RTX 3090 Ti
  • 161 Replies sorted by
  • @sammy

    No one knows. As all also highly depend on that you do.

    Resolve have extremely inefficient architecture, especially GPU part that is aimed at complex unusual workflows, and for typical simple grading it runs at around 1/4-1/5 of possible speed if they did it all proper and using optimized code.

    Once I had talk to one developer. It is industry chit chat talks that BM has agreement with some other NLE manufacturers on where they don't go. As if they wanted and aimed at much more beginners and focused on real optimization on FCP X level - they could wipe out all but Adobe.

  • HDMI support

    Cards support HDMI 2.1 48GBit/sec and support DSC compression. Maximum supported res is 8K 60Hz with active HDR.

    CUDA CORES

    One of the key design goals for the Ampere 30-series SM was to achieve twice the throughput for FP32 operations compared to the Turing SM. To accomplish this goal, the Ampere SM includes new datapath designs for FP32 and INT32 operations. One datapath in each partition consists of 16 FP32 CUDA Cores capable of executing 16 FP32 operations per clock. Another datapath consists of both 16 FP32 CUDA Cores and 16 INT32 Cores. As a result of this new design, each Ampere SM partition is capable of executing either 32 FP32 operations per clock, or 16 FP32 and 16 INT32 operations per clock. All four SM partitions combined can execute 128 FP32 operations per clock, which is double the FP32 rate of the Turing SM, or 64 FP32 and 64 INT32 operations per clock.

    Doubling the processing speed for FP32 improves performance for a number of common graphics and compute operations and algorithms. Modern shader workloads typically have a mixture of FP32 arithmetic instructions such as FFMA, floating point additions (FADD), or floating point multiplications (FMUL), combined with simpler instructions such as integer adds for addressing and fetching data, floating point compare, or min/max for processing results, etc. Performance gains will vary at the shader and application level depending on the mix of instructions. Ray tracing denoising shaders are good examples that might benefit greatly from doubling FP32 throughput.

    Doubling math throughput required doubling the data paths supporting it, which is why the Ampere SM also doubled the shared memory and L1 cache performance for the SM. (128 bytes/clock per Ampere SM versus 64 bytes/clock in Turing). Total L1 bandwidth for GeForce RTX 3080 is 219 GB/sec versus 116 GB/sec for GeForce RTX 2080 Super.

    Like prior NVIDIA GPUs, Ampere is composed of Graphics Processing Clusters (GPCs), Texture Processing Clusters (TPCs), Streaming Multiprocessors (SMs), Raster Operators (ROPS), and memory controllers.

    The GPC is the dominant high-level hardware block with all of the key graphics processing units residing inside the GPC. Each GPC includes a dedicated Raster Engine, and now also includes two ROP partitions (each partition containing eight ROP units), which is a new feature for NVIDIA Ampere Architecture GA10x GPUs. More details on the NVIDIA Ampere architecture can be found in NVIDIA’s Ampere Architecture White Paper, which will be published in the coming days.

    image

    Lack of progress with encoders

    For RTX 30 Series, we decided to focus improvements on the video decode side of things and added* AV1 decode support. On the encode side, RTX 30 Series has the same great encoder as our RTX 20 Series GPU. We have also recently updated our NVIDIA Encoder SDK. In the coming months, livestream applications will be updating to this new version of the SDK, unlocking new performance options for streamers.

    By idea for NLE GPU processing performance must rise 70-80% compared to 2080 Ti (for RTX 2080).

    sa14360.jpg
    668 x 779 - 95K
  • image

    As told above it will be in very short supply.

    sa14374.jpg
    554 x 489 - 39K
  • image

    image

    sa14401.jpg
    800 x 477 - 90K
    sa14409.jpg
    800 x 504 - 52K
  • On CUDA performance

    image

    sa14419.jpg
    800 x 459 - 89K
  • It is one big problem with new RTX cards

    Ethereum mining rates are

    • RTX 2080 - 40 Mh/s
    • RTX 3080 - 115 Mh/s.

    In Asia around 90% of all cards go to miners now.

  • All three cards

    image

    sa14514.jpg
    800 x 489 - 75K
  • image

    sa14515.jpg
    595 x 809 - 155K
  • image

    image

    image

    image

    image

    sa14523.jpg
    727 x 572 - 47K
    sa14524.jpg
    742 x 540 - 63K
    sa14525.jpg
    667 x 562 - 54K
    sa14527.jpg
    800 x 391 - 62K
    sa14528.jpg
    800 x 533 - 92K
  • RTX 3080 Benchmarks

    • 3DMark Fire Strike Performance: 31919 (+25% к 2080 Ti, +43% к 2080 Super)
    • 3DMark Fire Strike Extreme: 20101 (+24% 2080 Ti, +45% 2080 Super)
    • 3DMark Fire Strike Ultra: 11049 (+36% 2080 Ti, +64% 2080 Super)
    • 3DMark Fire Strike Time Spy: 17428 (+28% 2080 Ti, +49% 2080 Super)
    • 3DMark Fire Strike Time Spy Extreme: 8548 (+38% 2080 Ti, +59% 2080 Super)
    • 3DMark Fire Strike Port Royal: 11455 (+45% 2080 Ti, +64% 2080 Super)

    In average RTX 3080 is 33% faster than RTX 2080 Ti and 54% faster than RTX 2080 Super.

  • 3090 ti or a titan based on 3090 should be grate to see its capabilities

  • RTX 3090 in the case

    image

    sa14546.jpg
    746 x 596 - 68K
  • EVGA 3090 cards

    image

    sa14556.jpg
    800 x 547 - 74K
  • More leaked benchmarks

    image

    image

    image

    sa14557.jpg
    677 x 415 - 42K
    sa14558.jpg
    678 x 371 - 37K
    sa14559.jpg
    675 x 334 - 33K
  • Good youtube video must be like this one

  • Quadro card based on Ampere will be having less disabled CUDA cores

    • GPU GA102 chip
    • 10752 fake CUDA cored (actually 5376)
    • 1860Mhz clock
    • 384 bit memory bus
    • 48Gb memory GDDR6
    • Quadro A8000 name

    Actual number of CUDA cores in GA102 is around 5500, but some can be disabled.

  • image

    As expected - design of the cooling is not best for 3080 FE.

    sa14589.jpg
    800 x 500 - 64K
  • Colorful GPUs

    image

    Note how it is first time that better process did not help much.

    One guy in industry I talked with told that Ampere GPUs will be very prone to BGA issues and board burnouts. Life of such cards if it is not used with water cooling can be limited to 2 years (if used heavily).

    sa14597.jpg
    565 x 261 - 43K
  • Stupid mining efficiency measurements

    image

    sa14610.jpg
    800 x 204 - 38K
  • PSUs

    image

    sa14642.jpg
    629 x 740 - 111K
  • Board design

    NVIDIA PG132 reference board for 3080:

    image

    NVIDIA PG132 reference board 3090:

    image

    Next time then press will tell you something about very costly top board design and expense on extra memory bus - you can spit them into their prostitute face, as board is literally the same, just less power phases.

    NVIDIA PG133 Founders Edition board:

    image

    sa14675.jpg
    800 x 457 - 91K
    sa14676.jpg
    800 x 419 - 99K
    sa14677.jpg
    800 x 483 - 89K