Personal View site logo
Make sure to join PV on Telegram or Facebook! Perfect to keep up with community on your smartphone.
Threadripper PCIE-bus problem - help!
  • After upgrading to an RTX 3090 for Davinci Resolve and NeatVideo I ran into a very strange problem:

    The 3090 is not faster/sometimes a bit slower than my previous Radeon VII. I had a similar problem when trying to speed up NeatVideo with two Radeon VII - seeing no gain (even when using a special NeatVideo beta version specifically made for multi GPU use - big thanks to ABsoft).

    The support team at ABsoft/NeatVideo are really friendly and did a fantastic job in pinning down the problem: The data is sent from the CPU to the GPU with about 13 GB/s, but the return path is only about 7 GB/s on my system - which is way to slow!

    I'm using a Threadripper 2920X CPU on a Gigabyte Designare EX mainboard and as the PCIE-controller is inside the Threadripper, the problem is most likely the CPU. But is it only my CPU, is it only the 2920X or is it every Threadripper?

    I need your HELP:

    Everybody who has a Threadripper (1000- and 2000-series) and is willing to run a small benchmark tool (all in all takes less than 5 min.), please send me a PM. Any graphicscard (even old ones with DDR5 VRAM) will do, as long as its in the PCIE 16x slot.

    So, lets see if I just got unlucky with my Threadripper, or if AMD has intensionally crippled there flagship CPUs.

  • 15 Replies sorted by
  • @Psyco

    Did you try other PCIe slot?

    It can be also very specific thing, like issue with wiring, issue with repeater chips and so on.

    Can be even issue with benchmark, it can be designed such a way that GPU just can't keep up.

  • Most likely due to the way all 1/2 series threadripper handle numa.
    had a similar problem, cuda processing of r3ds. had to switch to trx40 check your bandwidth numbers with cuda-z to compare with other 3090 installs.

  • Most likely due to the way all 1/2 series threadripper handle numa.

    I never saw any mentioning of such.

    Threadripper setups frequently have lot of NVME drives installed in full size PCIe slots (4 drives with 4x), and benchmarks showed no issues on both write and read rates.

    As I said, it can be very specific things, board specific and even benchmark specific.

  • @Vitaliy_Kiselev

    Of course I did try the other 16x slot - same result.

    And no, the benchmark is specifically to test PCIE transfere speed. I used one from Nvidia, you can find it in there developer toolkit ("bandwidthTest.exe") and another one for OpenCL (just to double check).

    As far as I know, there is nothing on the mainboard between the (first) PCIE-slot and the CPU - just about 2-3 cm copper traces, no ICs.

    And besides that, I see a step backwards in performance with the RTX 3090 in real world tests (NeatVideo in Davinci Resolve), but see huge gains in games.

  • Send me details on how to perform the test with download links and I'll have a look (TR 2950x, GF 1080)

  • Here is the output from the openCL bandwidth test on my system:

    Platform 0 : NVIDIA Corporation

    Selected Platform Vendor : NVIDIA Corporation Device 0 : GeForce RTX 3090 Device ID is 00000155A8A05A70 Build: _WINxx release GPU work items: 131072 Buffer size: 33554432 CPU workers: 1 Timing loops: 100 Repeats: 1 Kernel loops: 1 inputBuffer: CL_MEM_READ_ONLY outputBuffer: CL_MEM_WRITE_ONLY copyBuffer: CL_MEM_READ_WRITE CL_MEM_USE_HOST_PTR

    AVERAGES (over loops 2 - 99, use -l for complete log)

    ---------------------------------|---------------

    PCIe B/W host->device (GBPS) | 12.9

    ---------------------------------|---------------

    PCIe B/W device->host (GBPS) | 7.02

    Passed!

  • Here's what I got:

    ---------------------------------|---------------

    PCIe B/W host->device (GBPS) 13.1
    PCIe B/W device->host (GBPS) 5.93
  • @brudney

    Looks similar.

    But I propose to try some other test, as this things are complex and it can be bottleneck not related to Threadripper at all.

  • @brudney @Vitaliy_Kiselev

    I did the same test with an old GTX 670 (Nvidia) and got the same result: 13 and 7 GB/s.

    Then did a third test with my Radeon VII (AMD):

    Selected Platform Vendor : Advanced Micro Devices, Inc.

    ---------------------------------|---------------

    PCIe B/W host->device (GBPS) | 13.6

    ---------------------------------|---------------

    PCIe B/W device->host (GBPS) | 13.7

    Strange, isn't it? The only explanation I could think of is, that AMD is deliberately screwing Nvidia graphics cards (or at least they don't bother fixing a critical bug).

  • @Psyco

    You need to dig deeper on this.

    Try talking with people doing GPGPU massive calculations and such stuff.

    As I don't see any mention of this still.

  • @Psyco

    That's interesting indeed. Maybe it's worth asking AMD on facebook/tweeter if they care to comment on it.

  • @brudney

    I have to admit, that I don't use any social media. So anybody feel free to post this on those platforms.

    @Vitaliy_Kiselev

    The problem is only visible when a huge amount of date is sent forth and back to/from the GPU to the CPU, not when theres a lot of processing on the GPU.

    Maybe its a specific mainboard type thats causing this problem? (Mine is a Gigabyte Designare EX.)

    If you know people doing heavy GPGPU stuff, please ask them.

  • Marginally related, I got a 3060 TI and I have seen zero gains over my previous 1060 in Resolve rendering.

  • @arum

    As far as I understand it has same 128bit memory bus and mainly focused on games.

    If not crypto scam you could have lot of nice options.

  • 192 memory bus in both, but RAM is faster, the actual chip is faster, there is more RAM... It should be faster, but it was not :/