Hi friends, I've got a problem regarding a shell-script and the "nvidia-smi" command! I've made a script that as protection against CPU overheating on my Ubuntu Server 14.04.2. The scripts works nicely but I need to make it work on my 4 GPU's as well. I'm pretty green when it comes to bash scripts so I've been looking for commands which would make it easy for me to edit the script. I found and tested a lot of them, but none seems to give me the output I need! I'll show you the commands and the output below. And the scripts as well. What I need is a command which lists the GPU's the same way the "sensors" command from "lm-sensors" does. So that I can use "grep" to select a GPU and set the variable "newstring" (the temp. two digits). I've been trying for a couple of days, but have had no luck. Mostly because the command "nvidia-smi -lso" and/or "nvidia-smi -lsa" doesn't exist anymore. Think it was an experimental command. Here's the commands I found and tested & the output: This command shows GPU socket number which I could put into the string "str" but the problem is that the temp. is on the next line. I've been fiddling with the flag "A 1" but haven't been able to put it into the script: Code: # nvidia-smi -q -d temperature | grep GPU Attached GPUs : 4 GPU 0000:01:00.0 GPU Current Temp : 57 C GPU Shutdown Temp : N/A GPU Slowdown Temp : N/A GPU 0000:02:00.0 GPU Current Temp : 47 C GPU Shutdown Temp : N/A GPU Slowdown Temp : N/A GPU 0000:03:00.0 GPU Current Temp : 47 C GPU Shutdown Temp : N/A GPU Slowdown Temp : N/A GPU 0000:04:00.0 GPU Current Temp : 48 C GPU Shutdown Temp : N/A GPU Slowdown Temp : N/A [/CODE] This command shows the temp in the first line, but there's no GPU number!? Code: # nvidia-smi -q -d temperature | grep "GPU Current Temp" GPU Current Temp : 58 C GPU Current Temp : 47 C GPU Current Temp : 47 C GPU Current Temp : 48 C This command shows the GPU number you select, but there's still no output showing the GPU numer/socket/ID!? Code: # nvidia-smi -q --gpu=0 | grep "GPU Current Temp" GPU Current Temp : 59 C And this commands shows the GPU number and the results in the same row!! But, no temperature!! Code: # nvidia-smi -L GPU 0: GeForce GTX 750 Ti (UUID: GPU-9785c7c7-732f-1f51-..........) GPU 1: GeForce GTX 750 (UUID: GPU-b2b1a4a-4dca-0c7f-..........) GPU 2: GeForce GTX 750 (UUID: GPU-5e6b8efd-7531-777c-..........) GPU 3: GeForce GTX 750 Ti (UUID: GPU-5b2b1a2f-3635-2a1c-..........) And a command which shows all 4 GPU's temp. without anything else. But still I need the GPU number/socket/ID!? Code: # nvidia-smi --query-gpu=temperature.gpu --format=csv,noheader 58 47 47 48 What I'm wishing for! If I could get a command which made a output like this I would be the happiest guy around: Code: GPU 0: GeForce GTX 750 Ti GPU Current Temp : 58 C GPU 1: GeForce GTX 750 GPU Current Temp : 47 C GPU 2: GeForce GTX 750 GPU Current Temp : 47 C GPU 3: GeForce GTX 750 Ti GPU Current Temp : 48 C Here's the output that "sensors" from "lm-sensors". As you can see the unit info and the temp is in the same line: Code: # ----------------------------------------------------------- # coretemp-isa-0000 # Adapter: ISA adapter # Physical id 0: +56.0°C (high = +80.0°C, crit = +100.0°C) # Core 0: +56.0°C (high = +80.0°C, crit = +100.0°C) # Core 1: +54.0°C (high = +80.0°C, crit = +100.0°C) # Core 2: +54.0°C (high = +80.0°C, crit = +100.0°C) # Core 3: +52.0°C (high = +80.0°C, crit = +100.0°C) # ----------------------------------------------------------- Here's the part of the script that needs changing. As mentioned in the top, this works using the command "sensors" from the application "lm-sensors". "lm-sensors" doesn't show GPU temp. when running CUDA and the driver attached, so we need another command to get the GPU's listed and the temp. shown. You may know another way to fix my problem, if please don't hesitate to show me.: Code: [...] echo "JOB RUN AT $(date)" echo "=======================================" echo '' echo 'CPU Warning Limit set to => '$1 echo 'CPU Shutdown Limit set to => '$2 echo '' echo '' sensors echo '' echo '' for i in 0 1 2 3 do str=$(sensors | grep "Core $i:") newstr=${str:17:2} if [ ${newstr} -ge $1 ] then echo '====================================================================' >>/home/......../logs/watchdogcputemp.log echo $(date) >>/home/......../logs/watchdogcputemp.log echo '' >>/home/......../logs/watchdogcputemp.log echo ' STATUS WARNING - NOTIFYING : TEMPERATURE CORE' $i 'EXCEEDED' $1 '=>' $newstr >>/home/......../logs/watchdogcputemp.log echo ' ACTION : EMAIL SENT' >>/home/......../logs/watchdogcputemp.log echo '' >>/home/......../logs/watchdogcputemp.log echo '====================================================================' >>/home/......../logs/watchdogcputemp.log # Status Warning Email Sending Code # WatchdogCpuTemp Alert! Status Warning - Notifying!" /usr/bin/msmtp -d --read-recipients </home/......../shellscripts/messages/watchdogcputempwarning.txt echo 'Email Sent.....' fi [...] I hope there's a bash-script guru out there, ready to solve this issue Have a nice weekend! Kind Regards, Dan Hansen Denmark .
Hi, No one had a suggestion, but I've solved the problem with the help of a few guys from Ubuntu Forum. Here's the solution to the problem for other to learn from: Problem seems to be solved for the moment! I've got a response from ubuntu forum and one suggestion solved the issue. For others to use, here's how we did it and the way we came to the solution. My thanks to "Terdon": http://askubuntu.com/questions/638665/shell-script-nvidia-smi-needs-right-command-flag/641828#641828 For others to see I'll and learn of this here's the results on my Ubuntu Server 14.04 This one looks like this on my system: Code: # nvidia-smi -q -d temperature | awk '{if(/C$/){print last,$0};last=$0};' Temperature GPU Current Temp : 53 C Temperature GPU Current Temp : 45 C Temperature GPU Current Temp : 52 C Temperature GPU Current Temp : 51 C And this one, which is just PERFECT looks like this on my system: Code: # nvidia-smi -q -d temperature | grep GPU | perl -pe '/^GPU/ && s/\n//' | grep ^GPU GPU 0000:01:00.0 GPU Current Temp : 53 C GPU 0000:02:00.0 GPU Current Temp : 45 C GPU 0000:03:00.0 GPU Current Temp : 52 C GPU 0000:04:00.0 GPU Current Temp : 51 C Here I've got the GPU text to "grep" in my script. I've got the GPU socket ID and last but not least I've got the temperature in the same line! Exactly what I asked for. I humbly bow