OpenShift node disk space disappeared
Filesystem usage displayed by df
does not match total usage displayed by du
,
e.g. 24 GB vs. ~4.7G:
[root@server1 /]# df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/rhel_server1-root 27G 24G 2.7G 90% /
[root@server1 /]# du -shx * --exclude proc --exclude sys | sort -h
0 bin
0 dev
0 home
0 lib
0 lib64
0 media
0 mnt
0 sbin
0 secrets
0 srv
4.0K tmp
13M opt
50M etc
182M root
187M boot
429M run
1.5G var
2.3G usr
Display human readable size from lsof output:
[root@server1 /]# lsof | grep /var | numfmt --field=7 --to=iec | head
Display deleted files still present on the device and consuming space:
[root@server1 /]# lsof +L1 /dev/mapper/rhel_server1-root 2> /dev/null
Combination of the above commands reveals two docker log files causing the issue:
[root@server1 /]# lsof +L1 /dev/mapper/rhel_server1-root 2> /dev/null | \
grep "/var" | \
numfmt --field=7 --to=iec | \
sort -h -k7 | \
tail -10
...snip...
dockerd-c 1997 root 1553r REG 253,0 2.1G 0 71015 /var/lib/docker/containers/<container1-id>/<container1-id>-json.log (deleted)
dockerd-c 1997 root 554r REG 253,0 5.8G 0 67454925 /var/lib/docker/containers/<container2-id>/<container2-id>-json.log (deleted)
Container logs were deleted but still using space on the filesystem.
Solution
List all file descriptors that have ‘deleted’ in the file listing:
[root@server1 /]# ls /proc/$(cat /var/run/docker.pid)/fd -l --time-style=+'%s' | \
grep -E 'deleted'
To perform the actual “cleaning”, run this:
[root@server1 /]# docker_pid=$(cat /var/run/docker.pid)
[root@server1 /]# gdb -p $docker_pid <<< "$( \
ls /proc/$docker_pid/fd -l --time-style=+'%s' | \
grep -E 'deleted' | \
awk '{ printf("p close(%s)\n", $7)}')"
Credits:
Gist with script to close leaked ‘deleted’ and ‘eventfd’ left by Docker
Git issues I came accross and helped me around: