Post

FreeBSD microvm Clocks

FreeBSD microvm Clocks

This post discusses the current state of TSC and PV clock usage in the MICROVM build of FreeBSD, relays the basics of the cpuid instruction in x86, and gives a brief overview of TSC and PV clock in x86.

PV Clock Investigation

We start with the following query in the FreeBSD source root directory:

1
grep -R "pvclock\.c"

As of now, besides some binary files we don’t care about, sys/conf/files.x86 is the only file returned in the query. Looking into the config, we note two important lines:

1
2
x86/x86/pvclock.c		optional	kvm_clock | xenhvm
x86/x86/tsc.c			standard

In sys/amd64/conf/MICROVM, we see that kvm_clock is already used in the build.

1
device		kvm_clock

Let’s reintroduce the xentimer device in sys/amd64/conf/MICROVM.

1
device		xentimer		# Xen x86 PV timer device

We can write a Python script to set a breakpoint on every function in sys/x86/x86/pvclock.c.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import gdb

GDB_PORT = 1234
KERNEL_FILE = "/usr/obj/<path-to-freebsd-source>/amd64.amd64/sys/MICROVM/kernel"

PVCLOCK_FUNCTIONS = [
    "pvclock_resume",
    "pvclock_tsc_freq",
    "pvclock_read_time_info",
    "pvclock_read_wall_clock",
    "pvclock_getsystime",
    "pvclock_get_timecount",
    "pvclock_get_wallclock",
    "pvclock_cdev_open",
    "pvclock_cdev_mmap",
    "pvclock_tc_get_timecount",
    "pvclock_tc_vdso_timehands",
    "pvclock_tc_vdso_timehands32",
    "pvclock_gettime",
    "pvclock_init",
    "pvclock_destroy",
]

gdb.execute(f"target remote localhost:{GDB_PORT}")
gdb.execute(f"add-symbol-file {KERNEL_FILE}")

for func in PVCLOCK_FUNCTIONS:
    gdb.execute(f"break {func}")

We also need a script to enable GDB debugging on a FreeBSD microvm guest, which we can get by slightly modifying our microvm launch script from before.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
qemu-system-x86_64 -M microvm                      \
    -cpu max                                       \
    -m ${memory}                                   \
    -smp ${cores}                                  \
    -kernel ${kernel}                              \
    -append ${bootargs}                            \
    -nodefaults                                    \
    -no-user-config                                \
    -nographic                                     \
    -serial stdio                                  \
    -drive id=test,file=${disk},format=raw,if=none \
    -device virtio-blk-device,drive=test           \
    -machine acpi=off                              \
    -s -S

We use the Python script by running microvm_debug.sh in one window, opening GDB in another, and executing the command

1
source gdb_script.py

Unfortunately, none of the breakpoints are triggered when continuing after sourcing the script. In fact, for whatever reason, just like Tom was initially, we are once again stopped at the assertion

1
2
	KASSERT((cpu_feature & CPUID_TSC) != 0 && tsc_freq != 0,
	    ("TSC not initialized"));

on line 547 of sys/x86/x86/local_apic.c. Since CPUID_TSC is defined as 0x00000010, we can set a breakpoint at the assert in GDB and run

1
p/x cpu_feature & CPUID_TSC

The output is 0x10, so the VM is equipped with TSC. Hence, the issue is the TSC frequency being set to 0, and we can look in the function tsc_freq_cpuid_vm to investigate.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
static int
tsc_freq_cpuid_vm(void)
{
	u_int regs[4];

	if (vm_guest == VM_GUEST_NO)
		return (false);
	//if (hv_high < 0x40000010)
		//return (false);

	do_cpuid(0x40000010, regs);
	tsc_freq = (uint64_t)(regs[0]) * 1000;
	tsc_early_calib_exact = 1;
	return (true);

Looking at do_cpuid, it’s simply an assembly instruction.

1
2
3
4
5
6
7
static __inline void
do_cpuid(u_int ax, u_int *p)
{
	__asm __volatile("cpuid"
	    : "=a" (p[0]), "=b" (p[1]), "=c" (p[2]), "=d" (p[3])
	    :  "0" (ax));
}

We can find information on the cpuid instruction at https://wiki.osdev.org/CPUID, https://www.felixcloutier.com/x86/cpuid, and https://tizee.github.io/x86_ref_book_web/instruction/cpuid.html. The latter two pages feature C-like psuedocode for what the function outputs in eax, ebx, ecx, and edx for some eax input cases. However, we don’t get any information about an input of 0x40000010. From the surrounding context in this particular application, we should be getting some form of the TSC frequency in regs[0] after executing the instruction. Of course, the register value is 0, and no amount of scaling will make it not zero, so we fail the check expecting tsc_freq to be nonzero.

However, this tsc_freq variable doesn’t seem to do anything to stop us from booting except for causing the KASSERT to fail. If we set a breakpoint, set tsc_freq to a nonzero value, and continue, the OS proceeds with a slow boot to the login screen. With this information, we can edit our Python script from before.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import gdb

GDB_PORT = 1234
KERNEL_FILE = "/usr/obj/<path-to-freebsd-source>/amd64.amd64/sys/MICROVM/kernel"

PVCLOCK_FUNCTIONS = [
    "pvclock_resume",
    "pvclock_tsc_freq",
    "pvclock_read_time_info",
    "pvclock_read_wall_clock",
    "pvclock_getsystime",
    "pvclock_get_timecount",
    "pvclock_get_wallclock",
    "pvclock_cdev_open",
    "pvclock_cdev_mmap",
    "pvclock_tc_get_timecount",
    "pvclock_tc_vdso_timehands",
    "pvclock_tc_vdso_timehands32",
    "pvclock_gettime",
    "pvclock_init",
    "pvclock_destroy",
]

gdb.execute(f"target remote localhost:{GDB_PORT}")
gdb.execute(f"add-symbol-file {KERNEL_FILE}")

for func in PVCLOCK_FUNCTIONS:
    gdb.execute(f"break {func}")

gdb.execute("break local_apic.c:547")
gdb.execute("continue")
gdb.execute("set tsc_freq = 1")
gdb.execute("continue")

None of the pvclock* breakpoints are tripped, so this at least tells us that PV clock isn’t utilized in the build. This is true regardless of whether the xentimer device is included in the config file, that change from earlier has been reverted.

We proceed to the login prompt just as before. However, just after uart0 is initialized, the time it takes for Statistical lapic calibration failed! Clocks might be ticking at variable rates. to print is much greater than everything else, leading Tom to believe that resolving this error message is what the rest of this project will be spent on.

CPUID Vendor ID

Out of curiosity, let’s see the information we get from setting the cpuid input to 0x80000000.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import gdb

GDB_PORT = 1234
KERNEL_FILE = "/usr/obj/<path-to-freebsd-source>/amd64.amd64/sys/MICROVM/kernel"

gdb.execute(f"target remote localhost:{GDB_PORT}")
gdb.execute(f"add-symbol-file {KERNEL_FILE}")

gdb.execute("break tsc_freq_cpuid_vm")
gdb.execute("continue")
for _ in range(3):
    gdb.execute("nexti")
gdb.execute("set $eax = 0x80000000")
gdb.execute("nexti")

print("EAX: " + hex(int(gdb.parse_and_eval("$eax"))))
ebx_bytes = int(gdb.parse_and_eval("$ebx")).to_bytes(4, "little")
ecx_bytes = int(gdb.parse_and_eval("$ecx")).to_bytes(4, "little")
edx_bytes = int(gdb.parse_and_eval("$edx")).to_bytes(4, "little")
print("Vendor ID String: "
      + ebx_bytes.decode()
      + edx_bytes.decode()
      + ecx_bytes.decode())

The last two lines of output are

1
2
EAX: 0x8000000a
Vendor ID String: AuthenticAMD

Unfortunately, this isn’t useful information, as the vendor ID string is already printed during initialization. In fact, soon after Origin="AuthenticAMD" is printed as part of the CPU information, Hypervisor: Origin = "TCGTCGTCGTCG" is printed, which we know denotes the QEMU vendor ID string from the OSDev wiki page linked earlier.

x86 Clocks

According to Wikipedia, the TSC is a 64-bit register on x86 processors which keeps track of the number of CPU cycles since reset.

Even though there is no author and the publish date is March 1, 2014, the blog post at https://cyberdeed.wordpress.com/2014/03/01/virtualization-mode-hvm-pv-pvhvm-pvh/ is a decent summary of virtualization modes, and I have cross referenced it with other sources to ensure accuracy. A paravirtualized machine knows its running in a virtual environment, and is provided hooks to the host system’s resources through specialized interfaces. This technique reduces the need to emulate every physical device the virtual guest requires, which may have significant performance gains due to the cost of emulation. PV Clock is a timing mechanism based off this idea.

QEMU provides a list of paravirtualized KVM features at https://www.qemu.org/docs/master/system/i386/kvm-pv.html. On the surface, it seems we can only take advantage of these in Linux, since KVM is a Linux module. Of course, I would need to dig deeper to know if this is actually the case.

Errata

I have changed the config sys/amd64/conf/QEMUMICROVM to sys/amd64/conf/MICROVM to align with NetBSD’s microvm implementation. Previous posts referencing this file have been updated accordingly.

This post is licensed under CC BY 4.0 by the author.