Not logged in, Join Here! or Log In Below:  
 
News Articles Search    
 

 Home / General Programming / RDTSC on multiple cpu's Account Manager
 
Archive Notice: This thread is old and no longer active. It is here for reference purposes. This thread was created on an older version of the flipcode forums, before the site closed in 2005. Please keep that in mind as you view this thread, as many of the topics and opinions may be outdated.
 
bmh

January 12, 2005, 05:28 PM

I'm about to use RDTSC as a time measurement, to store when an object was last modified. I want to use it as a common time stamp that is shared by several different objects. But I realized that a multiple CPU machine might give different RDTSC's for the different CPU's. Does anyone know if the values are somehow synchronized or if one can be certain that they will remain in sync?

Thanks.

 
Chad Austin

January 12, 2005, 07:07 PM

I was curious about this too, and found this thread via google: http://softwareforums.intel.com/ids/board/message?board.id=42&message.id=267&page=3

There's also http://www.dre.vanderbilt.edu/Doxygen/Current/html/ace/classACE__High__Res__Timer.html [search for RDTSC in that page]

How do people address this problem?

 
John Schultz

January 12, 2005, 10:45 PM

bmh wrote: I'm about to use RDTSC as a time measurement, to store when an object was last modified. I want to use it as a common time stamp that is shared by several different objects. But I realized that a multiple CPU machine might give different RDTSC's for the different CPU's. Does anyone know if the values are somehow synchronized or if one can be certain that they will remain in sync? Thanks.


My RDTSC-based code works OK on dual-processor hardware. It would appear they are either synchronized, or since the code is called from the same thread it's not an issue.

 
bmh

January 13, 2005, 08:39 AM

Thanks for the links.

I am specifically afraid of the multiple thread issue, so I guess RDTSC is no good then. I'm just using a plain old integer increment counter instead, which is just as good.

 
Goz

January 14, 2005, 06:38 AM

Could always use QueryPreformanceCounter if you are using windows ... thats GUARANTEED to work :)

 
Rui Martins

January 14, 2005, 11:24 AM

... if you are using windows ... thats GUARANTEED to work :)


So funny I almost fell of my chair 8)

 
janwas

January 21, 2005, 09:19 PM

I've thought about a solution: we could maintain per-processor counter values, and have each call to the timer report delta WRT the current CPU's (*) counter.
However, I don't have a multiprocessor system to test on, so I haven't tried this yet. RDTSC is simply not used if SMP is detected.

*: determine via APIC ID or GetCurrentProcessorNumber (Win Server 2003 only)

 
bmh

January 22, 2005, 12:22 PM

The supposed beauty of RDTSC is that it's system-wide so there's no need for global variables. If you're going to try and synchronize the different cpu's with some sort of global timer then you might as well just have the global timer keep it's own 'time' value which is incremented whenever somebody asks for the 'time'.

 
Axel

January 22, 2005, 04:11 PM

*: determine via APIC ID or GetCurrentProcessorNumber (Win Server 2003 only)
That won't work, because the task switch can occure after reading the value ;)

 
John Schultz

January 22, 2005, 07:49 PM

From the Intel specs: the RDTSC counter is set to 0 when the processors are reset. It increments monotonically at the rate of the processor frequency (not affected by instructions or processor state). All processors are reset at the same time and all must run at the same frequency. How do the counters get out of sync? Please post a simple example showing a multiprocessor RDTSC problem (so far I have not seen an issue with RDTSC on multiprocessor systems).

If a problem does exist, you could start a special timer thread and lock to one processor (and process all of your timing requests through the thread using event signaling, etc.). See SetThreadAffinityMask(). If your code has two threads, for example, you could lock one thread to each processor, and use RDTSC values exclusively in the respective threads.

 
janwas

January 23, 2005, 10:37 AM

That won't work, because the task switch can occure after reading the value

Determining current CPU, reading its TSC, and subtracting/updating the last value must be atomic, then all is well.

All processors are reset at the same time and all must run at the same frequency. How do the counters get out of sync?

If the processors are running on different clock signals, they may drift independently. The crystals used there have ballpark 200ppm error, IIRC; depends on your app how much time error you can tolerate.

If a problem does exist, you could start a special timer thread and lock to one processor (and process all of your timing requests through the thread using event signaling, etc.). See SetThreadAffinityMask(). If your code has two threads, for example, you could lock one thread to each processor, and use RDTSC values exclusively in the respective threads.

Yes, but that's not worth the trouble IMO. RDTSC is doomed in the long run, as even desktop CPUs now support power throttling. I only use it when available and convenient, as a stopgap until the HPET is widely available.

 
John Schultz

January 24, 2005, 01:07 PM

janwas wrote: Determining current CPU, reading its TSC, and subtracting/updating the last value must be atomic, then all is well. If the processors are running on different clock signals, they may drift independently. The crystals used there have ballpark 200ppm error, IIRC; depends on your app how much time error you can tolerate. Yes, but that's not worth the trouble IMO. RDTSC is doomed in the long run, as even desktop CPUs now support power throttling. I only use it when available and convenient, as a stopgap until the HPET is widely available.


1. Can you post a link to a motherboard that uses different clock signals per processor? After reading the Intel specs, it does not appear likely that a motherboard would use two separate clock signals (using two PLL's, extra cost, etc.), as both (or N) processors appear locked together. That is, there does not appear to be a provision to allow the processors to run at different frequencies: they both appear to run on the same clock. I could also find no warnings regarding RDTSC and multiprocessor systems on the Intel website.

2. I use RDTSC only for benchmarking code. For game simulation time, I use QueryPerformanceCounter() with provisions to handle timer wrap. Regardless of certain vendor implementation bugs, QueryPerformanceCounter()+QueryPerformanceFrequency() is the most reliable way to get time WRT CPU throttling, etc. If an implementation is found to be buggy (or the high performance timer is not implemented), timeGetTime() can be used as a fallback (see also timeBeginPeriod()/timeEndPeriod() for increasing timer resolution with timeGetTime()).

3. Please post a short code example showing the RDTSC bug on multiprocessor systems (I'll test it) or a link (to Intel or AMD), showing that it's possible for the processors to run at different frequencies. If the processors could run at different frequencies, synchronization to the shared busses and RAM access would be more complicated (doesn't seem likely). To date, I have not found any information that shows that it is possible for the processors to get out of sync or any evidence that a problem exists with RDTSC on multiprocessor systems.

 
janwas

January 24, 2005, 09:55 PM

After reading the Intel specs, it does not appear likely that a motherboard would use two separate clock signals (using two PLL's, extra cost, etc.), as both (or N) processors appear locked together. That is, there does not appear to be a provision to allow the processors to run at different frequencies: they both appear to run on the same clock. I could also find no warnings regarding RDTSC and multiprocessor systems on the Intel website.

It seems we are reading different specs :)
http://developer.intel.com/design/pentium/datashts/24201606.pdf

"B.8 Supporting Unequal Processors
Some MP operating systems that exist today do not support processors of
different types, speeds, or capabilities. However, as processor lifetimes
increase and new generations of processors arrive, the potential for
dissimilarity among processors increases. The MP specification addresses this
potential by providing an MP configuration table to help the operating system
configure itself. Operating system writers should factor in processor
variations, such as processor type, family, model, and features, to arrive at
a configuration that maximizes overall system performance. At a minimum, the
MP operating system should remain operational and should support the common
features of unequal processors."

2. I use RDTSC only for benchmarking code. For game simulation time, I use QueryPerformanceCounter() with provisions to handle timer wrap. Regardless of certain vendor implementation bugs, QueryPerformanceCounter()+QueryPerformanceFrequency() is the most reliable way to get time WRT CPU throttling, etc. If an implementation is found to be buggy (or the high performance timer is not implemented), timeGetTime() can be used as a fallback (see also timeBeginPeriod()/timeEndPeriod() for increasing timer resolution with timeGetTime()).

Agreed. But why not go with RDTSC if you determine it's safe (i.e. no SMP, no throttling)?

To date, I have not found any information that shows that it is possible for the processors to get out of sync or any evidence that a problem exists with RDTSC on multiprocessor systems.

I can only pass on second-hand reports I have read to this effect. Here's one that came up as the very first google search result for: smp drift rdtsc:
http://www.ussg.iu.edu/hypermail/linux/kernel/9902.3/0161.html
http://www.ussg.iu.edu/hypermail/linux/kernel/9902.3/0168.html

 
John Schultz

January 25, 2005, 12:57 AM

I believe the tech doc I read was newer than the 1997 doc you have referenced. However, this sentence is key in stating that the processors must be synchronized with a common frequency (states the same concept in the tech doc I read):

At a minimum, the MP operating system should remain operational and should support the common features of unequal processors.


Thus, both the older and newer tech docs state the same thing: the processors must share a common clock (frequency).

The links you have provided state that the processors should stay in sync (386's), but that current implementations were buggy (on some motherboards). They reset the counters at startup as a work-around (after which everything stays synced).

From the links provided:

1. There is no drift, the processors are synchronized.
2. Early 386 motherboards did not properly reset the TSC at reset. A software fix was provided in the OS.

janwas wrote: Agreed. But why not go with RDTSC if you determine it's safe (i.e. no SMP, no throttling)?


Perhaps this is a good reason:

janwas wrote: Yes, but that's not worth the trouble IMO. RDTSC is doomed in the long run, as even desktop CPUs now support power throttling. I only use it when available and convenient, as a stopgap until the HPET is widely available.


It appears id used timeGetTime() and timeBeginPeriod(1) for Quake 3.

 
janwas

January 25, 2005, 07:30 AM

However, this sentence is key in stating that the processors must be synchronized with a common frequency

Hehe, now we are bandying semantics :) I agree this paragraph is vague and ambiguous, but it references asymmetric hardware (explicitly listing 'speed' and 'type' differences) and then goes on to say the OS should cope. To me, that means the OS shouldn't be assuming the TSCs run in lockstep.

Thus, both the older and newer tech docs state the same thing: the processors must share a common clock (frequency).

I can't agree with that, because I have read of 2 different CPUs actually running on different clocks!

The links you have provided state that the processors should stay in sync (386's), but that current implementations were buggy (on some motherboards). They reset the counters at startup as a work-around (after which everything stays synced). 2. Early 386 motherboards did not properly reset the TSC at reset. A software fix was provided in the OS.

I believe you misread what was said. In the Linux world, "i386" means IA-32; 386 CPUs didn't even have a TSC. This is still relevant today.
Now as to the non-simultaneous-init issue, I wonder if Windows also provides a fix for it. (don't have time to dig through the manual ATM, but I assume the problem is TSC getting initialized (only|again) upon SIPI - when the 'boss' CPU says everyone else can start up)


Backing up a step here, we both have a different reading of the dox. However, it's pretty much moot in the presence of documented flaws/bugs. Why assume lockstep, instead of going the safe route?

If you get a bad reading and your delta-time ends up negative, things may blow up, depending on the app.


Perhaps this is a good reason: It appears id used timeGetTime() and timeBeginPeriod(1) for Quake 3.

While id has written good software, they surely are not invincible - for instance, Doom3 has a seriously flawed RNG. timeGetTime has been reported to slowly fall behind, due to lost or delayed hardware timer interrupts. (see http://www.gamedev.net/reference/articles/article2086.asp)

This time, I managed to track down what I had in mind:

"In retrospect, timeGetTime() slippage is almost certainly what is killing many networked games of StarTopia - they start out fine but after half an hour or so of play they get too far out of sync and die. The system will cope with a certain amount of lagginess, but gives up at about five seconds, assuming that your network connection is stuffed. And of course all our testing in the office was on mainly one or two very similar models of Dell, so we never saw this problem, even after many hours. Curse those hardware designers!"
-- Tom Forsyth, Muckyfoot - RE: [GD-Windows] More timer fun! (3/9/2002)

 
John Schultz

January 25, 2005, 09:58 AM

I can't agree with that, because I have read of 2 different CPUs actually running on different clocks!


Can you post a link to the hardware?

What motherboard(s) do not properly initialize the TSC at reset?

What motherboard(s) run multi-processors at different frequencies?

With respect to timer synchronization: it appears StarTopia did not provide a clock synchronization method while attempting to run lock-step. That is a software design problem, as all clocks will drift relative to one another (even atomic clocks). The only way you can have zero drift is to have zero error.

 
janwas

January 25, 2005, 12:00 PM

Can you post a link to the hardware?
Nope, sorry. Don't have the time ATM to search around; exactly which ones have bugs isn't of great interest to me, either.

With respect to timer synchronization: it appears StarTopia did not provide a clock synchronization method while attempting to run lock-step. That is a software design problem, as all clocks will drift relative to one another (even atomic clocks). The only way you can have zero drift is to have zero error.

Agreed. However, I was only trying to point out that TGT isn't safe (without further work), not that it would have been better for StarTopia to sync the multiplayer clocks. Incidentally, I wonder how many games bother doing this?

 
John Schultz

January 30, 2005, 02:53 PM

From: http://www.intel.com/design/pentiumii/specupdt/24333749.pdf

Mixed Steppings in DP Systems Intel Corporation fully supports mixed steppings of Pentium II processors. The following list and processor matrix describes the requirements to support mixed steppings: While Intel has done nothing to specifically prevent processors operating at differing frequencies from functioning within a dual processor system, there may be uncharacterized errata that exist in such configurations. Intel does not support such configurations. In mixed stepping systems, all processors must operate at identical frequencies (i.e., the highest frequency rating commonly supported by all processors).


This may explain the difficulty in finding a single MP motherboard that could exhibit RDTSC drift. A fixed TSC offset error may have been a bug in some motherboards/drivers circa 1999, but can be fixed via an OS patch. A properly implemented (or patched) MB+OS should provide synchronized RDTSC operation on MP systems.

Given that RDTSC is based on the CPU clock frequency, it should only be used for benchmarking and testing (since modern processors can change speed during operation. However, while a game is running at full speed, CPU throttling is unlikely unless required by a thermal or battery/power issue (or the user forces a change during play via hotkey, etc.).

The OS time functions are typically implemented through a hardware timer separate from the CPU, such as the 8253 timer ( http://courses.ece.uiuc.edu/ece390/books/labmanual/io-devices-timer.html ). Errors can occur with improper implementation of handling interrupts in the OS/drivers. For example, a few years ago I found the timer wrapping on VIA Apollo-based motherboards. The following fix solved the problem in my timer code:

  1.  
  2.   FLT seconds(void) {
  3.     if (!hiPerf) initTicsPerSec();
  4.     LARGE_INTEGER tt;
  5.     QueryPerformanceCounter(&tt);
  6.     tics = tt;
  7. #ifdef TEST_TIMER_WRAP
  8.    tics.QuadPart &= 0xffffffff;
  9. #endif
  10.    LONGLONG deltaTics = tics.QuadPart - lastTics.QuadPart;
  11.     if (deltaTics >= 0) {
  12.       totalTics.QuadPart += deltaTics;
  13.     } else { // Counter must have wrapped: broken Win32 timer code not
  14.              // updating upper 48 bit system tics?
  15.       if (lastTics.QuadPart <= 0xffff) { // Assuming 16 bit countdown timer (1.19Mhz 8253 timer)
  16.         totalTics.QuadPart += 0xffff - lastTics.QuadPart + tics.QuadPart + 1;
  17.       } else if (lastTics.QuadPart <= 0xffffffff) { // 32 bit wraparound
  18.         totalTics.QuadPart += 0xffffffff - lastTics.QuadPart + tics.QuadPart + 1;
  19.       } else { // Must be 64 bit wraparound! (generalization, should not happen).
  20.         totalTics.QuadPart += 0xffffffffffffffff - lastTics.QuadPart + tics.QuadPart + 1;
  21.       } // if
  22.     } // if
  23.     lastTics.QuadPart = tics.QuadPart;
  24.     return getFLT(totalTics)*timeScale*fiTicsPerSec;
  25.   } // seconds
  26.  


This type of problem may explain why id chose to use timeGetTime() instead of QueryPerformanceCounter(). timeGetTime() provides plenty of resolution for game timing. Online games that have timing issues are not properly designed (the networked game system must use a master clock and implement clock synchronization).

A game that uses multithreading can use WaitForSingleObject() with an an appropriate timeout value to provide sufficient timing quality while running the game simulation at a fixed rate. This method can be implemented without using any of the OS timer functions (though all timer systems are likely based on the same timer hardware). However, if the target hardware cannot keep up, the game will slow down. This may not be a problem, as some games play better with smooth slowdown of this nature as opposed to higher-speed choppiness.

 
prokopowicz

February 11, 2005, 09:54 AM

---------------
This may explain the difficulty in finding a single MP motherboard that could exhibit RDTSC drift. A fixed TSC offset error may have been a bug in some motherboards/drivers circa 1999, but can be fixed via an OS patch. A properly implemented (or patched) MB+OS should provide synchronized RDTSC operation on MP systems.

----------------

The problem I'm seeing is probably not drift but TSC offsets in MP systems from Dell and IBM that are circa 2005 - some of them pre-production demo systems and all very up to date. The OS is RedHat Linux AS 3.0. I have a simple test program that clearly shows deltas in the RDTSC values retrieved by different threads. Actually, it shows that the RDTSC values can go backwards in time depending on which CPU is read.

What I'm wondering is: has Windows solved the problem by synchronizing the CPUs in a way that Linux has not? Does anyone have a pointer to documentation on this?

thanks,
Pete

 
John Schultz

February 11, 2005, 11:50 AM

Hi Pete,

The only link I have seen was the reference from 1999 (for Linux, where it appears the problem was solved with a kernel patch). Can you dual-boot into Win32? Try another version of Linux? Have you contacted RH support?

 
This thread contains 20 messages.
 
 
Hosting by Solid Eight Studios, maker of PhotoTangler Collage Maker.