Rob Arnold’s Blog

witty tagline

Archive for the 'mozilla' Category

Measuring performance

TL;DR: Dates are terrible; use timestamps. Don’t expect good Date.now() resolution from Safari/Opera/IE on Windows.

Introduction

A common testing method in benchmarks, including our own Talos tests, is to measure and compare performance by recording how long it takes to run a given set of tests. In JavaScript, this is often done by using Date.now() to record the start time and the end time. The duration is computed by subtracting the start from the end. This seemingly simple method has hidden complexities and imposes undue requirements upon the Date.now() implementation.

Assumptions

Let’s take a step back to consider the assumptions this testing method makes upon Date.now(). For clarity and simplicity, I assume that time travel is impossible and relativity doesn’t come in to play. Thus if we were to have some function called continuously that represents the current time as a number, it would be continuous and increasing with a constant slope. (Ok, this is starting to fall into the realm of philosophy and physics). These are the two properties we want in our function that records the start and end times for our benchmark.

Dates

Dates do not have these properties. In the notation used by humans for dates, the representations are ambiguous and finding the duration between two dates can be complicated in that notation. Even with the notation used internally by the computer, issues still arise.

DST

In the United States on days where daylight savings changes, 2am occurs twice or never. This breaks the continuous and increasing properties. Switching to UTC avoids this, but having to convert from UTC into your local timezone is burdensome. We can have the computer automate this, sure, but local times are still ambiguous if that’s how we look at them. It’s hard to tell duration when you have to factor DST into account. This is furthermore complicated by the fact that the laws governing DST change frequently and vary from country to country. Converting from UTC thus depends on having a system that has up to date DST information and of course a correct implementation. So if you are looking at performance results on an out of date system, you may see different durations than on an up to date system. How many Linux/UNIX/Java/PHP users have up to date zoneinfo? Microsoft pushes out DST updates via Windows Update and Apple presumably does the same via their Software Update. Older systems like Windows 98, OS 10.2 and most Linux distributions from more than a few years ago no longer receive updates. So when doing performance tests (or checkins around the DST change), dates can be problematic because you cannot assume that there are 24 hours in a day. Thanks to leap seconds, you cannot even assume 60 seconds per minute. I’m not sure that any operating system supports them though any box using Unix time does not.

This matters why?

But computers don’t use human notation for the dates. Windows, OSX, and Linux all represent dates as an offset from a epoch so why bother concerning ourselves with the human representation? User interface matters. Start/stop dates for the tinderbox are important to get right to track regressions and connect them to checkins (though with hg, figuring out the changeset is much easier). Like DST, timezones are problematic for date representation, but so long as they are clearly marked and the user is aware of the timezone, it is not too problematic. Still, many apps are not aware of timezone changes (Thunderbird included sadly) so this small burden remains on the user to be aware of which timezone apps are using.

Still not enough

Supposing that the DST problems are overcome, there remains the problem of clock drift. Nearly every system routinely synchronizes time via NTP with some master time servers. This results in small and usually unnoticeable corrections to the system time which break our desired properties. One might argue that such offsets are rare and small and thus hardly noticeable, but they are on the order of milliseconds to 100s of milliseconds which means that they can come up and make a difference. Turning off NTP sync is undesirable since clock sync between computers is important for coordinating distributed tasks and having a reliable clock to look at to find out if you’re late to a meeting or not.

Resolution

Which brings up the next concern: time resolution. Put yourself in an operating system architect’s shoes for a moment. You’re designing an API and internal structures to keep track of the time. Let’s look at some use cases using Windows as an example.
1) The taskbar clock - this updates every minute or so
2) The analog clock in Vista - this updates once a second to draw the second hand
3) A fancy analog clock - this updates at a continuous 60 frames per second to draw a smoothly animating second hand
I cannot think of any other use cases for needing knowing the whole date at a rate faster than 60 fps (~16 ms). Coincidently (or maybe not), the Windows system clock updates roughly every 15.6ms on most NT based systems. And given the use cases discussed above it is a reasonable rate.

However, once we start using Date.now() for performance tests, 15.6ms is not fast enough. Some of the SunSpider benchmarks from Apple run in just a few ms. Either these benchmarks need to be made longer or the resolution of Date.now() needs to increase. There are drawbacks to both. Increasing the test length means that the Talos boxes take longer to run and so regressions take longer to find and we need more machines to keep up with the rate of checkins. Also, sitting around for hours waiting to find out if your changesets regressed performance is no fun. Looking at the second option, it’s not immediately clear how to increase the resolution on Windows’ GetSystemTimeAsFileTime API. This was my challenge in bug 363258. OSX and Linux have a 1ms resolution via gettimeofday() so this is a Windows-only issue. Just as Windows/OSX/UNIX store dates as an offset from an epoch, so can we. There are a few methods for measuring durations in Windows: GetTickCount, timeGetTime and QueryPerformanceCounter. There are drawbacks to all of them.

GetTickCount

GetTickCount returns the number of milliseconds since the NT kernel booted. A tick is the timer interrupt from the APIC on x86 systems (and because Mozilla doesn’t run on the other platforms supported by NT I won’t consider them). At each tick, the system time is updated by a certain amount as specified by the Get/SetSystemTimeAdjustment or “the system’s internal adjustment mechanisms.” This computed time since boot is stored into a shared readonly page mapped into each process so the API call is fast since it just reads from that page, but resolution is still limited by the kernel tick rate. Furthermore, the value returned a DWORD (32bit int) so every 49.7 days, the value will overflow. There is a GetTickCount64 which does not have that problem but it is Vista only.

timeGetTime

timeGetTime() suffers from the same problems as GetTickCount by default. By calling timeBeginPeriod(1), the kernel tick rate is increased to 1ms sometime after the call returns. In my brief testing with this function, I found it took effect immediately on Vista and one or two ticks on XP. This also increases the resolution of GetTickCount. Ignoring the overflow problem, why can’t we just call timeBeginPeriod and be done with it? Well, increasing the kernel tick rate increases the system load; more work is being done just to maintain a higher resolution clock. This would mean that leaving Firefox running while your laptop is on battery could decrease the battery’s performance by up to 25% (see this Intel article or this Microsoft presentation) There is a corresponding timeEndPeriod call that can restore the old tick rate, but that would require us to know when the last high-performance call to Date.now() is made.

QueryPerformanceCounter and RDTSC

QueryPerformanceCounter returns a 64 bit time stamp so it does not suffer from the overflow problem. This timestamp increases at the rate returned by QueryPerformanceFrequency. The API states that this rate does not change while the system is running. That’s not quite true. On older systems, QPC is implemented using the rdtsc instruction for x86. This returns the number of clock cycles since the processor booted. It is very high resolution and very accurate on those systems if they are uniprocessor. Newer processors such as Intel’s Centrino and Core processors dynamically change the clock speed of the processor in response to the workload and certain HALs (Hardware Abstraction Layer) for certain versions of Windows still use rdtsc to implement QPC. Needless to say, this makes it difficult to obtain accurate timings. Suppose we could force the processor into its highest speed; this can be done by the user fairly easily, especially if we’re running a CPU intensive benchmark. Note that I stated rdtsc returns the number of cycles since the processor booted. On multiprocessor systems (including multi core), the cores don’t boot at the same time. There is sometimes a noticeable difference between the two depending on your OS. Windows attempts to keep the CPU TSCs roughly in sync but the keyword there is roughly. Even in single threaded programs there is the issue of reading different time stamps when the thread switches cores. There is no way to perfectly tell which core you are on when the TSC is read because context switches can happen between any two instructions (in user space). On newer systems, there is a dedicated timer on the motherboard which QPC reads. This results in a system call and a read from the device so it is far more expensive than the TSC read but it is stable.

To add to the challenge, neither the TSC nor hardware timer are reliable after suspend/hibernate.

Summary

So to summarize, we have GetTickCount/timeGetTime which suffer from resolution and rollover issues and we have QPC which has superb resolution but lousy consistency in some cases (though it is very fast then). How can we use these to generate a reasonably reliable high performance (multithreaded) Date.now() implementation? Read the bug or the source or perhaps I’ll write another blog post describing the implementation. If you were hoping for a clever and perfect solution, there is none right now. My point is that high resolution date calculation is tricky and sometimes unreliable to calculate the date with high precision on Windows. So let’s revisit the duration calculation.

Duration revisited

duration = end - start where start and end are calculated using an offset from some epoch. Expanding the known implementations of gettimeofday() we get
duration = (epoch + offset_end) - (epoch + offset_start)
which reduces to
duration = offset_end - offset_start
where both offsets are measured in seconds. As you probably already knew, the current date doesn’t matter, it’s all relative. Calculating the full date to high precision just wastes time and effort. What we really care about is the offsets and it doesn’t matter what epoch they’re from so long as it’s consistent for the lifetime of the process.

Looking back on our choices in Windows, QPC is the only viable option since we don’t know when Date.now() needs to be high precision and when it doesn’t. It would be much nicer to have a separate API like Timestamp.sample(). To use timeGetTime or GetTickCount, we could extend this to include Timestamp.begin() and Timestamp.end(). This would leave us with fairly fast implementations for Date.now() and the Timestamp functions so that JS users that would like to do performance tests or animations could do so without sacrificing performance. This API is fairly trivial to implement for other systems since they can reuse their gettimeofday() functions for Timestamp.sample() or use higher performance timers (librt for example) the same as Windows does.

Moving forward

Of course, there should be some concerns with allowing a web page to alter the tick rate of the user’s computer so this should be discussed and more thoroughly thought out before committing to an implementation. And it should be standardized. Doesn’t seem right to have a Mozilla-only API (though for Talos, it wouldn’t be the end of the world).

Points of interest

And for those who are curious about other browser’s implementations, John Resig
had a nice article
comparing JS time resolution on Windows.
SpiderMonkey/TraceMonkey implementation
NSPR’s PR_Now()
NSPR’s interval API
Chrome’s implementation seems to be here
V8’s implementation (look for Time::SetToCurrentTime())
Note that Chrome and V8 don’t have the multithread-safe requirement that SpiderMonkey does.

As for Safari, IE and Opera, I don’t know but it looks like they just call GetSystemTimeAsFileTime based on John Resig’s performance results.

8 comments

Science of the Web

My friend Dan Schultz is currently taking a course called “Science of The Web”. For one of his assignments (see problem 2), he needs as much help as he can get. Here’s where you come in. It’s quick and easy:

1) Go to http://boom.aladdin.cs.cmu.edu/cgi-bin/ipaddy (the server might encounter an error, just refresh and it should work)
2) Enter ‘dschultz’
3) Get as many other people as possible to do the same

4 comments

hg qimport my-bugzilla-patch

Since I’m only a few days away from the end of my internship, I can’t really start any large projects so I went looking for little projects I can get done. This one originated from my dislike of pushing patches from bugzilla. My steps so far have been to go to the bug, find the attachment, download it, import it into my mozilla-central patch queue and then qpush, qrm and push. Well that’s a small hassle. So today I decided to write a small python script to help me out (I would have made it an hg extension but I can’t build hg on my system due to compiler issues with python’s extension API). So here’s my script.

Usage is pretty simple:
qimportbz.py 418454
This will fetch bugzilla’s xml output for bug 418454, look for patches that are not obsolete, and let you pick which ones to import. It conveniently displays the patch description and any review flags. If there’s only one, it’ll pick it automatically. It then takes the patch and feeds it into hg qimport, automatically generating a patch name from the bug and attachment name.

I plan to extend this further to automatically generate a commit message and user (preserving any already in the patch), display more flags (like approval) and automatically upgrade to newer versions of the patch posted in the bug.

1 comment

Initial Glass support

Yesterday, I pushed my patch to add glass support to chrome windows for Vista (see bug for some implementation discussion). For those who don’t use Vista on a physical machine (virtual machines don’t support glass): it is a fancy blurring effect rendered using the system’s graphics card. It is part of Vista’s Aero theme but requires some hardware support beyond Vista’s minimum requirements. The changes overall are mostly trivial, but they required lots of little edits over many files.

Example Glass Window

Example Glass Window

How to use it

Start by adding the CSS property -moz-appearance: -moz-win-glass; to your XUL window. For any areas that you want to be glass, be sure to make their backgrounds transparent. This includes the window itself. You can also set the opacity on elements to have them blend with the glass as transparent windows already do. Ok, you’re done.

Well, almost done. The glass effect is possible only when the user has desktop composition enabled, which requires a reasonably modern graphics card. Also, they can toggle it on or off at runtime as with native themes. Oh, and it only seems to work with the Windows Aero theme; Aero Basic users are left behind.

Rather than add some fallback code when glass is disabled, I left the issue to the theme designers. There’s a new system metric selector, windows-compositor, which detects if the glass effect is enabled. Now you can setup your CSS rules like this:

window:-moz-system-metric(windows-compositor) {
  background: transparent;
  -moz-appearance: -moz-win-glass;
}

to add glass to your window’s client area (and presumably other UI changes) when the user enables composition.

Here are some example rules for emulating the fallback used by Media Player and Explorer:

window[active="true"]:-moz-system-metric(windows-default-theme) {
  background-color: #b9d1ea;
}
window:not([active="true"]):-moz-system-metric(windows-default-theme) {
  background-color: #d7e4f2;
}
window:not(:-moz-system-metric(windows-default-theme)) {
  background-color: -moz-Dialog;
}
window:-moz-system-metric(windows-compositor) {
  background: transparent !important;
  -moz-appearance: -moz-win-glass;
}

Note: When bug 431666 lands, you’ll want to use windows-classic instead of windows-default-theme in your selectors.

Gotchas

The Desktop Window Manager (DWM) draws a border around the window’s client area, but our method of enabling glass disables that, so if you want to achieve the same look as Media Player or Explorer, you’ll have to do some fancy border work. I hope to fix this in the future so that it is automatically done in most cases.

Text on glass is hard. It’s sometimes hard to read which is why Windows provides the DrawThemeTextEx function which adds a glow behind the text; this is done by the DWM for the window title of unmaximized windows. DrawThemeTextEx takes characters, not glyphs so we can’t really integrate it into our text rendering code. CSS text-shadow can fake the glow, but it doesn’t work on the XUL widgets you’d want to use for your UI. So for now, don’t count on using text on glass.

Content (the xul browser element specifically) doesn’t render quite properly on glass. The underlying issue seems to be present in Firefox 3 (with transparent windows though) and will probably be fixed when the internal compositor for Gecko is completed. This unfortunately prevents Firefox from adopting IE7’s Vista UI.

Under the hood changes

Previously, windows were either transparent or not, so I added an enum nsTransparencyMode for the three options: opaque, transparent, and glass. Only windows supports the glass option; the other platforms fall back to opaque. A glass nsWindow calls DwmExtendFrameIntoClientArea to tell the DWM to render the entire window as glass. This has a performance impact for large windows since the entire window has the (already expensive) glass shader applied to it, even though we are probably going to be painting most of the window opaquely. I’m looking into ways to detect which areas of the window are glass and tell the DWM to only render those areas. This also solves the aforementioned border problem. We also have to render each window with an alpha channel, so there is a rendering performance hit.

Demos

I have two little demos to show off. The first (and its style sheet) was my testcase which shows opaque, semi transparent and transparent XUL on a glass window. The second uses an animating CSS transform on a green box with text on plain glass window. Since CSS transforms haven’t landed yet, you’ll need do a build yourself with the patch applied.

6 comments