Unity iOS touch input is broken – but they say it’s to be expected!

Table Tennis Touch is trapped on Unity 3.5.7. Unity is about to move to version 5 – that’s how far behind we are, two major versions! And with Apple requiring 64-bit support in the near future we have to catch up soon.

We’re trapped due to a rather severe problem with version 4’s iOS integration: touch input can become erratic. In TTT this manifests as a very jerky, stuttering movement of the bat. The problem renders our game unplayable. Here is a short video in slo-mo that demonstrates the problem.

Perhaps Unity isn’t to blame, perhaps it’s our game that has the problem. After all, thousands of devs build games with Unity and we seem to be the only ones reporting the issue. Well here’s another video in a fresh Unity project demonstrating how Unity fails to handle a rotating plane and touch input simultaneously.

We experimented a lot trying to find the cause of the problem. Being the big nerd that I am, this inevitably resulted in a lot of log files that then became charts demonstrating the problem.

 

Input_01_S

The moment of failure is clear: the once smooth waveform becomes horribly stepped.

 

The problem determination warrants a blog post all of its own; however, this post will focus on the more positive angle of why we think the problem occurs and how to overcome it. For those interested, here’s the PDF report we created that accompanied our bug report to Unity.

At the time of writing, we’ve had the bug filed with Unity for over two months. We had one early(ish) reply and then the thread went cold. Sadly, the reply wasn’t very helpful. I posed the question:

 

In Unity 3.5.7 we had nicely interleaving frame and touch updates resulting in a smooth, consistent movement of the bat; why in Unity 4.x do we often receive no touch updates on a frame whilst on other frames we receive two or more?

 

To summarise their response:

  • Unity is not to blame, our expectations of input corresponding with frame updates on iOS is unrealistic
  • instead we should estimate the player’s finger position
  • they don’t know why it worked perfectly in 3.5.7 and why it is so different in 4.x

 

We believe one reason Table Tennis Touch has been so successful is its simple and accurate controls. Why then should we start to estimate where the finger is!? If the finger is touching the screen when we’re processing the frame then I’d like to know exactly where it is! That doesn’t seem unreasonable.

If you’ve read this far then well done, but you’re only a tiny fraction of the way through this post – I’d just give up now unless you have a real interest in how Unity introduced a regression in v4 that will likely continue into v5. If you are interested then buckle up, it’s gonna get graphy.

<pre-empt>tl;dr</pre-empt>

The remainder of this post details experiments we performed to demonstrate why we believe iOS is capable of providing a consistent touch input sample to Unity; it then goes on to detail why we believe the Unity problem occurs, theorising the cause of the problem and finally it demonstrates a simple solution.

The test project is committed to GitHub so you can see the steps we took and play along yourself. The results from our test runs can be found in the repository. Also, here’s an Excel spreadsheet containing the graphs.

The experiments build upon each other getting more complicated as we progress. It does require knowledge of Unity and certain iOS technologies like CFRunLoop and CADisplayLink. When things get a little dark and you still have much scrolling left, always bear in mind:

 

Input was perfect on Unity v3.5.7 but not on v4.x. Something must have changed. We need to discover what changed.

 

These are the assertions we’ll attempt to prove:

  1. iOS samples the touch input at ~60Hz
  2. Unity 4.x can interfere with the consistency of the input sampling
  3. Unity 4.x does not always interfere with the consistency of the input sampling
  4. An empty Unity project can block the main thread for the duration of the frame
  5. Unity’s blocking behaviour did not exist in Unity 3.5.7
  6. If the CADisplayLink handler can’t execute when signalled it will not wait for the next 60Hz boundary to run; instead, it will run as soon as possible
  7. When the UPL execution is normal, the CFRunLoop sleeps for a significant time
  8. When the UPL execution duration increases, Unity is waiting on a signal to fire

The experiments are conducted on an iPhone 5 with iOS 8.0.2 (12A405) and Unity 4.6.1p2 (the newest available Unity).

 

Note: there is nothing special about the test device. It is my personal phone. It has many other apps installed. It is connected to the WIFI, receives calls and SMSs etc.

 

And off we go …


TEST 1

Assertion:

iOS samples the touch input at ~60Hz

Rationale:

If we can prove iOS is capable of sampling touch input at ~60Hz then we know that the problem does not lie with iOS but with something higher up the stack.

The test for this is simple. To keep things fair we use a Unity-generated Xcode project. The Unity project itself contains no code except a behaviour to set the frame rate to 60fps and some GUIText to indicate orientation. The accelerometer is disabled.

  • We modify UnityView.mm, adding a stopwatch to log the time between touchesMoved.
  • We remove the CADisplayLink from the CFRunLoop *
  • Running the game, we move a finger continually ** and capture the logs

* CADisplayLink is responsible for providing the Unity frame pulse. We remove it from the run loop to prevent Unity from rendering. This is very important: we are isolating iOS’s ability to sample touch input whilst asserting that Unity has a fault so we must initially remove Unity from the experiment to find iOS’s baseline ability.

** It is important to move continually as iOS does not report static touches. Leaving the finger stationary will produce misleading data suggesting input was skipped when in reality none was present. I find a fast circular motion is best as it avoids the natural halt produced by changing direction.

Results for Test 1

Running the test for ~15 seconds produced 905 log entries. Each entry describes the time since the last entry, the period between touches.
Test_01_S
We see entries massing around the 16.67ms mark, as we’d hope.

 

Minimum period Maximum period Mean average period Standard deviation
15.51ms 17.75ms 16.67ms 0.26ms

 

I have repeated this test many times and have received the same results. From the data we can conclude that iOS samples input with an average period of ~16.67ms (the period for a 60Hz frequency) with a small standard deviation.

 

Conclusion:
iOS provides touch updates at an average frequency of 60Hz.
Observation:
given that iOS samples the input at 60Hz, if Unity is able to sustain 60fps then we can definitely expect one touch update per frame (as we had with Unity 3.5.7). I acknowledge that there will be periods where Unity is unable to sustain 60fps or that iOS does not sample at 60Hz; however, for the most part there should be a 1:1 relationship.

 


 

TEST 2

Assertion:

Unity 4.x can interfere with the consistency of the input sampling.

Rationale:

We’ve seen that iOS can sample input at 60Hz but something is preventing it. We assert that Unity is at fault so we reintroduce Unity and note the different behaviour.

This is an extension of Test 1:

  • We add CADisplayLink to the CFRunLoop
  • Running the game, we move a finger continually * and capture the logs

* we try to move the finger at a similar speed and consistency as with the previous test.

Results for Test 2

Running the test for ~15 seconds produced 867 log entries. Note: 1 outlier was removed.

test_02_s

We see a rather worrying distribution of entries

 

Minimum period Maximum period Mean average period Standard deviation
0.71ms 52.85ms 17.91ms 17.33ms

 

Note: the Unity test project is empty; there is no geometry, no lights, no scripts running. There is one camera displaying one GUIText component. Unity is not under load in any way. Xcode reports a constant 60fps.

 

Clearly, these results are significantly more chaotic than those for Test 1 when Unity was disabled.

Unity can interfere with the consistency of the input sampling.

 

Conclusion:
The wording of this assertion was chosen carefully: “Unity can interfere”. This is not to say that Unity will always interfere. In fact, I ran this test several times before it produced the data I needed to validate the assertion. This is acceptable as I am asserting that it is possible for Unity to have a negative effect on touch input; there is no requirement for it to happen on each test run.

 

This prompts two follow on questions:

  • What results does an average run of Test 2 produce?
  • Why did Unity have such a negative impact on the input sampling?

 


 

TEST 3

Assertion:

Unity 4.x does not always interfere with the consistency of the input sampling.

Rationale:

We’ve seen that Unity 4.x can affect the input sampling but there are many cases where it has little effect. We will capture one such case for comparison with Test 2.

Results for Test 3

Running the test for ~15 seconds produced 907 log entries. Note: 3 outliers were removed.

test_03_s
We see a much healthier distribution of entries around 16.66ms

 

Minimum period Maximum period Mean average period Standard deviation
13.15ms 20.56ms 16.66ms 0.88ms

 

These results are much more respectable than those for Test 2. It’s clear that running Unity has had an effect on the consistency of the input sampling; however, that is part of the cost of executing more work on the device. We do see a noticeable deviation from the desired 60Hz, but it’s pleasing to see the mean average is 16.66ms.

Unity does not always have as noticeably negative an effect on the input sampling as it did for Test 2. Furthermore, this test demonstrates that a Unity project that can hold 60fps can expect one touch update from iOS per frame (on average).

 

Conclusion
Let’s take a moment to remember Unity said they “don’t expect to get a perfect alternation” between frame and touch updates. But if both updates are averaging 60Hz then interleaving is inevitable, albeit not always perfect due to device performance constraints.

 


 

Summarising Tests 1, 2 and 3

The chart below composites the three series for tests 1 2 and 3.

comp_123_s

 

The blue markers of Test 1 (Unity disabled) form a clear line of sanity around 16.67ms. The green markers of Test 3 (Unity behaving) show the deviation we incurred by enabling Unity and thus increasing the load on the device. The red markers of Test 2 (Unity misbehaving) show the three distribution regions for this awful scenario.

So what caused Test 2 to have such a negative impact on the input sampling? Many hours (days) of research has produced the following hypothesis:

 

Unity can become blocked, waiting on a signal. This in turn blocks the main thread, blocking the CFRunLoop, blocking touch inputs from being processed …

Let’s see how much of this we can prove …

 


 

TEST 4

Assertion:

An empty Unity project can block the main thread for the duration of the frame.

Rationale:

I’ve proposed that in some situations Unity blocks the main thread preventing input from being processed. So we first need to prove that Unity can indeed block the main thread. One way for Unity to block the thread legitimately is if our frame update scripts take longer to execute than the frame period. But we’re not interested in the legitimate scenario, we’re looking for the errant case so we’ll continue working with an empty Unity project to ensure we don’t overload it.

Following on from Test 3:

  • The CADisplayLink remains enabled.
  • The logging in UnityView is disabled. *
  • We add a stopwatch to time Unity’s Player Loop **
  • We add another stopwatch to time the period between frames ***
  • We log the results.

* we disable the UnityView logging as this test is not concerned with touch input but whether Unity blocks the main thread.
** we add this in UnityAppController+Rendering.mm to the repaint method.
*** we add this to UnityAppController+Rendering.mm to the repaintDisplayLink method – the handler for the CADisplayLink.

Results for Test 4

I ran the test for ~20s. This was the first run. The device was not touched and remained static, sat level on my desk.

The frame rate is a little inconsistent for such a simple project; however we accept that the iPhone 5 is not a dedicated gaming device. It’s reassuring to see the mean average being 16.67ms. The standard deviation is a little larger than I’d like, but is still acceptable. In short, the frame rate is as expected.

For the first ~800 frames the execution time of the UnityPlayerLoop (UPL) was also as expected: it averaged 3.12ms with a standard deviation of 0.85ms. Then around frame 797 something occurred on device that delayed the firing of the CADisplayLink resulting in a period of 36.21ms. This had a significant effect on the UPL.

From frame 797 to the end of the test (440 frames) the UPL’s execution time had a mean average of 15.02ms with a standard deviation of 1.08ms. This empty Unity project now consumed +/- the entire period between frames.

test_04_s

Conclusion:
Given that the UPL is called on the main thread, we can conclude that from frame 797 onwards the empty Unity project did indeed block the main thread for +/- the duration of every frame.
Observation:
This is very significant. The main thread’s CFRunLoop is responsible for dispatching input events to interested parties, including UnityView. If the main thread is blocked then this will have an impact on the rate at which input events can be processed.

 


This prompts several questions:

Q: In what circumstances does such behaviour occur?

We have observed this behaviour (the UPL increasing from ~3ms to ~15ms) on numerous occasions:

  • as in Test 4, leaving the device untouched can result in the behaviour. Sometimes it happens quickly (even on game startup), other times it could be several thousand frames before it occurs, but in tests it has always happened. (Several times it took over 10,000 frames.)
  • placing the device under external load, e.g. flipping the device. We work with the device in landscape orientation. Flipping it to the other landscape orientation will eventually cause the behaviour. Sometimes a single flip is sufficient, other times many flips are required, but in tests it has always happened. (Our worst case was 32 flips.)
  • placing the device under internal load, e.g. running a real Unity project. We have added logging to Table Tennis Touch and we see the behaviour occur after we’ve loaded large textures – an event that we know does not complete within a single frame.

Q: On what devices does such behaviour occur?

We have observed this behaviour on every iOS test device we have access to: iPhone 4s, 5, 5s, 6, 6 Plus, iPad 3, Air.

Q: Can the behaviour be stopped?

We have been able to stop the behaviour, i.e. return the UPL execution time to the expected ~3ms average, by flipping the device. Also, we have noticed that sometimes touching the screen can fix the problem. We will return to these points later.

And two more important questions:

  • Did this behaviour happen on Unity 3.5.7?
  • Why did the UPL execution increase from an average of 3.12ms to 15.02ms in the space of one frame?

TEST 5

Assertion:

Unity’s blocking behaviour did not exist in Unity 3.5.7

Rationale:

If we can show the behaviour did not occur in Unity 3.5.7 then our assertion that the problem was introduced in Unity 4.x is true.

For this test we will run the empty project in Unity 3.5.7 and log the same results as in Unity 4.6.1p2

  • We recreated the empty project in Unity 3.5.7
  • We added the same logging as for Test 4

Note: Unity 3.5.7 is prior to Unity’s support for iOS 8’s change in viewport size reporting, therefore Unity does not display as per 4.6.1. This is expected behaviour and does not affect the test.

Results for Test 5

I flipped the device several times in order to trigger the spike we witnessed in Test 4. On frame 767 we see an increase in the period to 20.47ms and on frame 892 to 25.45ms.

 

test_05_s

As per Test 4 we see corresponding increases in the UPL’s execution time; however, for Unity 3.5.7 we see the UPL’s execution time recover back to normal within ~75 frames.

test_05b_s

It is safe to say that this behaviour differs from that of Unity 4.6.1p2 and we can therefore conclude that it was introduced during Unity v4. (Our tests show that the behaviour was present in Unity 4.0.0.)

 

Conclusion:
this behaviour did not exist in Unity 3.5.7, which was able to recover promptly from any increase in UPL execution time.

 

We believe that the increase in the UPL’s execution time blocks the main thread and has an adverse effect on touch input events. We believe that Unity 3.5.7’s ability to recover from this spike in execution time is the reason our game’s input remains smooth on that version and is unplayable on current versions of Unity.


Earlier I posed the question: Why did the UPL execution increase from an average of 3.12ms to 15.02ms in the space of one frame?

In the test results we have seen a direct correlation between the frame period increasing and the UPL’s execution time increasing. This is demonstrated graphically in the chart below, a zoom of frames 770 – 830 from Test 4.

corr_01_s

The instant the frame period spikes the UPL follows, remaining excessively high.

Since we log in the CADisplayLink handler, when the frame period increases we can be certain that this is because said handler was not called. The most likely reason for this is that the main thread was busy or unable to get a slice of the CPU.The frame period spike observed in Test 4 occurred when the device was left to run without any physical interference from me, i.e. something on device caused the CADisplayLink handler to be delayed. This happens, we’re working with a phone after all. The real question here is why did the UPL duration continue to average 15.02ms for the remainder of the test whilst the frame period recovered immediately?

In addition, post spike (> frame 797), the chart also shows a noticeable correlation between frame period and UPL duration. It’s as though the UPL is filling its full time allowance between frames.

Given that the UPL should not be performing any extra work in this time (as we’ve given it nothing to do and Xcode did not report any significant increase in processor nor memory usage) then we may infer that the UPL is waiting on a signal to fire. The following is quite the leap; however, I have a hunch that Unity is waiting on a VSYNC.

Up until this point, this post has concentrated on empirical evidence to draw its conclusions. From here on I’ll be making (fairly) educated guesses as to what’s occurring within Unity and iOS. The things I say may be wrong, after all I have just used the word “hunch”!


Can I prove that the UPL is waiting on a VSYNC? I’m certain that the UPL is waiting on something; however, given my level of knowledge and the tools/source available to me, I can’t prove conclusively that it’s a VSYNC. Let’s see how far we can get:

We know that:

  • Unity relies on CADisplayLink to provide the pulse that drives its frame rate.
  • CADisplayLink delivers a signal in-sync with the hardware refresh rate.
  • Unity registers an instance of CADisplayLink with the main thread’s run loop.
  • When the main thread is blocked, the CADisplayLink handler will not be called when CADisplayLink signals.

The above is well known information or proven in earlier tests. The next test attempts to prove something that I (and perhaps Unity) thought false:

If the CADisplayLink handler can’t execute when signalled it will not wait for the next 60Hz boundary to run; instead, it will run as soon as possible.


TEST 6

Assertion:

If the CADisplayLink handler can’t execute when signalled it will not wait for the next 60Hz boundary to run; instead, it will run as soon as possible.

Rationale:

If we can prove that the CADisplayLink handler can fall out of sync with the hardware refresh then it becomes increasingly likely that Unity is indeed having to wait on a VSYNC.

I reran Test 4 to gather data and saved it in file Test_06.txt. On frame 51 we see the period was 16.67ms. In the frames leading to this point the period was ~16ms. Let’s assume that the CADisplayLink handler is running in-sync with the hardware – after all, that’s its job.

On frame 52 the time since the last frame was 20.82 ms. If frame 51 was in-sync then frame 52 can’t be as the 51->52 period is greater than 16.67ms. We could therefore conclude that the CADisplayLink handler ran out of sync with the hardware.

But what if frame 52 was in-sync with the hardware? That would mean that frame 51 was not in-sync because the 51->52 period was not 16.67ms. In one of the two cases, the CADisplayLink handler was not called on a 60Hz hardware refresh boundary.

There is further evidence to support the assertion if we consider how the CADisplayLink handler is called.

CADisplayLink is registered with the main thread’s CFRunLoop. If code is blocking the main thread it is also blocking the run loop.

As its name suggests, CFRunLoop loops around processing CFRunLoopSources, i.e. if a source is signalled its registered handler is run. When there is nothing to do the CFRunLoop sleeps.

The CFRunLoop can process two types of sources: input sources and timer sources. The QuartzCore Framework reference documentation states that CADisplayLink is a timer; however, I believe this to mean it has timer-like behaviour, not that it is a run loop timer source. In fact, I believe CADisplayLink to be registered with the run loop as an input source.

Furthermore, tests have shown the CADisplayLink handler to be run as a Version 1 input source (also known as a Source1). We can discover this by setting a breakpoint in the handler method and examining the stack trace. We see the following call stack:

 

-[UnityAppController(Rendering) repaintDisplayLink]
...
__CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE1_PERFORM_FUNCTION__
__CFRunLoopDoSource1
__CFRunLoopRun
...

 

This is important because Version 1 CFRunLoopSources are signalled via MACH ports and are capable of waking the CFRunLoop when it’s idle; however, if the main thread is blocked, the run loop is also blocked.

If the run loop is blocked any MACH port signal from the CADisplayLink must wait for the run loop to continue its cycle and notice the pending signal. So whilst the underlying CADisplayLink technology should be in-sync with the hardware, if the main thread is blocked, the CADisplayLink handler will be run when the CFRunLoop next becomes free to call it. Consequently we see the UPL being executed out of sync with the display’s refresh rate.

 

Conclusion:
If the CADisplayLink handler can’t be executed when signalled it will be run when the CFRunLoop is next free – this causes the handler to be executed out of sync with the hardware refresh.

 


Given that Unity leans so heavily on the CADisplayLink handler to run its Player Loop it is highly likely that Unity has the potential to fall out of sync with the hardware refresh.

In fairness to Unity, they could be doing any number of things within their UPL to mitigate this situation; however, the evidence gathered thus far suggests that something is going wrong and we believe it is related to the CADisplayLink handler firing out of sync with the hardware.

If Unity is out of sync then we would expect it to have to wait whilst synchronicity is restored. At this point I will add the caveat that I have no background in OpenGL; my knowledge has been gathered quite recently through experimentation and reading. We are now quite deep into the speculative area of the post.

So can we prove that Unity is waiting on something? What does normal behaviour look like?


TEST 7

Assertion:

When the UPL execution is normal, the CFRunLoop sleeps for a significant time.

Rationale:

If we first examine normal behaviour then we’ll get a better understanding of how the CFRunLoop behaves, preparing us to understand the bad behaviour.

The assertion states “a significant time” – with reference to this empty test project running at 60fps. An actual game project will almost certainly find the sleep time to be much less.

We have seen that when Unity is behaving the UPL execution duration averages 3.02ms. To simplify the maths let’s round that to a nice 4ms. Let’s also round down the desired frame period to 16ms.

Continuing the simplification, assume that during a frame period only the UPL is running or the main thread’s CFRunLoop is sleeping. This means that 25% of the frame period should be spent executing the UPL and the remaining 75% should be spent with the run loop sleeping. Now:

  • We rerun Test 4 and ensure that the UPL duration is ~3ms (a good run)
  • Pause execution in Xcode

 

Note: the act of pausing then resuming execution can trigger the bad behaviour, causing the UPL duration to increase when execution is resumed. This makes sense since pausing has effectively blocked the main thread and potentially left a CADisplayLink signal pending processing. Equally, pausing a bad run can reverse the behaviour.

 

Xcode will show the stack trace of where the game was executing when paused. From our simplification above we should find that ~75% of the time the stack trace shows:

 

0  mach_msg_trap
1  mach_msg
2  __CFRunLoopServiceMachPort
3  __CFRunLoopRun
...
7  UIApplicationMain

 

This is the CFRunLoop in its waiting state. An additional, more conclusive but complicated test can be performed:

  • Add a CFRunLoopObserver to the main thread’s CFRunLoop
  • Log the run loop’s state changes, as reported to the observer

Fair warning, this requires an understanding of the CFRunLoop. Fortunately the Core Foundation is open source and so it’s available to us here.

The log for this test gets a bit crowded but for a good run we’ll see something like this:

 

...
 0  Frame: 16         Period: 15.92       UPL: 3.16
 1  5.35         2  Before Timers
 2  0.39         4  Before Sources
 3  0.39        32  Before Waiting
 4  10.62       64  After  Waiting
 5  Frame: 17         Period: 16.48       UPL: 2.81
 6  4.80         2  Before Timers
 7  0.40         4  Before Sources
 8  0.36        32  Before Waiting
 9  11.35       64  After  Waiting
10  Frame: 18         Period: 16.94       UPL: 2.92
... 

 

On line 0 we see Unity frame 16 with a UPL of 3.16ms. After its execution the run loop looks to process its timers, then any pending sources (touch inputs etc.), then on line 3 it goes to sleep since it has no work to do. This is a great situation! The main thread is not blocked; it can respond to any sources that can wake it (CADisplayLink etc.).

On line 4 we see the run loop wakes. It’s been sleeping for 10.62ms. From our simplified example above, this corresponds with the 75% sleep time of the frame period. I believe that when we paused execution earlier and saw mach_msg_trap in the stack trace, the run loop was in this wait state.

 

Conclusion:
I’m now fairly certain that when the UPL execution is normal, the CFRunLoop sleeps allowing other input sources to be processed.

 


 

TEST 8

Assertion:

When the UPL execution duration increases, Unity is waiting on a signal to fire.

Continuing the simplification from Test 7 where the UPL execution should be 4ms: if the assertion is true the remaining 75% will not be occupied by the CFRunLoop sleeping as it was in Test 7; rather, the 75% will be in some wait state.

  • We rerun Test 7 and ensure that the UPL duration is ~15ms (a bad run)
  • Pause execution in Xcode

Working from our simplified values and ignoring that the device could be doing any number of things: 75% of the time we would expect to see the stack trace show some kind of wait state. I ran this test 20 times and on 14 occasions the following stack trace was produced:

 

 0  __semwait_signal
 1  nanosleep
 2  usleep
 3  native_window_begin_iosurface(_EAGLNativeWindowObject*)
 4  gliGetNewIOSurfaceES
...
 7  sgxEndRender
...
10  DiscardCurrentFBImpl(bool, bool, unsigned int)
...
12  GfxDeviceGLES20::EndFrame()
...
14  UnityPlayerLoop
15  -[UnityAppController(Rendering) repaint]
16  -[UnityAppController(Rendering) repaintDisplayLink]
...
23  __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE1_PERFORM_FUNCTION__
24  __CFRunLoopDoSource1
25  __CFRunLoopRun
...
29  UIApplicationMain
...

 

Line 0 shows the main thread sat waiting for a signal. Lines 1 and 2 show this was caused by sleep functions. Jumping back up to line 14 we see this originated inside the UnityPlayerLoop! So Unity does appear to be sleeping, waiting for a signal. This will block the main thread until the sleep expires or the signal is fired.

The logs for this test confirm that the main thread’s CFRunLoop is not sleeping anywhere near as long as in Test 7 when Unity was behaving:

 

... 
 0  Frame: 460        Period: 17.10       UPL: 12.66
 1  14.62        2  Before Timers
 2  0.37         4  Before Sources
 3  0.36        32  Before Waiting
 4  0.38        64  After  Waiting
 5  Frame: 461        Period: 16.34       UPL: 13.29
 6  14.49        2  Before Timers
 7  0.42         4  Before Sources
 8  0.36        32  Before Waiting
 9  0.38        64  After  Waiting
10  Frame: 462        Period: 17.34       UPL: 15.03
...

 

On line 0 we see Unity frame 460 with a UPL of 12.66ms. After its execution the run loop looks to process its timers, then any pending sources (touch inputs etc.). On line 3 it goes to sleep but note that on line 4 it wakes after just 0.38ms. It is likely that the sleep was for an even shorter length since there is an overhead to process the code and log. In fact, we may infer that the run loop didn’t really sleep at all, it progressed to line 5 where the CADisplayLink handler runs and the UPL takes 13.29ms to execute. The cycle continues.

The logs for this test began with a run of good frames followed by these bad ones – triggered by flipping the device. Consequently we can plot the time the CFRunLoop spent sleeping for both good and bad periods:

test_08_s

 

Note: the chart shows a reduced x and y axis to frame the pertinent data in a size suitable for this post.

 

Conclusion:
When the UPL duration increases the main thread’s CFRunLoop sleep time decreases indicating that Unity is blocking the main thread. Breaking into Unity’s execution we see the UPL sat on a__semwait_signal. I therefore infer that Unity is indeed blocking the main thread waiting for a signal to fire.
Observation:
Working back up the stack trace from the __semwait_signal we see calls such as sgxEndRender and EndFrame(). It certainly sounds like the UPL is ending its rendering and preparing to write to a buffer. Is the UPL now blocked waiting for the VSYNC?

 

At this point we’ve uncovered a lot of detail. I’m happy we’ve shown with some certainty that Unity is waiting on a signal, adversely affecting other parts of the process – in our case touch input. Digging deeper may yield more clarity but having time constraints, this is where I’ll end the investigation.

 


 

So why does this waiting affect the touch input?

Touch input handlers are registered with the main thread’s CFRunLoop so if the main thread is blocked so is their processing. On my iPhone 5 there has been a 100% correlation between a high UPL duration and failing input; however, and here’s the kicker, this doesn’t appear to be the case on all our test devices.

As I said earlier, the UPL’s duration will increase on every test device we have access to (iPhone 4s, 5, 5s, 6, 6 Plus, iPad 3, Air) – whether through leaving the device untouched or through prompting the behaviour by flipping. But on the iPhone 6, for example, whilst we see the increased UPL duration we do not see the input sampling affected to anywhere near the level as for the iPhone 5.

My immediate thought was that the iPhone 6 is a much more powerful device than my lowly 5. Perhaps there is sufficient time in the remaining cycles for the 6’s run loop to process the incoming touch events. But then the iPhone 4s is a lot less powerful than the 5 and whilst it certainly exhibits the input problem, it’s not to the same extent as the 5 – or indeed the 5s.

To further complicate matters this behaviour exhibits properties of a Heisenbug: often adding debug code to test for its presence fixes it! Here are two examples:

  1. The CFRunLoopObserver we added in tests 7 and 8 has logging. With the observer registered with the run loop we still recorded the UPL increase but input remained smooth! If we kept the observer registered but stopped it from logging the input was broken once more.
  2. We added a UILabel to Table Tennis Touch to log the UPL duration whilst testing the game. Updating the label each frame appeared to fix the input problem. If we kept the label on screen but stopped writing to it the input broke again.

Both of these debug aids increases processing time on the main thread. It’s possible that they create sufficient breaks between the CADisplayLink handler firing for the touch input sources to be processed. This is purely speculation and more investigation is required.

Why has no one else spotted this problem?

Perhaps they have. By my own admission, I’m not as active on the Unity forums as I possibly should be; however, I have not seen any other mention of a similar problem and Unity certainly didn’t respond as though they thought it was a known bug.

In addition, Table Tennis Touch has quite specific touch requirements, mainly to allow the player to maintain a constant, accurate contact with the screen. If the bat is not directly under the finger then it soon becomes noticeable. That’s not to say that other games don’t require touch input, of course they do but even a wickedly fast game like Super Hexagon does not require constant contact – the player taps the screen. As we’ve seen from our experiments, even with an increased UPL duration, touch inputs will be reported to the game within a few milliseconds. This is often sufficient for many games, but for Table Tennis Touch the finger can travel tens of pixels in those few milliseconds. This can lead to an noticeably poor response time and, as the videos in the opening section showed, an unplayable game.

Finally, returning to a point I made earlier, sometimes actually touching the screen can fix the UPL! So in some situations the increased UPL duration can break input whilst in others the act of touching the screen can fix the UPL! This has certainly made the bug harder to track down.

So can this be fixed?

I really hope Unity recognises this behaviour as a regression from version 3.5.7. Unfortunately, we can’t wait for them to acknowledge the problem and fix it – as I said at the start of the post, they are reluctant to even comment on the bug report.

Instead we’re developing a work around. It’s not production ready yet but it works along the following lines:

  • time the execution of the UPL
  • if the execution exceeded 16ms skip the next frame

test_09_s

 

When a spike in the frame period provokes a spike in the UPL duration we skip a frame and the UPL recovers.

This simple logic has been successful in our initial tests. The theory goes: if Unity is indeed waiting on a VSYNC there’s no sense trying to run again immediately, we’ll just end up waiting – and the problem continues. Instead, if we detect that the UPL ran on too long just skip the next frame. The next time the CADisplayLink handler is called the main thread should be sleeping and the CFRunLoop should be able to wake and process the request immediately.

It may seem undesirable to discard a frame – it is – but it’s significantly better than the horrific situation we find ourselves in if we leave Unity to its own devices.

It’s probable that better solutions exist and we may yet find one. I certainly doubt whether the above logic will make it to production but it’s reassuring to know that we at least have a way to target Unity 5 when it’s released without crippling our game’s usability.

But didn’t Unity say it’s unrealistic to expect interleaving update messages?

Yep, they sure did. I think we’ve proven their statement false; however, if any doubt remains let’s see how a real project behaves when built with Unity 4.6.1p2. I’ve added logging to Table Tennis Touch, a non-trivial project to be certain.

We’re logging when the CADisplayLink handler fires, recording the frame period and the duration of the UPL. In a separate log call we’re recording the touch period. Remember, the first log will be written when the CADisplayLink is ready, the second when the input sampler tells us the player is touching the screen. The two events are not linked and should fire at different times. Ideally we’ll see interleaving – alternating log entries (since they both report at ~60Hz):

 

Frame: 2060        Period: 16.89       UPL: 7.19
> touch period: 16.88
Frame: 2061        Period: 16.15       UPL: 6.71
> touch period: 16.13
Frame: 2062        Period: 17.10       UPL: 6.76
> touch period: 17.23
Frame: 2063        Period: 16.39       UPL: 7.07
> touch period: 16.28
Frame: 2064        Period: 16.95       UPL: 7.01
> touch period: 17.27

 

Success! That’s exactly what we want (and what we were told we shouldn’t expect).

If you study the results in file Test_10.txt you’ll see that even when the UPL is not blocking (as above) there are a few occasions where the interleaving fails:

 

Frame: 2065        Period: 16.24       UPL: 6.83
> touch period: 15.91
Frame: 2066        Period: 16.48       UPL: 6.65
Frame: 2067        Period: 15.98       UPL: 5.98
Frame: 2068        Period: 19.48       UPL: 8.80
> touch period: 51.84
Frame: 2069        Period: 15.05       UPL: 6.84
> touch period: 15.14
Frame: 2070        Period: 16.36       UPL: 6.85
> touch period: 16.49

 

This is to be expected. As I’ve said throughout, we’re working with a phone, not a dedicated games machine. The key thing is whether the device recovers. The example above shows such a recovery. This is exactly the behaviour we want to see.

But of course we know that the UPL will eventually start blocking and the input will be negatively affected. In the example below we see frame after frame of skipped input – and that’s what cripples our game, especially if the player is moving their finger quickly (as they often do).

 

Frame: 3388        Period: 16.10       UPL: 13.63
> touch period: 15.86
> touch period: 1.06
Frame: 3389        Period: 16.03       UPL: 12.85
Frame: 3390        Period: 17.97       UPL: 16.48
Frame: 3391        Period: 17.96       UPL: 16.74
> touch period: 50.41
Frame: 3392        Period: 13.70       UPL: 12.26
Frame: 3393        Period: 17.78       UPL: 16.76
Frame: 3394        Period: 16.57       UPL: 15.51
> touch period: 48.43
Frame: 3395        Period: 15.84       UPL: 13.92
Frame: 3396        Period: 16.73       UPL: 15.62
Frame: 3397        Period: 17.37       UPL: 16.35
> touch period: 49.90

 

When we plot a chart for these logs we see ~600 frames of perfect input interleaving. We can also clearly see when the UPL starts blocking and the effect it has on touch input.

test_10_s

 

Note: This chart has a new series (Touch Period 2) for those frames where two touch input events are raised.

Note: In this experiment the UPL blocking had a less significant effect on input than it has had previously. The bat did stutter during play; however, it was nowhere near as bad as we have seen it in the past. In the interests of presenting fair data I have included these results, although I would have preferred a slightly more dramatic conclusion! Regardless, the UPL began blocking after frame 3151 and the once perfect interleaving stopped.

What’s next?

We’ll update our bug report with Unity and wait for their response. I’ll revisit this post if/when any new information arises. Please do let us know if you experience similar problems as the more weight we can get behind this bug the more likely Unity are to fix it.

James Gratton


CTO

James had a long career coding trading software for big banks. He quit it all and learned how to make games. He’s the the technical brain behind Yakuto.

Comments