New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 676094 link

Starred by 4 users

Issue metadata

Status: Verified
Owner:
Last visit > 30 days ago
Closed: Feb 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

Kitty touchscreen's in "Recovery" mode after kernel patch -- don't work

Project Member Reported by kathrelk...@chromium.org, Dec 20 2016

Issue description

Starting in R57-9092.0.0, all nyan_kitty machines are reporting hw_id of 0000.  Error has happened across four different (previously working) machines.  Error has happened every M57 test run since that build.

BEFORE:
2016-12-16T10:11:57.686258+00:00 NOTICE chromeos-elan-touch-firmware-update[974]: Board detected as 'nyan_kitty'
2016-12-16T10:11:57.702739+00:00 NOTICE chromeos-elan-touch-firmware-update[986]: Touch controller reported Product ID: 0999
2016-12-16T10:11:57.714673+00:00 NOTICE chromeos-elan-touch-firmware-update[1000]: Attempting to load FW: '/lib/firmware/elants_i2c.bin'
2016-12-16T10:11:57.717882+00:00 NOTICE chromeos-elan-touch-firmware-update[1003]: (which points to '/opt/google/touch/firmware/0999_1215.bin')
2016-12-16T10:11:57.722804+00:00 NOTICE chromeos-elan-touch-firmware-update[1007]: Hardware product id : 0999
2016-12-16T10:11:57.726186+00:00 NOTICE chromeos-elan-touch-firmware-update[1010]: Updater product id  : 0999
2016-12-16T10:11:57.733893+00:00 NOTICE chromeos-elan-touch-firmware-update[1016]: Current Firmware: 1215
2016-12-16T10:11:57.738420+00:00 NOTICE chromeos-elan-touch-firmware-update[1019]: Updater Firmware: 1215
2016-12-16T10:11:57.746600+00:00 NOTICE chromeos-elan-touch-firmware-update[1025]: Firmware up to date.


AFTER:
2016-12-16T15:53:10.433582+00:00 NOTICE chromeos-elan-touch-firmware-update[967]: Board detected as 'nyan_kitty'
2016-12-16T15:53:10.450393+00:00 NOTICE chromeos-elan-touch-firmware-update[979]: Touch controller reported Product ID: 0000
2016-12-16T15:53:10.464601+00:00 NOTICE chromeos-elan-touch-firmware-update[993]: Attempting to load FW: '/lib/firmware/elants_i2c.bin'
2016-12-16T15:53:10.468770+00:00 NOTICE chromeos-elan-touch-firmware-update[997]: (which points to '/opt/google/touch/firmware/0999_1215.bin')
2016-12-16T15:53:10.473133+00:00 NOTICE chromeos-elan-touch-firmware-update[1000]: Hardware product id : 0000
2016-12-16T15:53:10.476246+00:00 NOTICE chromeos-elan-touch-firmware-update[1004]: Updater product id  : 0999
2016-12-16T15:53:10.484905+00:00 NOTICE chromeos-elan-touch-firmware-update[1015]: error: Touch firmware updater: Product ID mismatch!


Example failure:
https://wmatrix.googleplex.com/testrun/unfiltered?test_ids=398512432

 
Labels: OS-Chrome
This sounds like the touchscreen's firmware is getting corrupted.  Do you know if the touchscreen's are working?  I would expect they don't...

Since Kitty only has one FW it would be reasonable to make the updater force the update in this situation, but I don't know why they would all get corrupted like this.

I don't suppose you have a log that shows the corruption happening?  (eg the first boot on this 9092.0.0 version that broke everything?)
Looking at what landed between 9091 and 9092 I can't see anything that stands out

https://crosland.corp.google.com/log/9091.0.0..9092.0.0
I'll need a kitty to test on, do we have any?
Labels: ReleaseBlock-Dev Proj-Kitty
We've confirmed that the touchscreen doesn't work in this state.  Blocking dev releases for kitty only (as there are no other machines showing this failure).

Switching back to M56 resumes normal update behavior.  Attached is /var/log/messages for test runs right before running 9092.0.0 (90958291, PASS), right after (90995204, FAIL), and a later M56 test run on the same machine (91125008, PASS).
test_run_before_failure_messages
2.2 MB View Download
test_run_of_failure_messages
117 KB View Download
Uploading the correct file for a switch to M56 build from #5
test_run_of_switch_to_M56_messages
116 KB View Download
In a bizarre turn of events, this doesn't seem to be caused by a bad fw and/or bad fw update.

When I switch my Kitty between canary and stable, the touchscreen works perfectly on stable (55) but fails in the described way on canary (57).  The interesting thing is that there doesn't seem to be an actual firmware update occuring anywhere to fix/break it.  It seems that there is something else that changed between those two versions that is preventing the Chromebook from correctly querying the touchscreen's product ID.

I'm unsure what that would be, but I'm going to try and figure it out
Cc: dbehr@chromium.org
Found some more unexpected symptoms, and made some narrowing down progress hopefully.

Testing on 9091 --> everything works fine
Testing on 9092 -->
  1. fw_version, hw_version, bc_version, etc.. all just report 0's
  2. iap_mode reports "Recovery" (When it's working it is usually "Normal")
  3. Chrome doesn't use touchscreen input to interact with UI
  4. Miraculously, evtest does, however, show the screen reporting
     touches, and they seem to be reasonable/correct.

So, it's sure looking like there's some patch between  9091 and 9092 that is causing the touchscreen to go into "Recovery" mode instead or "Normal."

The link below should show which patches were added between them (Google internal link only, unfortunately):
https://crosland.corp.google.com/log/9091.0.0..9092.0.0

To further narrow down the culprit, I've taken a working 9091 image and deployed a ToT kernel onto it.  The touchscreen immediately stopped working just as before, so I'm pretty confident that one of the four 3.10 kernel patches between 9091 and 9092 must be the cause:

1. CHROMIUM: drm/i915: use min brightness instead of 0 as trigger
   https://chromium-review.googlesource.com/#/c/419771/
   --> This seems less likely to be the cause, but I added dbehr@ just
   in case there's something in there I'm not seeing.
2. CHROMIUM: drm/tegra: Align dc/dpaux/sor/panel enable/disable
   https://chromium-review.googlesource.com/#/c/416704/
3. CHROMIUM: drm/tegra: dpaux: Reset dpaux clock correctly
   https://chromium-review.googlesource.com/#/c/416703/
4. CHROMIUM: drm/panel: Add panel powerwash
   https://chromium-review.googlesource.com/#/c/416702/

Note: Kitty is a Tegra system, so the fact that these kernel patching are for Tegra seems to make sense!

I'll see if I can narrow it down more.
Cc: -dbehr@chromium.org marc...@chromium.org ma...@gpartner-nvidia.com kathrelk...@chromium.org yungleem@chromium.org charliemooney@chromium.org
Labels: Kernel-3.10
Owner: marc...@chromium.org
[-dbehr@ since his CL wasn't the cause, sorry for the spam]

Okay, the CL that broke the touchscreen seems to be this one:

CHROMIUM: drm/panel: Add panel powerwash
https://chromium-review.googlesource.com/#/c/416702/

When I revert that and build a new kernel the touchscreen works fully again.

I'm afraid I don't really know what that patch is or why it's causing issues, so I'm going to assign this bug over to the people who originally worked on that CL.  Can you take a look at this patch and try to figure out why it's switching the Kitty touchscreen into a "Recovery" mode and keeping it from working anymore?

Thanks!


Components: -Internals>Input>Touch>Pad Internals>Input>Touch>Screen
Summary: Kitty touchscreen's in "Recovery" mode after kernel patch -- don't work (was: Kitty touchscreen: hwid set to 0000 on all machines)
From the offline email from Mark Z (who wrote the patch that seems to be causing issues):


I created that kernel patch several days ago and the patch fixes another display issue observed on nyan-big.

Anyway, I wrote this mail just want to ask you, do you know somebody who knows about Kitty's touch controller/firmware stuffs? The touch screen stops working is because the touch firmware can't get the panel's product ID. Given I've totally no idea about touch, could you find someone to help on this issue? My patch resets the panel power when kernel boots, so I want to know why this makes touch firmware stops working.

Thanks,
Mark
Cc: jeff.chu...@emc.com.tw
Mark, the touchscreen is not failing *because* it can't get a product ID.  Failing to get the product ID is a symptom of the problem.  Whatever your patch changed is somehow forcing the touchscreen into "Recovery" mode, and apparently in that mode the firmware is incapable of reporting the product ID.

I didn't design or do any of the wiring for the touchscreen, nor do I know much about the low-level kind of stuff you were modifying with that patch so I'm not sure what the actual cause could be really.  All I know is that I was able to narrow the CL that caused it down that that patch -- when I revert it everything starts working just fine again.

It looks like maybe your patch has to do with controlling power lines, right?  Perhaps it causes some gpio to stay high/low which confuses the touchscreen or something similar?

I'm adding Jeff from Elan (the touchscreen company) to see if he can shed some light on what would put the touchscreen in "Recovery" mode for you.
Cc: eji...@nvidia.com pc...@nvidia.com j...@nvidia.com
Adding the people from the email to keep everyone in the loop
Hi Mark,

         Do you still support the power to touch controller in this mode? I will discuss with our internal team and get back to you.

Thanks,
ELAN Jeff
Thanks Charlie & Jeff. I think you guys can take a look my patch -- it just has few lines. Basically what this patch does is, disable the panel then enable it again. So here disable/enable means disable/enable regulators, manipulate some gpios... and etc.

And the reason that we need this patch is, on ChromeOS, normally the panel and the display have been enabled by coreboot, so when kernel starts running, the hardware state doesn't align with the kernel driver's. E.g, the panel regulator in driver is disabled while actually the panel power is on. So this brings some issues(refer to https://code.google.com/p/chrome-os-partner/issues/detail?id=55903 for details).

Anyway, come back to this touchscreen issue, I guess it can be fixed by few lines which correct the touch firmware state. Just like Charlie mentioned, maybe we need to change some gpios or something. Jeff, please suggest, thanks in advance.
Jeff, we need Elan to tell us what kinds of things might cause these symptoms on your controller or Mark isn't going to be able to debug this.  What could cause iap_mode to be "Recovery"?
Jeff@elan - can you please follow up?  Thanks.
I just spoke with Jeff via email to make sure Elan is aware of this issue.  He says they don't have a Kitty to test, but will look at it tomorrow (or "today" depending on which timezone you're in) and should have some info for us shortly
Hi Charlie,
    Touch driver will read hw_version and fw_version on elants_i2c_initialize() when probe and fw_update process. If we cannot get information, the iap_mode will set to be "Recovery".

    As #9 comment, touch could get correct hw_version and fw_version after reverting panel powerwash CL.(https://chromium-review.googlesource.com/#/c/416702/)
Due to the fw boot code of Kitty didn't support hw_version command.So we think fw works fine on IC and didn't re-flash fw after reverting panel powerwash CL. Hence, it maybe touch driver reading hw_version at the duration of panel control regulator.
Okay, so just to clarify/confirm:

1. The iap_mode is set to "Recovery" as a result of the elants_i2c driver being unable to query the hw_version/fw_version from the touchscreen (not the other way around)
2. The boot code on the touchscreen doesn't support hw_version or fw_version
3. Your theory is that something in that patch cases the touchscreen to get rebooted/put into boot code, which causes the issues we're seeing.

Did I understand that right?

That sounds plausible to me, but I don't know how to go about fixing it.  Does that give you any ideas of how to fix this, Mark?  Johnny, do you have any ideas for a solution?  Please add anyone you think might be helpful

Additionally, could someone who works on graphics chime in here?  Your fix is causing serious issues for touch and we could probably use some graphics/display expertise here.

We need to fix this asap because Mark's patch fully disabled the touchscreen on all Kittys, so please let me know if you have any ideas.  I don't know how the touchscreen firmware works or what Mark's patch is doing really, so I can't fix this -- I need you guys to help figure it out
If no one has a fix suggestion soon, we'll be forced to revert the offending patch in the meantime so Kitty can continue to get updates.  Then nVidia and Elan can figure this out without a time crunch
Cc: johnny.c...@elan.corp-partner.google.com
@Elan, this same patch didn't cause issues with your touchscreen on other "Nyan" devices (Blaze for instance)  Can you explain why this change would effect only one of your systems?
Sorry for the late response guys. So does elan mean my patch affects touch firmware's timing to read hw_version? If so, this looks like a timing issue? As a dirty HACK, I think I can reset the panel power in another place, for example, I can defer it until display driver tries to read panel EDID first time. Is this worth to try?
@Charlie,
  Your understand is right. My theory still need to be proved. Could anyone help to get the touch screen signals(Power, I2C) of Kitty for work and fail cases when boot on?

@ma...,
  I think it is a timing issue. For make sure to meet power on sequence, please move it as early as you can on a dirty HACK.
I created a HACK here:
https://chromium-review.googlesource.com/#/c/428654/

This HACK defers the panel reset until first time tegradrm tries to get panel EDID. Penny@nvidia helped to test this HACK and this is the result:

- First trying is based on kernel commit:
c3d787f39dd7 UPSTREAM: timers: Fix usleep_range() in the context of wake_up_process()
then apply the HACK. The result is Penny can reproduce the  issue 3  times within 50 reboot cycles. So the repro rate is 3/50.

- Second trying is based on tot kernel + the HACK. And Penny didn't reproduce the issue within 50 reboot cycles.

I noticed that recently we've merged a lot of elan_i2c patches, I don't know whether this makes the difference but to be honest, I don't think the HACK really solves the issue. This still sounds timing related, we need to figure out the root cause.

Finally @johnny, the patch:
CHROMIUM: drm/panel: Add panel powerwash https://chromium-review.googlesource.com/#/c/416702/
already resets the panel at the earliest stage, i.e, we can just defer it which is exactly my HACK patch does.

Mark
Thanks for writing and testing that hack, Mark.  I agree that it doesn't sound like a stable fix even for the short term though, given the results of your tests with Penny.  It definitely sounds like a race condition.

We cherry-picked over a bunch of elan_i2c driver patches, but I would be really shocked if they factored into this issue, since we were seeing the same symptoms before they were patched in, too.  Plus, this appears to be caused by how the touchscreen's firmware works, and not how the kernel driver is handling communication.  Maybe Johnny from Elan can confirm that for us?

It sounds to me like we need to get a Kitty into Elan's hands so they can do their testing and figure out what's actually going on.  We might be able to probe some of the lines for them once, but it's going to be really slow going compared to just putting a device in their lab.

@Yung, can we get a Kitty sent to Elan?
@charliemooney - let me check.  This is a public tracker so will communicate by email for shipping logistics.

Comment 30 by ketakid@google.com, Jan 18 2017

This is currently assigned to marcheu@. Who should the right owner be for this issue?
Owner: yungleem@chromium.org
I did the work to track down the patch that broke Kitty, which was written by markz.  I assigned the bug to Marcheu because he reviewed the offending change.  He asserts that the graphics change was "fully in spec" and that there's some quirk about this touchscreen that is causing this behavior.  As such, the work seems to fall on Elan, the touch vendor.

The bug tracker won't let me assign this bug to anyone from nvidia or elan, so I'm not sure who to put there.  Yung is trying to get a Kitty for Elan to work on and debug, so I guess I'll assign it to him in the meantime?
 Thanks Charlie and Yung. We got one device and could reproduce the issue. We are trying to figure out the root cause and fix the issue internally.
@Mark
Thanks for your update.

We found the root cause is how long between first touch INT falling and first touch i2c ACK.
Original image, the time is 62.412ms. With CL (https://chromium-review.googlesource.com/#/c/416702/) is 1.075s.
These will make our timing of driver flow wrong. We cannot get expect data at expect timing to finish initial, so touch no function.

We found a easy workaround and verified on our Kitty device pass 50/50.
Please try to add a 1 second delay at the end of CL.

Many thanks,
Johnny

static void panel_simple_powerwash(struct panel_simple *p)
{
	int err;

	if (p->backlight) {
		p->backlight->props.power = FB_BLANK_POWERDOWN;
		backlight_update_status(p->backlight);
	}

	if (p->desc->delay.disable)
		msleep(p->desc->delay.disable);

	err = regulator_enable(p->supply);
	if (err < 0)
		dev_err(p->base.dev, "failed to enable supply: %d\n", err);
	regulator_disable(p->supply);

	if (p->desc->delay.unprepare)
		msleep(p->desc->delay.unprepare);
	msleep(1000);
}

@Charlie,
There is another "recovery" case which I forgot previous. We will send reset command via i2c bus and check received data correct or not before we query hw_version and fw_version.
If we got three times of "invalid 'hello' packet", we will set "Recovery" to iap_mode. From Re#5 attached file(failure message), we can see three times. 
ERR kernel: [    2.412920] elants_i2c 1-0010: invalid 'hello' packet: ff ff ff ff
ERR kernel: [    2.512523] elants_i2c 1-0010: invalid 'hello' packet: ff ff ff ff
ERR kernel: [    2.612118] elants_i2c 1-0010: invalid 'hello' packet: ff ff ff ff
@Mark @Marcheu -- Does this seem like a good solution?
Owner: marc...@chromium.org
set owner to marcheu to answer #c34
It will slow down the boot by 1s though, I'm not sure if we want to do that. Is 1s the minimum or can we go lower?
Thanks Johnny. This sounds good to me. Also can we set this delay lower? I can create a patch when we get a conclusion.
One thing I want to highlight is that, on nyan-big, we already have a 0.5s delay here: msleep(p->desc->delay.unprepare), which is from nyan-big's panel spec: a 0.5s delay is needed between panel on/off.
So I'm wondering that when we have the conclusion about the delay for touch, we can combine these 2 delays. E.g, if the delay for touch is 1s, then I think we can do like this:

if (p->desc->delay.unprepare)
    msleep(p->desc->delay.unprepare);
msleep(1000 - p->desc->delay.unprepare);
The minimum value I tried is 1 second. 1 second was measured based on powerwash work finished. So it cannot combine with panel sepc (0.5s delay). I also tried your solution which was failed : msleep(1000 - p->desc->delay.unprepare);
I see. But I'm still confused that why adding that 1s delay works. Johnny you mentioned that "first touch INT falling and first touch i2c ACK", so does adding a 1s delay after panel reset mean the INT & I2C ASK will happen during this 1s sleeping? Could you explain more about this 1s delay?
Sorry for the late response.

After power on, FW will pull INT low when it ready to be accessed by I2C.
Add 1 second delay will extend the duration time between INT low & I2C ACK.
For driver flow, we will get I2C ACK as above duration time. Then we will start initialization to get more information. If fail to get correct information, it will be set to Recovery mode.

After discussing with our FW team. They expect the duration time less then 900ms when power on. If more then 900ms, it will do other works first and pending I2C commands until works were finished. So we suggest add more then 1 second at CL to make sure all works done. Hence driver could get correct information to finish initialization.

Any test results for the workaround?

Johnny
OK thanks Johnny. A patch is created according to your suggestion:
https://chromium-review.googlesource.com/#/c/431333/
Penny@nvidia will test this and will update the result here soon.
I tested the patch with 50 times reboot cycle, and the touch function worked fine.
markz@gpartner-nvidia.com Looks like the test succeeded and the patch is working. Will you please check this into ToT so we can validate the fix before we can merge it to M57?
yeah I am fine with that. Stephane, could you approve the patch? Do you have other concerns about this 1s delay?
I'm also thinking that maybe we can do something -- like adding this delay in Kitty's DTS so that it only takes effect on Kitty. AFAIK, big & blaze don't have this issue. We can do that in another patch I think.
we haven't pushed kitty in the last few releases. Can we get this in ToT so we can quickly validate before putting this in the next dev?
Today I tried to make this 1s delay only takes effect on Kitty. Then I observed that Kitty's panel unprepare already has a 1s delay... so I add an additional 1s based on that which is this patch:
https://chromium-review.googlesource.com/c/438105/

So johnny(or other ELan guys), could you take a look at and test the change? It's just 1 line.

Meanwhile, Penny, could you help to test this kernel on Kitty:
https://drive.google.com/open?id=0B2zFJRMzpWE-N1EwcS1UY0llUkU
With attached kernel image, I met the touch function didn't work at 33th reboot cycle.
@johnny, so this makes me confusing. Why Penny hits the issue again? Technically, the new patch changes the msleep from 1000ms to 2000ms which has covered the additional 1s delay so why it isn't working? Could you take a look?
I can't say I understand fully what is causing this issue, but it seems like adding more and more delay to the boot time can't be the best solution.  We didn't have a 1/33 chance of the touchpad not working on reboot before, will adding the extra delay in comment #50 get us a guarantee the touchscreen will work, or will it just increase the odds of it working?  We didn't use to have any chance of failure as far as I know, so this feels like a slippery slope of a regression here -- adding upwards or 2 seconds to boot, plus it doesn't seem guaranteed to fix the problem.

Is there a way to actually detect when it's safe to reset the panel instead of just giving it some time and hoping it's enough?
Labels: -ReleaseBlock-Dev ReleaseBlock-Stable
It looks like the solution is going to take some more time to bake. Moving this to RBS so we don't block the upcoming Beta.
Without a fix for this users will have no touchscreen whatsoever on Kitty, I don't think she should allow a build with this bug to release to anyone.

Would a revert break graphics too much while people figure out a good solution?
Labels: -ReleaseBlock-Stable ReleaseBlock-Dev
Adding back dev blocker until we have a firm decision on this
charlie, revert the "panel reset" patch will cause nyan-big display issues when system boots. So I don't think revert is a good way to go.
Besides, I agree with you. It seems that adding delay just makes the issue hard to reproduce, it doesn't fix it completely. While for this we still need Elan guys to take a look.
If reverting your patch will make this display not work, then how was Nyan-big's display working for so long until just now?  It seems like comepletely disabling the touchscreen on nyan-kitty must be a worse side effect than whatever was happening with nyan-big's screen for all that time.  I could be wrong, but as long as this bug is open on Kitty, we can't ship any updates to Kitty for any branch that it's on.

If you need Elan's help, please work with them to figure this out, but we need to have a solution for this one way or the other pretty soon.  What question do you need answered from Elan?  They are cc'd on this bug and should see whatever you ask them if you ask it here
First of all, I think reset the panel when kernel boots is the right thing to do. As I explained earlier, for kernel, it thinks the panel is OFF when panel driver finishes probing while the fact is coreboot enables the panel already. So we need to reset the panel to make the HW state align with kernel driver.

Secondly, your words make sense -- "how was Nyan-big's display working for so long until just now"? Honestly I can't explain this, we're using nvidia's own display driver before, recently we replaced it with upstream drm display driver. So again I don't have answer for your question, I guess this is timing related. The display issue on nyan-big is that display controller can't get panel's EDID in a specific period, while reset the panel makes it works .

Finally, an ugly workaround I can think of now is, we just reset the panel on big and blaze. I mean, we can add something in DTS to indicate big/blaze need a panel reset while Kitty doesn't.
The thing that I need Elan's help is, right now it seems that the 1s delay which Elan suggested is not the root cause/correct fix. I don't have any knowledge about touch stuffs, so I hope Elan can dig this again. If anything needs me or Penny, just feel free to comment here.
@Mark, confirm your code status. Did you add powerwash CL(without any delay) and CL-438105 (https://chromium-review.googlesource.com/c/438105/)

I'm trying your new code and check signal first and discuss with FW team. 
@johnny, yes, I'm using tot kernel + 438105. You can try it as well.
@Mark, Thanks

We got a new MP device, but we don't know how to switch it into develop mode.
Usually, we press ESC + F3 + Power Key to switch it on other projects.
Just wondering if there is any ways we could make the device go into develop mode so that we could further debug it.
I don't know either, normally I'm working on debug boards which has debug port so that I can flash coreboot directly. Charlie might answer this I think.
Ah yes, since a USB keyboard isn't a trusted piece of hardware you can't enter developer mode using one.  On our All-in-one units like Kitty, there is a "paperclip hole" located next to the Kensington lock on the device.  Basically we use the button at the bottom of that hole (that you push with a paperclip) to tell the system to switch to dev mode.

1. Turn off device
2. Use a paperclip to press and hold the button at the bottom of the hole
3. Turn on device while holding the button down
4. Device should boot into "Recovery Mode"
5. Press Ctrl+d on the recovery mode screen to tell the system to switch to dev mode
6. Tap the button at the bottom of the paperclip hole one time to confirm the switch to dev mode


I think that should do it
Thanks Charlie, I switched to dev mode by your steps.
Okay everyone -- apparently a copy of this broken build just got sent out on dev channel for Kitty, completely disabling everyone on dev mode's touchscreens.  This is exactly why I didn't want to just leave this patch in the tree, it's far too error prone.

Explain to me again why we can't revert this patch while working on a fix?  It feels like we should revert it in the meantime so that Kitty users can get updates and use their touchscreens...  It seems like the only fix for the short-term no?

Can someone explain to me why we can't revert the offending patch, then add it back in once Elan has figured out how to do it without breaking Kitty?
Labels: M-58
Labels: ReleaseBlock-Beta
I was thinking more about this, and was wondering if you could get the extra-delay fix merged into the graphics code to get Kitty's touchscreen "mostly working" with a delay in the meantime?  Then we don't have to make Blaze's boot graphics break again and people can use their Kitty's while Elan works on understanding the issue?

We'd need to remove the delay once it's really fixed, naturally, but it might be a good stop-gap in the interim.

Does that sound good?  (Sorry if this was already suggested and I missunderstood)

Mark, could you do that, do you think?
Charlie, I think for now adding delay can't workaround this completely, not to mention we don't know how long the delay should be.
Let's do something ugly temporarily -- I've submitted a patch here:
https://chromium-review.googlesource.com/#/c/440684/

This patch introduces a new DTS property named "no-powerwash" and adds this property in Kitty's DTS. This means on Kitty we won't do the panel reset while big/blaze are not affected.

Please take a look, if you think this looks good. Elan and Penny@nvidia can give it a test. We can revert it when we finally fix this issue.
@Mark, for #50 question, we don't have idea what happen on 1/33 fail.
Adding delay is a workaround solution. But it didn't meet our FW design.
Actually, if unprepare time could be decreased from 1000ms to 500ms, it will meet our FW design. Then we could make sure touch will work fine.

@Charlie, below CL modified unprepare delay time for panel. 
Could you ask the author of CL (Haixia Shi <hshi@chromium.org>) help to check the minimal value for unprepare of panel?

@johnny, so you have created a patch which reduces the delay from 1000ms to 500ms right? Could you show me the CL number? Have you verified this patch?
@Mark, my colleague validated 300 times and all worked. Due to I'm not sure the minimal value for unprepare panel, so I didn't create a patch.
Cc: -marc...@chromium.org h...@chromium.org
@Johnny Chuang, which CL are you talking about in comment #70?  I don't understand what you mean by "minimal value for unprepare of panel" either, sorry.

Just to be clear, I don't personally understand anything about the power system here -- I was just roped into this bug because it has to do with the touchscreen.  It looks like a change in some power code for graphics caused it, which is something I've never worked with before.

Feel free to contact Haixia Shi directly, you don't need to go through me to send an email to their account.  I'm adding hsi@ to this bug so you can talk directly though here
Cc: -h...@chromium.org
Haixia left google, please don't bother him with this :)
Okay, do you have a suggestion of a someone else that could help?  I never worked with him, so I can't say I even know what area he worked in
IIUC, Johnny's suggestion is like this:
https://chromium-review.googlesource.com/#/c/442224/
Since the original 1000ms panel unprepare delay is added by Haixia, johnny wants him to check whether reducing to 500ms is fine. Johnny right?
I, too, do not have the full understanding of this problem, but I believe Kitty has an extra delay on panel initialization because of its size as well as needing for an LVDS to eDP bridge.  Big or Blaze I believe are using eDP panels so that might be the main difference causing the regression on Kitty?
markz@ - Here is my suggested plan.

In my understanding, the patch only affects Kitty.  So let's push the CL has occasional failure to ToT and M57 so we can unblock dev and beta channel push.  Once it's merged, I am going to mark this bug Stable blocker to give you more time to find a more fundamental fix.
snanda@ Can you please suggest someone on the kernel team who can help drive this issue to conclusion?
Thanks Yunglee. I agree. Patch 442224 is according to Johnny's investigation and we need more people verify/confirm it. Patch 440684 completely disables the offending patch(I mean the original patch causes this issue). I'm OK with both of them.
Penny@nvidia tested the tot kernel + CL 442224. Surprisingly the touch is totally not working. Tried 5 times and all of them failed. Johnny, could you give us your changes? I mean, the change which passed 300 cycles testing.
@Mark, my changed as attached png file. It's same with CL 442224.
The code base of 300 cycles testing is clean download on 2017-02-08.
I don't know what's the difference between us. I will fetch tot again and cherry pick CL 442224 to check.
unprepare_500ms_diff.png
44.0 KB View Download
@Mark, I repo init and repo sync to get a latest code and cherry pick CL 442224. The touch function work fail with native build, but work fine with CL 442224 build. Could you help to check it again?
@Johnny, we did the test again but still the same. Touch is not working at all.
In case I'm doing something stupid, here is my git log:

0d92516f2c27 CHROMIUM: drm/panel: Reduce panel unprepare time to 500ms
d5f25d3f7f8b CHROMIUM: ALSA: info: Check for integer overflow in snd_info_entry_write()
be1344074eb9 UPSTREAM: ipv4: fix a race in ip4_datagram_release_cb()
c4be52453f80 CHROMIUM: drm/tegra: Check buffer size when doing kmap
e94245762e91 CHROMIUM: drm/tegra: Fix host1x syncpt array OOB access
48c54a06235b CHROMIUM: perf: don't leave group_entry on sibling list (use-after-free)
6831774e377a elan_i2c: Try to use pid in fw name, with fallback
c83bac17e190 drm/evdi: Filter unsupported display modes
01544234926e drm/evdi: Always send mode change event

Looks same with yours? If yes, maybe our Kitty has some differences? Different touch firmware version?
@Mark, git log looks same with you.

FW version is 1215. (cat /sys/bus/i2c/drivers/elants_i2c/1-0010/fw_version)
fwid = Google_Nyan_Kitty.5771.61.16 (crossystem | grep fwid)
RO version: kitty_v1.1.1809-4ad59c4 (ectool version)

localhost ~ # cat /sys/bus/i2c/drivers/elants_i2c/1-0010/fw_version
0000

localhost ~ # crossystem | grep fwid
fwid      = Google_Nyan_kitty.5771.61.18  # Active firmware ID
ro_fwid   = Google_Nyan_kitty.5771.61.18  # Read-only firmware ID

localhost ~ # dmesg | grep elan
[    7.097920] elants_i2c 1-0010: invalid 'hello' packet: ff ff ff ff
[    7.202981] elants_i2c 1-0010: invalid 'hello' packet: ff ff ff ff
[    7.322570] elants_i2c 1-0010: invalid 'hello' packet: ff ff ff ff

Don't know why my fw_version is 0000. BTW, I checked the dmesg and found above errors, is that expected?
Just talked with Penny and got fw_version now. Our Kitty's fw_version is 1215 as well.
Given the diverged test result of reducing unprepare delay to 500ms, I would suggest we land this change temporarily:
https://chromium-review.googlesource.com/#/c/440684/
Cc: snanda@chromium.org
Guys, can we land 440684?
@marcheu, @snanda, can one of you review https://chromium-review.googlesource.com/#/c/440684/ ?
@91: I already have on Feb 10
ahh.. I see.

Mark, so what's holding you up from landing the patch?
@yungleem, no, I just want you guys to confirm. :)
Let's land this to unblock Kitty.
Project Member

Comment 95 by bugdroid1@chromium.org, Feb 24 2017

Labels: merge-merged-chromeos-3.10
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/f7dc8485b1f056218a219267449654f76415638c

commit f7dc8485b1f056218a219267449654f76415638c
Author: Mark Zhang <markz@gpartner-nvidia.com>
Date: Fri Feb 24 03:06:17 2017

CHROMIUM: drm/panel: Introducing no-powerwash dts property

We can't reset the panel during boot on Kitty because that
makes the touchscreen stops working.

So to handle this, this patch adds a new DTS property named
"no-powerwash" to prevent reset the panel on Kitty.

BUG= chromium:676094 

Change-Id: Ifc883ed6a1db23b689d5b6b548d7f50d3a327dfb
Signed-off-by: Mark Zhang <markz@gpartner-nvidia.com>
Reviewed-on: https://chromium-review.googlesource.com/440684
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>

[modify] https://crrev.com/f7dc8485b1f056218a219267449654f76415638c/arch/arm/boot/dts/tegra124-nyan-kitty-rev8.dts
[modify] https://crrev.com/f7dc8485b1f056218a219267449654f76415638c/drivers/gpu/drm/panel/panel-simple.c
[modify] https://crrev.com/f7dc8485b1f056218a219267449654f76415638c/arch/arm/boot/dts/tegra124-nyan-kitty-rev0_3.dts

marcheu@ can you please add the request-merge-57 label so we can get this approved for merge to M57? We need to have this merged before tomorrow so this week's beta can pick up this change.


Status: Fixed (was: Assigned)
Project Member

Comment 98 by bugdroid1@chromium.org, Feb 28 2017

Labels: merge-merged-release-R57-9202.B-chromeos-3.10
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/83bac0a14b343eaf991d82e9238f05d90a7162eb

commit 83bac0a14b343eaf991d82e9238f05d90a7162eb
Author: Mark Zhang <markz@gpartner-nvidia.com>
Date: Tue Feb 28 00:12:31 2017

CHROMIUM: drm/panel: Introducing no-powerwash dts property

We can't reset the panel during boot on Kitty because that
makes the touchscreen stops working.

So to handle this, this patch adds a new DTS property named
"no-powerwash" to prevent reset the panel on Kitty.

BUG= chromium:676094 

Change-Id: Ifc883ed6a1db23b689d5b6b548d7f50d3a327dfb
Signed-off-by: Mark Zhang <markz@gpartner-nvidia.com>
Reviewed-on: https://chromium-review.googlesource.com/440684
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/447221
Commit-Queue: Stéphane Marchesin <marcheu@chromium.org>
Tested-by: Stéphane Marchesin <marcheu@chromium.org>

[modify] https://crrev.com/83bac0a14b343eaf991d82e9238f05d90a7162eb/arch/arm/boot/dts/tegra124-nyan-kitty-rev8.dts
[modify] https://crrev.com/83bac0a14b343eaf991d82e9238f05d90a7162eb/drivers/gpu/drm/panel/panel-simple.c
[modify] https://crrev.com/83bac0a14b343eaf991d82e9238f05d90a7162eb/arch/arm/boot/dts/tegra124-nyan-kitty-rev0_3.dts

Status: Verified (was: Fixed)

Sign in to add a comment