Kitty touchscreen's in "Recovery" mode after kernel patch -- don't work |
||||||||||||||||||||||
Issue descriptionStarting in R57-9092.0.0, all nyan_kitty machines are reporting hw_id of 0000. Error has happened across four different (previously working) machines. Error has happened every M57 test run since that build. BEFORE: 2016-12-16T10:11:57.686258+00:00 NOTICE chromeos-elan-touch-firmware-update[974]: Board detected as 'nyan_kitty' 2016-12-16T10:11:57.702739+00:00 NOTICE chromeos-elan-touch-firmware-update[986]: Touch controller reported Product ID: 0999 2016-12-16T10:11:57.714673+00:00 NOTICE chromeos-elan-touch-firmware-update[1000]: Attempting to load FW: '/lib/firmware/elants_i2c.bin' 2016-12-16T10:11:57.717882+00:00 NOTICE chromeos-elan-touch-firmware-update[1003]: (which points to '/opt/google/touch/firmware/0999_1215.bin') 2016-12-16T10:11:57.722804+00:00 NOTICE chromeos-elan-touch-firmware-update[1007]: Hardware product id : 0999 2016-12-16T10:11:57.726186+00:00 NOTICE chromeos-elan-touch-firmware-update[1010]: Updater product id : 0999 2016-12-16T10:11:57.733893+00:00 NOTICE chromeos-elan-touch-firmware-update[1016]: Current Firmware: 1215 2016-12-16T10:11:57.738420+00:00 NOTICE chromeos-elan-touch-firmware-update[1019]: Updater Firmware: 1215 2016-12-16T10:11:57.746600+00:00 NOTICE chromeos-elan-touch-firmware-update[1025]: Firmware up to date. AFTER: 2016-12-16T15:53:10.433582+00:00 NOTICE chromeos-elan-touch-firmware-update[967]: Board detected as 'nyan_kitty' 2016-12-16T15:53:10.450393+00:00 NOTICE chromeos-elan-touch-firmware-update[979]: Touch controller reported Product ID: 0000 2016-12-16T15:53:10.464601+00:00 NOTICE chromeos-elan-touch-firmware-update[993]: Attempting to load FW: '/lib/firmware/elants_i2c.bin' 2016-12-16T15:53:10.468770+00:00 NOTICE chromeos-elan-touch-firmware-update[997]: (which points to '/opt/google/touch/firmware/0999_1215.bin') 2016-12-16T15:53:10.473133+00:00 NOTICE chromeos-elan-touch-firmware-update[1000]: Hardware product id : 0000 2016-12-16T15:53:10.476246+00:00 NOTICE chromeos-elan-touch-firmware-update[1004]: Updater product id : 0999 2016-12-16T15:53:10.484905+00:00 NOTICE chromeos-elan-touch-firmware-update[1015]: error: Touch firmware updater: Product ID mismatch! Example failure: https://wmatrix.googleplex.com/testrun/unfiltered?test_ids=398512432
,
Dec 20 2016
This sounds like the touchscreen's firmware is getting corrupted. Do you know if the touchscreen's are working? I would expect they don't... Since Kitty only has one FW it would be reasonable to make the updater force the update in this situation, but I don't know why they would all get corrupted like this. I don't suppose you have a log that shows the corruption happening? (eg the first boot on this 9092.0.0 version that broke everything?)
,
Dec 20 2016
Looking at what landed between 9091 and 9092 I can't see anything that stands out https://crosland.corp.google.com/log/9091.0.0..9092.0.0
,
Dec 20 2016
I'll need a kitty to test on, do we have any?
,
Dec 20 2016
We've confirmed that the touchscreen doesn't work in this state. Blocking dev releases for kitty only (as there are no other machines showing this failure). Switching back to M56 resumes normal update behavior. Attached is /var/log/messages for test runs right before running 9092.0.0 (90958291, PASS), right after (90995204, FAIL), and a later M56 test run on the same machine (91125008, PASS).
,
Dec 21 2016
Uploading the correct file for a switch to M56 build from #5
,
Jan 3 2017
In a bizarre turn of events, this doesn't seem to be caused by a bad fw and/or bad fw update. When I switch my Kitty between canary and stable, the touchscreen works perfectly on stable (55) but fails in the described way on canary (57). The interesting thing is that there doesn't seem to be an actual firmware update occuring anywhere to fix/break it. It seems that there is something else that changed between those two versions that is preventing the Chromebook from correctly querying the touchscreen's product ID. I'm unsure what that would be, but I'm going to try and figure it out
,
Jan 3 2017
Found some more unexpected symptoms, and made some narrowing down progress hopefully.
Testing on 9091 --> everything works fine
Testing on 9092 -->
1. fw_version, hw_version, bc_version, etc.. all just report 0's
2. iap_mode reports "Recovery" (When it's working it is usually "Normal")
3. Chrome doesn't use touchscreen input to interact with UI
4. Miraculously, evtest does, however, show the screen reporting
touches, and they seem to be reasonable/correct.
So, it's sure looking like there's some patch between 9091 and 9092 that is causing the touchscreen to go into "Recovery" mode instead or "Normal."
The link below should show which patches were added between them (Google internal link only, unfortunately):
https://crosland.corp.google.com/log/9091.0.0..9092.0.0
To further narrow down the culprit, I've taken a working 9091 image and deployed a ToT kernel onto it. The touchscreen immediately stopped working just as before, so I'm pretty confident that one of the four 3.10 kernel patches between 9091 and 9092 must be the cause:
1. CHROMIUM: drm/i915: use min brightness instead of 0 as trigger
https://chromium-review.googlesource.com/#/c/419771/
--> This seems less likely to be the cause, but I added dbehr@ just
in case there's something in there I'm not seeing.
2. CHROMIUM: drm/tegra: Align dc/dpaux/sor/panel enable/disable
https://chromium-review.googlesource.com/#/c/416704/
3. CHROMIUM: drm/tegra: dpaux: Reset dpaux clock correctly
https://chromium-review.googlesource.com/#/c/416703/
4. CHROMIUM: drm/panel: Add panel powerwash
https://chromium-review.googlesource.com/#/c/416702/
Note: Kitty is a Tegra system, so the fact that these kernel patching are for Tegra seems to make sense!
I'll see if I can narrow it down more.
,
Jan 4 2017
[-dbehr@ since his CL wasn't the cause, sorry for the spam] Okay, the CL that broke the touchscreen seems to be this one: CHROMIUM: drm/panel: Add panel powerwash https://chromium-review.googlesource.com/#/c/416702/ When I revert that and build a new kernel the touchscreen works fully again. I'm afraid I don't really know what that patch is or why it's causing issues, so I'm going to assign this bug over to the people who originally worked on that CL. Can you take a look at this patch and try to figure out why it's switching the Kitty touchscreen into a "Recovery" mode and keeping it from working anymore? Thanks!
,
Jan 4 2017
,
Jan 5 2017
,
Jan 6 2017
From the offline email from Mark Z (who wrote the patch that seems to be causing issues): I created that kernel patch several days ago and the patch fixes another display issue observed on nyan-big. Anyway, I wrote this mail just want to ask you, do you know somebody who knows about Kitty's touch controller/firmware stuffs? The touch screen stops working is because the touch firmware can't get the panel's product ID. Given I've totally no idea about touch, could you find someone to help on this issue? My patch resets the panel power when kernel boots, so I want to know why this makes touch firmware stops working. Thanks, Mark
,
Jan 6 2017
Mark, the touchscreen is not failing *because* it can't get a product ID. Failing to get the product ID is a symptom of the problem. Whatever your patch changed is somehow forcing the touchscreen into "Recovery" mode, and apparently in that mode the firmware is incapable of reporting the product ID. I didn't design or do any of the wiring for the touchscreen, nor do I know much about the low-level kind of stuff you were modifying with that patch so I'm not sure what the actual cause could be really. All I know is that I was able to narrow the CL that caused it down that that patch -- when I revert it everything starts working just fine again. It looks like maybe your patch has to do with controlling power lines, right? Perhaps it causes some gpio to stay high/low which confuses the touchscreen or something similar? I'm adding Jeff from Elan (the touchscreen company) to see if he can shed some light on what would put the touchscreen in "Recovery" mode for you.
,
Jan 6 2017
Adding the people from the email to keep everyone in the loop
,
Jan 8 2017
Hi Mark,
Do you still support the power to touch controller in this mode? I will discuss with our internal team and get back to you.
Thanks,
ELAN Jeff
,
Jan 9 2017
Thanks Charlie & Jeff. I think you guys can take a look my patch -- it just has few lines. Basically what this patch does is, disable the panel then enable it again. So here disable/enable means disable/enable regulators, manipulate some gpios... and etc. And the reason that we need this patch is, on ChromeOS, normally the panel and the display have been enabled by coreboot, so when kernel starts running, the hardware state doesn't align with the kernel driver's. E.g, the panel regulator in driver is disabled while actually the panel power is on. So this brings some issues(refer to https://code.google.com/p/chrome-os-partner/issues/detail?id=55903 for details). Anyway, come back to this touchscreen issue, I guess it can be fixed by few lines which correct the touch firmware state. Just like Charlie mentioned, maybe we need to change some gpios or something. Jeff, please suggest, thanks in advance.
,
Jan 9 2017
Jeff, we need Elan to tell us what kinds of things might cause these symptoms on your controller or Mark isn't going to be able to debug this. What could cause iap_mode to be "Recovery"?
,
Jan 12 2017
Jeff@elan - can you please follow up? Thanks.
,
Jan 12 2017
I just spoke with Jeff via email to make sure Elan is aware of this issue. He says they don't have a Kitty to test, but will look at it tomorrow (or "today" depending on which timezone you're in) and should have some info for us shortly
,
Jan 13 2017
Hi Charlie,
Touch driver will read hw_version and fw_version on elants_i2c_initialize() when probe and fw_update process. If we cannot get information, the iap_mode will set to be "Recovery".
As #9 comment, touch could get correct hw_version and fw_version after reverting panel powerwash CL.(https://chromium-review.googlesource.com/#/c/416702/)
Due to the fw boot code of Kitty didn't support hw_version command.So we think fw works fine on IC and didn't re-flash fw after reverting panel powerwash CL. Hence, it maybe touch driver reading hw_version at the duration of panel control regulator.
,
Jan 13 2017
Okay, so just to clarify/confirm: 1. The iap_mode is set to "Recovery" as a result of the elants_i2c driver being unable to query the hw_version/fw_version from the touchscreen (not the other way around) 2. The boot code on the touchscreen doesn't support hw_version or fw_version 3. Your theory is that something in that patch cases the touchscreen to get rebooted/put into boot code, which causes the issues we're seeing. Did I understand that right? That sounds plausible to me, but I don't know how to go about fixing it. Does that give you any ideas of how to fix this, Mark? Johnny, do you have any ideas for a solution? Please add anyone you think might be helpful Additionally, could someone who works on graphics chime in here? Your fix is causing serious issues for touch and we could probably use some graphics/display expertise here. We need to fix this asap because Mark's patch fully disabled the touchscreen on all Kittys, so please let me know if you have any ideas. I don't know how the touchscreen firmware works or what Mark's patch is doing really, so I can't fix this -- I need you guys to help figure it out
,
Jan 13 2017
If no one has a fix suggestion soon, we'll be forced to revert the offending patch in the meantime so Kitty can continue to get updates. Then nVidia and Elan can figure this out without a time crunch
,
Jan 13 2017
,
Jan 13 2017
@Elan, this same patch didn't cause issues with your touchscreen on other "Nyan" devices (Blaze for instance) Can you explain why this change would effect only one of your systems?
,
Jan 14 2017
Sorry for the late response guys. So does elan mean my patch affects touch firmware's timing to read hw_version? If so, this looks like a timing issue? As a dirty HACK, I think I can reset the panel power in another place, for example, I can defer it until display driver tries to read panel EDID first time. Is this worth to try?
,
Jan 16 2017
@Charlie, Your understand is right. My theory still need to be proved. Could anyone help to get the touch screen signals(Power, I2C) of Kitty for work and fail cases when boot on? @ma..., I think it is a timing issue. For make sure to meet power on sequence, please move it as early as you can on a dirty HACK.
,
Jan 16 2017
I created a HACK here: https://chromium-review.googlesource.com/#/c/428654/ This HACK defers the panel reset until first time tegradrm tries to get panel EDID. Penny@nvidia helped to test this HACK and this is the result: - First trying is based on kernel commit: c3d787f39dd7 UPSTREAM: timers: Fix usleep_range() in the context of wake_up_process() then apply the HACK. The result is Penny can reproduce the issue 3 times within 50 reboot cycles. So the repro rate is 3/50. - Second trying is based on tot kernel + the HACK. And Penny didn't reproduce the issue within 50 reboot cycles. I noticed that recently we've merged a lot of elan_i2c patches, I don't know whether this makes the difference but to be honest, I don't think the HACK really solves the issue. This still sounds timing related, we need to figure out the root cause. Finally @johnny, the patch: CHROMIUM: drm/panel: Add panel powerwash https://chromium-review.googlesource.com/#/c/416702/ already resets the panel at the earliest stage, i.e, we can just defer it which is exactly my HACK patch does. Mark
,
Jan 17 2017
Thanks for writing and testing that hack, Mark. I agree that it doesn't sound like a stable fix even for the short term though, given the results of your tests with Penny. It definitely sounds like a race condition. We cherry-picked over a bunch of elan_i2c driver patches, but I would be really shocked if they factored into this issue, since we were seeing the same symptoms before they were patched in, too. Plus, this appears to be caused by how the touchscreen's firmware works, and not how the kernel driver is handling communication. Maybe Johnny from Elan can confirm that for us? It sounds to me like we need to get a Kitty into Elan's hands so they can do their testing and figure out what's actually going on. We might be able to probe some of the lines for them once, but it's going to be really slow going compared to just putting a device in their lab. @Yung, can we get a Kitty sent to Elan?
,
Jan 17 2017
@charliemooney - let me check. This is a public tracker so will communicate by email for shipping logistics.
,
Jan 18 2017
This is currently assigned to marcheu@. Who should the right owner be for this issue?
,
Jan 18 2017
I did the work to track down the patch that broke Kitty, which was written by markz. I assigned the bug to Marcheu because he reviewed the offending change. He asserts that the graphics change was "fully in spec" and that there's some quirk about this touchscreen that is causing this behavior. As such, the work seems to fall on Elan, the touch vendor. The bug tracker won't let me assign this bug to anyone from nvidia or elan, so I'm not sure who to put there. Yung is trying to get a Kitty for Elan to work on and debug, so I guess I'll assign it to him in the meantime?
,
Jan 19 2017
Thanks Charlie and Yung. We got one device and could reproduce the issue. We are trying to figure out the root cause and fix the issue internally.
,
Jan 20 2017
@Mark Thanks for your update. We found the root cause is how long between first touch INT falling and first touch i2c ACK. Original image, the time is 62.412ms. With CL (https://chromium-review.googlesource.com/#/c/416702/) is 1.075s. These will make our timing of driver flow wrong. We cannot get expect data at expect timing to finish initial, so touch no function. We found a easy workaround and verified on our Kitty device pass 50/50. Please try to add a 1 second delay at the end of CL. Many thanks, Johnny static void panel_simple_powerwash(struct panel_simple *p) { int err; if (p->backlight) { p->backlight->props.power = FB_BLANK_POWERDOWN; backlight_update_status(p->backlight); } if (p->desc->delay.disable) msleep(p->desc->delay.disable); err = regulator_enable(p->supply); if (err < 0) dev_err(p->base.dev, "failed to enable supply: %d\n", err); regulator_disable(p->supply); if (p->desc->delay.unprepare) msleep(p->desc->delay.unprepare); msleep(1000); } @Charlie, There is another "recovery" case which I forgot previous. We will send reset command via i2c bus and check received data correct or not before we query hw_version and fw_version. If we got three times of "invalid 'hello' packet", we will set "Recovery" to iap_mode. From Re#5 attached file(failure message), we can see three times. ERR kernel: [ 2.412920] elants_i2c 1-0010: invalid 'hello' packet: ff ff ff ff ERR kernel: [ 2.512523] elants_i2c 1-0010: invalid 'hello' packet: ff ff ff ff ERR kernel: [ 2.612118] elants_i2c 1-0010: invalid 'hello' packet: ff ff ff ff
,
Jan 20 2017
@Mark @Marcheu -- Does this seem like a good solution?
,
Jan 20 2017
set owner to marcheu to answer #c34
,
Jan 20 2017
It will slow down the boot by 1s though, I'm not sure if we want to do that. Is 1s the minimum or can we go lower?
,
Jan 21 2017
Thanks Johnny. This sounds good to me. Also can we set this delay lower? I can create a patch when we get a conclusion.
,
Jan 21 2017
One thing I want to highlight is that, on nyan-big, we already have a 0.5s delay here: msleep(p->desc->delay.unprepare), which is from nyan-big's panel spec: a 0.5s delay is needed between panel on/off.
So I'm wondering that when we have the conclusion about the delay for touch, we can combine these 2 delays. E.g, if the delay for touch is 1s, then I think we can do like this:
if (p->desc->delay.unprepare)
msleep(p->desc->delay.unprepare);
msleep(1000 - p->desc->delay.unprepare);
,
Jan 23 2017
The minimum value I tried is 1 second. 1 second was measured based on powerwash work finished. So it cannot combine with panel sepc (0.5s delay). I also tried your solution which was failed : msleep(1000 - p->desc->delay.unprepare);
,
Jan 23 2017
I see. But I'm still confused that why adding that 1s delay works. Johnny you mentioned that "first touch INT falling and first touch i2c ACK", so does adding a 1s delay after panel reset mean the INT & I2C ASK will happen during this 1s sleeping? Could you explain more about this 1s delay?
,
Jan 24 2017
Sorry for the late response. After power on, FW will pull INT low when it ready to be accessed by I2C. Add 1 second delay will extend the duration time between INT low & I2C ACK. For driver flow, we will get I2C ACK as above duration time. Then we will start initialization to get more information. If fail to get correct information, it will be set to Recovery mode. After discussing with our FW team. They expect the duration time less then 900ms when power on. If more then 900ms, it will do other works first and pending I2C commands until works were finished. So we suggest add more then 1 second at CL to make sure all works done. Hence driver could get correct information to finish initialization. Any test results for the workaround? Johnny
,
Jan 24 2017
OK thanks Johnny. A patch is created according to your suggestion: https://chromium-review.googlesource.com/#/c/431333/ Penny@nvidia will test this and will update the result here soon.
,
Jan 25 2017
I tested the patch with 50 times reboot cycle, and the touch function worked fine.
,
Jan 26 2017
markz@gpartner-nvidia.com Looks like the test succeeded and the patch is working. Will you please check this into ToT so we can validate the fix before we can merge it to M57?
,
Jan 29 2017
yeah I am fine with that. Stephane, could you approve the patch? Do you have other concerns about this 1s delay?
,
Jan 29 2017
I'm also thinking that maybe we can do something -- like adding this delay in Kitty's DTS so that it only takes effect on Kitty. AFAIK, big & blaze don't have this issue. We can do that in another patch I think.
,
Feb 2 2017
we haven't pushed kitty in the last few releases. Can we get this in ToT so we can quickly validate before putting this in the next dev?
,
Feb 6 2017
Today I tried to make this 1s delay only takes effect on Kitty. Then I observed that Kitty's panel unprepare already has a 1s delay... so I add an additional 1s based on that which is this patch: https://chromium-review.googlesource.com/c/438105/ So johnny(or other ELan guys), could you take a look at and test the change? It's just 1 line. Meanwhile, Penny, could you help to test this kernel on Kitty: https://drive.google.com/open?id=0B2zFJRMzpWE-N1EwcS1UY0llUkU
,
Feb 6 2017
With attached kernel image, I met the touch function didn't work at 33th reboot cycle.
,
Feb 6 2017
@johnny, so this makes me confusing. Why Penny hits the issue again? Technically, the new patch changes the msleep from 1000ms to 2000ms which has covered the additional 1s delay so why it isn't working? Could you take a look?
,
Feb 6 2017
I can't say I understand fully what is causing this issue, but it seems like adding more and more delay to the boot time can't be the best solution. We didn't have a 1/33 chance of the touchpad not working on reboot before, will adding the extra delay in comment #50 get us a guarantee the touchscreen will work, or will it just increase the odds of it working? We didn't use to have any chance of failure as far as I know, so this feels like a slippery slope of a regression here -- adding upwards or 2 seconds to boot, plus it doesn't seem guaranteed to fix the problem. Is there a way to actually detect when it's safe to reset the panel instead of just giving it some time and hoping it's enough?
,
Feb 6 2017
It looks like the solution is going to take some more time to bake. Moving this to RBS so we don't block the upcoming Beta.
,
Feb 6 2017
Without a fix for this users will have no touchscreen whatsoever on Kitty, I don't think she should allow a build with this bug to release to anyone. Would a revert break graphics too much while people figure out a good solution?
,
Feb 7 2017
Adding back dev blocker until we have a firm decision on this
,
Feb 7 2017
charlie, revert the "panel reset" patch will cause nyan-big display issues when system boots. So I don't think revert is a good way to go. Besides, I agree with you. It seems that adding delay just makes the issue hard to reproduce, it doesn't fix it completely. While for this we still need Elan guys to take a look.
,
Feb 7 2017
If reverting your patch will make this display not work, then how was Nyan-big's display working for so long until just now? It seems like comepletely disabling the touchscreen on nyan-kitty must be a worse side effect than whatever was happening with nyan-big's screen for all that time. I could be wrong, but as long as this bug is open on Kitty, we can't ship any updates to Kitty for any branch that it's on. If you need Elan's help, please work with them to figure this out, but we need to have a solution for this one way or the other pretty soon. What question do you need answered from Elan? They are cc'd on this bug and should see whatever you ask them if you ask it here
,
Feb 7 2017
First of all, I think reset the panel when kernel boots is the right thing to do. As I explained earlier, for kernel, it thinks the panel is OFF when panel driver finishes probing while the fact is coreboot enables the panel already. So we need to reset the panel to make the HW state align with kernel driver. Secondly, your words make sense -- "how was Nyan-big's display working for so long until just now"? Honestly I can't explain this, we're using nvidia's own display driver before, recently we replaced it with upstream drm display driver. So again I don't have answer for your question, I guess this is timing related. The display issue on nyan-big is that display controller can't get panel's EDID in a specific period, while reset the panel makes it works . Finally, an ugly workaround I can think of now is, we just reset the panel on big and blaze. I mean, we can add something in DTS to indicate big/blaze need a panel reset while Kitty doesn't.
,
Feb 7 2017
The thing that I need Elan's help is, right now it seems that the 1s delay which Elan suggested is not the root cause/correct fix. I don't have any knowledge about touch stuffs, so I hope Elan can dig this again. If anything needs me or Penny, just feel free to comment here.
,
Feb 7 2017
@Mark, confirm your code status. Did you add powerwash CL(without any delay) and CL-438105 (https://chromium-review.googlesource.com/c/438105/) I'm trying your new code and check signal first and discuss with FW team.
,
Feb 7 2017
@johnny, yes, I'm using tot kernel + 438105. You can try it as well.
,
Feb 7 2017
@Mark, Thanks We got a new MP device, but we don't know how to switch it into develop mode. Usually, we press ESC + F3 + Power Key to switch it on other projects. Just wondering if there is any ways we could make the device go into develop mode so that we could further debug it.
,
Feb 7 2017
I don't know either, normally I'm working on debug boards which has debug port so that I can flash coreboot directly. Charlie might answer this I think.
,
Feb 7 2017
Ah yes, since a USB keyboard isn't a trusted piece of hardware you can't enter developer mode using one. On our All-in-one units like Kitty, there is a "paperclip hole" located next to the Kensington lock on the device. Basically we use the button at the bottom of that hole (that you push with a paperclip) to tell the system to switch to dev mode. 1. Turn off device 2. Use a paperclip to press and hold the button at the bottom of the hole 3. Turn on device while holding the button down 4. Device should boot into "Recovery Mode" 5. Press Ctrl+d on the recovery mode screen to tell the system to switch to dev mode 6. Tap the button at the bottom of the paperclip hole one time to confirm the switch to dev mode I think that should do it
,
Feb 8 2017
Thanks Charlie, I switched to dev mode by your steps.
,
Feb 9 2017
Okay everyone -- apparently a copy of this broken build just got sent out on dev channel for Kitty, completely disabling everyone on dev mode's touchscreens. This is exactly why I didn't want to just leave this patch in the tree, it's far too error prone. Explain to me again why we can't revert this patch while working on a fix? It feels like we should revert it in the meantime so that Kitty users can get updates and use their touchscreens... It seems like the only fix for the short-term no? Can someone explain to me why we can't revert the offending patch, then add it back in once Elan has figured out how to do it without breaking Kitty?
,
Feb 9 2017
,
Feb 9 2017
,
Feb 9 2017
I was thinking more about this, and was wondering if you could get the extra-delay fix merged into the graphics code to get Kitty's touchscreen "mostly working" with a delay in the meantime? Then we don't have to make Blaze's boot graphics break again and people can use their Kitty's while Elan works on understanding the issue? We'd need to remove the delay once it's really fixed, naturally, but it might be a good stop-gap in the interim. Does that sound good? (Sorry if this was already suggested and I missunderstood) Mark, could you do that, do you think?
,
Feb 10 2017
Charlie, I think for now adding delay can't workaround this completely, not to mention we don't know how long the delay should be. Let's do something ugly temporarily -- I've submitted a patch here: https://chromium-review.googlesource.com/#/c/440684/ This patch introduces a new DTS property named "no-powerwash" and adds this property in Kitty's DTS. This means on Kitty we won't do the panel reset while big/blaze are not affected. Please take a look, if you think this looks good. Elan and Penny@nvidia can give it a test. We can revert it when we finally fix this issue.
,
Feb 10 2017
@Mark, for #50 question, we don't have idea what happen on 1/33 fail. Adding delay is a workaround solution. But it didn't meet our FW design. Actually, if unprepare time could be decreased from 1000ms to 500ms, it will meet our FW design. Then we could make sure touch will work fine. @Charlie, below CL modified unprepare delay time for panel. Could you ask the author of CL (Haixia Shi <hshi@chromium.org>) help to check the minimal value for unprepare of panel?
,
Feb 13 2017
@johnny, so you have created a patch which reduces the delay from 1000ms to 500ms right? Could you show me the CL number? Have you verified this patch?
,
Feb 13 2017
@Mark, my colleague validated 300 times and all worked. Due to I'm not sure the minimal value for unprepare panel, so I didn't create a patch.
,
Feb 13 2017
@Johnny Chuang, which CL are you talking about in comment #70? I don't understand what you mean by "minimal value for unprepare of panel" either, sorry. Just to be clear, I don't personally understand anything about the power system here -- I was just roped into this bug because it has to do with the touchscreen. It looks like a change in some power code for graphics caused it, which is something I've never worked with before. Feel free to contact Haixia Shi directly, you don't need to go through me to send an email to their account. I'm adding hsi@ to this bug so you can talk directly though here
,
Feb 13 2017
Haixia left google, please don't bother him with this :)
,
Feb 13 2017
Okay, do you have a suggestion of a someone else that could help? I never worked with him, so I can't say I even know what area he worked in
,
Feb 14 2017
IIUC, Johnny's suggestion is like this: https://chromium-review.googlesource.com/#/c/442224/ Since the original 1000ms panel unprepare delay is added by Haixia, johnny wants him to check whether reducing to 500ms is fine. Johnny right?
,
Feb 14 2017
I, too, do not have the full understanding of this problem, but I believe Kitty has an extra delay on panel initialization because of its size as well as needing for an LVDS to eDP bridge. Big or Blaze I believe are using eDP panels so that might be the main difference causing the regression on Kitty?
,
Feb 14 2017
markz@ - Here is my suggested plan. In my understanding, the patch only affects Kitty. So let's push the CL has occasional failure to ToT and M57 so we can unblock dev and beta channel push. Once it's merged, I am going to mark this bug Stable blocker to give you more time to find a more fundamental fix.
,
Feb 14 2017
snanda@ Can you please suggest someone on the kernel team who can help drive this issue to conclusion?
,
Feb 15 2017
Thanks Yunglee. I agree. Patch 442224 is according to Johnny's investigation and we need more people verify/confirm it. Patch 440684 completely disables the offending patch(I mean the original patch causes this issue). I'm OK with both of them.
,
Feb 15 2017
Penny@nvidia tested the tot kernel + CL 442224. Surprisingly the touch is totally not working. Tried 5 times and all of them failed. Johnny, could you give us your changes? I mean, the change which passed 300 cycles testing.
,
Feb 15 2017
@Mark, my changed as attached png file. It's same with CL 442224. The code base of 300 cycles testing is clean download on 2017-02-08. I don't know what's the difference between us. I will fetch tot again and cherry pick CL 442224 to check.
,
Feb 15 2017
@Mark, I repo init and repo sync to get a latest code and cherry pick CL 442224. The touch function work fail with native build, but work fine with CL 442224 build. Could you help to check it again?
,
Feb 16 2017
@Johnny, we did the test again but still the same. Touch is not working at all. In case I'm doing something stupid, here is my git log: 0d92516f2c27 CHROMIUM: drm/panel: Reduce panel unprepare time to 500ms d5f25d3f7f8b CHROMIUM: ALSA: info: Check for integer overflow in snd_info_entry_write() be1344074eb9 UPSTREAM: ipv4: fix a race in ip4_datagram_release_cb() c4be52453f80 CHROMIUM: drm/tegra: Check buffer size when doing kmap e94245762e91 CHROMIUM: drm/tegra: Fix host1x syncpt array OOB access 48c54a06235b CHROMIUM: perf: don't leave group_entry on sibling list (use-after-free) 6831774e377a elan_i2c: Try to use pid in fw name, with fallback c83bac17e190 drm/evdi: Filter unsupported display modes 01544234926e drm/evdi: Always send mode change event Looks same with yours? If yes, maybe our Kitty has some differences? Different touch firmware version?
,
Feb 16 2017
@Mark, git log looks same with you. FW version is 1215. (cat /sys/bus/i2c/drivers/elants_i2c/1-0010/fw_version) fwid = Google_Nyan_Kitty.5771.61.16 (crossystem | grep fwid) RO version: kitty_v1.1.1809-4ad59c4 (ectool version)
,
Feb 16 2017
localhost ~ # cat /sys/bus/i2c/drivers/elants_i2c/1-0010/fw_version 0000 localhost ~ # crossystem | grep fwid fwid = Google_Nyan_kitty.5771.61.18 # Active firmware ID ro_fwid = Google_Nyan_kitty.5771.61.18 # Read-only firmware ID localhost ~ # dmesg | grep elan [ 7.097920] elants_i2c 1-0010: invalid 'hello' packet: ff ff ff ff [ 7.202981] elants_i2c 1-0010: invalid 'hello' packet: ff ff ff ff [ 7.322570] elants_i2c 1-0010: invalid 'hello' packet: ff ff ff ff Don't know why my fw_version is 0000. BTW, I checked the dmesg and found above errors, is that expected?
,
Feb 16 2017
Just talked with Penny and got fw_version now. Our Kitty's fw_version is 1215 as well.
,
Feb 16 2017
Given the diverged test result of reducing unprepare delay to 500ms, I would suggest we land this change temporarily: https://chromium-review.googlesource.com/#/c/440684/
,
Feb 17 2017
,
Feb 22 2017
Guys, can we land 440684?
,
Feb 22 2017
@marcheu, @snanda, can one of you review https://chromium-review.googlesource.com/#/c/440684/ ?
,
Feb 22 2017
@91: I already have on Feb 10
,
Feb 22 2017
ahh.. I see. Mark, so what's holding you up from landing the patch?
,
Feb 23 2017
@yungleem, no, I just want you guys to confirm. :) Let's land this to unblock Kitty.
,
Feb 24 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/f7dc8485b1f056218a219267449654f76415638c commit f7dc8485b1f056218a219267449654f76415638c Author: Mark Zhang <markz@gpartner-nvidia.com> Date: Fri Feb 24 03:06:17 2017 CHROMIUM: drm/panel: Introducing no-powerwash dts property We can't reset the panel during boot on Kitty because that makes the touchscreen stops working. So to handle this, this patch adds a new DTS property named "no-powerwash" to prevent reset the panel on Kitty. BUG= chromium:676094 Change-Id: Ifc883ed6a1db23b689d5b6b548d7f50d3a327dfb Signed-off-by: Mark Zhang <markz@gpartner-nvidia.com> Reviewed-on: https://chromium-review.googlesource.com/440684 Reviewed-by: Stéphane Marchesin <marcheu@chromium.org> [modify] https://crrev.com/f7dc8485b1f056218a219267449654f76415638c/arch/arm/boot/dts/tegra124-nyan-kitty-rev8.dts [modify] https://crrev.com/f7dc8485b1f056218a219267449654f76415638c/drivers/gpu/drm/panel/panel-simple.c [modify] https://crrev.com/f7dc8485b1f056218a219267449654f76415638c/arch/arm/boot/dts/tegra124-nyan-kitty-rev0_3.dts
,
Feb 27 2017
marcheu@ can you please add the request-merge-57 label so we can get this approved for merge to M57? We need to have this merged before tomorrow so this week's beta can pick up this change.
,
Feb 28 2017
,
Feb 28 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/83bac0a14b343eaf991d82e9238f05d90a7162eb commit 83bac0a14b343eaf991d82e9238f05d90a7162eb Author: Mark Zhang <markz@gpartner-nvidia.com> Date: Tue Feb 28 00:12:31 2017 CHROMIUM: drm/panel: Introducing no-powerwash dts property We can't reset the panel during boot on Kitty because that makes the touchscreen stops working. So to handle this, this patch adds a new DTS property named "no-powerwash" to prevent reset the panel on Kitty. BUG= chromium:676094 Change-Id: Ifc883ed6a1db23b689d5b6b548d7f50d3a327dfb Signed-off-by: Mark Zhang <markz@gpartner-nvidia.com> Reviewed-on: https://chromium-review.googlesource.com/440684 Reviewed-by: Stéphane Marchesin <marcheu@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/447221 Commit-Queue: Stéphane Marchesin <marcheu@chromium.org> Tested-by: Stéphane Marchesin <marcheu@chromium.org> [modify] https://crrev.com/83bac0a14b343eaf991d82e9238f05d90a7162eb/arch/arm/boot/dts/tegra124-nyan-kitty-rev8.dts [modify] https://crrev.com/83bac0a14b343eaf991d82e9238f05d90a7162eb/drivers/gpu/drm/panel/panel-simple.c [modify] https://crrev.com/83bac0a14b343eaf991d82e9238f05d90a7162eb/arch/arm/boot/dts/tegra124-nyan-kitty-rev0_3.dts
,
Apr 4 2017
|
||||||||||||||||||||||
►
Sign in to add a comment |
||||||||||||||||||||||
Comment 1 by kathrelk...@chromium.org
, Dec 20 2016