Kernel Planet

January 11, 2016

Daniel Vetter: Neat drm/i915 stuff for 4.5

Kernel version 4.4 is released, it's time for our regular look at what's in store for the Intel graphics driver in the next release.

Overall this cycle has seen lots and lots of bugfixes, and the reason for that is that we're rebuilding our CI infrastructure after it went up in a poof of smoke last summer. Really big thanks to the entire team for the effort invested! And that's why this overview is a bit different and we'll start with bugfix efforts before delving into the few feature additions:

Ville fixed up display fifo underruns all over the place: FDI modeset fixes for Haswell/Broadwell, correctly detecting fused-of VGA on the same,  and disabling fifo underrun reporting in some places where we've learned that underruns just happen - mostly around starting up the display pipeline.

Next up is improved runtime PM wakelock debugging from Imre Deak, with efforts from other folks trying to fix up various issues. Unfortunately this turned up so many little buglets that we had to disable the reporting again, at least for now. But at least we'll now have a very clear list of things to address, and a reliable way to audit any failures, to finally be able to enable runtime PM by default hopefully soon.

There's also been lots of fixes for PSR and FBC from Rodrigo and Paulo respectively. PSR enabled by default missed the 4.5 merge window by just a hair - it's already enabled for 4.6. FBC is also pretty close, the last bit Paulo is working is untangling the locking issues. FBC is sitting between GEM and KMS and FBC code gets called by both subsystems. And that's an easy recipe for deadlocks, which Paulo is now working to resolve.

There's also fixes included from Tvrtko Ursulin, Lukas Wunner and Chris Wilson to remedy some long-standing regressions in the fbdev framebuffer setup code. In GEM finally Dave Gordon fixed issues with the page dirty tracking. And Chris Wilson fine-tuned the request polling logic to avoid needlessly wasting CPU cycles.

Imre, Patrik and others have done a lot of work to fix up various issues in the DMC firmware loader for Skylake and the DC5/6 support. It works well now on that platform, and could reenable the overall display power well support again on Skylake. But there's still plenty of issues on Broxton unfortunately.

Since bugfixes have been highly prioritized over feature work this time around there's only very little progress on atomic modesettting and specifically atomic watermark updates. But 4.5 includes a few more prep patches from Maarten Lankhorst and Matt Roper.

There have been some real features though still: Alex Goins from nvidia implementd proper sync for page-flipping dma-buf backed framebuffers, benefitting setups where nVidia renders buffers that the Intel driver displays.

Finally there's also been the usual amount of internal refactoring to prepare the code for the future and keep it maintainable. Jani Nikula rewrote the VBT parsing code.  And Ander started to rework the DP detection code as the first step of a large DP support revamp. And finally ther's been a bit of enabling for Kabylake too, but it's not yet complete.

And of courese there's been a lot more smaller things, again mostly bugfixes.

January 11, 2016 04:12 PM

Pavel Machek: X servers and dangerous aircraft

It is very easy to lose track of cursor on multiple monitors... Especially if cursor is on down or right edge, only few pixels remain. Should some kind of pointer remain on the monitor even when mouse is on other monitor -- providing kind of "look that way" pointer?


Oh and... when activating USB-to-VGA adapter, mouse disappears altogether. Ouch. Unfortunately, that means system is unusable.
Is there way to adjust DPI setting, preferably per application? Does gtk has some option like that? N900 has 800x480 display. When using stylus, you can put your phone close to your eyes and pretend its a PC, but when using fingers, many controls are just way too small.
Its official: Airbus killed them.
Airbus A320 has two sidesticks, with no force feedback, and no physical link. So you are trying to recover from stall, you are pushing the sidestick fully and your first officer pulls the stick fully -- result is you remain stalled. You don't even know your first officer fights with you... That's what happened to PK-AXC, report is here. (How did they get to stall? Computers spuriously adjusted their rudder trim when they lost power. No, you should not reset flight computers like that.)
This is second accident of this type. Similar effect happened to Air France 447. (And pretty much every Airbus incident involves "dual inputs"). Lets see how many crashes it takes before Airbus provides force feedback.

January 11, 2016 12:24 PM

January 10, 2016

Daniel Vetter: Better Markup for the Kernel GPU DocBook

This summer Intel sponsored some work to improve the kerneldoc toolchain, with the aim to use all that to extend the DRM and i915 driver documentation we have. Most of it landed, but the last bit to integrate some type  of text markup processing was stalled until it could be discussed at the kernel summit, see the LWN summary. Unfortunately it died in a bikeshed fest due to an alliance of people who think docs are useless and you should just read the code, and others who didn't even know how to convert the kerneldoc into something pretty.

But we still need this, since without lists, highlighting, basic tables and inserting code snippets it's really hard to write decent documentation. Luckily Dave Airlie is ok with using it for DRM kerneldoc as long as Intel maintains the support. It's purely opt-in and the only downside of not using asciidoc is that the resulting docs won't be as pretty. All the changes to the text itself to use this markup are going into upstream as normal. The only bit that's not in upstream is the tooling, which is available in a topic branch at

        git://anongit.freedesktop.org/drm-intel topic/kerneldoc


If you want to build pretty docs just install asciidoc and base your drm documentation patches on top of drm-intel-nightly from the same repository - that tree also includes all of Dave's tree. Alternatively pull in the above topic branch into your own personal tree. Note that asciidoc is detected automatically, so you really only need it and the tooling branch to check the rendering of your changes.

For added convenience Intel also maintains an autobuilder that pushes latest drm-intel-nightly DRM documentation builds to http://dri.freedesktop.org/docs/drm/.

Aside: If all you want to build is just the GPU DocBook instead of all of them, you can do that with

        $ make DOCBOOKS="gpu.xml" htmldocs

With that have fun reading the new&improved documentation, and if you spot anything please submit a patch to dri-devel@lists.freedesktop.org.

January 10, 2016 11:00 PM

Pavel Machek: 2016: Year of the GNU/Linux phone

Well, Linux is running significant fraction of cellphones these days... but usually very old and very patched kernel. But maybe we can run mainline kernel with free userland?

Today, I made first successfull call with Nokia N900 running 4.4-rc8
kernel and Debian userland. Latency was cca second, and I was told my voice was not recognizable, but we had no problems understanding each other.

Maybe voice format is not 4kHz, 16bit, stereo after all?

January 10, 2016 11:30 AM

January 04, 2016

Pavel Machek: N900: Found a way to do rotation in X

...which is needed for control by single hand. Unfortunately, xrandr refuses to rotate on N900 for some reason, so I'm doing Xephyr and
then xrandr.

...and looking for better ssh.

Sitting at the train, ssh-ing from laptop to phone, with second phone providing hotspot (okay, I do feel a bit silly). Now... this does not work too well. First, I had to manually copy the IP address, and second, I did not really verify the ssh key. And as IP address changes, I'd have to do it each time. Better solution would be welcome.
I want to connect to my phone, no matter what IP address it has. If possible, I'd like the keys to be checked during connection, too.

January 04, 2016 07:58 AM

January 03, 2016

Rusty Russell: Bitcoin And Stuck Transactions?

One problem of filling blocks is that transactions with too-low fees will get “stuck”; I’ve read about such things happening on Reddit.  Then one of my coworkers told me that those he looked at were simply never broadcast properly, and broadcasting them manually fixed it.  Which lead both of us to wonder how often it’s really happening…

My approach is to look at the last 2 years of block data, and make a simple model:

  1. I assume the tx is not a priority tx (some miners reserve space for these; default 50k).
  2. I judge the “minimum feerate to get into a block” as the smallest feerate for any transaction after the first 50k beyond the coinbase (this is an artifact of how bitcoin core builds transactions; priority area first).
  3. I assume the tx won’t be included in “empty” blocks with only a coinbase or a single non-coinbase tx (SPV mining); their feerate is “infinite”.

Now, what feerate do we assume?  The default “dumb wallet” fee is 10000 satoshi per kilobyte: bitcoin-core doesn’t do this pro-rata, so a median 300-byte transaction still pays 10000 satoshi by default (fee-per-byte 33.33).  The worse case is a transaction of exactly 1000 bytes (or, a wallet which does pro-rata fees), which would have a fee-per-byte of 10.

So let’s consider the last two years (since block 277918).  How many blocks in a row we see with a fee-per-byte > 33.33, and how many we see with a feerate of > 10:

Conclusion

In the last two years you would never have experienced a delay of more than 10 blocks for a median-size transaction with a 10,000 satoshi fee.

For a 1000-byte transaction paying the same fee, you would have experienced a 10 block delay 0.7% of the time, with a 20+ block delay on eight occasions: the worse being a 26 block delay at block 382918 (just under 5 hours).  But note that this fee is insufficient to be included in 40% of blocks during the last two years, too; if your wallet is generating such things without warning you, it’s time to switch wallets!

Stuck low-fee transactions are not a real user problem yet.  It’s good to see adoption of smarter wallets, though, because it’s expected that they will be in the near future…

January 03, 2016 10:24 PM

January 02, 2016

James Bottomley: A Modest Proposal on the DCO

In this post, I discussed why corporations are having trouble regarding the DCO as sufficient for contributions to projects using licences which require patent grants.  The fear being that rogue corporations could legitimately claim that under the DCO they were authorizing their developers as agents for copyrights but not for patents.  Rather than argue about the legality of this trick, I think it will be much more productive to move the environment forwards to a place where it simply won’t work.  The key to doing this is to change the expectations of the corporate players which moves them to the point where they expect that a corporate signoff under the DCO gives agency for both patents and copyrights because once this happens for most of them (the good actors), the usual estoppal rules would make it apply to all.

The fact is that even though corporate lawyers fear that agency might not exist for patent grants via DCO signoffs in contributions, all legitimate corporate entities who make bona fide code contributions wish to effect this anyway; that’s why they go to the additional lengths of setting up Contributor Licence Agreements and signing them.  The corollary here is that really only a bad actor in the ecosystem wishes to perpetuate the myth that patents aren’t handled by the DCO.  So if all good actors want the system to work correctly anyway, how do we make it so?

The lever that will help to make this move is a simple pledge, which can be published on a corporate website,  that allows corporations expecting to make legitimate contributions to patent binding licences under the DCO to do so properly without needing any additional Contributor Licence Agreements.  Essentially it would be an explicit statement that when their developers submit code to a project under the DCO using a corporate signoff, they’re acting as agents for the necessary patent and copyright grants, meaning you can always trust a DCO signoff from that corporation.  When enough corporations do this, it becomes standard practice and thus expectations on the DCO have moved to the point we originally assumed they were at, so here’s the proposal for what such a statement would look like.


 

Corporate Contribution Pledge

Preamble

It is our expectation that any DCO signoff from a corporate email address binds that corporation to grant all necessary copyright and, where required, patent rights to satisfy the terms of the licence.  Accordingly, we are publishing this pledge to illustrate how, as a matter of best practice, we implement this expectation.

For the purposes of this pledge, our corporate email domain is @bigcorp.com and its subdomains.

Limitations

  1. This pledge only applies to projects which use an OSI accepted Open Source  licence and which also use a developer certificate of origin (DCO).
  2. No authority is given under this pledge to sign contribution agreements on behalf of the company or otherwise bind it except by contributing code under an OSI approved licence and DCO process.
  3. No authority is given under this pledge if a developer, who may be our employee, posts patches under an email address which is not our corporate email domain above.
  4. No trademarks of this corporation may ever be bound under this pledge.
  5. Except as stated below, no other warranty, express or implied, is made on behalf of the contribution, including, but not limited to, fitness of the code for a specific purpose or merchantability.  The entire risk of the quality and performance of this contribution rests with the recipient.

Warranties

  1. Our corporation trains its Open Source contributors carefully to understand when they may and may not post patches from our corporate email domain and to obtain all necessary internal clearances according to our processes before making such a posting.
  2. When one of our developers posts a patch to a project under an OSI approved licence with a DCO Signed-off-by: from our corporate email domain, we authorise that developer to be our agent in the minimum set of patent and copyright grants that are required to satisfy the terms of the OSI approved licence for the contribution.

January 02, 2016 08:26 PM

Pavel Machek: N900 progress... and roadblock

Ok, developing directly on target is easy... and phone is now in much better shape. I added watchdogs, so I'm no longer afraid to let it run debian for extended periods of time. Voice calls work, but audio quality is awful/unusable.

Developing directly on target also has problems:'

remote: error: object c86cce9eda127cd891a7cf2d23e007deaddf4d34: badTimezone: invalid author/committer line - bad time zone

remote: fatal: Error in object

pavel@n900:/my/tui$ git show c86cce9eda127cd891a7cf2d23e007deaddf4d34
error: object directory /data/l/clean-cg/.git/objects does not exist; check .git/objects/info/alternates.
commit c86cce9eda127cd891a7cf2d23e007deaddf4d34
Author: Pavel <pavel@ucw.cz>
Date:   Wed Dec 31 23:59:43 1969 +35150858

Fun. I wonder what happened there, if it is going to happen again, and if I can fix it somehow...

Happy New Year!

January 02, 2016 01:31 PM

January 01, 2016

Matthew Garrett: The current state of boot security

I gave a presentation at 32C3 this week. One of the things I said was "If any of you are doing seriously confidential work on Apple laptops, stop. For the love of god, please stop." I didn't really have time to go into the details of that at the time, but right now I'm sitting on a plane with a ridiculous sinus headache and the pseudoephedrine hasn't kicked in yet so here we go.

The basic premise of my presentation was that it's very difficult to determine whether your system is in a trustworthy state before you start typing your secrets (such as your disk decryption passphrase) into it. If it's easy for an attacker to modify your system such that it's not trustworthy at the point where you type in a password, it's easy for an attacker to obtain your password. So, if you actually care about your disk encryption being resistant to anybody who can get temporary physical possession of your laptop, you care about it being difficult for someone to compromise your early boot process without you noticing.

There's two approaches to this. The first is UEFI Secure Boot. If you cryptographically verify each component of the boot process, it's not possible for a user to compromise the boot process. The second is a measured boot. If you measure each component of the boot process into the TPM, and if you use these measurements to control access to a secret that allows the laptop to prove that it's trustworthy (such as Joanna Rutkowska's Anti Evil Maid or my variant on the theme), an attacker can compromise the boot process but you'll know that they've done so before you start typing.

So, how do current operating systems stack up here?

Windows: Supports UEFI Secure Boot in a meaningful way. Supports measured boot, but provides no mechanism for the system to attest that it hasn't been compromised. Good, but not perfect.

Linux: Supports UEFI Secure Boot[1], but doesn't verify signatures on the initrd[2]. This means that attacks such as Evil Abigail are still possible. Measured boot isn't in a good state, but it's possible to incorporate with a bunch of manual work. Vulnerable out of the box, but can be configured to be better than Windows.

Apple: Ha. Snare talked about attacking the Apple boot process in 2012 - basically everything he described then is still possible. Apple recently hired the people behind Legbacore, so there's hope - but right now all shipping Apple hardware has no firmware support for UEFI Secure Boot and no TPM. This makes it impossible to provide any kind of boot attestation, and there's no real way you can verify that your system hasn't been compromised.

Now, to be fair, there's attacks that even Windows and properly configured Linux will still be vulnerable to. Firmware defects that permit modification of System Management Mode code can still be used to circumvent these protections, and the Management Engine is in a position to just do whatever it wants and fuck all of you. But that's really not an excuse to just ignore everything else. Improving the current state of boot security makes it more difficult for adversaries to compromise a system, and if we ever do get to the point of systems which aren't running any hidden proprietary code we'll still need this functionality. It's worth doing, and it's worth doing now.

[1] Well, except Ubuntu's signed bootloader will happily boot unsigned kernels which kind of defeats the entire point of the exercise
[2] Initrds are built on the local machine, so we can't just ship signed images

comment count unavailable comments

January 01, 2016 12:48 AM

December 30, 2015

Davidlohr Bueso: LPC 2015: Performance and Scalability MC

This year I had the privilege of leading the Performance and Scalability micro-conference for Linux Plumbers. The goals and motivation behind organizing this track were threefold. First present relevant work-in-progress ideas that can improve performance in core kernel subsystems, and need some face to face discussion -- as such, this requires previous debate on lkml. Similarly, learn about real bottlenecks and issues people are running into. And finally, get to know more relevant academic (experimental) work going on in in both the kernel and system-level userland. As such, the sessions were grouped as follows:

(i) Fast Bounded-Concurrency Hash Tables. Samy Bahra introduced a novel non-blocking multi-reader/single writer hash table with strong forward progress guarantees for TSO. Because the common-case fastpath does not incur in barriers or atomic operations, this technique allows nearly perfect scaling. While his work is done in userspace, he sees potential for it in the kernel, such as the networking subsystem. In such situations, the use of RCU (readers being the common case) might also be used.

(ii) Improving Transactional Memory Performance with Queued Locking. While transactional memory  works nicely in conflict-free setups, it ends up requiring common serialization otherwise. An option is to retry, however, when the amount of threads executing in the CR is larger than the amount of completed threads, you can get pileups. Tim Chen presented a solution based on applying a sort of 'aperture' and using principles based on MCS for faired queuing, where can be regulated based on metrics such as the number of threads in the critical region and abort rate.

(iii) How to Apply Mutation Testing to RCU. Iftekhar Ahmed from OSU,
summarized his research in overcoming limitations of mutation testing to identify problems in RCU. As usual, working with Paul McKenney, they have been able to identify a number of mutants along with making use of rcutorture for specific periods of time. They generated ~3300 mutants from rcu and rcutorture is doing a good job identifying them. It would be interesting to see this applied along with fuzzy testing which has already uncovered several bugs in RCU in the past.

Scaling track -- LPC'15, Seattle.

 (iv) Unfair Queued Spinlocks and Transactional Locks. Waiman Long has been working on extending spinlocks and apply them to solve issues with transactional memory. He presented experiments based on rwlocks and transactional spinlock (new primitive) for transactional (reader) and non-transactional (writer) executions. This talk nicely complemented Tim Chen's previous presentation. He also touched on the qspinlock performance in virtualized environments and the challenges currently out there. As we already have code for this, it was much easier to discuss face to face. Consensus in the room was that kernel developers are not against improving pv spinlocks, but what is determined is that we will not accept a 3rd primitive.

(v) Do Virtual Machines Really Scale. Sanidhya Kashyap
from GA Tech showed us the state of scalability in the cloud where there is a clear trend that services hit poor scalability after certain degrees of contention/core-count. These are LHP issues and vmexits/enters cause performance issues at high vcpu counts. He introduces oticket backed by performing multiple wakeups at once when granting the lock. Good feedback and suggestions to overcome some of the presented issues with the approach. This was an extra short BoF like of presentation, but there was quite a bit of interest, and the appropriate people were in the room.

Overall I would say that all three objectives were met and the quality of the sessions were high, thus meeting all expectations (if not, please email me for feedback ;-). In fact, there were some highly interesting and relevant presentations that, due to time constraints, had to be left out.

December 30, 2015 02:31 PM

December 29, 2015

Michael Kerrisk (manpages): man-pages-4.04 is released

I've released man-pages-4.04. The release tarball is available on kernel.org. The browsable online pages can be found on man7.org. The Git repository for man-pages is available on kernel.org.

This release resulted from patches, bug reports,and  comments from more than 30 contributors. As well as a large number of minor fixes to nearly 90 man pages, the more significant changes in man-pages-4.04 include the following:

December 29, 2015 04:03 PM

Davidlohr Bueso: fu(zz)tex: targeted fuzzing of futexes

The complexity of futexes, their non-trivial interactions and semantics, very much serve as a good candidate for applying fuzzy testing techniques to them. In general futex code is poorly understood and audited, both at a kernel implementation level and by the respective userland callers, normally trying to implement some sort of locking primitive. Unsurprisingly, bugs related to this call will often be subtle and nasty, sometimes with security implications. Specifically for futexes, all system call fuzzers use generic and completely randomized inputs, which has only limited usefulness. This is even the case for Dave Jones' trinity program, which has been extremely good at finding kernel bugs (and ruining my weekends more than once ;). Much of the success and popularity of this program is because not all the inputs are random and meaningful parameters are passed for many of the exercised syscalls. This is called targeted fuzzing, and has been proven to find more bugs than blindly random inputs, which in turn is more likely to produce logic that makes the kernel actually do something related to the call, as opposed to quickly erroring out due to some trivial bogus scenario. A nice example is the perf_event_open(2) call, which was studied for targeted fuzzy testing with very good results.

Extending Trinity

Reusing the already proven-to-work machinery of trinity. and extend it for futex ad-hoc work, is the obvious step for improving coverage, in the hope to tackle some of the issues previously described. While reading the code is always the definite answer, having a man-page that is up-to-par with the call is quite essential; if we want programmers to make correct use of the tools we provide, that is. Fortunately, Michael Kerrisk has been doing a nice job of rewriting the current futex.2 page, which is so surprisingly crappy and incomplete, it's sad. This makes the task correctly setting the input parameters following a certain purpose a little less tedious and error-prone:

 SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
     struct timespec __user *, utime, u32 __user *, uaddr2, u32, val3)


 -- just imagine if mmap.2 were barely documented and stale.

There are two immediately obvious op flags that are not being exercised at all (with the exception of randomly bumping into them, which is quite unlikey and badly controllable):

Ever-changing task priorities

The whole purpose of PI futexes are to address priority inheritance issues for systems with real time requirements. Randomly changing a processes priority will therefore better stress the system call instead of always using the default nice value, exercising priority boosting code in the kernel.

Fault/error injections

This year we added support for artificially triggering errors within the various futex paths faults and deadlock scenarios, via the CONFIG_FAULT_INJECTION kernel framework along with the CONFIG_FAIL_FUTEX option. Trinity can make use of this feature by randomly toggling the process' make-it-fail file as well as selecting appropriate fault injection debugfs options.

Feeding user-addresses

Perhaps the single most important argument that we can pass to the syscall is the user address (uaddr, or 'the futex'), which will govern everything the kernel attempts to do with it, being private or shared address space. As such, it is not very useful to blindly feed it random addresses, even if trinity is setup by default, these inputs will sometimes be picked by previously mmap-created shared memory playgrounds. However, at a futex level, this does not matter unless we are doing blocking calls (WAIT).

So this has been reworked such that trinity now creates a number of locks in shared memory at startup, which has the owner PID and the actual futex. Upon a call, both fields of uaddr get either a random lock or a random address from the mmap playground, each with a 50% chance. The locks follow very simple semantics, where a successful cmpxchg will allow the caller to acquire the lock without the kernel being involved (fastpath), otherwise we need to wait/block through the futex call.

Because of how trinity is structured with callbacks for pre/post syscall invocation, there are a number of racy windows between when the lock is dealt (ie considered contended) with and when the fuzzer actually calls futex(2). As such, this must be taken with a grain of salt, but does exercise lots of real world situations, nonetheless.

Choosing operations

The idea is to randomly perform different operations on the selected futex, such that combinations of wake, wait, requeue are done (both for regular and PI futexes). While passing informed, not-so-random, parameters to the system call reduces the chance of shallow fuzzing, choosing the futex operation will determine the kind of work to be done on the uaddress. As such this part can further determine the usefulness of trinity regarding futexes. However, one cannot get too strict here as reducing the randomness will also limit the usefulness. For now the layout is a 25% chance when performing lock operations. Oh the other hand, for the case of mmap selected uaddress, the operation is left up to trinity to decide.

Evaluation and future work

Evaluating software that purposely tries to mess up other software is always twofold. For one, any new futex bug that is found indicates that modifying trinity was a good step towards better testing coverage. But unfortunately this creates a new headache for futex hackers, and a bug needs to be fixed (including any corresponding Linux distribution backporting, security and -stable work). So any useful results which exhibit the presence of bugs can be bitter/sweet -- just think Dijkstra.

One immediate way of evaluating the changes to trinity is to see the number of successful calls. While this can be a misleading metric, it does at least indicate whether or not many of the bogus parameter passing have been mitigated and replaced with smarter, more informed calls. Tests show that these changes have in fact boosted the amount of successful futex(2) returns; within a trinity run of 10,000 calls with 4 threads, we were able to go from ~470 to nearly ~4300, which is around a 10x improvement. This also means that it takes more time to run trinity as the kernel is doing actual work now with our futexes, not simply returning immediately due to bogus parameters and trivial error checks.

In the future, it would be good to fuzz futexes with memory-back file (uaddress), instead of always relying on anonymous memory. While is perhaps not so interesting from a futex standpoint (with the exception of hashing), it would be good when combining with other memory related calls which actually do things with the file. Another useful direction would be to further investigate operation selection policies. Different models will fuzz different parts of the futex subsystem, and perhaps (very probably, actually) I have not found the best one yet.

This work was done as part of SUSE Hackweek 13, which allowed me to finally allocate some time to focus on this (although this writing is much overdue). So as always, lots of thanks to my employer.

December 29, 2015 01:36 PM

December 23, 2015

Matthew Garrett: GPL enforcement is a social good

The Software Freedom Conservancy is currently running a fundraising program in an attempt to raise enough money to continue funding GPL compliance work. If they don't gain enough supporters, the majority of their compliance work will cease. And, since SFC are one of the only groups currently actively involved in performing GPL compliance work, that basically means that there will be nobody working to ensure that users have the rights that copyright holders chose to give them.

Why does this matter? More people are using GPLed software than at any point in history. Hundreds of millions of Android devices were sold this year, all including GPLed code. An unknowably vast number of IoT devices run Linux. Cameras, Blu Ray players, TVs, light switches, coffee machines. Software running in places that we would never have previously imagined. And much of it abandoned immediately after shipping, gently rotting, exposing an increasingly large number of widely known security vulnerabilities to an increasingly hostile internet. Devices that become useless because of protocol updates. Toys that have a "Guaranteed to work until" date, and then suddenly Barbie goes dead and you're forced to have an unexpected conversation about API mortality with your 5-year old child.

We can't fix all of these things. Many of these devices have important functionality locked inside proprietary components, released under licenses that grant no permission for people to examine or improve them. But there are many that we can. Millions of devices are running modern and secure versions of Android despite being abandoned by their manufacturers, purely because the vendor released appropriate source code and a community grew up to maintain it. But this can only happen when the vendor plays by the rules.

Vendors who don't release their code remove that freedom from their users, and the weapons users have to fight against that are limited. Most users hold no copyright over the software in the device and are unable to take direct action themselves. A vendor's failure to comply dooms them to having to choose between buying a new device in 12 months or no longer receiving security updates. When yet more examples of vendor-supplied malware are discovered, it's more difficult to produce new builds without them. The utility of the devices that the user purchased is curtailed significantly.

The Software Freedom Conservancy is one of the only organisations actively fighting against this, and if they're forced to give up their enforcement work the pressure on vendors to comply with the GPL will be reduced even further. If we want users to control their devices, to be able to obtain security updates even after the vendor has given up, we need to keep that pressure up. Supporting the SFC's work has a real impact on the security of the internet and people's lives. Please consider giving them money.

comment count unavailable comments

December 23, 2015 07:38 PM

LPC 2016: Planning has begun for LPC 2016

The planning committee for the 2016 edition of the Linux Plumbers Conference is happy to announce that planning for the conference has begun. LPC will be held November 2-4 in Santa Fe, New Mexico in conjunction with the Kernel Summit at the Santa Fe Convention Center in the historic downtown area. More information about LPC can be found at the web site and we will be posting additional bits and pieces here as they become available. We look forward to seeing you there!

December 23, 2015 07:21 PM

December 22, 2015

Rusty Russell: Bitcoin: Mixed Signs of A Fee Market

Six months ago in a previous post I showed that 45% of transactions have an output of less that $1, and estimated that they would get squeezed out first as blocks filled.  It’s time to review that prediction, and also to see several things:

  1. Are fees rising?
  2. Are fees detached from magic (default) numbers of satoshi?
  3. Are low value transactions getting squeezed out?
  4. Are transactions starting to shrink in response to fee pressure?

Here are some scenarios: low-value transactions might be vanishing even if nothing else changes, because people’s expectations (“free global microtransactions!” are changing).  Fees might be rising but still on magic numbers, because miners and nodes increased their relayfee due to spam attacks (most commonly, the rate was increased from 1000 satoshi per kb to 5000 satoshi per kb).  Finally, we’d eventually expect wallets which produce large transactions (eg. using uncompressed signatures) to lose popularity, and wallets to get smarter about transaction generation (particularly once Segregated Witness makes it fairly easy).

Fees For The Last 2 Years

The full 4 year graph is very noisy, so I only plotted the mean txfee/kb for each day for the last two years, in Satoshi and USD (thanks to the Coindesk BPI data for the conversion):

 

Conclusion: Too noisy to be conclusive: they seem to be rising recently, but some of that reflects the exchange rate changes.

Are Fees on Magic Boundaries?

Wallets should be estimating fees: in a real fee market they’d need to.

Dumb wallets pay a fixed fee per kb: eg. the bitcoin-core wallet pays 1,000 (now 5,000) satoshi per kb by default; even if the transaction is 300 bytes, it will pay 5,000 satoshi.  Some wallets use (slightly more sensible) scaling-by-size, so they’d pay 1,500 satoshi.  So if a transaction fee ends in “000”, or the scaled transaction fee does (+/- 2) we can categorize them as “fixed fee”.  We assume others are using a variable fee (about 0.6% will be erroneously marked as fixed):

This graph is a bit dense, so we thin it by grouping into weeks:

 

Conclusion: Wallets are starting to adapt to fee pressure, though the majority are still using a fixed fee.

Low Value Transactions For Last 4 Years

We categorize 4 distinct types of transactions: ones which have an output below 25c, ones which have an output between 25c and $1, ones which have an output between $1 and $5, and ones which have no output below $5, and graph the trends for each for the last four years:

Conclusion: 25c transactions are flat (ignoring those spam attack spikes).  < $1 and <$5 are growing, but most growth is coming from transactions >= $5.

Transaction Size For Last 4 Years

Here are the transaction sizes for the last 4 years:

Conclusion: There seems to be a slight decline in transaction sizes, but it’s not clear the cause, and it might be just noise.

Conclusion

There are signs of a nascent fee market, but it’s still very early. I’d expect something conclusive in the next 6 months.

The majority of fees should be variable, and they’re not: wallets remain poor, but users will migrate as blocks fill and more transactions get stuck.

A fee rate of over 10c per kb (2.5c per median transaction) hasn’t suppressed 25c transactions: perhaps it’s not high enough yet, or perhaps wallets aren’t making the relative fees clear enough (eg. my Trezor gives fees in BTC, as well as  only offering fixed fee rates).

The slight dip in mean transaction sizes and lack of growth in 25c transactions to may point to early market pressure, however.

Six months ago I showed that 45% of transactions were less than a dollar.  In the last six months that has declined to 38%.  I previously estimated that we would want larger blocks within two years, and need them within three.  That still seems a reasonable estimate.

Data

I used bitcoin-iterate and a really crappy Makefile to generate CSVs with the data.  You can see the result on github or go straight to downloading the Gnumeric spreadsheet with the graphs.

Disclaimer: I Work For Blockstream

On lightning.  Not on drawing pretty graphs.  But I wanted to see the data…

 

December 22, 2015 05:53 AM

December 21, 2015

James Bottomley: The DCO, Patents and OpenStack

Historically, the Developer Certificate of Origin originally adopted by the Linux Kernel in 2005 has seen widespread use within a large variety of Open Source projects.  The DCO is designed to replace a Contributor Licence Agreement with a simple Signed-off-by attestation which is placed into the commit message of the source repository itself, thus meaning that all the necessary DCO attestations are automatically available to anyone who downloads the source code repository.  It also allows (again, through the use of a strong source control system) the identification of who changed any given line of code within the source tree and all their DCO signoffs.

The legal basis of the DCO is that it is an attestation by an individual developer that they have sufficient rights in the contribution to submit it under the project (or file) licence.

The DCO and Corporate Contributions

In certain jurisdictions, particularly the United States of America, when you work as a software developer for a Corporation, they actually own, exclusively, the copyright of any source code you produce under something called the Work for Hire doctrine.  So the question naturally arises: if the developer who makes the Signed-off-by attestation  doesn’t actually own any rights in the code, how is that attestation valid and how does the rights owning entity (the corporation) actually license the code correctly to make the contribution?

The answer to that question resides in something called the theory of agency.  Agency is the mechanism by which individuals give effect to actions of a corporation.  For example, being a nebulous entity with no actual arms or legs, a corporation cannot itself sign any documents.  Thus, when a salesman signs a contract to supply widgets on behalf of a corporation, he is acting as the agent of that corporation.  His signature on the sales contract becomes binding on the corporation as if the corporation itself had made it.  However, there’s a problem here: how does the person who paid for and is expecting the delivery of widgets know that the sales person is actually authorised to be an agent of the corporation?  The answer here is again in the theory of agency: as long as the person receiving the widgets had reasonable cause to think that the salesperson signing the contract is acting as an agent of the corporation.  Usually all that’s required is that the company gave the salesperson a business card and a title which would make someone think they were authorised to sign contracts (such as “Sales Manager”).

Thus, the same thing applies to developers submitting patches on behalf of a corporation.  They become agents of that corporation when making DCO attestations and thus, even if the contribution is a work for hire and the copyright owned by the corporation, the DCO attestation the developer makes is still binding on the corporation.

Email addresses matter

Under the theory of agency, it’s not sufficient to state “I am an agent”, there must be some sign on behalf of the corporation that they’re granting agency (in the case of the salesperson, it was a business card and checkable title).  For developers making contributions with a Signed-off-by, the best indication of agency is to do the signoff using a corporate email address.  For this reason, the Linux kernel has adopted the habit of not accepting controversial patches without a corporate signoff.

Patents and the DCO

The Linux Kernel uses GPLv2 as its licence.  GPLv2 is solely a copyright licence and has nothing really to say about patents, except that if you assert against the project, you lose your right to distribute under GPLv2.  This is what is termed an implied patent licence, but it means that the DCO signoff for GPLv2 only concerns copyrights.  However, there are many open source licences (like Apache-2 or GPLv3) which require explicit patent grants as well as copyright ones, so can the DCO give all the necessary rights, including patent ones, to make the contribution on behalf of the corporation?  The common sense answer, which is if the developer is accepted as an agent for copyright, they should also be an agent for patent grants, isn’t as universally accepted as you might think.

The OpenStack problem

OpenStack has been trying for years to drop its complex contributor licence infrastructure in favour of a simple DCO attestation.  Most recently, the Technical Committee made that request of the board in 2014 and it was finally granted in a limited fashion on November 2015.  The recommendation of the OpenStack counsel was accepted and the DCO was adopted for individuals only, keeping the contributor licence agreements for corporations.  The given reason for this is that the corporate members of OpenStack want more assurance that corporations are correctly granting their patents in their contributions than they believe the DCO gives (conversely, individuals aren’t expected to have any patents, so, for them, the DCO applies just fine since it’s effectively only a copyright attestation they’re giving).

Why are Patents such an Issue?

Or why do lots of people think developers aren’t agents for patents in contributions unlike for copyrights?  The essential argument (as shown here) is that corporations as a matter of practise, do not allow developers (or anyone else except specific title holders) to be agents for patent transactions and thus there should not be an expectation, even when they make a DCO attestation using a corporate email signoff, that they are.

One way to look at this is that corporations have no choice but to make developers agents for the copyright because without that, the DCO attestation is false since the developers themselves has no rights to a work for hire piece of code.  However, some corporations think they can get away with not making developers agents for patents because the contribution and the licence do not require this to happen.  The theory here is that the developer is making an agency grant for the copyright, but an individual grant of the patents (and, since developers don’t usually own patents, that’s no grant at all).  Effectively this is a get out of jail free card for corporations to cheat on the patent requirements of the licence.

Does this interpretation really hold water?  Well, I don’t think so, because it’s deceptive.  It’s deliberately trying to evade the responsibilities for patents that the licences require.  Usually under the theory of agency, deceptive practises are barred.  However, the fear that a court might be induced to accept this viewpoint is sufficient to get the OpenStack board to require that corporations sign a CLA to ensure that patents are well and truly bound.  The problem with this viewpoint is that, if it becomes common enough, it ends up being de facto what the legal situation actually is (because the question courts most often ask in cases about agency is what would the average person think, so a practise that becomes standard in the industry ipso facto becomes what the average reasonable person would think).  Effectively therefore, the very act of OpenStack acting on its fear causes the thing they fear eventually to become true.  The only way to reverse this now is to change the current state of how the industry views patents and the DCO … and that will be the subject of another post.

December 21, 2015 11:18 PM

December 18, 2015

James Morris: Bangalore Linux Kernel Meetup – Jan 2016

Allen Pais, who was one of the FOSS.IN organizers, and now works on my team at Oracle, has announced a new event, the Bangalore Linux Kernel Meetup.

This is a great idea!  There are many Linux kernel developers in Bangalore.

The first meetup will be on 16th Jan, 2016, at a location to be announced.

December 18, 2015 02:07 AM

December 15, 2015

Pete Zaitcev: HP Reconfigurable

I learned by way of Mirantis today that an entity known as "HP Enterprise" or "HPE" introduced something described thus:

It’s an architecture in which a large server acts as a “pool” of compute, storage, and networking resources, the same way a cloud might. When an application needs resources, they’re allocated from that hardware pool, and when the application goes away, they’re returned from the pool. All of this happens via the composable architecture.

That may explain the mysterious Intel computer that I saw in Tokyo. So it's not quite NUMA taken to extremes, it's also hardware domains taken to extremes.

December 15, 2015 06:36 PM

December 09, 2015

Daniel Vetter: Neat drm/i915 stuff for 4.4

Due to vacations, conferences and other things I'm way later than usual and 4.3 has been released a while ago. More than overdue to take a look at what's in store in the next kernel release.
First looking at overall infrastructure work on the display side there's a lot of atomic conversion progress again. One feature that's now on solid fundations is fastboot, built on top of atomic infrastructure with patches from Maarten. Unfortunately we had to disable it again due to some backligh issues early in 4.4-rc. The other big piece is reworking the watermark update code (Ville&Matt), which unfortunately ran into regression roadblocks already in the development cycle and had to be reverted partially. Another piece of infrastructure building on top of atomic is validation&adjusting the display clock - some ULT chips can't drive all DP screens and the driver now detects that, and it should also downclock when less bandwidth is needed. This was implemented by Mika Kahola and Ville.

Again this round has seen a lot of improvements and bug fixes to PSR code (from Rodrigo) and for FBC (from Paulo). Unfortunately we're not yet done with those, but it looks really good that at least PSR can finally be enabled for 4.5. Still on the display side of the driver there was a pile of smaller improvements all over: Prep work for Broxton DSI support (Shashank Sharma). HDMI detection finally checks the hotplug sense, after some workaround from Sonika. And tons of cleanups all over. Fixing up DMC support (for new low-power display states) was also a topic, but we've only managed to fix it up for real in 4.5.

On the GEM side the big thing for sure is support for the extended 48-bit GPU address space on Broadwell and later chips, from Michel Thierry. And then there's the code for GuC-based command submission (Alex Dai and Dave Gordon), which is merged but not yet enabled by default. The idea behind that is to feed all command submission through an on-chip microcontroller, which can then react much faster to changing workloads and tune power states accordingly. It should also help long-term with better scheduling by supporting preemption. But none of that is implemented yet, so this is just fundations.

For existing features there are bugfixes for userptr and shrinker improvements from Chris Wilson. And Tvrtko has extended the vma view code in prepartion of rotation support for NV12.

Of course there's also been the usual enabling work for new platforms, this time around mostly consisting of workaround patches for Skylake and Broxton. But Zhiyuan Lv submitted support for the virtualized XenGT gpu support on Broadwell.

Finally for driver internals there's the massive work from Ville to make the register access functions type safe. This is escpecially a problem for writing registers, where both the register and the value that needs to be written are of type uin32_t. That resulted in subtile bugs fairly often. Ville encapsulated the register offset into a struct and converted all the thousands of register #defines and users over to that, and now compilation will fail if we ever get this wrong again.

December 09, 2015 10:52 AM

December 05, 2015

Michael Kerrisk (manpages): man-pages-4.03 is released

I've released man-pages-4.03. The release tarball is available on kernel.org. The browsable online pages can be found on man7.org. The Git repository for man-pages is available on kernel.org.

This release is relatively small, but nevertheless nearly 40 people contributed patches, bug reports,and comments. The more significant changes in man-pages-4.03 include the following:

December 05, 2015 12:27 PM

November 29, 2015

Matthew Garrett: What is hacker culture?

Eric Raymond, author of The Cathedral and the Bazaar (an important work describing the effectiveness of open collaboration and development), recently wrote a piece calling for "Social Justice Warriors" to be ejected from the hacker community. The primary thrust of his argument is that by calling for a removal of the "cult of meritocracy", these SJWs are attacking the central aspect of hacker culture - that the quality of code is all that matters.

This argument is simply wrong.

Eric's been involved in software development for a long time. In that time he's seen a number of significant changes. We've gone from computers being the playthings of the privileged few to being nearly ubiquitous. We've moved from the internet being something you found in universities to something you carry around in your pocket. You can now own a computer whose CPU executes only free software from the moment you press the power button. And, as Eric wrote almost 20 years ago, we've identified that the "Bazaar" model of open collaborative development works better than the "Cathedral" model of closed centralised development.

These are huge shifts in how computers are used, how available they are, how important they are in people's lives, and, as a consequence, how we develop software. It's not a surprise that the rise of Linux and the victory of the bazaar model coincided with internet access becoming more widely available. As the potential pool of developers grew larger, development methods had to be altered. It was no longer possible to insist that somebody spend a significant period of time winning the trust of the core developers before being permitted to give feedback on code. Communities had to change in order to accept these offers of work, and the communities were better for that change.

The increasing ubiquity of computing has had another outcome. People are much more aware of the role of computing in their lives. They are more likely to understand how proprietary software can restrict them, how not having the freedom to share software can impair people's lives, how not being able to involve themselves in software development means software doesn't meet their needs. The largest triumph of free software has not been amongst people from a traditional software development background - it's been the fact that we've grown our communities to include people from a huge number of different walks of life. Free software has helped bring computing to under-served populations all over the world. It's aided circumvention of censorship. It's inspired people who would never have considered software development as something they could be involved in to develop entire careers in the field. We will not win because we are better developers. We will win because our software meets the needs of many more people, needs the proprietary software industry either can not or will not satisfy. We will win because our software is shaped not only by people who have a university degree and a six figure salary in San Francisco, but because our contributors include people whose native language is spoken by so few people that proprietary operating system vendors won't support it, people who live in a heavily censored regime and rely on free software for free communication, people who rely on free software because they can't otherwise afford the tools they would need to participate in development.

In other words, we will win because free software is accessible to more of society than proprietary software. And for that to be true, it must be possible for our communities to be accessible to anybody who can contribute, regardless of their background.

Up until this point, I don't think I've made any controversial claims. In fact, I suspect that Eric would agree. He would argue that because hacker culture defines itself through the quality of contributions, the background of the contributor is irrelevant. On the internet, nobody knows that you're contributing from a basement in an active warzone, or from a refuge shelter after escaping an abusive relationship, or with the aid of assistive technology. If you can write the code, you can participate.

Of course, this kind of viewpoint is overly naive. Humans are wonderful at noticing indications of "otherness". Eric even wrote about his struggle to stop having a viscerally negative reaction to people of a particular race. This happened within the past few years, so before then we can assume that he was less aware of the issue. If Eric received a patch from someone whose name indicated membership of this group, would there have been part of his subconscious that reacted negatively? Would he have rationalised this into a more critical analysis of the patch, increasing the probability of rejection? We don't know, and it's unlikely that Eric does either.

Hacker culture has long been concerned with good design, and a core concept of good design is that code should fail safe - ie, if something unexpected happens or an assumption turns out to be untrue, the desirable outcome is the one that does least harm. A command that fails to receive a filename as an argument shouldn't assume that it should modify all files. A network transfer that fails a checksum shouldn't be permitted to overwrite the existing data. An authentication server that receives an unexpected error shouldn't default to granting access. And a development process that may be subject to unconscious bias should have processes in place that make it less likely that said bias will result in the rejection of useful contributions.

When people criticise meritocracy, they're not criticising the concept of treating contributions based on their merit. They're criticising the idea that humans are sufficiently self-aware that they will be able to identify and reject every subconscious prejudice that will affect their treatment of others. It's not a criticism of a desirable goal, it's a criticism of a flawed implementation. There's evidence that organisations that claim to embody meritocratic principles are more likely to reward men than women even when everything else is equal. The "cult of meritocracy" isn't the belief that meritocracy is a good thing, it's the belief that a project founded on meritocracy will automatically be free of bias.

Projects like the Contributor Covenant that Eric finds so objectionable exist to help create processes that (at least partially) compensate for our flaws. Review of our processes to determine whether we're making poor social decisions is just as important as review of our code to determine whether we're making poor technical decisions. Just as the bazaar overtook the cathedral by making it easier for developers to be involved, inclusive communities will overtake "pure meritocracies" because, in the long run, these communities will produce better output - not just in terms of the quality of the code, but also in terms of the ability of the project to meet the needs of a wider range of people.

The fight between the cathedral and the bazaar came from people who were outside the cathedral. Those fighting against the assumption that meritocracies work may be outside what Eric considers to be hacker culture, but they're already part of our communities, already making contributions to our projects, already bringing free software to more people than ever before. This time it's Eric building a cathedral and decrying the decadent hordes in their bazaar, Eric who's failed to notice the shift in the culture that surrounds him. And, like those who continued building their cathedrals in the 90s, it's Eric who's now irrelevant to hacker culture.

(Edited to add: for two quite different perspectives on why Eric's wrong, see Tim's and Coraline's posts)

comment count unavailable comments

November 29, 2015 10:41 PM

November 19, 2015

Matthew Garrett: If it's not practical to redistribute free software, it's not free software in practice

I've previously written about Canonical's obnoxious IP policy and how Mark Shuttleworth admits it's deliberately vague. After spending some time discussing specific examples with Canonical, I've been explicitly told that while Canonical will gladly give me a cost-free trademark license permitting me to redistribute unmodified Ubuntu binaries, they will not tell me what Any redistribution of modified versions of Ubuntu must be approved, certified or provided by Canonical if you are going to associate it with the Trademarks. Otherwise you must remove and replace the Trademarks and will need to recompile the source code to create your own binaries actually means.

Why does this matter? The free software definition requires that you be able to redistribute software to other people in either unmodified or modified form without needing to ask for permission first. This makes it clear that Ubuntu itself isn't free software - distributing the individual binary packages without permission is forbidden, even if they wouldn't contain any infringing trademarks[1]. This is obnoxious, but not inherently toxic. The source packages for Ubuntu could still be free software, making it fairly straightforward to build a free software equivalent.

Unfortunately, while true in theory, this isn't true in practice. The issue here is the apparently simple phrase you must remove and replace the Trademarks and will need to recompile the source code. "Trademarks" is defined later as being the words "Ubuntu", "Kubuntu", "Juju", "Landscape", "Edubuntu" and "Xubuntu" in either textual or logo form. The naive interpretation of this is that you have to remove trademarks where they'd be infringing - for instance, shipping the Ubuntu bootsplash as part of a modified product would almost certainly be clear trademark infringement, so you shouldn't do that. But that's not what the policy actually says. It insists that all trademarks be removed, whether they would embody an infringement or not. If a README says "To build this software under Ubuntu, install the following packages", a literal reading of Canonical's policy would require you to remove or replace the word "Ubuntu" even though failing to do so wouldn't be a trademark infringement. If an @ubuntu.com email address is present in a changelog, you'd have to change it. You wouldn't be able to ship the juju-core package without renaming it and the application within. If this is what the policy means, it's so impractical to be able to rebuild Ubuntu that it's not free software in any meaningful way.

This seems like a pretty ludicrous interpretation, but it's one that Canonical refuse to explicitly rule out. Compare this to Red Hat's requirements around Fedora - if you replace the fedora-logos, fedora-release and fedora-release-notes packages with your own content, you're good. A policy like this satisfies the concerns that Dustin raised over people misrepresenting their products, but still makes it easy for users to distribute modified code to other users. There's nothing whatsoever stopping Canonical from adopting a similarly unambiguous policy.

Mark has repeatedly asserted that attempts to raise this issue are mere FUD, but he won't answer you if you ask him direct questions about this policy and will insist that it's necessary to protect Ubuntu's brand. The reality is that if Debian had had an identical policy in 2004, Ubuntu wouldn't exist. The effort required to strip all Debian trademarks from the source packages would have been immense[2], and this would have had to be repeated for every release. While this policy is in place, nobody's going to be able to take Ubuntu and build something better. It's grotesquely hypocritical, especially when the Ubuntu website still talks about their belief that people should be able to distribute modifications without licensing fees.

All that's required for Canonical to deal with this problem is to follow Fedora's lead and isolate their trademarks in a small set of packages, then tell users that those packages must be replaced if distributing a modified version of Ubuntu. If they're serious about this being a branding issue, they'll do it. And if I'm right that the policy is deliberately obfuscated so Canonical can encourage people to buy licenses, they won't. It's easy for them to prove me wrong, and I'll be delighted if they do. Let's see what happens.

[1] The policy is quite clear on this. If you want to distribute something other than an unmodified Ubuntu image, you have two choices:

  1. Gain approval or certification from Canonical
  2. Remove all trademarks and recompile the source code
Note that option 2 requires you to rebuild even if there are no trademarks to remove.

[2] Especially when every source package contains a directory called "debian"…

comment count unavailable comments

November 19, 2015 10:16 PM

November 13, 2015

Gustavo F. Padovan: Collabora contributions to Linux Kernel 4.2

A total of 63 patches were contributed upsteam by Collabora engineers as part of our current projects.

In the ARM multi_v7_defconfig we have the addition of support for Exynos Chromebooks, all options that had a tristate Kconfig option were added as module. After this change it was found that a few drivers weren’t working  properly when built as module, so this was fixed. This work was done by Javier Martinez.

Javier also added multi EC support as newer Chromebooks have more than one Embedded Controller in the system.

Tomeu Vizoso added EMC (External Memory Controller) support to the Tegra124 platform.

On the DRM side initial support for Atomic Modesetting was added to Exynos devices by Gustavo Padovan. The Atomic Modesetting interface allows all screen updates such as changing modes, pageflip and set planes/cursors to happen in the same IOCTL. Thus everything can be updated atomically. More on that can be found at Daniel Vetter’s post at LWN.net. Another contribution, from Daniel Stone, to Atomic Modesetting was the addition of the CRTC state mode property, it is through this property that userspace configure a modeset that will be updated via an Atomic Modesetting ioctl.

Following is a list of all patches submitted by Collabora for this kernel release:

Daniel Stone (17):

Gustavo Padovan (17):

Javier Martinez Canillas (19):

Tomeu Vizoso (11):

November 13, 2015 09:38 AM

November 12, 2015

Gustavo F. Padovan: Collabora contributions to Linux Kernel 4.3

Collabora developers contributed 48 patches to kernel 4.3 as part of our current projects.

Danilo worked on the kernel doc scripts to add  cross-reference links to html documentation and arguments documentation in struct body. While Sjoerd Simons fixed a clock definition in rockchip and a incorrect udelay usage for the stmmac phy reset delay.

Tomeu fixed gpiolib to defer probe if the pin controller isn’t available, added another fix to chipidea USB to defer probe of usbmisc hasn’t been probed yet. On Tegra Tomeu worked to support to gpio-ranges property. Still on Tegra cpuidle_state.enter_freeze() was added.

Gustavo Padovan did a lot of exynos DRM work, with the most important changes being improvements to atomic modesetting, including the asynchronous atomic commit in exynos, in async mode we just schedule the atomic update and return right away to the userspace, in a similar way that PageFlips works in the old API. In this release the exynos atomic modesetting interface was enabled for userspace usage. Another important set of patches was the removal of structs exynos_drm_display and exynos_drm_encoder layers which greatly improved the code making it cleaner and easier to use. Apart from that there is also a few cleanup and fixes.

Danilo Cesar Lemes de Paula (2):

Gustavo Padovan (36):

Javier Martinez Canillas (1):

Sjoerd Simons (2):

Tomeu Vizoso (7):

November 12, 2015 12:20 PM

November 11, 2015

Kees Cook: evolution of seccomp

I’m excited to see other people thinking about userspace-to-kernel attack surface reduction ideas. Theo de Raadt recently published slides describing Pledge. This uses the same ideas that seccomp implements, but with less granularity. While seccomp works at the individual syscall level and in addition to killing processes, it allows for signaling, tracing, and errno spoofing. As de Raadt mentions, Pledge could be implemented with seccomp very easily: libseccomp would just categorize syscalls.

I don’t really understand the presentation’s mention of “Optional Security”, though. Pledge, like seccomp, is an opt-in feature. Nothing in the kernel refuses to run “unpledged” programs. I assume his point was that when it gets ubiquitously built into programs (like stack protector), it’s effectively not optional (which is alluded to later as “comprehensive applicability ~= mandatory mitigation”). Regardless, this sensible (though optional) design gets me back to his slide on seccomp, which seems to have a number of misunderstandings:

OpenBSD has some interesting advantages in the syscall filtering department, especially around sockets. Right now, it’s hard for Linux syscall filtering to understand why a given socket is being used. Something like SOCK_DNS seems like it could be quite handy.

Another nice feature of Pledge is the path whitelist feature. As it’s still under development, I hope they expand this to include more things than just paths. Argument inspection is a weak point for seccomp, but under Linux, most of the arguments are ultimately exposed to the LSM layer. Last year I experimented with creating a “seccomp LSM” for path matching where programs could declare whitelists, similar to standard LSMs.

So, yes, Linux “could match this API on seccomp”. It’d just take some extensions to libseccomp to implement pledge(), as I described at the top. With OpenBSD doing a bunch of analysis work on common programs, it’d be excellent to see this usable on Linux too. So far on Linux, only a few programs (e.g. Chrome, vsftpd) have bothered to do this using seccomp, and it could be argued that this is ultimately due to how fine grained it is.

© 2015, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License

November 11, 2015 06:01 PM

November 06, 2015

Dave Jones: Trinity 1.6

As alluded to in my last post, a few days ago I released a new version of Trinity.
The bulk of the work in this release happened prior to my burn out back in July. The combination of everything described in that post, and general unhappiness in my last job etc led to me just wanting to walk away from everything for an indeterminate amount of time.

Distance is good. I’ve continued to poke at trinity in small amounts since then. At last weeks kernel summit, a number of people expressed just how useful they find Trinity and how much they were bummed to find out I wasn’t working on it any more. With that feedback, I felt motivated to clean the decks and get 1.6 out. There’s a short description of most of the bigger changes below, but there were probably a whole bunch more changes made that I forgot to highlight in the shortlog.

With that release wrapped up, and with the fresh perspective of having been ‘away’ from the project for a while, when I was travelling last week, I started work on some new features, starting with implementing a generic object cache instead of hard coding a “remember this” set of functionality for every single object type a syscall could return. A relatively small amount of code, which should make life easier to support recycling syscall results for syscalls other than mmap (which is all that’s implemented right now).

So,.. while I’m working on this stuff again, it’s not the comeback many would like. I don’t know just how much time I’m going to have to devote to working on Trinity. From time to time, I suspect I’ll find some intersection between my work at Facebook and the sort of targeted testing that Trinity is useful for, but it’s not my primary focus, and probably won’t be again. Additionally, I’ve got a bunch of ideas for new projects I’m itching to work on that spawned from discussions last week, so “spare time” hacking effort might be devoted more to them in future.

tl;dr: Don’t send me feature requests. I’ve got more than enough ideas for stuff *I* want to implement. Diffs speak louder than words.

Summary of some of the bigger changes to Trinity since the last (1.5) tarball release include:

The post Trinity 1.6 appeared first on codemonkey.org.uk.

November 06, 2015 04:08 PM

Matthew Garrett: Why improving kernel security is important

The Washington Post published an article today which describes the ongoing tension between the security community and Linux kernel developers. This has been roundly denounced as FUD, with Rob Graham going so far as to claim that nobody ever attacks the kernel.

Unfortunately he's entirely and demonstrably wrong, it's not FUD and the state of security in the kernel is currently far short of where it should be.

An example. Recent versions of Android use SELinux to confine applications. Even if you have full control over an application running on Android, the SELinux rules make it very difficult to do anything especially user-hostile. Hacking Team, the GPL-violating Italian company who sells surveillance software to human rights abusers, found that this impeded their ability to drop their spyware onto targets' devices. So they took advantage of the fact that many Android devices shipped a kernel with a flawed copy_from_user() implementation that allowed them to copy arbitrary userspace data over arbitrary kernel code, thus allowing them to disable SELinux.

If we could trust userspace applications, we wouldn't need SELinux. But we assume that userspace code may be buggy, misconfigured or actively hostile, and we use technologies such as SELinux or AppArmor to restrict its behaviour. There's simply too much userspace code for us to guarantee that it's all correct, so we do our best to prevent it from doing harm anyway.

This is significantly less true in the kernel. The model up until now has largely been "Fix security bugs as we find them", an approach that fails on two levels:

1) Once we find them and fix them, there's still a window between the fixed version being available and it actually being deployed
2) The forces of good may not be the first ones to find them

This reactive approach is fine for a world where it's possible to push out software updates without having to perform extensive testing first, a world where the only people hunting for interesting kernel vulnerabilities are nice people. This isn't that world, and this approach isn't fine.

Just as features like SELinux allow us to reduce the harm that can occur if a new userspace vulnerability is found, we can add features to the kernel that make it more difficult (or impossible) for attackers to turn a kernel bug into an exploitable vulnerability. The number of people using Linux systems is increasing every day, and many of these users depend on the security of these systems in critical ways. It's vital that we do what we can to avoid their trust being misplaced.

Many useful mitigation features already exist in the Grsecurity patchset, but a combination of technical disagreements around certain features, personality conflicts and an apparent lack of enthusiasm on the side of upstream kernel developers has resulted in almost none of it landing in the kernels that most people use. Kees Cook has proposed a new project to start making a more concerted effort to migrate components of Grsecurity to upstream. If you rely on the kernel being a secure component, either because you ship a product based on it or because you use it yourself, you should probably be doing what you can to support this.

Microsoft received entirely justifiable criticism for the terrible state of security on their platform. They responded by introducing cutting-edge security features across the OS, including the kernel. Accusing anyone who says we need to do the same of spreading FUD is risking free software being sidelined in favour of proprietary software providing more real-world security. That doesn't seem like a good outcome.

comment count unavailable comments

November 06, 2015 09:19 AM

November 05, 2015

Pete Zaitcev: Cool hardware in Tokyo

At the Mitaka Summit, we finally got some interesting kit exhibited, after the relatively lean summits in Atlanta and Vancouver. Unfortunately, the lightning in the Marketplace was very weird and pictures came out poorly.

My personal favourite is probably the flash array by SanDisk. It's nothing but JBOF, the host connection is SAS. You'd think any idiot could slap a few flash chips on cards and plug them into backplane... But just look how elegant it is. The capacity of the 2U box is 512 TB, but the whole thing only consumes 700 W maximum. It's brilliant, really.

Unfortunately, I don't have a good picture, but the second best was Ericksson's passive optical backplane. It promises to make your cables last forever: just swap out optronics when new bit rates come along. Even a terabit! Now it may actually be a misguided product. If they cannot get 3rd party vendors to build modules for it, the whole things comes crashing to the ground. Ditto if they build, but overprice. But the audacity of making something that's different is to be acknowledged. And frankly I'm not a fan of re-cabling when new servers come about.

Intel wins a consolation prize for preservance. They quietly presented some kind of next-generation multiblock computer, with pieces connected by serial cables. Finally, the future dreamed by the creators of Infiniband is here - only 15 years late, and still we don't know if it is viable.

There was also a bunch of fairly mundane boxes. Various also-run flash vendors were present, of course. Interestingly, SolidFire had a booth, but without anything eye-catching. Resting on the laurels? IBM brought their newest PowerPC, which was mostly remarkable for still existing. That sort of thing.

November 05, 2015 02:46 AM

November 04, 2015

Dave Jones: kernel summit 2015 wrap-up

Exhausting travel aside, kernel summit in Seoul was a good use of time.
Most of the sessions didn’t feel as interactive as prior years, in part I think because there really wasn’t a lot of objection, even to some
of the more controversial things. Kees’ security talk went over pretty well even if it did depress most the people in the room. Hopefully something good will come of it. The restartable sequences feature got talked about but didn’t get much (if any) real pushback.

There were a few hallway discussions surrounding various upcoming
kernel functionality that didn’t get ‘airtime’ in the sessions.
The kernel TLS stuff was probably discussed more in depth at netconf, and assorted VM features were covered more at LSFMM
earlier this year. Quite a few people talking excitedly about eBPF, both from a networking point of view, and soon.. tracing.
Quite a few people still seem concerned (rightly) about the upcoming unpriveledged bpf syscall.

It seems by fracturing the kernel summit into lots of smaller events the deep-dives into new features/problems happens there, leaving the kernel summit more for executive summary type talks, and as has been the general push over the last decade more and more process related discussions.

On process, Sasha’s discussion on stable was probably the most interesting to me personally. GregKH agreed to make 4.4 the next LTS starting a new tradition of “the next LTS is the one after the kernel summit”. We’ll see how that works out.

Chris Mason gave a “what went good/bad when facebook moved to 4.0” talk. Which for the most part, was all good. There are a few small things that are still being shaken out, but it’s by no means awful.

I had a lot of hallway conversations that began “so, trinity..”
The short answer there is that I’m still working on it, though at a much reduced pace than I was a year ago. It was good to hear feedback
from pretty much everyone I talked to that it was something that people value, which was a good motivator. More on that later.

I also had a lot of people asking a lot of questions about my Facebook bootcamp experience. I’ll do a longer write-up of that soon.

The post kernel summit 2015 wrap-up appeared first on codemonkey.org.uk.

November 04, 2015 07:00 PM

November 03, 2015

Grant Likely: Debugging 96Boards I2C

I was originally just going to post this to one of the 96boards mailing list, but it got sufficiently interesting that I thought I’d make it a blog post instead. I’ve been working on making i2c on the 96Boards sensors adapter work properly and I’ve made some progress. The problem that user have run into is that the Grove RGB LCD module won’t work when connected to one of the baseboard’s I2C busses. I pulled out the oscilloscope today to investigate.

The LCD module is particularly useful for testing because it actually has 2 i2c devices embedded in it; an LCD controller at address 0x3e, and an RGB controller at 0x62. The two devices operate independently with different electrical properties.

​On Hikey+sensors (TXS0108 level shifter), the RGB device will work, but only after pulling the ribbon cable apart to reduce crosstalk due to insufficient pullups. However, the LCD causes the entire bus to lock up, and no further transactions will work.

On Hikey+pca9306 the LCD isn’t detected and the RGB works correctly (undetermined if there are crosstalk issues)

​The traces below show both sides of the level shifter. Green and blue on the top for the data line. Orange and purple on the bottom with the clock.​

First, what I saw on using Hikey+pca9306+RGB:

Screen CaptureRGB transaction via PCA9306

And with the LCD:

Screen CaptureLCD transaction via PCA9306

In both traces you can see the start condition (data goes low while clock is high), the 7 bits of address (7 rising clock edges), the R/W bit (1 rising clock), and then the acknowledgement bit driven by the device. If the controller doesn’t see the device drive the data line low on the 9th clock, then it decides the device isn’t there and it terminates the transaction. It is easy to recognize the ack bit because the device has a different drive strength and the voltage level is different.

The RGB controller is a happy little device and it jumps at the chance to drive the data line low. It goes down pretty close to 0V. The LCD on the other hand is sulky and doesn’t drive the line quite as low as the controller can. About to 1V. 1V is recognized fine as logic low on a 5V device, but with 1.8V it is not even less than half. The way the pca9306 level shifter works is there are pull-up resistors on either side of the device that draws each side up to its respective high level. In this case, 1.8V and 5V. When either side gets driven low, the level shifter begins to conduct and the other side also gets drawn down to the same voltage, but it can only go as low as the voltage it is driven to. If it only gets driven down to 1V, then it will never get low enough for a 1.8V controller to recognize it as a low state.

It may be that with weaker pull-ups the LCD will be able to drive to a lower voltage level. I’ll need to experiment more, but in the mean time let’s move onto the Sensors board. Back to the traces:

First, here is a transaction to address 0x63 with no device present:

Screen CaptureNo device

​Looks perfectly normal so far. Next, the RGB device at address 0x62:

Screen CaptureRGB

Also behaving the same way as it did with the pca9306. Finally, an LCD transaction:

Screen CaptureLCD

Again we see the start condition, the 7 data bits and 1 r/w bit, but the ack bit looks weird. The LCD successfully drives the data bit low enough to be recognized, but then something weird happens. The data line stays low and the clock stops running. I don’t know actually know what is happening here, but I’ve got my suspicions. The LCD is continuing to drive the data line low, (you can tell by the slightly different voltage level) but keeping data low should not stop the clock. I suspect the txs0108 is getting confused and driving the clock line high. I’ve come across reports from others having trouble with the txs010x series on i2c. It has ‘one-shot’ accelerators to reduce rise time by driving the line high. I don’t know for sure though.

On the plus side, I now know that the Hikey I2C busses are working correctly. Now I need to decide what to do next. Aside from the i2c problem, Rev B of the sensors board ready for manufacturing. I either need to make the txs part work, or rework the design to use a pair of pca9306s. I think I’ll try weaker pull-ups on the pca9306 breakout board first and see how that goes. Sadly, I blew up the i2c drives on my Hikey board while experimenting today, so I need to do the same experiments with my Dragonboard 410c.

Dear lazyweb, do you have any other suggestions on things to try?

November 03, 2015 12:35 AM

October 28, 2015

Pete Zaitcev: Darcy on the future of storage

Quick comment on the following:

Good morning, madam. What kind of storage system would you like me to build for you today?

Scary thought. That means that selling storage products is going to be hard for all of us. We'll be selling components, both hardware and software, or we'll be selling integration and support services. Somebody will always pay to have somebody else assemble the parts, maybe add some light customization, and support the result. There's a nice living to be made there... but no empires.

Why is it a problem that no empires are to be built? It's only a problem for an empire-builder like I dunno... Sam Altman or something. Darcy is an old engineer, not a startup founder. A good one, too. His kids aren't going to go to bed hungry.

We've been at this dance before with Linux. People have been asking if Red Hat was going to be like Microsoft, and I told everyone: nope. We're transfering the wealth that the proprietary lock-in vendors were collecting back to the users. That was the whole idea. In the process, we're collecting less - a more reasonable amount, necessary to put stuff together and make it run. Therefore, we're not going to be as wealthy off users' backs. But the society as a whole benefits.

So cry me a river. Not scary at all. But RTWT, I think he's drawing a truthful outline overall.

P.S. Another thing, what's magical about storage? Why, I can go build spacecraft when storage goes bust. Or whatever. Of course it's a pity for all the storage-specific techniques and skills that I accumulated, but eh. As long as we leave behind the good code (and docs), it's all good.

October 28, 2015 01:41 AM

October 22, 2015

James Morris: LSM Mailing List Being Archived Again

Several folks noticed that all of the known LSM mailing list archives stopped archiving earlier this year.  We don’t know why and generally have not had any luck contacting the owners of several archives, including marc and gmane.  This is a concern, because the list is generally where Linux kernel security takes place and it’s important to have a public record of it.

The good news is that Paul Moore was finally able to re-register the list with mail-archive.com, and there is once again an active archive here: http://www.mail-archive.com/linux-security-module@vger.kernel.org/

Please update any links you may have!

October 22, 2015 04:58 AM

Andy Grover: iSNS support coming soon for LIO in Fedora

target-isns recently was added to Rawhide, and will be in a future Fedora release. This add-on to LIO allows it to register with an iSNS server, which potential initiators can then query for available targets. (On Fedora, see isns-utils for both the server, and client query tools.) This removes one of the few remaining areas that other target implementations have been ahead of LIO.

Kudos and thanks to Christophe Vu-Brugier for writing this useful program!

October 22, 2015 12:29 AM

October 21, 2015

Andy Grover: Some targetcli and TCMU questions

Just got an email full of interesting questions, I hope the author will be ok with me answering them here so future searches will see them:

I searched on internet and I don’t find some relevant info about gluster api support via tcmu-runner. Can you tell me please if this support will be added to the stable redhat targetcli in the near future? And I want to know also which targetcli is recommended for setup (targetcli or targetcli-fb) and what is the status for targetcli-3.0.

tcmu-runner is a userspace daemon add-on to LIO that allows requests for a device to be handled by a user process. tcmu-runner has early support for using glfs (via gfapi). Both tcmu-runner and its glfs plugin are beta-quality and will need further work before they are ready for stable Fedora, much less a RHEL release. tcmu-runner just landed in Rawhide, but this is really just to make it easier to test.

RHEL & Fedora use targetcli-fb, which is a fork of targetcli, and what I work on. Since I’m working on both tcmu-runner and targetcli-fb, targetcli-fb will see TCMU support very early.

The -fb packages I maintain switched to a “fbXX” version scheme, so I think you must be referring to the other one :-) I don’t have any info about the RTS/Datera targetcli’s status, other than nobody likes having two versions, the targetcli maintainer and I have discussed unifying them into a common version, but the un-fun work of merging them has not happened yet.

October 21, 2015 09:58 PM

October 20, 2015

Rusty Russell: ccan/mem’s memeqzero iteration

On Thursday I was writing some code, and I wanted to test if an array was all zero.  First I checked if ccan/mem had anything, in case I missed it, then jumped on IRC to ask the author (and overall CCAN co-maintainer) David Gibson about it.

We bikeshedded around names: memallzero? memiszero? memeqz? memeqzero() won by analogy with the already-extant memeq and memeqstr. Then I asked:

rusty: dwg: now, how much time do I waste optimizing?
dwg: rusty, in the first commit, none

Exactly five minutes later I had it implemented and tested.

The Naive Approach: Times: 1/7/310/37064 Bytes: 50

bool memeqzero(const void *data, size_t length)
{
    const unsigned char *p = data;

    while (length) {
        if (*p)
            return false;
        p++;
        length--;
    }
    return true;
}

As a summary, I’ve give the nanoseconds for searching through 1,8,512 and 65536 bytes only.

Another 20 minutes, and I had written that benchmark, and an optimized version.

128-byte Static Buffer: Times: 6/8/48/5872 Bytes: 108

Here’s my first attempt at optimization; using a static array of 128 bytes of zeroes and assuming memcmp is well-optimized for fixed-length comparisons.  Worse for small sizes, much better for big.

 const unsigned char *p = data;
 static unsigned long zeroes[16];

 while (length > sizeof(zeroes)) {
     if (memcmp(zeroes, p, sizeof(zeroes)))
         return false;
     p += sizeof(zeroes);
     length -= sizeof(zeroes);
 }
 return memcmp(zeroes, p, length) == 0;

Using a 64-bit Constant: Times: 12/12/84/6418 Bytes: 169

dwg: but blowing a cacheline (more or less) on zeroes for comparison, which isn’t necessarily a win

Using a single zero uint64_t for comparison is pretty messy:

bool memeqzero(const void *data, size_t length)
{
    const unsigned char *p = data;
    const unsigned long zero = 0;
    size_t pre;
    pre = (size_t)p % sizeof(unsigned long);
    if (pre) {
        size_t n = sizeof(unsigned long) - pre;
        if (n > length)
            n = length;
        if (memcmp(p, &zero, n) != 0)
            return false;
        p += n;
        length -= n;
    }
    while (length > sizeof(zero)) {
        if (*(unsigned long *)p != zero)
            return false;
        p += sizeof(zero);
        length -= sizeof(zero);
    }
    return memcmp(&zero, p, length) == 0;
}

And, worse in every way!

Using a 64-bit Constant With Open-coded Ends: Times: 4/9/68/6444 Bytes: 165

dwg: rusty, what colour is the bikeshed if you have an explicit char * loop for the pre and post?

That’s slightly better, but memcmp still wins over large distances, perhaps due to prefetching or other tricks.

Epiphany #1: We Already Have Zeroes: Times 3/5/92/5801 Bytes: 422

Then I realized that we don’t need a static buffer: we know everything we’ve already tested is zero!  So I open coded the first 16 byte compare, then memcmp()ed against the previous bytes, doubling each time.  Then a final memcmp for the tail.  Clever huh?

But it no faster than the static buffer case on the high end, and much bigger.

dwg: rusty, that is brilliant. but being brilliant isn’t enough to make things work, necessarily :p

Epiphany #2: memcmp can overlap: Times 3/5/37/2823 Bytes: 307

My doubling logic above was because my brain wasn’t completely in phase: unlike memcpy, memcmp arguments can happily overlap!  It’s still worth doing an open-coded loop to start (gcc unrolls it here with -O3), but after 16 it’s worth memcmping with the previous 16 bytes.  This is as fast as naive with as little as 2 bytes, and the fastest solution by far with larger numbers:

 const unsigned char *p = data;
 size_t len;

 /* Check first 16 bytes manually */
 for (len = 0; len < 16; len++) {
     if (!length)
         return true;
     if (*p)
         return false;
     p++;
     length--;
 }

 /* Now we know that's zero, memcmp with self. */
 return memcmp(data, p, length) == 0;

You can find the final code in CCAN (or on Github) including the benchmark code.

Finally, after about 4 hours of random yak shaving, it turns out lightning doesn’t even want to use memeqzero() any more!  Hopefully someone else will benefit.

October 20, 2015 12:09 AM

October 09, 2015

Paul E. Mc Kenney: Deep Blue vs. Watson Revisited

Some years back, I speculated on the importance of IBM's Watson. Much has happened since then: Watson won Jeopardy, has been applied to medical applications, and has been made available to numerous business partners to enable them to produce Watson-based offerings. In short, it is long past time for a follow-up.

However, The Economist beat me to the punch in their October 3rd print edition. I doubt that I can improve on their article, so I will confine myself to taking the fair-use liberty of quoting their last sentence:

If it [Watson] can pull that off, a truly disturbing possibility looms: that the next TV show featuring Watson might be “America's Got Talent”.

October 09, 2015 02:31 AM

October 08, 2015

Matthew Garrett: Going my own way

Reaction to Sarah's post about leaving the kernel community was a mixture of terrible and touching, but it's still one of those things that almost certainly won't end up making any kind of significant difference. Linus has made it pretty clear that he's fine with the way he behaves, and nobody's going to depose him. That's unfortunate, because earlier today I was sitting in a presentation at Linuxcon and remembering how much I love the technical side of kernel development. "Remembering" is a deliberate choice of word - it's been increasingly difficult to remember that, because instead I remember having to deal with interminable arguments over the naming of an interface because Linus has an undying hatred of BSD securelevel, or having my name forever associated with the deepthroating of Microsoft because Linus couldn't be bothered asking questions about the reasoning behind a design before trashing it.

In the end it's a mixture of just being tired of dealing with the crap associated with Linux development and realising that by continuing to put up with it I'm tacitly encouraging its continuation, but I can't be bothered any more. And, thanks to the magic of free software, it turns out that I can avoid putting up with the bullshit in the kernel community and get to work on the things I'm interested in doing. So here's a kernel tree with patches that implement a BSD-style securelevel interface. Over time it'll pick up some of the power management code I'm still working on, and we'll see where it goes from there. But, until there's a significant shift in community norms on LKML, I'll only be there when I'm being paid to be there. And that's improved my mood immeasurably.

(Edited to add a context link for the "deepthroating of Microsoft" reference)

comment count unavailable comments

October 08, 2015 09:22 AM

James Bottomley: Respect and the Linux Kernel Mailing Lists

I recently noticed that Sarah Sharp resigned publicly from the kernel giving a failure to impose a mandatory code of conduct as the reason and citing interaction problems, mainly on the mailing lists.  The net result of this posting, as all these comments demonstrate, is to imply directly that nothing has ever changed.  This implication is incredibly annoying, firstly because it is actually untrue, secondly because it does more to discourage participation than the behaviour that is being complained about and finally because it totally disrespects and ignores the efforts of hundreds of people who, over the last decade or so, have been striving to improve all interactions around Linux … a rather nice irony given that “respect” is listed as one of the issues for the resignation.  I’d just like to remind everyone of the history of these efforts and what the record shows they’ve achieved.

The issue of respect on the Mailing lists goes way back to the beginnings of Linux itself, but after the foundation of the OSDL (precursor to the Linux Foundation) Technical Advisory Board (TAB), one of its first issues from OSDL member companies was the imbalance between Asian and European/American contributions to the kernel.  The problems were partly to do with Management culture and partly because the lack of respect on the various mailing lists was directly counter to the culture of respect in a lot of Asian countries and disproportionately discouraged contributions from that region.  The TAB largely works behind the scenes, but some aspects of the effort filtered into the public domain as can be seen with a session on developer relations at the 2007 kernel summit (and, in fact, at a lot of other kernel summits since then).  Progress was gradual, and influenced by a large number of people, but the climate did improve.  I have to confess that I don’t follow LKML (not because of the flame war issues, simply because it’s too much of a firehose); however, the lists I do participate in (linux-scsi, linux-ide, linux-mm, linux-fsdevel, linux-efi, linux-arch, linux-parisc) haven’t seen any flagrantly disrespectful and personally insulting posts for several years now.  Indeed, when an individual came along who could almost have been flame bait for this with serial efforts to get incorrect and badly thought out patches into the kernel (I won’t give cites here to avoid stigmatising individuals) they met with a large reserve of patience and respectful and helpful advice before finally being banned from the lists for being incorrigible … no insults or flames at all.

Although I’d love to take credit for some of this, I’ve got to say that I think the biggest influencer towards civility is actually the “professionalisation”  of Linux: Employers pay people to work on Linux but the statements of those people become identified with their employers (no matter how many disclaimers they have) … in many ways, Open Source engineers are the new corporate spokespeople.  All employers bear this in mind when they hire and they certainly look over the mailing lists to see how people behave.  The net result is really that the only people who can afford to be rude or abusive are those who don’t think they have much chance of a long term career in Linux.

So, by and large, I’m proud of the achievements we’ve made in civility and the way we have improved over the years.  Are we perfect? by no means (but then perfection in such a large community isn’t a realistic goal).  However, we have passed our stress test: that an individual with bad patches to several mailing lists was met with courtesy and helpful advice, in spite of serially repeating the behaviour.

In conclusion, I’d just like to note that even the thread that gave rise to Sarah’s desire to pursue a code of conduct is now over two years old and try as they might, no-one’s managed to come up with a more recent example and no-one has actually invoked the voluntary code of conflict, which was the compromise for not having a mandatory code of conduct.  If it were me, I’d actually take that as a sign of success …

October 08, 2015 03:47 AM

October 05, 2015

Pete Zaitcev: Pics Up

Чёт я под настроение выложил картинки с этой недели на форумы Авиабазы. Anglophones are welcome to pictures at least.

October 05, 2015 07:31 PM

Davidlohr Bueso: acquire/release semantics in the kernel

With the need for better scaling on increasingly larger multi-core systems, we've continued to extend our CPU barriers in the kernel. Two important variants to prevent CPU reordering for lock-free shared memory synchronization are pairs of load/acquire and store/release barriers; also known as LOCK/UNLOCK barriers. These enable threads to cooperate between each other.

Multiple, yet pretty much equivalent, definitions of acquire/release semantics can be found all over the internet, but I like the version from the infamous 'Documentation/memory-barriers.txt' file for three reasons: (i) it is clear and concise, (ii) it explicitly warns that they are the minimum operations and not to assume anything about reordering of loads and stores before or after the acquire or release, respectively. Finally, (iii) it strongly mentions the need for pairing and thus portability:
 (5) ACQUIRE operations.

     This acts as a one-way permeable barrier.  It guarantees that all memory operations after the ACQUIRE operation will appear to happen after the CQUIRE operation with respect to the other components of the system. ACQUIRE operations include LOCK operations and smp_load_acquire() operations.

     Memory operations that occur before an ACQUIRE operation may appear tohappen after it completes.

     An ACQUIRE operation should almost always be paired with a RELEASE operation.

 (6) RELEASE operations.

     This also acts as a one-way permeable barrier.  It guarantees that all   memory operations before the RELEASE operation will appear to happen before the RELEASE operation with respect to the other components of the system. RELEASE operations include UNLOCK operations and smp_store_release() operations.

     Memory operations that occur after a RELEASE operation may appear to happen before it completes.

     The use of ACQUIRE and RELEASE operations generally precludes the need for other sorts of memory barrier (but note the exceptions mentioned in the subsection "MMIO write barrier").  In addition, a RELEASE+ACQUIRE pair is -not- guaranteed to act as a full memory barrier.  However, after an ACQUIRE on a given variable, all memory accesses preceding any prior RELEASE on that same variable are guaranteed to be visible.  In other words, within a given variable's critical section, all accesses of all previous critical sections for that variable are guaranteed to have completed.

     This means that ACQUIRE acts as a minimal "acquire" operation and    RELEASE acts as a minimal "release" operation.
Thread B's ACQUIRE pairs with Thread A's RELEASE. Copyright (C) IBM.



In lock-speak, all this means is that nothing leaks from the critical region that is protected by the primitive in question. A thread attempting to take a lock will synchronize/pair the load (ACQUIRE), for instance via Rmw (cmpxchg), when attempting to take the lock with the last store (RELEASE) when another thread is concurrently releasing the lock (for example, setting the counter to 0).

For v4.2, Will Deacon introduced more relaxed extensions of traditional atomic operations (including Rmw) which allow finer grained control over, what used to be, full barriers semantics on both sides of the instruction. This is also true for just about all atomic functions that return a value to the caller, ie: atomic_*_return(). As such weakly ordered architectures can make use of these -- currently only arm64 makes use of them, but efforts for PPC are being made.
      - *_relaxed: No ordering guarantees. This is similar to what we have already for the non-return atomics (e.g. atomic_add).
  
      - *_acquire: ACQUIRE semantics, similar to smp_load_acquire.
  
      - *_release: RELEASE semantics, similar to smp_store_release.
So we now have goodies such as atomic_cmpxchg_acquire() or atomic_add_return_relaxed(). Most recently, aiming for v4.4, I've ported all our locks to make use of these optimizations, which can save almost half the amount of barriers in the kernel's locking code -- which is specially nice under low or regular contention scenarios, where the fastpaths are exercised. There are plenty of other examples of real world code making use of acquire/release semantics. Mostly by using smp_load_acquire()/smp_store_release() other primitives  also use these semantics for common building blocks (as esoteric as they can get, ie RCU).

October 05, 2015 06:54 AM

September 24, 2015

Eric Sandeen: No, XFS won’t steal your money

So, the Inquirer runs a story by Chris Merriman today, titled “GreenDispenser malware threatens to take all your dosh from Linux ATMs” which includes this breathless little gem:

GreenDispenser targets the XFS file system, a popular standard for ATMs, originally designed for IRIX but now widely used in Linux. ATMs that use Windows XP Embedded, which is still supported, are not thought to be at risk.

Of course, I found this interesting, and a bit odd.  Could the XFS filesystem possibly be at fault here?  And is the “large and lots” filesystem really used in ATMS?  Let’s see what Proofpoint, the security firm who discovered it has to say about the subject:

Specifically, GreenDispenser like its predecessors interacts with the XFS middleware [4], which is widely adopted by various ATM vendors.

That handy link & footnote leads us to Wikipedia, which explains that “XFS middleware” refers to CEN/XFS, which is not in any way related to the XFS filesystem, or Linux, and is in fact Microsoft specific:

CEN/XFS or XFS (eXtensions for Financial Services) provides a client-server architecture for financial applications on the Microsoft Windows platform.

Nice job, Inquirer!  Nice job, Chris Merriman!

(As Jeff points out in the comments, The Inquirer has updated the article as of Sep 25, removing references to LInux and the XFS filesystem.)

September 24, 2015 06:49 PM

Matthew Garrett: Filling in the holes in Linux boot chain measurement, and the TPM measurement log

When I wrote about TPM attestation via 2FA, I mentioned that you needed a bootloader that actually performed measurement. I've now written some patches for Shim and Grub that do so.

The Shim code does a couple of things. The obvious one is to measure the second-stage bootloader into PCR 9. The perhaps less expected one is to measure the contents of the MokList and MokSBState UEFI variables into PCR 14. This means that if you're happy simply running a system with your own set of signing keys and just want to ensure that your secure boot configuration hasn't been compromised, you can simply seal to PCR 7 (which will contain the UEFI Secure Boot state as defined by the UEFI spec) and PCR 14 (which will contain the additional state used by Shim) and ignore all the others.

The grub code is a little more complicated because there's more ways to get it to execute code. Right now I've gone for a fairly extreme implementation. On BIOS systems, the grub stage 1 and 2 will be measured into PCR 9[1]. That's the only BIOS-specific part of things. From then on, any grub modules that are loaded will also be measured into PCR 9. The full kernel image will be measured into PCR10, and the full initramfs will be measured into PCR11. The command line passed to the kernel is in PCR12. Finally, each command executed by grub (including those in the config file) is measured into PCR 13.

That's quite a lot of measurement, and there are probably fairly reasonable circumstances under which you won't want to pay attention to all of those PCRs. But you've probably also noticed that several different things may be measured into the same PCR, and that makes it more difficult to figure out what's going on. Thankfully, the spec designers have a solution to this in the form of the TPM measurement log.

Rather than merely extending a PCR with a new hash, software can extend the measurement log at the same time. This is stored outside the TPM and so isn't directly cryptographically protected. In the simplest form, it contains a hash and some form of description of the event associated with that hash. If you replay those hashes you should end up with the same value that's in the TPM, so for attestation purposes you can perform that verification and then merely check that specific log values you care about are correct. This makes it possible to have a system perform an attestation to a remote server that contains a full list of the grub commands that it ran and for that server to make its attestation decision based on a subset of those.

No promises as yet about PCR allocation being final or these patches ever going anywhere in their current form, but it seems reasonable to get them out there so people can play. Let me know if you end up using them!

[1] The code for this is derived from the old Trusted Grub patchset, by way of Sirrix AG's Trusted Grub 2 tree.

comment count unavailable comments

September 24, 2015 01:21 AM

September 20, 2015

Matthew Garrett: The Internet of Incompatible Things

I have an Amazon Echo. I also have a LIFX Smart Bulb. The Echo can integrate with Philips Hue devices, letting you control your lights by voice. It has no integration with LIFX. Worse, the Echo developer program is fairly limited - while the device's built in code supports communicating with devices on your local network, the third party developer interface only allows you to make calls to remote sites[1]. It seemed like I was going to have to put up with either controlling my bedroom light by phone or actually getting out of bed to hit the switch.

Then I found this article describing the implementation of a bridge between the Echo and Belkin Wemo switches, cunningly called Fauxmo. The Echo already supports controlling Wemo switches, and the code in question simply implements enough of the Wemo API to convince the Echo that there's a bunch of Wemo switches on your network. When the Echo sends a command to them asking them to turn on or off, the code executes an arbitrary callback that integrates with whatever API you want.

This seemed like a good starting point. There's a free implementation of the LIFX bulb API called Lazylights, and with a quick bit of hacking I could use the Echo to turn my bulb on or off. But the Echo's Hue support also allows dimming of lights, and that seemed like a nice feature to have. Tcpdump showed that asking the Echo to look for Hue devices resulted in similar UPnP discovery requests to it looking for Wemo devices, so extending the Fauxmo code seemed plausible. I signed up for the Philips developer program and then discovered that the terms and conditions explicitly forbade using any information on their site to implement any kind of Hue-compatible endpoint. So that was out. Thankfully enough people have written their own Hue code at various points that I could figure out enough of the protocol by searching Github instead, and now I have a branch of Fauxmo that supports searching for LIFX bulbs and presenting them as Hues[2].

Running this on a machine on my local network is enough to keep the Echo happy, and I can now dim my bedroom light in addition to turning it on or off. But it demonstrates a somewhat awkward situation. Right now vendors have no real incentive to offer any kind of compatibility with each other. Instead they're all trying to define their own ecosystems with their own incompatible protocols with the aim of forcing users to continue buying from them. Worse, they attempt to restrict developers from implementing any kind of compatibility layers. The inevitable outcome is going to be either stacks of discarded devices speaking abandoned protocols or a cottage industry of developers writing bridge code and trying to avoid DMCA takedowns.

The dystopian future we're heading towards isn't Gibsonian giant megacorporations engaging in physical warfare, it's one where buying a new toaster means replacing all your lightbulbs or discovering that the code making your home alarm system work is now considered a copyright infringement. Is there a market where I can invest in IP lawyers?

[1] It also requires an additional phrase at the beginning of a request to indicate which third party app you want your query to go to, so it's much more clumsy to make those requests compared to using a built-in app.
[2] I only have one bulb, so as yet I haven't added any support for groups.

comment count unavailable comments

September 20, 2015 09:22 PM

September 18, 2015

Daniel Vetter: XDC 2015: Atomic Modesetting for Drivers

I've done a talk at XDC 2015 about atomic modesetting with a focus for driver writers. Most of the talk is an overview of how an atomic modeset looks and how to implement the different parts in a driver backend. Anyway, for all those who missed it, there's a video and slides.

September 18, 2015 03:27 PM

September 11, 2015

Pete Zaitcev: TLS Security In Firefox 40

What do people at Mozilla think is going to happen when I need to access a website and Firefox says that TLS parameters are insecure and thus I cannot? I'm going to use Chrome, that's what. Or maybe even a hacked Midori, where I can adjust build-time parameters of gcr.

That company went way downhill when they kicked Eich out.

September 11, 2015 06:33 PM

September 07, 2015

Daniel Vetter: Neat drm/i915 stuff for 4.3

Kernel 4.2 is released already and the 4.3 merge window in full swing, time to look at what's in it for the intel graphics driver.



Biggest thing for sure is that Skylake is finally out of preliminary support and enabled by default. The reason for the long hold-up was some ABI fumble - the hardware exposes the topmost plane both through the new universal plane registers and the legacy cursor registers and because we simply carried the legacy plane code around in the driver we ended up exposing both. This wasn't something big to take care of but somehow was dragged on forever.

The other big thing is that now legacy modesets are done with the new atomic modesetting code driver-internally. Atomic support in i915.ko isn't ready for prime-time yet fully, but this is definitely a big step forward. Besides atomic there's also other cross-platform improvements in the modeset code: Ville fixed up the 12bpc support for HDMI, which is now used by default if the screen supports it. Mika Kahola and Ville also implemented dynamic adjustment of the cdclk, which is the main clock source for display engines on intel graphics. And there's a big difference in the clock speeds needed between e.g. a 4k screen and a 720p TV.

Continuing with power saving features Rodrigo again spent a lot of time fixing up PSR (panel self refresh). And Paulo did the same by writing patches to improve FBC (framebuffer compression). We have some really solid testcases by now, unfortunately neither feature is ready for enabling by default yet. Especially PSR is still plagued by screen freezes on some random systems. Also there's been some fixes to DRRS (dynamic refresh rate switching) from Ramalingam. DRRS is enabled by default already, where supported. And finally some improvements to make the frontbuffer rendering tracking more accurate, which is used by all three of these display power saving features.

And of course there's also tons of improvements to platform code. Display PLL code for Sklylake and Valleyview&Cherryview was tuned by Damien and Ville respectively. There's been tons of work on Broxton and DSI support by Imre, Gaurav and others.

Moving on to the rendering side the big change is how tracking of rendering tasks is handled. In the past the driver just used raw sequence numbers emitted by the hardware, but for cross-driver synchronization and reordering tasks with an eventual gpu scheduler more abstraction is needed. A big step is converting over to the i915 request structure completely, done by John Harrison. The next step will be to switch the internal implementation for i915 requests to the cross-driver fences, but that's for future kernels. As a follow-up cleanup John also removed the OLR, which stands for outstanding lazy request. It was a neat little trick implemented years ago to simplify handling error recovery, but which causes tons of pain with subtle bugs. Making requests more explicit in the driver allowed us to finally remove this trick since.

There's also been a pile of platform related features: MOCS programming for Skylake/Broxton (which is used for caching control). Resource streamer support from Abdiel, which is used to offload some of the buffer object tracking for shaders from the cpu to the gpu. And the command parser on Haswell was extended to support atomic instructions in shaders. And finally for Skylake Mika Kuoppala added code to avoid resetting the gpu - in certain cases the hardware would hard-hang the entire system trying to execute the reset. And a dead gpu is still better than a dead system.

September 07, 2015 09:40 AM

September 04, 2015

Andy Grover: RHEL 7.2 has an updated kernel target

As mentioned in the beta release notes, the kernel in RHEL 7.2 contains a rebased LIO kernel target, to the equivalent of the Linux 4.0.stable series.

This is a big update. LIO has improved greatly since 3.10. It has added support for SCSI features that enable VMWare VAAI support, as well as data integrity (DIF), and significant iSER work, for those of you using Infiniband. (SRP is also supported, as well as iSCSI and FCoE, of course.)

Note that we still do not ship support for the Fibre Channel qla2xxx fabric. It still seems to be something storage vendors and integrators want, more than a feature our customers are telling us they want in RHEL.

(On a side note, Infiniband hardware is pretty affordable these days! For all you datacenter hobbyists who have a rack in the garage, I might suggest a cheap previous-gen IB setup and either SRP or iSER as the way to go and still get really high IOPs.)

Users of RHEL 7’s SCSI target should find RHEL 7.2 to be a very nice upgrade. Please try the beta out and report any issues you find of course, but it’s looking really good so far.

September 04, 2015 09:50 PM

Pavel Machek: Wifi fun and misc..

(And apology for the SSD entry some time back. Apparently yes, they can fail to retain data after less than a week... at the very end of their lifetime.)

In the last weeks, learned that transfering real-time data over WIFI is way more fun than I thought. And that it is possible to communicate from inside of (closed) microwave oven using 2.4GHz WIFI. I don't know about you, but it scares me a little.

N900 and not everything is a file

Pocket Computer. We had pocket computers before ... Sharp Zaurus lines was prominent example. They had keyboards and resistive
touchscreens... Resistive touchscreen with stylus is accurate enough to serve as mouse replacement. Unfortunately, such machines are slowly going extinct. Sure, we have Quad-core Full-HD smartphones these days... but they lack keyboards, making ssh from them impossible, they lack accurate pointing device, and they are really phones, not small computers. N900 can almost be used as a pocket computer...

New Mer is "broken beyond repair" for n900.. as it uses qt5.  qt4 works well (well... little slow) on n900, but qt5 needs stable egl
drivers. Ok, so that was another nice-looking trap. I'm starting to think that text-only user interface is right thing to do on n900 at
this point.
Baking n900 for 15minutes at 250C seems to have fixed the "no sim card" problem... for a week. It now seems a bit flakey, but definitely better than before baking. Thanks for everyone at Czech BrmLab!
To backup mmc card on N900, I'd like to rsync root@maemo:/dev/mmcblk1 mmcblk1.img ... but that does not work, as rsync is too clever and refuses to transfer content of special files. Is there trick I'm missing?

On the n900 front... it has 256MiB RAM and 800x480 screen. What web browser would you recommend for that? I tried links2, but its support is not good enough for properly working m.mobilecity.com pages... which I'd kind of like.

Linus, please reconsider -rc0

Hmm. There's big difference between 4.1 (expected to be pretty stable kernel) and 4.2-rc0 (which is probably going to be as unstable as it gets. Unfortunately, Linus does not change makefile before merging, so it is quite tricky to tell if
Linux amd 4.1.0 #25 SMP Wed Jul 1 11:20:22 CEST 2015 x86_64 GNU/Linux
is expected-to-be-stable 4.1, or expected-to-be-very-unstable 4.2-rc0...

Its tempting to name your branches simply "v4.1", "v3.11". Don't. When -rc's are done, Linus will create "v4.1" tag, and you'll have fun
figuring out what whent wrong in your git.

Google play bloatware

I got very cheap LG optimus chic.. and android did improve from G1 days. Its still Google's spying empire, but.. at least it is fluid and mostly works.
Not sure what "Google Play services" are good for, but taking 50MB of internal flash is not funny.. and when moved to SD card, the SD card tends to disconnect. "Google Play Store" still works without them. "My Tracks" need them, but 60MB of flash is not reasonable price to pay for GPX recording. "Pubtran" got removed, too. MHDdroid has strange interface, but perhaps it will not need that much storage.
Do you know a way to search czech public transport without Android and without desktop browser or Opera Mini? m.idos.cz leads to "full" version.

And ...dear Android, "force close" dialog is last thing I want to see after hearing ringtone. If you could at least add the number to call log...

Feeling cheated


Wed Jul  1 01:59:58 CEST 2015
Wed Jul  1 01:59:59 CEST 2015
Wed Jul  1 02:00:00 CEST 2015
Wed Jul  1 02:00:01 CEST 2015
Wed Jul  1 02:00:02 CEST 2015
Wed Jul  1 02:00:03 CEST 2015
Different power supply for X60

Thinkpad X60 is marked as 20V, 3.25A. I wonder if using 19V, 2.63A power supply is a good idea. The power brick is way smaller, and 65W seems to be a little high for a small notebook.

September 04, 2015 10:04 AM

September 03, 2015

Gustavo F. Padovan: Linux Kernel Engineer opportunity at Collabora!

Collabora is a software consultancy specialising in bringing companies and the open source software community together and it is currently looking for a Core Software Engineer, that works in the Linux kernel and/or all the plumbing around the kernel. In this role the engineer will be part of worldwide team who works with our clients to solve their Linux kernel and low level stack technical problems.

Collabora is well-known for its strong relationship to upstream development, so it is an important part of this role make significant contributions to upstream projects.

Visit our jobs page or talk me to put you in contact with our Hiring Team!

September 03, 2015 08:44 PM

Paul E. Mc Kenney: Stupid RCU Tricks: Hand-over-hand traversal of linked list using SRCU

Suppose that a very long linked list was to be protected with SRCU. Let's also make the presumably unreasonable assumption that this list is so long that we don't want to stay in a single SRCU read-side critical section for the whole traversal.

So why not try hand-over-hand SRCU protection, as shown in the following code fragment?

  1 struct foo {
  2   struct list_head list;
  3   ...
  4 };
  5 
  6 LIST_HEAD(mylist);
  7 struct srcu_struct mysrcu;
  8 
  9 void process(void)
 10 {
 11   int i1, i2;
 12   struct foo *p;
 13 
 14   i1 = srcu_read_lock(&mysrcu);
 15   list_for_each_entry_rcu(p, &mylist, list) {
 16     do_something_with(p);
 17     i2 = srcu_read_lock(&mysrcu);
 18     srcu_read_unlock(&mysrcu, i1);
 19     i1 = i2;
 20   }
 21   srcu_read_unlock(&mysrcu, i1);
 22 }


The trick is that on each pass through the loop, we enter a new SRCU read-side critical section, then exit the old one. That way the entire traversal is protected by SRCU, but each SRCU read-side critical section is quite short, covering traversal of but a single element of the list.

As is customary with SRCU, the list is manipulated using list_add_rcu(), list_del_rcu, and friends.

What are the advantages and disadvantages of this hand-over-hand SRCU list traversal?

September 03, 2015 05:20 AM

August 31, 2015

Matthew Garrett: Working with the kernel keyring

The Linux kernel keyring is effectively a mechanism to allow shoving blobs of data into the kernel and then setting access controls on them. It's convenient for a couple of reasons: the first is that these blobs are available to the kernel itself (so it can use them for things like NFSv4 authentication or module signing keys), and the second is that once they're locked down there's no way for even root to modify them.

But there's a corner case that can be somewhat confusing here, and it's one that I managed to crash into multiple times when I was implementing some code that works with this. Keys can be "possessed" by a process, and have permissions that are granted to the possessor orthogonally to any permissions granted to the user or group that owns the key. This is important because it allows for the creation of keyrings that are only visible to specific processes - if my userspace keyring manager is using the kernel keyring as a backing store for decrypted material, I don't want any arbitrary process running as me to be able to obtain those keys[1]. As described in keyrings(7), keyrings exist at the session, process and thread levels of granularity.

This is absolutely fine in the normal case, but gets confusing when you start using sudo. sudo by default doesn't create a new login session - when you're working with sudo, you're still working with key posession that's tied to the original user. This makes sense when you consider that you often want applications you run with sudo to have access to the keys that you own, but it becomes a pain when you're trying to work with keys that need to be accessible to a user no matter whether that user owns the login session or not.

I spent a while talking to David Howells about this and he explained the easiest way to handle this. If you do something like the following:
$ sudo keyctl add user testkey testdata @u
a new key will be created and added to UID 0's user keyring (indicated by @u). This is possible because the keyring defaults to 0x3f3f0000 permissions, giving both the possessor and the user read/write access to the keyring. But if you then try to do something like:
$ sudo keyctl setperm 678913344 0x3f3f0000
where 678913344 is the ID of the key we created in the previous command, you'll get permission denied. This is because the default permissions on a key are 0x3f010000, meaning that the possessor has permission to do anything to the key but the user only has permission to view its attributes. The cause of this confusion is that although we have permission to write to UID 0's keyring (because the permissions are 0x3f3f0000), we don't possess it - the only permissions we have for this key are the user ones, and the default state for user permissions on new keys only gives us permission to view the attributes, not change them.

But! There's a way around this. If we instead do:
$ sudo keyctl add user testkey testdata @s
then the key is added to the current session keyring (@s). Because the session keyring belongs to us, we possess any keys within it and so we have permission to modify the permissions further. We can then do:
$ sudo keyctl setperm 678913344 0x3f3f0000
and it works. Hurrah! Except that if we log in as root, we'll be part of another session and won't be able to see that key. Boo. So, after setting the permissions, we should:
$ sudo keyctl link 678913344 @u
which ties it to UID 0's user keyring. Someone who logs in as root will then be able to see the key, as will any processes running as root via sudo. But we probably also want to remove it from the unprivileged user's session keyring, because that's readable/writable by the unprivileged user - they'd be able to revoke the key from underneath us!
$ sudo keyctl unlink 678913344 @s
will achieve this, and now the key is configured appropriately - UID 0 can read, modify and delete the key, other users can't.

This is part of our ongoing work at CoreOS to make rkt more secure. Moving the signing keys into the kernel is the first step towards rkt no longer having to trust the local writable filesystem[2]. Once keys have been enrolled the keyring can be locked down - rkt will then refuse to run any images unless they're signed with one of these keys, and even root will be unable to alter them.

[1] (obviously it should also be impossible to ptrace() my userspace keyring manager)
[2] Part of our Secure Boot work has been the integration of dm-verity into CoreOS. Once deployed this will mean that the /usr partition is cryptographically verified by the kernel at runtime, making it impossible for anybody to modify it underneath the kernel. / remains writable in order to permit local configuration and to act as a data store, and right now rkt stores its trusted keys there.

comment count unavailable comments

August 31, 2015 05:18 PM

August 26, 2015

James Morris: Linux Security Summit 2015 – Wrapup, slides

The slides for all of the presentations at last week’s Linux Security Summit are now available at the schedule page.

Thanks to all of those who participated, and to all the events folk at Linux Foundation, who handle the logistics for us each year, so we can focus on the event itself.

As with the previous year, we followed a two-day format, with most of the refereed presentations on the first day, with more of a developer focus on the second day.  We had good attendance, and also this year had participants from a wider field than the more typical kernel security developer group.  We hope to continue expanding the scope of participation next year, as it’s a good opportunity for people from different areas of security, and FOSS, to get together and learn from each other.  This was the first year, for example, that we had a presentation on Incident Response, thanks to Sean Gillespie who presented on GRR, a live remote forensics tool initially developed at Google.

The keynote by kernel.org sysadmin, Konstantin Ryabitsev, was another highlight, one of the best talks I’ve seen at any conference.

Overall, it seems the adoption of Linux kernel security features is increasing rapidly, especially via mobile devices and IoT, where we now have billions of Linux deployments out there, connected to everything else.  It’s interesting to see SELinux increasingly play a role here, on the Android platform, in protecting user privacy, as highlighted in Jeffrey Vander Stoep’s presentation on whitelisting ioctls.  Apparently, some major corporate app vendors, who were not named, have been secretly tracking users via hardware MAC addresses, obtained via ioctl.

We’re also seeing a lot of deployment activity around platform Integrity, including TPMs, secure boot and other integrity management schemes.  It’s gratifying to see the work our community has been doing in the kernel security/ tree being used in so many different ways to help solve large scale security and privacy problems.  Many of us have been working for 10 years or more on our various projects  — it seems to take about that long for a major security feature to mature.

One area, though, that I feel we need significantly more work, is in kernel self-protection, to harden the kernel against coding flaws from being exploited.  I’m hoping that we can find ways to work with the security research community on incorporating more hardening into the mainline kernel.  I’ve proposed this as a topic for the upcoming Kernel Summit, as we need buy-in from core kernel developers.  I hope we’ll have topics to cover on this, then, at next year’s LSS.

We overlapped with Linux Plumbers, so LWN was not able to provide any coverage of the summit.  Paul Moore, however, has published an excellent write-up on his blog. Thanks, Paul!

The committee would appreciate feedback on the event, so we can make it even better for next year.  We may be contacted via email per the contact info at the bottom of the event page.

August 26, 2015 07:09 PM

August 19, 2015

Matt Domsch: Dell Desktop / Notebook Linux Engineering position available

Come help Dell ensure Linux “just works!” on Dell notebooks, desktops, and devices! The Dell Client Linux Engineering team has opening for a Senior Software Engineer. This team works closely with the Linux community, device manufacturers, and Dell engineering teams to provide the best Linux experience across the entire client product line.

Visit the Dell Jobs site to apply. If you’re a friend of mine and are interested, drop me a line and I’ll make sure you get in front of the hiring manager quickly!

August 19, 2015 09:31 PM

August 18, 2015

Matthew Garrett: Canonical's deliberately obfuscated IP policy

I bumped into Mark Shuttleworth today at Linuxcon and we had a brief conversation about Canonical's IP policy. The short summary:


The even shorter summary: Canonical won't clarify their IP policy because they believe they can make more money if they don't.

Why do I keep talking about this? Because Canonical are deliberately making it difficult to create derivative works, and that's one of the core tenets of the definition of free software. Their IP policy is fundamentally incompatible with our community norms, and that's something we should care about rather than ignoring.

comment count unavailable comments

August 18, 2015 07:02 PM

August 17, 2015

Andi Kleen: Announcing simple-pt — A simple Processor Trace implementation

Modern Intel Core CPUs (5th and 6th generation) have a Intel Processor Trace (PT) feature to trace branch execution with low overhead. This is useful for performance analysis and debugging.

simple-pt is a simple standalone driver and decoder tool to implement PT on Linux.

Starting with Linux 4.1 Linux already has a integrated PT implementation in perf (see https://lwn.net/Articles/648154/ ). simple-pt is an alternative implementation. It has many disadvantages over the perf PT implementation, such as:
- needs to run as root
- no long term tracing or sampling with interrupts
- no support for interactive debugging (use gdb 7.10 on perf for that)
- no support for histograms
- somewhat experimental
- not as well supported as perf

On the positive side simple-pt is:
- simple
- standalone. No kernel changes needed. Could be ported to older kernels or other operating systems
- easy to modify and experiment with
- more ftrace like decoding tool
- support for kprobes based triggers
- modular “unix style” design with simple tools that do only one thing each
- BSD licensed

Example output:


        % sptcmd  -c tcall taskset -c 0 ./tcall
        cpu   0 offset 1027688,  1003 KB, writing to ptout.0
        ...
        Wrote sideband to ptout.sideband
        % sptdecode --sideband ptout.sideband --pt ptout.0 | less
        TIME      DELTA  INSNs   OPERATION
        frequency 32
        0        [+0]     [+   1] _dl_aux_init+436
                          [+   6] __libc_start_main+455 -> _dl_discover_osversion
        ...
                          [+  13] __libc_start_main+446 -> main
                          [+   9]     main+22 -> f1
                          [+   4]             f1+9 -> f2
                          [+   2]             f1+19 -> f2
                          [+   5]     main+22 -> f1
                          [+   4]             f1+9 -> f2
                          [+   2]             f1+19 -> f2
                          [+   5]     main+22 -> f1
        ...

Available from https://github.com/andikleen/simple-pt

August 17, 2015 04:27 AM

August 16, 2015

Daniel Vetter: Atomic Modesetting Design Overview

After a few years of development the atomic display update IOCTL for drm drivers is finally ready for prime time with the 4.2 pull request from Dave Airlie. It's been a long road, with a lot of drivers already converted over to atomic and even more in progress, the atomic helper libraries and support code in the drm subsystem sufficiently polished. But what's really missing is a design overview of what the overall atomic infrastructure looks like and why some decisions and details are implemented like they are.

That's now done and published on LWN: Part 1 talks about the problem space, issues with the Android atomic display framework and the basic atomic IOCTL interface. Part 2 goes into more detail about a few specific things like locking, helper library design and the exact semantics of atomic modessetting updates. Happy Reading!

August 16, 2015 01:52 PM

August 15, 2015

Rusty Russell: Broadband Speeds, New Data

Thanks to edmundedgar on reddit I have some more accurate data to update my previous bandwidth growth estimation post: OFCOM UK, who released their November 2014 report on average broadband speeds.  Whereas Akamai numbers could be lowered by the increase in mobile connections, this directly measures actual broadband speeds.

Extracting the figures gives:

  1. Average download speed in November 2008 was 3.6Mbit
  2. Average download speed in November 2014 was 22.8Mbit
  3. Average upload speed in November 2014 was 2.9Mbit
  4. Average upload speed in November 2008 to April 2009 was 0.43Mbit/s

So in 6 years, downloads went up by 6.333 times, and uploads went up by 6.75 times.  That’s an annual increase of 36% for downloads and 37% for uploads; that’s good, as it implies we can use download speed factor increases as a proxy for upload speed increases (as upload speed is just as important for a peer-to-peer network).

This compares with my previous post’s Akamai’s UK numbers of 3.526Mbit in Q4 2008 and 10.874Mbit in Q4 2014: only a factor of 3.08 (26% per annum).  Given how close Akamai’s numbers were to OFCOM’s in November 2008 (a year after the iPhone UK release, but probably too early for mobile to have significant effect), it’s reasonable to assume that mobile plays a large part of this difference.

If we assume Akamai’s numbers reflected real broadband rates prior to November 2008, we can also use it to extend the OFCOM data back a year: this is important since there was almost no bandwidth growth according to Akamai from Q4 2007 to Q7 2008: ignoring that period gives a rosier picture than my last post, and smells of cherrypicking data.

So, let’s say the UK went from 3.265Mbit in Q4 2007 (Akamai numbers) to 22.8Mbit in Q4 2014 (OFCOM numbers).  That’s a factor of 6.98, or 32% increase per annum for the UK. If we assume that the US Akamai data is under-representing Q4 2014 speeds by the same factor (6.333 / 3.08 = 2.056) as the UK data, that implies the US went from 3.644Mbit in Q4 2007 to 11.061 * 2.056 = 22.74Mbit in Q4 2014, giving a factor of 6.24, or 30% increase per annum for the US.

As stated previously, China is now where the US and UK were 7 years ago, suggesting they’re a reasonable model for future growth for that region.  Thus I revise my bandwidth estimates; instead of 17% per annum this suggests 30% per annum as a reasonable growth rate.

August 15, 2015 04:54 AM

August 14, 2015

Pete Zaitcev: Tablet Uber Alles Or Is It

Given the trouble with modern laptops, I'm seriously thinking if I should make a jump to a gigantic tablet with a keyboard. You run "make" on VM. Not enough RAM? Order in the cloud! The idea was planted in my mind by that jerk Atwood, who penned an article claiming a death of PC. And a month ago I saw someone at Python meetup using Canopy. It kinda worked, actually. I expect Github Atom to be even better.

Unfortunately, there are problems in 3 broad categories still.

First, the hotspot Internet connectivity sucks. It is plain unreliable. VPN, ssh, and IRC are often blocked; it's necessary to remember "Connectivity Through Anything" lessons and tehcniques. When it works, it's often slow. These problems extend to venues such as Intel's Executive Briefing Center. If "executives" eating their awesome snacks cannot obtain a decent WiFi, what hope do I have? I do not have cellphone data, but I hear bitching about it.

Second, the usual questions about privacy and security apply. Non-proprietary tablets suck immensely, from what I heard.

Third, tablets top out at 10..11 inch. Sorry, but that is not enough to kill laptops while laptops continue to be made. Certainly, Atwood made an argument that as tablets absorb users, PC makers will stop. The day the last one quits, we'll have to use the least shitty tablet regardless of size. But today is not that day.

UPDATE: 3 weeks after this post, Apple unveiled a 12.9" (2732 x 2048) iPad Pro, with a keyboard as a factory option.

August 14, 2015 09:38 PM

Pete Zaitcev: User-facing hardware

New business trip, new hardware pictures.

It was almost a year, and I'm still looking for a decent laptop, same criteria. I saw a couple of guys using Lenovo X1 Carbon, which looks good. Most importantly, the left Ctrl is now extends to its proper position. Almost a winner, but unfortunately, there are issues. Apparently, the screen on the X1 is not touching the main frame flat when it's closed, so a bundle of clothing pressing in the middle between the hinges is capable to making a nasty crack in plastic. Not acceptable for what is a $1,400 laptop even with Amazon's "discount" of $900. Way to go, Lenovo. Almost had me this time.

Meanwhile, a $500 Dell Vostro continues to soldier on. It's showing its age: building Ceph with "make -j${N}" requires more RAM that it has for any reasonable N, and dialog windows started to outgrow its screen (notably, some of GNOME preferences). I still need a laptop, but can't find a suitable one. The Lenovo X1 tops out at 8GB, which was another strike against it.

I was a little sad when Google stopped making Nexus 7. I have a 2013 version and it is quite good. In the same meeting, I bumped into a guy with a projected update to Nexus 7 that became orphaned when Google pulled the plug. ASUS continued to build them and market them as "MemoPad 7". However, taking the page from Microsoft playbook with their "Surface" and "Surface Pro", ASUS sell "MemoPad 7" versions ranging from worthless piece of junk with 1024x600 to actual Nexus 7 replacements with 1920x1200. Allegedly, the battery life and speed are much improved by using Intel's embedded Atom core. Some of the ARM-optimized apps may not work (example is some kind of music editing thing for podcasters).

August 14, 2015 09:18 PM

August 13, 2015

Dave Jones: The case of the mysterious disappearing I211

Day one of unemployed life saw me finally getting around to the first of several hardware related maintenance items that I’ve been putting off until I’ve had the time.

I got a lot of life out of my desktop machine that I had been using since 2007. Earlier this year, I decided it was long overdue an upgrade, and ended up building a ridiculously over-specced machine in the hopes it too would last me a while. After some research, I ended up with a 6-core Haswell-E i7-5820K, and a frankly ridiculously over-featured motherboard.
Once I had delved through the absurd number of BIOS options to convince it that I *really* didn’t want to overclock my CPU or my RAM, or anything else, it was very stable.

It has exceeded all my expectations. In the time it took my old desktop to build one kernel, I can build kernel .deb’s for every machine I own, and still have time spare. It’s an absolute beast.

One of the features that sold me on this board was the two onboard ethernet ports. I had been wanting to do a bunch of networking experiments, and the possibility of using bonding, without having to screw around with add-in cards was appealing.

So I was a little irked one evening after updating its BIOS, to notice that the bond only had one interface active. After some investigation, I noticed that the PCI ID of one of the onboard NICs had changed.

What was once

00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (2) I218-V (rev 05)
08:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03)

Was now

00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (2) I218-V (rev 05)
08:00.0 Ethernet controller: Intel Corporation Device 1532 (rev 03)

My I211 had changed its PCI ID, and the e1000 driver wouldn’t bind to this new device.

At first I thought “Cool, some kind of NIC firmware update”, and assumed that e1000 hadn’t been updated yet to support this new feature. Googling for “i211 1532” told a much sadder story however.

If you read the spec update for the i211, you find this interesting table:

I211 Device ID Code Vendor ID Device ID Revision ID1
WGI211AT (not programmed/factory default) 0x8086 0x1532 0x3
WGI211AT (programmed) 0x8086 0x1539 0x3

Uh, not cool. Somehow the BIOS update procedure had wiped the NVRAM on the NIC.

A long protracted conversation with ASUS support followed, including such gems as “I understand you’re seeing blue screens” and “Have you tried removing the DIMMs, rubbing the contacts with an eraser and replacing them”. Eventually I think they got to the end of their script, and agreed to RMA the board. Somewhat annoying, given there’s probably a tool somewhere that can rewrite the flash, but Intel only seems to make that available to integrators, not end-users, and the ASUS representatives denied all knowledge.

It was gone for about two weeks, and finally returned yesterday. Its PCI ID is 0x1539 again, and it has its old MAC address once more. (I’m now hesitant to ever upgrade the BIOS on this machine again). So what happened ? Anyone’s guess, but this isn’t the first time I’ve seen this happen. We had a bunch of these NICs at Akamai too that occasionally had the same thing happen to them.

The whole thing is reminiscent of a painful old bug where ftrace would corrupt the e1000e ROM. Hopefully Linux isn’t to blame this time.

So, long story short: If you see an i211 with a PCI ID of 1532, you’re looking at an RMA.

The post The case of the mysterious disappearing I211 appeared first on codemonkey.org.uk.

August 13, 2015 04:09 PM