Kernel Planet

September 21, 2014

Michael Kerrisk (manpages): man-pages-3.73 is released

I've released man-pages-3.73. The release tarball is available on The browsable online pages can be found on The Git repository for man-pages is available on

The most notable changes in man-pages-3.73 are various new and modified pages describing namespaces in general, and user and PID namespaces in detail:

September 21, 2014 11:39 AM

September 16, 2014

Matthew Garrett: ACPI, kernels and contracts with firmware

ACPI is a complicated specification - the latest version is 980 pages long. But that's because it's trying to define something complicated: an entire interface for abstracting away hardware details and making it easier for an unmodified OS to boot diverse platforms.

Inevitably, though, it can't define the full behaviour of an ACPI system. It doesn't explicitly state what should happen if you violate the spec, for instance. Obviously, in a just and fair world, no systems would violate the spec. But in the grim meathook future that we actually inhabit, systems do. We lack the technology to go back in time and retroactively prevent this, and so we're forced to deal with making these systems work.

This ends up being a pain in the neck in the x86 world, but it could be much worse. Way back in 2008 I wrote something about why the Linux kernel reports itself to firmware as "Windows" but refuses to identify itself as Linux. The short version is that "Linux" doesn't actually identify the behaviour of the kernel in a meaningful way. "Linux" doesn't tell you whether the kernel can deal with buffers being passed when the spec says it should be a package. "Linux" doesn't tell you whether the OS knows how to deal with an HPET. "Linux" doesn't tell you whether the OS can reinitialise graphics hardware.

Back then I was writing from the perspective of the firmware changing its behaviour in response to the OS, but it turns out that it's also relevant from the perspective of the OS changing its behaviour in response to the firmware. Windows 8 handles backlights differently to older versions. Firmware that's intended to support Windows 8 may expect this behaviour. If the OS tells the firmware that it's compatible with Windows 8, the OS has to behave compatibly with Windows 8.

In essence, if the firmware asks for Windows 8 support and the OS says yes, the OS is forming a contract with the firmware that it will behave in a specific way. If Windows 8 allows certain spec violations, the OS must permit those violations. If Windows 8 makes certain ACPI calls in a certain order, the OS must make those calls in the same order. Any firmware bug that is triggered by the OS not behaving identically to Windows 8 must be dealt with by modifying the OS to behave like Windows 8.

This sounds horrifying, but it's actually important. The existence of well-defined[1] OS behaviours means that the industry has something to target. Vendors test their hardware against Windows, and because Windows has consistent behaviour within a version[2] the vendors know that their machines won't suddenly stop working after an update. Linux benefits from this because we know that we can make hardware work as long as we're compatible with the Windows behaviour.

That's fine for x86. But remember when I said it could be worse? What if there were a platform that Microsoft weren't targeting? A platform where Linux was the dominant OS? A platform where vendors all test their hardware against Linux and expect it to have a consistent ACPI implementation?

Our even grimmer meathook future welcomes ARM to the ACPI world.

Software development is hard, and firmware development is software development with worse compilers. Firmware is inevitably going to rely on undefined behaviour. It's going to make assumptions about ordering. It's going to mishandle some cases. And it's the operating system's job to handle that. On x86 we know that systems are tested against Windows, and so we simply implement that behaviour. On ARM, we don't have that convenient reference. We are the reference. And that means that systems will end up accidentally depending on Linux-specific behaviour. Which means that if we ever change that behaviour, those systems will break.

So far we've resisted calls for Linux to provide a contract to the firmware in the way that Windows does, simply because there's been no need to - we can just implement the same contract as Windows. How are we going to manage this on ARM? The worst case scenario is that a system is tested against, say, Linux 3.19 and works fine. We make a change in 3.21 that breaks this system, but nobody notices at the time. Another system is tested against 3.21 and works fine. A few months later somebody finally notices that 3.21 broke their system and the change gets reverted, but oh no! Reverting it breaks the other system. What do we do now? The systems aren't telling us which behaviour they expect, so we're left with the prospect of adding machine-specific quirks. This isn't scalable.

Supporting ACPI on ARM means developing a sense of discipline around ACPI development that we simply haven't had so far. If we want to avoid breaking systems we have two options:

1) Commit to never modifying the ACPI behaviour of Linux.
2) Exposing an interface that indicates which well-defined ACPI behaviour a specific kernel implements, and bumping that whenever an incompatible change is made. Backward compatibility paths will be required if firmware only supports an older interface.

(1) is unlikely to be practical, but (2) isn't a great deal easier. Somebody is going to need to take responsibility for tracking ACPI behaviour and incrementing the exported interface whenever it changes, and we need to know who that's going to be before any of these systems start shipping. The alternative is a sea of ARM devices that only run specific kernel versions, which is exactly the scenario that ACPI was supposed to be fixing.

[1] Defined by implementation, not defined by specification
[2] Windows may change behaviour between versions, but always adds a new _OSI string when it does so. It can then modify its behaviour depending on whether the firmware knows about later versions of Windows.

comment count unavailable comments

September 16, 2014 10:51 PM

Andy Grover: Emacs and using multiple C code styles

I primarily work on Linux, so I put this in my Emacs config:

; Linux mode for C
(setq c-default-style
      '((c-mode . "linux") (other . "gnu")))

However, other projects like QEMU have their own style preferences. So here’s what I added to use a different style for that. First, I found the qemu C style defined here. Then, to only use this on some C code, we attach a hook that only overrides the default C style if the filename contains “qemu”, an imperfect but decent-enough test.

(defconst qemu-c-style
  '((indent-tabs-mode . nil)
    (c-basic-offset . 4)
    (tab-width . 8)
    (c-comment-only-line-offset . 0)
    (c-hanging-braces-alist . ((substatement-open before after)))
    (c-offsets-alist . ((statement-block-intro . +)
                        (substatement-open . 0)
                        (label . 0)
                        (statement-cont . +)
                        (innamespace . 0)
                        (inline-open . 0)
    (c-hanging-braces-alist .
                             (block-close . c-snug-do-while)
                             ;; structs have hanging braces on open
                             (class-open . (after))
                             ;; ditto if statements
                             (substatement-open . (after))
                             ;; and no auto newline at the end
  "QEMU C Programming Style")

(c-add-style "qemu" qemu-c-style)

(defun maybe-qemu-style ()
  (when (and buffer-file-name
       (string-match "qemu" buffer-file-name))
    (c-set-style "qemu")))

(add-hook 'c-mode-hook 'maybe-qemu-style)

September 16, 2014 01:16 AM

September 12, 2014

Dave Jones: Trinity threading improvements and misc

Since my blogging tsunami almost a month ago, I’ve been pretty quiet. The reason being that I’ve been heads down working on some new features for trinity which have turned out to be a lot more involved than I initially anticipated.

Trinity does all of its work in child processes continually forked off from a main process. For a long time I’ve had “investigate using pthreads” as a TODO item, but after various conversations at kernel summit, I decided to bump the priority of that up a little, and spend some time looking at it. I initially guessed that it would have take maybe a few weeks to have something usable, but after spending some time working on it, every time I make progress on one issue, it becomes apparent that there’s something else that is also going to need changing.

I’m taking a week off next week to clear my head and hopefully return to this work with fresh eyes, and make more progress, because so far it’s been mostly frustrating, and there may be an easier way to solve some of the problems I’ve been hitting. Sidenote: In the 15+ years I’ve been working on Linux, this is the first time I recall actually ever using pthreads in my own code. I can’t say I’ve been missing out.

Unrelated to that work, a month or so ago I came up with a band-aid fix for a problem where trinity would corrupt its own structures. That ‘fix’ turned out to break the post-mortem work I implemented a few months prior, so I’ve spent some time this week undoing that, and thinking about how I’m going to fix that properly. But before coming up with a fix, I needed to reproduce the problem reliably, and naturally now that I’ve added debug code to determine where the corruption is coming from, the bug has gone into hiding.

I need this vacation.

Trinity threading improvements and misc is a post from:

September 12, 2014 08:14 PM

September 07, 2014

Michael Kerrisk (manpages): man-pages-3.72 is released

I've released man-pages-3.72. The release tarball is available on The browsable online pages can be found on The Git repository for man-pages is available on

This is a small release; the  more notable changes in man-pages-3.72 are the addition of three new pages by Peter Schiffer that document glibc commands used for memory profile and malloc tracing:

September 07, 2014 01:36 PM

September 06, 2014

Pavel Machek: Fraud attempt from DAD GmbH

Got snail mail from DAD GmbH, Postfach 11 35 68, 20435. I should update my business info (which I never gave to them) and by submitting updated info, they would charge me 500 euro (small notice so that you are likely to miss it). I hope they go to jail for this.

September 06, 2014 09:38 PM

September 04, 2014

James Morris: New GPG Key

Just an FYI, I lost my GPG key a few months back during an upgrade, and have created a new one.  This was signed by folk at LinuxCon/KS last month.

The new key ID / fingerprint is: D950053C / 8327 23D0 EF9D D46D 9AC9  C03C AD98 4BBF D950 053C

Please use this key and not the old one!

September 04, 2014 09:38 PM

September 03, 2014

Pavel Machek: Boot shell

Yesterday I got electric shock. Yes, the device was supposed to be turned off by remote-control outlet, but I was still stupid to play with it.

Have you ever played the "press any key to stop autoboot" game, followed by copying boot commands from your notes, because you wanted to keep boot loader in original (early project phases) or final (late project phases) configuration? Have you reached level 2, playing autoboot game over internet?

If so, you may want to take a look at boot shell (bs) from Not Universal Test System project. In ideal case, it knows how to turn off/on the target, break into autoboot, boot your target in development mode, and login as root when user land is ready.

September 03, 2014 09:09 AM

August 29, 2014

Daniel Vetter: Review Training Slides

We currently have a large influx of new people contributing to i915 - for the curious just check the git logs. As part of ramping them up I've done a few trainings about upstream review, and a bunch of people I've talked with at KS in Chicago were interested in that, too. So I've cleaned up the slides a bit and dropped the very few references to Intel internal resources. No speaker notes or video recording, but I think this is useful all in itself. And of course if you have comments or see big gaps - feedback is very much welcome:

Upstream Review Training Slides

August 29, 2014 04:14 PM

August 21, 2014

Michael Kerrisk (manpages): man-pages-3.71 is released

I've released man-pages-3.71. The release tarball is available on The browsable online pages can be found on The Git repository for man-pages is available on

As well as many smaller fixes to various pages, the more notable changes in man-pages-3.71 are the following:

August 21, 2014 01:27 PM

August 19, 2014

Rusty Russell: POLLOUT doesn’t mean write(2) won’t block: Part II

My previous discovery that poll() indicating an fd was writable didn’t mean write() wouldn’t block lead to some interesting discussion on Google+.

It became clear that there is much confusion over read and write; eg. Linus thought read() was like write() whereas I thought (prior to my last post) that write() was like read(). Both wrong…

Both Linux and v6 UNIX always returned from read() once data was available (v6 didn’t have sockets, but they had pipes). POSIX even suggests this:

The value returned may be less than nbyte if the number of bytes left in the file is less than nbyte, if the read() request was interrupted by a signal, or if the file is a pipe or FIFO or special file and has fewer than nbyte bytes immediately available for reading.

But write() is different. Presumably so simple UNIX filters didn’t have to check the return and loop (they’d just die with EPIPE anyway), write() tries hard to write all the data before returning. And that leads to a simple rule.  Quoting Linus:

Sure, you can try to play games by knowing socket buffer sizes and look at pending buffers with SIOCOUTQ etc, and say “ok, I can probably do a write of size X without blocking” even on a blocking file descriptor, but it’s hacky, fragile and wrong.

I’m travelling, so I built an Ubuntu-compatible kernel with a printk() into select() and poll() to see who else was making this mistake on my laptop:

cups-browsed: (1262): fd 5 poll() for write without nonblock
cups-browsed: (1262): fd 6 poll() for write without nonblock
Xorg: (1377): fd 1 select() for write without nonblock
Xorg: (1377): fd 3 select() for write without nonblock
Xorg: (1377): fd 11 select() for write without nonblock

This first one is actually OK; fd 5 is an eventfd (which should never block). But the rest seem to be sockets, and thus probably bugs.

What’s worse, are the Linux select() man page:

       A file descriptor is considered ready if it is possible to
       perform the corresponding I/O operation (e.g., read(2)) without
       ... those in writefds will be watched to see if a write will
       not block...

And poll():

		Writing now will not block.

Man page patches have been submitted…

August 19, 2014 01:57 PM

August 15, 2014

Dave Jones: A breakdown of Linux kernel networking related issues from Coverity scan

For the last of these breakdowns, I’ll focus on fifth place: networking.

Linux supports many different network protocols, so I spent quite a while splitting the net/ tree into per-protocol components. The result looks like this.

Net-802 8
Net-Bluetooth 15
Net-CAIF 9
Net-Core 11
Net-DCCP 5
Net-IRDA 17
Net-NFC 11
Net-SCTP 18
Net-SunRPC 21
Net-Wireless 9
Net-XFRM 6
Net-bridge 14
Net-ipv4 24
Net-ipv6 16
Net-mac80211 12
Net-sched 5
everything else 124

The networking code has gotten noticably better over the last year. When I initially introduced these components they were all well into double figures. Now, even crap like DECNET has gotten better (both users will be very happy).

“Everything else” above is actually a screw-up on my part. For some reason around 50 or so netfilter issues haven’t been categorized into their appropriate component. The remaining ~70 are quite a mix, but nearly all small numbers of issues in many components.Things like 9p, atm, ax25, batman, can, ceph, l2tp, rds, rxrpc, tipc, vmwsock, and x25. The Lovecraftian protocols you only ever read about.

So networking is in pretty good shape considering just how much stuff it supports. While there’s 24 issues in a common protocol like ipv4, they tend to be mostly benign things rather than OMG 24 WAYS THE NSA IS OWNING YOUR LINUX RIGHT NOW.

That’s the last of these breakdowns I’ll do for now. I’ll do this again maybe in six months to a year, if things are dramatically different, but I expect any changes to be minor and incremental rather than anything too surprising.

After I get back from kernel summit and recover from travelling, I’ll start a series of posts showing code examples of the top checkers.

A breakdown of Linux kernel networking related issues from Coverity scan is a post from:

August 15, 2014 09:37 PM

Dave Jones: Breakdown of Linux kernel wireless drivers in Coverity scan

In fourth place on the list of hottest areas of the kernel as seen by Coverity, is drivers/net/wireless.

rtlwifi 96
Atheros 74
brcm80211 67
mwifiex 33
b43 16
iwlwifi 15
everything else 65

I mentioned in my drivers/staging examination that the realtek wifi drivers stood odd as especially problematic. Here we see the same situation. Larry Finger has been working on cleaning up this (and other drivers) for some time, but it apparently still has a long way to go.

It’s worth noting that “Atheros” here is actually a number of drivers (ar5523, ath10k, ath5k, ath6k, ath9k, carl9170, wcn36xx, wil6210). I’ve not had time to break those down into smaller components yet, though a quick look shows that ath9k in particular accounts for a sizable portion of those 74 issues)

I was actually surprised at how low the iwlwifi and b43 counts were. I guess there’s something to be said for ubiquitous hardware.

What of all the ancient wireless drivers ? The junky pcmcia/pccard drivers like orinoco and friends ?
They’re in with those 65 “everything else” bugs, and make up < 5-6 issues each. Considering their age, and lack of any real maintenance these days, they’re in surprisingly good shape.

Just for fun, here’s how the drivers above compare against the wireless drivers currently in staging.

rtl8821 102 (Staging)
rtlwifi 96
Atheros 74
brcm80211 67
rtl8188eu 42 (Staging)
mwifiex 33
rtl8712 22 (Staging)
rtl8192u 21 (Staging)
rtl8192e 17 (Staging)
b43 16
iwlwifi 15
everything else 65

Breakdown of Linux kernel wireless drivers in Coverity scan is a post from:

August 15, 2014 09:12 PM

Dave Jones: A breakdown of Linux kernel filesystem issues in Coverity scans

The filesystem code shows up in the number two position of the list of hottest areas of the kernel. Like the previous post on drivers/scsi, this isn’t because “the filesystem code is terrible”, but more that Linux supports so many filesystems, the accumulative effect of issues present in all of them adds up to a figure that dominates the statistics.

The breakdown looks like this.

fs/*.c 77
9P 3
EXTn 36
GFS2 12
HFSPlus 4
NFS 24
OCFS2 35
Reiserfs 12
UDF 14
XFS 33

fs/*.c accounts for the VFS core, AIO, binfmt parsers, eventfd, epoll, timerfd’s, xattr code and a bunch of assorted miscellany. Little wonder it show up with so high, it’s around 62,000 LOC by itself. Of all the entries on the list, this is perhaps the most concerning area given it affects every filesystem.

A little more concerning perhaps is that btrfs is so high on the list. Btrfs is still seeing a lot of churn each release, so many of these issues come and go, but it seems to be holding roughly at the same rate of new incoming issues each release.

EXTn counts for ext2, ext3, and ext4 combined. Not too bad considering that’s around 74,000 LOC combined. (and another 15K LOC for jbd/jbd2)

The CIFS, NFS and OCFS filesystems stand out as potentially something that might be of concern, especially if those issues are over-the-wire trigger-able.

XFS has been improving over the past year. It was around 60-70 when I started doing regular scans, and continues to move downward each release, with few new issues getting added.

The remaining filesystems: not too shabby. Especially considering some of the niche ones don’t get a lot of attention.

A breakdown of Linux kernel filesystem issues in Coverity scans is a post from:

August 15, 2014 03:40 PM

Dave Jones: A closer look at drivers/scsi Coverity scans.

drivers/scsi showed up in third place in the list of hottest areas of the kernel. Breaking it down into sub-components, it looks like this.

aic7xxx 15
be2iscsi 15
bfa 26
bnx2fc 6
csiostor 10
isci 11
lpfc 38
megaraid 10
mpt2sas 17
mpt3sas 15
pm8001 9
qla2xxx 42
qla4xxx 17
Everything else 152

All these components have been steadily improving over the last year. The obvious stand-out is “Everything else” that looks like it needs to be broken out into more components.
But drivers/scsi is one area of the kernel where we have a *lot* of legacy drivers, many of them 10-15 years old. (Remarkably, some of these are even still in regular use). Looking over the list of filenames matching the “Everything else” component, pretty much every driver that isn’t broken out into its own component is on the list. 3w-9xxx, NCR5380, aacraid, advansys, aic94xx, arcmsr, atp870, bnx2i, cxgbi, dc395x, dpt_i2o, eata, esas2, fdomain, fnic, gdth, hpsa, imm, ipr, ips, mvsas, mvumi, osst, pmcraid, qla1280, qlogicfas, stex, storvsc_drv, sym53x8xx, tmscsim.
None of these are particularly worse than the others, most averaging less than a half dozen issues each.

Ignoring the problems I currently have adding more components, it’s not particularly helpful to break it down further when the result is going to be components with a half dozen issues. It’s not that there’s a few awful drivers dragging down the average, it’s that there’s so many of them, and they all contribute a little bit of awful.

Something I’d like to component-ize, but can’t easily without crafting and maintaining ugly regexps, is the core scsi functionality and its libraries. The problem is that drivers/scsi/*.c includes both legacy drivers, and also scsi core functionality & library functions. I discussed potentially moving all the old drivers to a “legacy” or “vintage” sub-directory at LSF/MM earlier this year with James, but he didn’t seem overly enthusiastic. So it’s going to continue to be lumped in with “Everything else” for now.

The difficulty with figuring out whether many of these issues are real concerns is that because they are hardware drivers, the scanner has no way of knowing what range of valid responses the HBA will return. So there are a number of issues which are of the form “This can’t actually happen, because if the HBA returned this, then we would have called this other function instead”.
Not a problem unique to SCSI, and something that’s seen across many different parts of the kernel.

And for those ancient 15 year old drivers ? It’s tough to find someone who either remembers how they work on a chip level, or cares enough to go back and revisit them.

A closer look at drivers/scsi Coverity scans. is a post from:

August 15, 2014 02:59 PM

Dave Jones: drivers/staging under the Coverity microscope.

In my previous post, I mentioned that drivers/staging took the top spot for number of issues in a component.

Here’s a ‘zoomed in’ look at the sub-components under drivers/staging.

bcm 103
comedi 45
iio 13
line6 7
lustre 133
media 10
rtl8188eu 42
rtl8192e 17
rtl8192u 21
rtl8712 22
rtl8821 102
rts5208 19
unisys 14
vt6655 47
vt6656 4
everything else in drivers/staging/ (40 other uncategorized drivers) 95

Some of the sub-components with < 10 issues are likely to have their categories removed soon. When they were initially added, the open issues counts were higher, but over time they’ve improved to the point where they could just be lumped in with “everything else”

When Lustre was added back in 3.12, it caused a noticable jump in new issues detected. The largest delta from any one single addition since I’ve been doing regular scans. It’s continuing to make progress, with 20 or so issues being knocked out each release, and few new issues being introduced. Lustre doesn’t suffer from any one issue overly, but has a grab-bag of issues from the many checkers that Coverity has.
Amusingly, Lustre is the only part of the kernel that has Coverity annotations in the code.

Second on the list is the bcm Wimax driver. This has been around in staging for years, and has had a metric shitload of checkpatch type stylistic changes made to it, but relatively few actual functionality fixes. (confession: I was guilty of ~30 of those cleanups myself, but I couldn’t bare to look at the 1906 line bcm_char_ioctl function: Splitting that up did have a nice side-effect though). A lot of the issues in this driver are duplicates due to a problem in a macro being picked up as a new issue for every instance it gets used.

Something that sticks out in this list is the cluster of rtl* drivers. At time of writing there are seven drivers for various Realtek wireless chips, all of varying quality. Much of the code between these drivers is cut-and-pasted from previous drivers. It seems each time Realtek rev new silicon, they do another code-drop with a new driver. Worse yet, many of the fixes that went into the kernel variants don’t make it back to the driver they based their new work on. There have been numerous cases where a bug fixed in one driver has been reintroduced in a new variant months later. There’s a ton of work going on here, and a lot more needed.
Somewhat depressingly, even the not-in-staging rtlwifi driver that lives in drivers/net/wireless has ~100 issues. Many of them the exact same issues as those in the staging drivers.

As bad as it seems, staging is serving its purpose for the most part, and things have gotten a lot quieter each merge window when the staging tree gets pulled. It’s only when it contains something new and huge like Lustre that it really shows up noticeably in the daily stats after each scan. The number of new issues being added are generally lower than the number being fixed. For the 3.17 pull for example, 67 new issues, 132 eliminated. (Note: Those numbers are kernel wide, not *just* staging, but staging made up the majority of the results change on that day).

Something that bothers me slightly is that a number of drivers have ‘escaped’ drivers/staging into the kernel proper, with a high number of open issues. That said, many of those escapees are no worse than drivers that were added 10+ years ago when standards were lower. More on that in a future post.

drivers/staging under the Coverity microscope. is a post from:

August 15, 2014 01:50 AM

Dave Jones: Linux kernel Coverity scan ‘hot’ areas.

One of the time-consuming parts of organizing the data generated by Coverity has been sorting it into categories, (or components as Coverity refers to them). A component is a wildcard (or exact filename) that matches a specific subsystem, driver, filesystem etc.

As the Linux kernel has thousands of drivers, it isn’t really practical to add a component per-driver, so I started by generalizing into subsystems, and from there, broke down the larger groupings into per-driver components, while still leaving an “everything else” catch-all for drivers within a subsystem that hadn’t been broken out.

According to discussions I’ve had with Coverity, we are actually one of the more ‘heavy’ users of components, and we’ve hit a few scalability problems as we’ve added more and more of them, which has been another reason I’ve not broken things down more than the ~150 components we have so far. Also, if a component has less than 10 or so issues, it’s really not worth the effort of splitting up. (I may revise that cut-off even higher at some point just to keep things managable).

Before the big reveal, some caveats:

Right now, the top ten ‘hot areas’ of the kernel (these include accumulated broken-out drivers), sorted by number of issues are:

drivers/staging 694
fs/ 465
drivers/scsi/ 382
drivers/net/wireless 366
net/ 324
drivers/ethernet/ 285
drivers/media/ 262
drivers/usb/ 140
drivers/infiniband/ 109
arch/x86/ 95
sound/ 89

It should come as no surprise really that the staging drivers take the number one spot. If something had beaten it, I think it would have highlighted a somewhat embarrassing problem in our development methods.

In the next posts, I’ll drill down into each of these categories, and figure out exactly why they’re at the top of the list.

For the impatient: once this series is over, I intend to show breakdowns of the various types of issues being detected, but it’s going to take me a while to get to (probably post kernel summit). There’s an absolute ton of data to dig through, and I’m trying to present as much of it in bite-sized chunks as possible, rather than dozens of pages of info.

Linux kernel Coverity scan ‘hot’ areas. is a post from:

August 15, 2014 01:34 AM

August 13, 2014

Dave Jones: The first year of Coverity Linux kernel scans.

Next week at kernel summit, I’m going to be speaking about the Coverity scans, and have come up with more material than I have time to present in the short slot, so I’ve decided to turn it into a series of blog posts in a hope to kickstart some discussion ahead of time.

I started doing regular scans against the Linux kernel in July 2013. In that time, I’ve sent a bunch of patches, reported many bugs, and spent hours going through the database categorizing, diagnosing, and closing out issues where possible.

I’ve been doing at least one build per day during each merge window (except obviously on days when there haven’t been any commits), and at least one per -rc once the merge window closes.

A few people have asked me about the config file that I use for the builds.
It’s pretty much an ‘allmodconfig’, except where choices have to be made, I’ve tried to pick the general case that a distribution would select. For some of these, I will occasionally flip between them (for eg, SLAB/SLOB/SLUB, PREEMPT_NONE/PREEMPT_VOLUNTARY/PREEMPT) just for coverage. In total, currently 6955 CONFIG_ options are enabled, 117 disabled. (None by choice, they are all the deselected parts of multi-choice options).

The builds are done x86-64 only. At this time, it’s the only architecture Coverity scan supports. I do have CONFIG_COMPILE_TEST set, so non-x86 drivers that can be built do get scanned. The architecture specific code in arch/ and drivers not covered under COMPILE_TEST being the only parts of the kernel we’re not covering.

Builds take about an hour to build on a 24-core Nehalem. The results are then uploaded to a server which takes another 20 minutes. Then a script kicks something at Coverity to pick up the new tarball and scan it. This can take any number of hours. At best, around 5-6 hours, at worst I’ve seen it take as long as 12 hours. This hopefully answers why I don’t do even more builds, or builds of variant trees. (Although I’m still trying to figure out a way to scan linux-next while having it inherit the results of the issues already marked in Linus tree). Thankfully much of the build/upload/scan process is automated, so I can do other things while I wait for it to finish.

Over the year, the overall defect density has been decreasing.

3.11 0.68
3.12 0.62
3.13 0.59
3.14 0.55
3.15 0.55
3.16 0.53

Moving in the right direction, though things have slowed a little the last few releases. At least in part due to my spending more time on Trinity than going through the Coverity backlog. The good news is that the incoming rate of new bugs each window has also slowed.

Newer issues when they are getting introduced, are getting jumped on faster than before. Many developers have signed up for accounts and are looking over their subsystems each release, which is great. It means I have to spend less time sending email :)
Eventually I hope that Coverity implements a feature I asked for allowing each component to have a designated email address that new reports get sent to. With that in place, plus active triage on the backlog, a real dent could be made in the ~4700 outstanding issues.

Throughout the past year Coverity has made a number of improvements server-side, some at the behest of the scans, resulting in fewer false positives being found by some checkers. A good example of this was some additional heuristics being added to spot intentional ‘missing break in switch statement’ situations. I’ve also been in constant communication whenever an interesting bug was found upstream that Coverity didn’t detect, so over time, additional checkers should be added to catch more bugs.

How do we compare against other projects ?
I picked a few at random.

FreeBSD 0.54 (~15m LOC) 14655 total, 6446 fixed, 8093 outstanding.
Firefox 0.70 (~5.4m LOC) 9008 total. 5066 fixed. 3786 outstanding.
Linux 0.53 (~9m LOC) 13337 total. 7202 fixed. 4761 outstanding.
Python 0.03 ! (~400k LOC) 1030 total. 895 fixed. 3 outstanding.

(LOC based on C preprocessor output)

FreeBSD’s defect density is pretty much the same as Linux right now, despite having a lot more code. I think they include all their userspace in their scans also, so it’s picked up gcc, sendmail, binutils etc etc.

The Python people have made a big effort to keep their defect density low (afaik, the lowest of all projects in scan). They did however have a lot fewer issues to begin with, and have a much smaller codebase. Firefox by comparison seems to have a lot of the same problems Linux has. A large corpus of pre-existing issues, and a large codebase (probably with few people with ‘global’ knowledge)

In my next post, I’ll go into some detail about where some of the more dense areas of the kernel are for Coverity issues. Much of it should be no surprise (old, unmaintained/neglected code etc), but there are a few interesting cases).

update : added FreeBSD statistics.
update 2 : (hi hackernews!) added blurb about coverity improvements.

The first year of Coverity Linux kernel scans. is a post from:

August 13, 2014 07:07 PM

Paul E. Mc Kenney: A practitioner at a formal-methods conference

I had the privilege of being asked to present on ordering, RCU, and validation at a joint meeting of the REORDER (Third International Workshop on Memory Consistency Models) and EC2 (7th International Workshop on Exploiting Concurrency Efficiently and Correctly) workshops.

Before my talk, Michael Tautschnig gave a presentation (based on this paper) on an interesting prototype tool (called “mole,” with the name chosen because the gestation period of a mole is about 42 days) that helps identify patterns of usage in large code bases. It is early days for this tool, but one could imagine it growing into something quite useful, perhaps answering questions such as “what are the different ways in which the Linux kernel uses reference counting?” He also took care to call out the many disadvantages of testing, which include not being able to test all paths, all possible races, all possible inputs, or all possible much of anything, at least not in finite time.

I started my talk with an overview of split counters, where each CPU (or task or whatever) updates its own counter, and the aggregate counter is read out by summing all the per-CPU counters. There was some concern expressed by members of the audience about the possibility of inconsistent results. For example, if one CPU adds five, another CPU adds seven, and a third CPU adds 11 to initially zero counter, then two CPUs reading out the counter might see 12 and 18, respectively, which are inconsistent (they differ by six, and no CPU added six). To their credit, the attendees picked right up on a reasonable correctness criterion. The idea is that the aggregate counter's value varies with time, and that any given reader will be guaranteed to return a value between that of the counter when the reader started and that of the counter when the reader ended: Consistency is neither needed nor provided in a great number of situations.

I then gave my usual introduction to RCU, and of course it is always quite a bit of fun introducing RCU to people who have never encountered anything like it. There was quite a bit of skepticism initially, as well as a lot of questions and comments.

I then turned to validation, noting the promise of some formal-validation tooling. I ended by saying that although I agreed with the limitations of testing called out by the previous speaker, the fact remains that a number of people have devised tests that had found RCU bugs (thank you, Stephen, Dave, and Fengguang!), but no one has yet devised a hard-core formal-validation tool that has found any bugs in RCU. I also pointed out that this is definitely not because there are no bugs in RCU! (Yes, I have gotten rid of all day-one bugs in RCU, but only by having also gotten rid of all day-one code in RCU.) When asked if I meant bugs in RCU usage or in RCU itself, I replied “Either would be good.” Several people wrote down where to find RCU in the Linux kernel, so it will be interesting to see what they come up with. (Perhaps all too interesting!)

There were several talks on analyzing weakly ordered systems, but keep in mind that for these guys, even x86 is weakly ordered. After all, it allows prior stores to be reordered with later loads.

Another interesting talk was given by Kapil Vaswani on the topic of wait freedom. Recall that in a wait-free algorithm, every process is guaranteed to make some progress in a finite time, even in the presence of arbitrarily long delays for any given process. In contrast, in a lock-free algorithm, only one process is guaranteed to make some progress in a finite time, again, even in the presence of arbitrarily long delays for any given process. It is worth noting that neither of these guarantees is sufficient for real-time programs, which require a specified amount of progress (not merely some progress) in a bounded amount of time (not merely a finite amount of time). Wait-freedom and lock-freedom are nevertheless important forward-progress guarantees, and there are numerous other similar guarantees including obstruction freedom, deadlock freedom, starvation freedom, many more besides.

It turns out that most software in production, even in real-time systems, is not wait-free, which has been a source of consternation for many researchers for quite some time. Kapil went on to describe how Alistarh et al. showed that, roughly speaking, given a non-hostile scheduler and crash-free execution, lock-free algorithms have wait-free behavior.

The interesting thing about this is that you can take it quite a bit farther, and those of you who know me well won't be surprised to learn that I did just that in a question to the speaker. If you have a non-hostile scheduler, crash-free execution, FIFO locks, bounded lock-hold times, no lock nesting, a finite number of processes, and so on, you can obtain the benefits of the more aggressive forward-progress guarantees. The general idea is that if you have at most N processes, and if the maximum lock-hold time is T, then you can wait at most (N-1)T time to acquire a given lock. (Those wishing greater rigor should read Bjoern B. Brandenburg's dissertation — Full disclosure: I was on Bjoern's committee.) In happy contrast to the authors of the paper mentioned in the previous paragraph, the speaker and audience seemed quite intrigued by this line of thought.

In all, it was an extremely interesting and thought-provoking time. With some luck, perhaps we will see some powerful software tools introduced by this group of researchers.

August 13, 2014 03:40 AM

August 10, 2014

Matthew Garrett: Birthplace

For tedious reasons, I will at this stage point out that I was born in Galway, Ireland.

comment count unavailable comments

August 10, 2014 11:44 PM

August 08, 2014

Dave Jones: Week of kernel bugs in review

With the 3.17 merge window opening up this week, it’s been kinda busy.
I also made a few enhancements to Trinity, so it found some bugs that have been there for a while.

In addition to this, I started pulling together a talk for kernel summit based on all the stuff that Coverity has been finding. I’ll eventually get around to turning those into blog posts too, as there’s a lot of material.

Productive week.

Week of kernel bugs in review is a post from:

August 08, 2014 07:36 PM

Dave Jones: compiler sanitizers.

I only recently discovered the sanitizer libraries that both gcc and llvm support despite them being a few years old now. (libasan, liblsan, libtsan and my favorite libubsan for undefined behaviour detection). LLVM also has a -fsanitize=memory.

Building code with -fsanitize={address|leak|undefined} has turned up a number of hard to find issues in various userspace code I’ve written. (Unfortunately doing this on something like Trinity produces a lot of false positives, as it deliberately generates undefined behavior in many cases, like creating an mmap, never writing to it, and then passing it to something that reads it).

There’s also a variant of libasan for the kernel which looks interesting. I know that’s found a bunch of issues in concert with fuzzing via Trinity, and expect it’s something we’ll see more of if/when that functionality gets merged.

Today I was reading about the recent gcc meeting, and these slides by the sanitizer developers caught my attention. What I found of particular interest was the “MSan for Chromium” slide, where they mention they rebuilt ~40 libraries to link with the sanitizer.

I’ve been contemplating doing this for a subset of some userspace packages in Fedora that I care about for a while, but I’ve not had spare cycles to even look into it. I dogfood a lot of bleeding edge code on all my machines, and have been curious for some time to see what the fallout looks like from such a rebuild of various network facing daemons. I suspect with Chromium being more focused on the client side, there hasn’t been a huge amount of research into this for server side code. Looking at ASan’s found bugs wiki page, it does seem to support that hypothesis. I’m curious to see what would fall out from a rebuilt Apache, Bind, Sendmail, nginx, etc.
Hopefully the developers of all the network facing code we ship are just as curious.

There are obvious comparisons to valgrind, which doesn’t require rebuilding, but in my experience so far, the sanitizers have found a bunch of issues that valgrind didn’t (or got lost in the noise). Also, just like with fuzzers, different tools tend to find different bugs even if they have the same intent. I think there’s room for both approaches.

compiler sanitizers. is a post from:

August 08, 2014 06:54 PM

August 07, 2014

Daniel Vetter: Neat stuff for 3.17

So with the 3.16 kernel out of the door it's time to look at what's queued up for the Intel graphics driver in 3.17.

This release features the universal plane support from Matt Roper, all enabled already by default. This is prep work for atomic modesetting and pageflipping support: Since a while we support additional (overlay) planes in the DRM core and the i915 driver, but there have always been two implicit planes directly attached to the CRTC: The primary plane used by the SetCrtc and PageFlip functions, and the optional cursor support. But with the atomic ioctl these implicit planes it's easier to handle everything as an explicit plane, so Matt's patches split them away into separate real plane objects. This is a nice cleanup of the kms api in general since a lot of SoC hardware has unified plane hardware, where cursor, primary plane and any overlays are fully interchangeable. So we already expose this to userspace, if it sets the corresponding feature flag.

Another big feature on the display side is the improved PSR support, which is now enabled by default on Haswell and Broadwell. The tricky bit with PSR (and also with FBC) and the reason we didn't yet enable this by default is correctly support legacy frontbuffer rendering (for example for X). The hardware provides a bit of support to do that, but it doesn't catch all possible frontbuffer rendering and has a lot of other limitations. To finally fix this for real we've added accurate frontbuffer tracking in software. This should finally allow us to enable a lot of display power saving features by default like PSR on Baytrail, FBC (on all platforms) and DRRS (dynamic refresh rate switching).

On actual platform display enabling we have lots of improvements all over: Baytrail MIPI DSI support has greatly stabilized, backlight and power sequencer fixes, mmio based flips to work around issues with stalls and hangs for blitter ring based flips and plenty of other work. The core drm pieces for plane rotation support have also landed, unfortunately the i915 parts didn't make the cut for 3.17.

Another big area, as usual, has been general power management improvements. We now support runtime PM for DPMS Off and not just when the output is completely disabled. This was fairly invasive work since our current modesetting code assumed that a DPMS Off/On cycle will not destroy register state, but that's exactly what runtime PM can do. On the plus side this reorganization greatly cleaned up the code base and prepared the driver for atomic modesetting, which requires a similar separation between state computation and actual hw state updating like this feature.

Jesse Barnes implemented S0ix support for system suspend/resume. Marketing has some crazy descriptions for this, but essentially this means that we use the same power saving knobs for system suspend as for runtime PM - the entire machine is still running, just at a very low power state. Long-term this should simplify our system suspend code a bit since we can just reuse all the code used to implement runtime PM.

Moving on to the render side of the gpu there have been again improvements to the rps code. Chris Wilson further tuned the rps boost logic, and Ville and Deepak implemented rps support for Cherrytrail.
Jesse contributed ppgtt support for Baytrail which will be a lot more interesting once we enable full ppgtt again (hopefully in 3.18).

For Broadwell semaphores support from Ben and Rodrigo was merged, but it looks like we need to disable that again due to stability issues. Oscar Mateo also implemented a large pile of interrupt handling improvements which hopefully address the small races and bugs we've had in the past on some platforms. There's also a lot of refactoring patches to prepare for execlist support from Oscar. Excelists are the new way of submitting work to the gpu, first supported on Broadwell (but not yet mandatory). The key feature compared to legacy ringbuffer submission is that we'll finally be able to preempt gpu tasks.

And as usual there have been tons of bugsfixes and improvements all over. Oh and: User mode setting has moved one step further on the path to deprecation and is now fully disabled. If no one complains about this we can finally rip out all that code in one of the next kernel releases.

August 07, 2014 03:36 PM

August 05, 2014

Dave Jones: Linux 3.16 coverity stats

date rev Outstanding fixed defect density
Jun/8/2014 v3.15 4928 6397 0.55
Jun/16/2014 v3.16-rc1 4817 6651 0.53
Jun/23/2014 v3.16-rc2 4815 6653 0.53
Jun/29/2014 v3.16-rc3 4810 6659 0.53
Jul/6/2014 v3.16-rc4 4806 6661 0.53
Jul/14/2014 v3.16-rc5 4801 6663 0.53
Jul/21/2014 v3.16-rc6 4827 7022 0.53
Jul/28/2014 v3.16-rc7 4820 7022 0.53
Aug/4/2014 v3.16 4817 7023 0.53

The 3.16 cycle really started putting a dent in the backlog of older issues. Hundreds of older issues got fixed in -rc1.
There was a small bump at rc5 in new issues being detected, when Coverity upgraded to their 7.5.0 release.
Improvements in that upgrade also meant it closed out more issues than it found new (395 new: 409 eliminated)

Many of the new issues detected look to be real problems. 50 or so of them come from a new checker that looks for patterns like

if (condition)

In a lot of drivers however, it seems to be intentional, as these cases come with FIXME comments suggesting that the author doesn’t know what the right thing to do is in the ‘else’ case, or some functionality doesn’t work right yet, so it falls back to doing the same thing in both branches.

It’s now been a year since I first started doing regular builds in Coverity. In that time, the detected defect density has dropped from 0.68 to 0.53 today. We used to see upticks in new issues every time the merge window opened. Now, we’re seeing as many as (or more) issues closed as we are seeing new. As an example: day 1 of the 3.17 merge window yesterday featured 3638 new changes, including all the questionable code in drivers/staging/ Coverity picked up 67 new issues, but 132 got eliminated).

I’m hoping things continue to improve at this rate.

Linux 3.16 coverity stats is a post from:

August 05, 2014 04:27 PM

Dave Jones: Linux 3.15 coverity stats

date rev Outstanding fixed defect density
Mar/31/2014 v3.14 4811 6126 0.55
Apr/14/2014 v3.15-rc1 4909 6337 0.55
Apr/15/2014 v3.15-rc2 4881 6369 0.55
Apr/21/2014 v3.15-rc3 4878 6375 0.55
Apr/28/2014 v3.15-rc4 4966 6382 0.56
May/9/2014 v3.15-rc5 4960 6389 0.56
May/22/2014 v3.15-rc6 4956 6390 0.56
May/27/2014 v3.15-rc7 4954 6392 0.56
Jun/2/2014 v3.15-rc8 4932 6393 0.55
Jun/8/2014 v3.15 4928 6397 0.55

A belated dump of the statistics coverity gathers on defect density for the 3.15 kernel.
The most interesting thing for this cycle was the bump around rc4. The number of outstanding bugs increased by almost a hundred new defects. This was due to Coverity implementing a new checker for detecting the heartbleed bug in openssl.

After dismissing a bunch of false positives/intentional cases, we ended the cycle with a delta vs the previous release of just over a hundred new outstanding issues, but the overall defect density remained at 0.55.

Linux 3.15 coverity stats is a post from:

August 05, 2014 04:05 PM

Dave Jones: Vacation over, back to oopses.

First day back after a week off.
Started the day by screwing up a coverity build, because went away. Fixed up my scripts and started over, running the analysis on 3.16 which came out yesterday. I’ll do another report tomorrow on how things have changed over the last release or two (sudden realization that I haven’t done one since 3.14).

Spent some time in the afternoon beginning some new functionality in trinity, to have some shared files that all threads write/seek/truncate etc to. It didn’t take much code at all before I was staring at my first oops, when btrfs blew up. Pretty sad, given I’m not even really started doing anything particularly interesting with this code yet.

Vacation over, back to oopses. is a post from:

August 05, 2014 03:15 AM

August 02, 2014

Rusty Russell: ccan/io: revisited

There are numerous C async I/O libraries; tevent being the one I’m most familiar with.  Yet, tevent has a very wide API, and programs using it inevitably descend into “callback hell”.  So I wrote ccan/io.

The idea is that each I/O callback returns a “struct io_plan” which says what I/O to do next, and what callback to call.  Examples are “io_read(buf, len, next, next_arg)” to read a fixed number of bytes, and “io_read_partial(buf, lenp, next, next_arg)” to perform a single read.  You could also write your own, such as pettycoin’s “io_read_packet()” which read a length then allocated and read in the rest of the packet.

This should enable a convenient debug mode: you turn each io_read() etc. into synchronous operations and now you have a nice callchain showing what happened to a file descriptor.  In practice, however, debug was painful to use and a frequent source of bugs inside ccan/io, so I never used it for debugging.

And I became less happy when I used it in anger for pettycoin, but at some point you’ve got to stop procrastinating and start producing code, so I left it alone.

Now I’ve revisited it.   820 insertions(+), 1042 deletions(-) and the code is significantly less hairy, and the API a little simpler.  In particular, writing the normal “read-then-write” loops is still very nice, while doing full duplex I/O is possible, but more complex.  Let’s see if I’m still happy once I’ve merged it into pettycoin…

August 02, 2014 06:58 AM

Matt Domsch: Ottawa Linux Symposium needs your help

If you have ever attended the Ottawa Linux Symposium (OLS), read a paper on a technology first publicly suggested at OLS, or use Linux today, please consider donating to help the conference and Andrew Hutton, the conference’s principal organizer since 1999.

I first attended OLS in the summer of 2003. I had heard of this mythical conference in Canada each summer, a long way from Austin yet still considered domestic rather than international for the purposes of business travel authorization, so getting approval to attend wasn’t so hard. I met Val on the walk from Les Suites to the conference center on the first morning, James Bottomley during a storage subsystem breakout the first afternoon, Jon Masters while still in his manic coffee phase, and countless others that first year. Willie organized the bicycle-chain keysigning that helped people put faces to names we only knew via LKML posts. I remember meeting Andrew in the ever-present hallway track, and somehow wound up on the program committee for the following year and the next several.

I went on to submit papers in 2004 (DKMS), 2006 (Firmware Tools), 2008 (MirrorManager). Getting a paper accepted meant great exposure for your projects (these three are still in use today). It also meant an invitation to my first exposure to the party-within-the-party – the excellent speaker events that Andrew organized as a thank-you to the speakers. Scotch-tastings with a haggis celebrated by Stephen Tweedie. A cruise on the Ottawa River. An evening in a cold war fallout shelter reserved for Parliament officials with the most excellent Scotch that only Mark Shuttleworth could bring. These were always a special treat which I always looked forward to.

Andrew, and all the good people who helped organize OLS each year, put on quite a show, being intentional about building the community – not by numbers (though for quite a while, attendance grew and grew) – but providing space to build deep personal connections that are so critical to the open source development model. It’s much harder to be angry about someone rejecting your patches when you’ve met them face to face, and rather than think it’s out of spite, understand the context behind their decisions, and how you can better work within that context. I first met many of the Linux developers face-to-face at OLS that became my colleagues for the last 15 years.

I haven’t been able to attend for the last few years, but always enjoyed the conference, the hallway talks, the speaker parties, and the intentional community-building that OLS represents.

Several economic changes conspired to put OLS into the financial bind it is today. You can read Andrew’s take about it on the Indiegogo site. I think the problems started before the temporary move to Montreal. In OLS’s growth years, the Kernel Summit was co-located, and preceded OLS. After several years with this arrangement, the Kernel Summit members decided that OLS was getting too big, that the week got really really long (2 days of KS plus 4 days of OLS), and that everyone had been to Ottawa enough times that it was time to move the meetings around. Cambridge, UK would be the next KS venue (and a fine venue it was). But in moving KS away, some of the gravitational attraction of so many kernel developers left OLS as well.

The second problem came in moving the Ottawa Linux Symposium to Montreal for a year. This was necessary, as the conference facility in Ottawa was being remodeled (really, rebuilt from the ground up), which prevented it from being held there. This move took even more of the wind out of the sails. I wasn’t able to attend the Montreal symposium, nor since, but as I understand it, attendance has been on the decline ever since. Andrew’s perseverance has kept the conference alive, albeit smaller, at a staggering personal cost.

Whether or not the conference happens in 2015 remains to be seen. Regardless, I’ve made a donation to support the debt relief, in gratitude for the connections that OLS forged for me in the Linux community. If OLS has had an impact in your career, your friendships, please make a donation yourself to help both Andrew, and the conference.

Visit the OLS Indigogo site to show your respect.

August 02, 2014 03:52 AM

July 30, 2014

Pavel Machek: Friends don't let friends freeze their hard drives

Hour and 15 minutes later, platters look really frozen... and heads are leaving watery trails on the harddrive, that clicks. Ok, this is not looking good.

Should not have let it run with water on board -- outside tracks are physically destroyed.

Next candidate: WD Caviar, WD200, 20GB.

This one is actually pretty impressive. It clearly has place for four (or so) platters, and there's only one populated. And this one actually requires cover for operation, otherwise it produces "interesting sounds" (and no data).

It went to refrigerator for few hours but then I let it thaw before continuing operation. Disk still works with few bad sectors. I overwrote the disk with zeros, and that recovered the bad sectors.

Did fingerprint on the surface. Bad idea, that killed the disk.

Ok, so we have two information from advertising confirmed: freezing can and will kill the disk, and some hard drives need their screws for operation.

July 30, 2014 04:33 PM

Pavel Machek: I have seen the future

...and did not like what I saw. I installed Debian/testing. Now I know why everyone hates systemd: it turned minor error (missing firmware for wlan card) into message storm (of increasing speed) followed by forkbomb. Only OOM stopped the madness.

Now, I've seen Gnome3 before, and it is unusable -- at least on X60 hardware. So I went directly into Mate, hoping to see friendly Gnome2-like desktop. Well, it look familiar but slightly different. After a while I discovered I'm actually in Xfce. So log-out, log-in, and yes, this looks slightly more familiar. Unfortunately, theme is still different, window buttons are smaller and Terminal's no longer can be resized using lower-right corner. I also tried to restore my settings (cp -a /oldhome/.[a-z]* .) and it did not have the desired effect.

July 30, 2014 04:27 PM

Dave Airlie: you have a long road to walk, but first you have to leave the house

or why publishing code is STEP ZERO.

If you've been developing code internally for a kernel contribution, you've probably got a lot of reasons not to default to working in the open from the start, you probably don't work for Red Hat or other companies with default to open policies, or perhaps you are scared of the scary kernel community, and want to present a polished gem.

If your company is a pain with legal reviews etc, you have probably spent/wasted months of engineering time on internal reviews and stuff, so think all of this matters later, because why wouldn't it, you just spent (wasted) a lot of time on it, so it must matter.

So you have your polished codebase, why wouldn't those kernel maintainers love to merge it.

Then you publish the source code.

Oh, look you just left your house. The merging of your code is many many miles distant and you just started walking that road, just now, not when you started writing it, not when you started legal review, not when you rewrote it internally the 4th time. You just did it this moment.

You might have to rewrite it externally 6 times, you might never get it merged, it might be something your competitors are also working on, and the kernel maintainers would rather you cooperated with people your management would lose their minds over, that is the kernel development process.

step zero: publish the code. leave the house.

(lately I've been seeing this problem more and more, so I decided to write it up, and it really isn't directed at anyone in particular, I think a lot of vendors are guilty of this).

July 30, 2014 01:47 AM

July 29, 2014

Rusty Russell: Pettycoin Alpha01 Tagged

As all software, it took longer than I expected, but today I tagged the first version of pettycoin.  Now, lots more polish and features, but at least there’s something more than the git repo for others to look at!

July 29, 2014 07:53 AM

July 24, 2014

Pavel Machek: Nowcasting for whole Europe, international travel script

CHMI changed their webpages, so that no longer worked, so I had to adapt nowcast. My first idea was to use, that has nice coverage of whole europe, but pictures are too big and interpolated... and handling them takes time. So I updated it once more, now it supports new format of chmi pictures. But it also means that if you are within EU and want to play with weather nowcasting, you now can... just be warned it is sligtly slow... but very useful, especially in rainy/stormy weather these days.

Now, I don't know about you, but I always forget something when travelling internationally. Like.. power converters, or the fact that target is in different time zone. Is there some tool to warn you about differences between home and target countries? (I'd prefer it offline, for privacy reasons, but...) I started country script, with some data from wikipedia, but it is quite incomplete and would need a lot of help.

July 24, 2014 01:00 PM

Pavel Machek: More fun with spinning rust

So I took an old 4GB (IBM) drive for a test. Oops, it sounds wrong while spinning up. Perhaps I need to use two usb cables to get enough power?

Lets take 60GB drive... that one works well. Back to 4GB one. Bad, clicking sounds.

IBM actually used two different kinds of screws, so I can not non-destructively open this one... and they actually made platters out of glass. Noone is going to recover data from this one... and I have about 1000 little pieces of glass to collect.

Next candidate: Seagate Barracuda ATA III ST320414A, 20GB.

Nice, cca 17MB/sec transfer, disk is now full of photos. Data recovery firms say that screw torque matters. I made all of them very loose, then removed them altogether, then found the second hidden screw and then ran the drive open. It worked ok.

Air filter is not actually secured in any way, and I guess I touched the platters with the cover while opening. Interestingly, these heads do not stick to surface, even when manually moved.

Friends do not let friends freeze their hard drives, but this one went into two plastic back and into refrigerator. Have you noticed how the data-recovery firms placed the drive there without humidity protection?

So, any bets if it will be operational after I remove it from the freezer?

July 24, 2014 12:51 PM

July 23, 2014

Michael Kerrisk (manpages): Linux/UNIX System Programming course scheduled for October 2014

I've scheduled a further 5-day Linux/UNIX System Programming course to take place in Munich, Germany, for the week of 6-10 October 2014.

The course is intended for programmers developing system-level, embedded, or network applications for Linux and UNIX systems, or programmers porting such applications from other operating systems (e.g., Windows) to Linux or UNIX. The course is based on my book, The Linux Programming Interface (TLPI), and covers topics such as low-level file I/O; signals and timers; creating processes and executing programs; POSIX threads programming; interprocess communication (pipes, FIFOs, message queues, semaphores, shared memory),  network programming (sockets), and server design.
The course has a lecture+lab format, and devotes substantial time to working on some carefully chosen programming exercises that put the "theory" into practice. Students receive a copy of TLPI, along with a 600-page course book containing the more than 1000 slides that are used in the course. A reading knowledge of C is assumed; no previous system programming experience is needed.

Some useful links for anyone interested in the course:

Questions about the course? Email me via

July 23, 2014 05:49 AM

July 17, 2014

Rusty Russell: API Bug of the Week: getsockname().

A “non-blocking” IPv6 connect() call was in fact, blocking.  Tracking that down made me realize the IPv6 address was mostly random garbage, which was caused by this function:

bool get_fd_addr(int fd, struct protocol_net_address *addr)
   union {
      struct sockaddr sa;
      struct sockaddr_in in;
      struct sockaddr_in6 in6;
   } u;
   socklen_t len = sizeof(len);
   if (getsockname(fd, &, &len) != 0)
      return false;

The bug: “sizeof(len)” should be “sizeof(u)”.  But when presented with a too-short length, getsockname() truncates, and otherwise “succeeds”; you have to check the resulting len value to see what you should have passed.

Obviously an error return would be better here, but the writable len arg is pretty useless: I don’t know of any callers who check the length return and do anything useful with it.  Provide getsocklen() for those who do care, and have getsockname() take a size_t as its third arg.

Oh, and the blocking?  That was because I was calling “fcntl(fd, F_SETFD, …)” instead of “F_SETFL”!

July 17, 2014 03:31 AM

July 16, 2014

Dave Jones: Closure on some old bugs.

Closure on some old bugs. is a post from:

July 16, 2014 03:10 AM

July 15, 2014

James Morris: Linux Security Summit 2014 Schedule Published

The schedule for the 2014 Linux Security Summit (LSS2014) is now published.

The event will be held over two days (18th & 19th August), starting with James Bottomley as the keynote speaker.  The keynote will be followed by referred talks, group discussions, kernel security subsystem updates, and break-out sessions.

The refereed talks are:

Discussion session topics include Trusted Kernel Lock-down Patch Series, led by Kees Cook; and EXT4 Encryption, led by Michael Halcrow & Ted Ts’o.   There’ll be kernel security subsystem updates from the SELinux, AppArmor, Smack, and Integrity maintainers.  The break-out sessions are open format and a good opportunity to collaborate face-to-face on outstanding or emerging issues.

See the schedule for more details.

LSS2014 is open to all registered attendees of LinuxCon.  Note that discounted registration is available until the 18th of July (end of this week).

See you in Chicago!

July 15, 2014 11:00 PM

Dave Jones: catch-up after a brief hiatus.

Yikes, almost a month since I last posted.
In that time, I’ve spent pretty much all my time heads down chasing memory corruption bugs in Trinity, and whacking a bunch of smaller issues as I came across them. Some of the bugs I’ve been chasing have taken a while to reproduce, so I’ve deliberately held off on changing too much at once this last few weeks, choosing instead to dribble changes in a few commits at a time, just to be sure things weren’t actually getting worse. Every time I thought I’d finally killed the last bug, I’d do another run for a few hours, and then see the same corrupted structures. Incredibly frustrating. After a process of elimination (I found a hundred places where the bug wasn’t), I think I’ve finally zeroed in on the problematic code, in the functions that generate random filenames.
I pretty much gutted that code today, which should remove both the bug, and a bunch of unnecessary operations that never found any kernel bugs anyway. I’m glad I spent the time to chase this down, because the next bunch of features I plan to implement leverage this code quite heavily, and would have only caused even more headache later on.

The one plus side of chasing this bug the last month or so has been all the added debugging code I’ve come up with. Some of it has been committed for re-use later, while some of the more intrusive debug features (like storing backtraces for every locking operation) I opted not to commit, but will keep the diff around in case it comes in handy again sometime.

Spent the afternoon clearing out my working tree by committing all the clean-up patches I’ve done while doing this work. Some of them were a little tangled and needed separating into multiple commits). Next, removing some lingering code that hasn’t really done anything useful for a while.

I’ve been hesitant to start on the newer features until things calmed down, but that should hopefully be pretty close.

catch-up after a brief hiatus. is a post from:

July 15, 2014 03:26 PM

July 14, 2014

Pavel Machek: Fun with spinning rust

Got a hard drive that would not spin up, to attempt recovery. Getting necessary screwdrivers was not easy, but eventually I managed to open the drive. (After hitting it few times in an attempt to unstick the heads). Now, this tutorial does not sound that bad, and yes, I managed to un-stick the heads. Drive now spins up... and keeps seeking, not getting ready. I tried to run the drive open, and heads only go to the near half of platters... I assume something is wrong there? I tried various torques on the screws as some advertising video suggested.

(Also, drives immediately stick to the platters when I move them manually. I guess that's normal?)

Drive is now in the freezer, and probably beyond repair... but if you have some ideas, or some fun uses for dead hard drive, I guess I can try them. Data on the disk are not important enough to do platter-transplantation.

July 14, 2014 10:00 PM

July 09, 2014

Michael Kerrisk (manpages): man-pages-3.70 is released

I've released man-pages-3.70. The release tarball is available on The browsable online pages can be found on The Git repository for man-pages is available on

This is a relatively small release. As well as many smaller fixes to various pages, the more notable changes in man-pages-3.70 are the following:

July 09, 2014 11:24 AM

July 04, 2014

Matthew Garrett: Self-signing custom Android ROMs

The security model on the Google Nexus devices is pretty straightforward. The OS is (nominally) secure and prevents anything from accessing the raw MTD devices. The bootloader will only allow the user to write to partitions if it's unlocked. The recovery image will only permit you to install images that are signed with a trusted key. In combination, these facts mean that it's impossible for an attacker to modify the OS image without unlocking the bootloader[1], and unlocking the bootloader wipes all your data. You'll probably notice that.

The problem comes when you want to run something other than the stock Google images. Step number one for basically all of these is "Unlock your bootloader", which is fair enough. Step number two is "Install a new recovery image", which is also reasonable in that the key database is stored in the recovery image and so there's no way to update it without doing so. Except, unfortunately, basically every third party Android image is either unsigned or is signed with the (publicly available) Android test keys, so this new recovery image will flash anything. Feel free to relock your bootloader - the recovery image will still happily overwrite your OS.

This is unfortunate. Even if you've encrypted your phone, anyone with physical access can simply reboot into recovery and reflash /system with something that'll stash your encryption key and mail your data to the NSA. Surely there's a better way of doing this?

Thankfully, there is. Kind of. It's annoying and involves a bunch of manual processes and you'll need to re-sign every update yourself. But it is possible to configure Nexus devices in such a way that you retain the same level of security you had when you were using the Google keys without losing the freedom to run whatever you want. Here's how.

Note: This is not straightforward. If you're not an experienced developer, you shouldn't attempt this. I'm documenting this so people can create more user-friendly approaches.

First: Unlock your bootloader. /data will be wiped.
Second: Get a copy of the stock recovery.img for your device. You can get it from the factory images available here
Third: Grab mkbootimg from here and build it. Run unpackbootimg against recovery.img.
Fourth: Generate some keys. Get this script and run it.
Fifth: zcat recovery.img-ramdisk.gz | cpio -id to extract your recovery image ramdisk. Do this in an otherwise empty directory.
Sixth: Get from here and run it against the .x509.pem file generated in step 4. Replace /res/keys from the recover image ramdisk with the output. Include the "v2" bit at the beginning.
Seventh: Repack the ramdisk image (find . | cpio -o -H newc | gzip > ../recovery.img-ramdisk.gz) and rebuild recovery.img with mkbootimg.
Eighth: Write the new recovery image to your device
Ninth: Get signapk from here and build it. Run it against the ROM you want to sign, using the keys you generated earlier. Make sure you use the -w option to sign the whole zip rather than signing individual files.
Tenth: Relock your bootloader
Eleventh: Boot into recovery mode and sideload your newly signed image.

At this point you'll want to set a reasonable security policy on the image (eg, if it grants root access, ensure that it requires a PIN or something), but otherwise you're set - the recovery image can't be overwritten without unlocking the bootloader and wiping all your data, and the recovery image will only write images that are signed with your key. For obvious reasons, keep the key safe.

This, well. It's obviously an excessively convoluted workflow. A *lot* of it could be avoided by providing a standardised mechanism for key management. One approach would be to add a new fastboot command for modifying the key database, and only permit this to be run when the bootloader is unlocked. The workflow would then be something like

which seems more straightforward. Long term, individual projects could do the signing themselves and distribute their public keys, resulting in the install process becoming as easy as which is actually easier than the current requirement to install an entirely new recovery image.

I'd actually previously criticised Google on the grounds that using custom keys wasn't possible on Android devices. I was wrong. It is, it's just that (as far as I can tell) nobody's actually documented it before. It's important that users not be forced into treating security and freedom as mutually exclusive, and it's great that Google have made that possible.

[1] This model fails if it's possible to gain root on the device. Thankfully this would never hold on what's that over there is that a distraction?

comment count unavailable comments

July 04, 2014 10:10 PM

July 03, 2014

Pavel Machek: Web browser limits desktop on low-powered machines

It seems that web browser is the limit when it comes to low-powered machines. Chromium is pretty much unusable with 512MB, usable with 1GB and nice with 2GB. Firefox is actually usable with 512MB -- it does not seem to have so big per-tab overhead -- but seems to be less responsive.

Anyway, it seems I'll keep using x86 for desktops for now.

July 03, 2014 10:29 AM

June 30, 2014

Paul E. Mc Kenney: Confessions of a Recovering Proprietary Programmer, Part XII

We have all suffered from changing requirements. We are almost done with implemention, maybe even most of the way done testing, and then a change in requirements invalidates all of our hard work. This can of course be extremely irritating.

It turns out that changing requirements are not specific to software, nor are they anything new. Here is an example from my father's area of expertise, which is custom-built large-scale food processing equipment.

My father's company was designing a new-to-them piece of equipment, namely a spinach chopper able to chop 15 tons of spinach per hour. Given that this was a new area, they built a small-scale prototype with which they ran a number of experiments on “small” batches of spinach (that is, less than a ton of spinach). The customer provided feedback on the results of these experiments, which fed into the design of the full-scale production model.

After the resulting spinach chopper had been in production for a short while, there was a routine follow-up inspection. The purpose of the inspection was to check for unexpected wear and other unanticipated problems with the design and fabrication. While the inspector was there, the chopper kicked a large rock into the air. It turned out that spinach from the field can contain rocks up to six inches (15cm) in diameter, and this requirement was not been communicated during development. This situation of course inspired a quick engineering change, installing a cage that captured the rocks.

Of course, it is only fair to point out that this requirement didn't actually change. After all, spinach from the field has always contained rocks up to six inches in diameter. There had been a failure to communicate an existing requirement, not a change in the actual requirement.

However, from the viewpoint of the engineers, the effect is the same. Regardless of whether there was an actual change in the requirements or the exposure of a previously hidden requirement, redesign and rework will very likely required.

One other point of interest to those who know me... The spinach chopper was re-inspected some five years after it started operation. Its blades had never been sharpened, and despite encountering the occasional rock, still did not need sharpening. So to those of you who occasionally accuse me of over-engineering, all I can say is that I come by it honestly! ;–)

June 30, 2014 10:36 PM

Pavel Machek: Warning: don't use 3.16-rc1

As Andi found, and it should be fixed in newest -rcs, but I just did

root@amd:~# mkfs.ext4 -c /dev/mapper/usbhdd

(yes, obscure 4GB bug, how could it hit me?)

And now I have

root@amd:/# dumpe2fs -b /dev/mapper/usbhdd
dumpe2fs 1.41.12 (17-May-2010)


>>> (2059923-1011347)/1024.
>>> (3108499-1011347)/1024.

. Yes, badblocks detected error every 4GB.

I'll update now, and I believe disk errors will mysteriously disappear.

June 30, 2014 08:03 PM

June 22, 2014

Pavel Machek: Feasibility of desktop on ARM cpu

Thinkpad X60 is old, Core Duo@1.8GHz, 2GB RAM notebook. But it is still pretty usable desktop machine, as long as Gnome2 is used, number of Chromium tabs does not grow "unreasonable", and development is not attempted there. But eats a bit too much power.

OLPC 1.75 is ARM v7@0.8GHz, .5GB RAM. According to my tests, it should be equivalent to Core Solo@0.43GHz. Would that make an usable desktop?

Socrates is dual ARM v7@1.5GHz, 1GB RAM. It should be equivalent to Core Duo@0.67GHz. Oh, and I'd have to connect display over USB. Would that be usable?

Ok, lets try. "nosmp mem=512M" makes thinkpad not boot. "nosmp mem=512M@1G" works a bit better. 26 chromium tabs make machine unusable: mouse lags, and system is so overloaded that X fails to
interpret keystrokes correctly. (Or maybe X and input subsystem sucks so much that it fails to interpret input correctly on moderate system load?)

I limited CPU clock to 1GHz; that's as low as thinkpad can go:
/sys/devices/system/cpu/cpu0/cpufreq# echo 1000000 > scaling_max_freq

Machine feels slightly slower, but usable as long as chromium is stopped. Even video playback is usable at SD resolution.

With limited number of tabs (7), situation is better, but single complex tab (facebook) still makes machine swap and unusable. And... slow CPU makes "unresponsive tabs" pop up way too often.

Impressions so far: Socrates CPU might be enough for marginally-usable desktop. 512MB RAM definitely is not. Will retry with 1GB one day.

June 22, 2014 02:59 PM

June 21, 2014

Rusty Russell: Alternate Blog for my Pettycoin Work

I decided to use github for pettycoin, and tested out their blogging integration (summary: it’s not very integrated, but once set up, Jekyll is nice).  I’m keeping a blow-by-blow development blog over there.

June 21, 2014 12:14 AM

June 20, 2014

Dave Jones: Transferring maintainership of x86info.

On Mon Feb 26 2001, I committed the version of x86info. 13 years later I’m pretty much done with it.
Despite a surge of activity back in February, when I reached a new record monthly commit count record, it’s been a project I’ve had little time for over the last few years. Luckily, someone has stepped up to take over. Going forward, Nick Black will be maintaining it here.

I might still manage the occasional commit, but I won’t be feeling guilty about neglecting it any more.

An unfinished splinter project that I started a while back x86utils was intended to be a replacement for x86info of sorts, by moving it to multiple small utilities rather than the one monolithic blob that x86info is. I’m undecided if I’ll continue to work on this, especially given my time commitments to Trinity and related projects.

Transferring maintainership of x86info. is a post from:

June 20, 2014 06:34 PM

June 19, 2014

Pavel Machek: debootstrap, olpc, and gnome

Versioned Fedora setup that is by default on OLPC-1.75 is a bit too strange for me, so I attempted to get Debian/ARM to work there. First, I used multistrap, but that uses separate repositories. debootstrap might be deprecated, but at least it works... after I figured need to do debootstrap --second-stage in chroot. I added firmware, set user password, installed system using tasksel, and things started working.

So now, I have a working gnome on OLPC... with everything a bit too small. I don't know how OLPC solved high-dpi support, but Debian certainly does not do the right thing by default.

June 19, 2014 02:49 PM

June 17, 2014

Dave Jones: Daily log June 16th 2014

Catch up from the last week or so.
Spent a lot of time turning a bunch of code in trinity inside out. After splitting the logging code into “render” and “write out buffer”, I had a bunch of small bugs to nail down. While chasing those I kept occasionally hitting some strange bug where occasionally the list of pids would get corrupted. Two weeks on, and I’m still no closer to figuring out the exact cause, but I’ve got a lot more clues (mostly by now knowing what it _isn’t_ due to a lot of code rewriting).
Some of that rewriting has been on my mind for a while, the shared ‘shm’ structure between all processes was growing to the point that I was having trouble remembering why certain variables were shared there rather than just be globals in main, inherited at fork time. So I moved a bunch of variables from the shm to another structure named ‘childdata’, of which there are an array of ptrs to in the shm.

Some of the code then looked a little convoluted with references that used to be shm->data being replaced with shm->children[childno].data, which I remedied by having each child instantiate a ‘this_child’ var when it starts up, allowing this_child->data to point to the right thing.

All of this code churn served in part to refresh my memory on how some of this worked, hoping that I’d stumble across the random scribble that occasionally happens. I also added a bunch of code to things like dump backtraces, which has been handy for turning up 1-2 unrelated bugs.

On Friday afternoon I discovered that the bug only shows up if MALLOC_PERTURB_ is set. Which is curious, because the struct in question that gets scribbled over is never free()’d. It’s allocated once at startup, and inherited in each child when it fork()’s.

Debugging continues..

Daily log June 16th 2014 is a post from:

June 17, 2014 04:11 AM

June 16, 2014

Rusty Russell: Rusty Goes on Sabbatical, June to December

At I spoke about my pre-alpha implementation of Pettycoin, but progress since then has been slow.  That’s partially due to yak shaving (like rewriting ccan/io library), partially reimplementation of parts I didn’t like, and partially due to the birth of my son, but mainly because I have a day job which involves working on Power 8 KVM issues for IBM.  So Alex convinced me to take 6 months off from the day job, and work 4 days a week on pettycoin.

I’m going to be blogging my progress, so expect several updates a week.  The first few alpha releases will be useless for doing any actual transactions, but by the first beta the major pieces should be in place…

June 16, 2014 08:50 AM

June 15, 2014

Michael Kerrisk (manpages): man-pages-3.69 is released

I've released man-pages-3.69. The release tarball is available on The browsable online pages can be found on The Git repository for man-pages is available on

Aside from minor improvements and fixes to various pages, the notable changes in man-pages-3.69 are the following:

June 15, 2014 07:33 PM

June 11, 2014

Daniel Vetter: Documentation for drm/i915

So over the past few years the drm subsystem gained some very nice documentation. And recently we've started to follow suite with the Intel graphics driver. All the kernel documenation is integrated into one big DocBook and I regularly upload latest HTML builds of the Linux DRM Developer's Guide. This is built from drm-intel-nightly so has slightly freshed documentation (hopefully) than the usual DocBook builds from Linus' main branch which can be found all over the place. If you want to build these yourself simply run

$ make htmldocs

For testing we now also have neat documentation for the infrastructure and helper libraries found in intel-gpu-tools. The README in the i-g-t repository has detailed build instructions - gtkdoc is a bit more of a fuzz to integrate.

Below the break some more details about documentation requirements relevant for developers.

So from now on I expect reasonable documentation for new, big kernel features and for new additions to the i-g-t library.

For i-g-t the process is simple: Add the gtk-doc comment blocks to all newly added functions, install and build with gtk-doc enabled. Done. If the new library is tricky (for example the pipe CRC support code) a short overview section that references some functions to get people started is useful, but not really required. And with the exception of the still in-flux kernel modesetting helper library i-g-t is fully documented, so there's lots of examples to copy from.

For the kernel this is a bit more involved, mostly since kerneldoc sucks more. But we also only just started with documenting the drm/i915 driver itself.

  1. First extract all the code for your new feature into a new file. There's unfortunately no other way to sensibly split up and group the reference documentation with kerneldoc. But at least that will also be a good excuse to review the related interfaces before extracting them.
  2. Create reference kerneldoc comments for the functions used as interfaces to the rest of the driver. It's always a bit a judgement call what to document and what not, since compared to the DRM core where functions must be explicitly exported to drivers there's no clean separate between the core parts and subsystems and more mundane platform enabling code. For big and complicated features it's also good practice to have an overview DOC: section somewhere at the beginning of the file.
  3. Note that kerneldoc doesn't have support for markdown syntax (or anything else like that) and doesn't do automatic cross-referencing like gtk-doc. So if you documentation absolutely needs a table or a list you have to do it twice unfortunately: Once as a plain code comment and once as a DocBook marked-up table or list. Long-term we want to improve the kerneldoc markup support, but for now we have to deal with what we have.
  4. As with all documentation don't document the details of the implementation - otherwise it will get stale fast because comments are often overlooked when updating code.
  5. Integrate the new kerneldoc section into the overall DRM DocBook template. Note that you can't go deeper than a section2 nesting for otherwise the reference documentation won't be lists, and due to the lack of any autogenerated cross-links inaccessible and useless. Build the html docs to check that your overview summary and reference sections have all been pulled in and that the kerneldoc parser is happy with your comments.
A really nice example for how to do this all is the documentation for the gen7 cmd parser in i915_cmd_parser.c.

June 11, 2014 06:51 PM

June 10, 2014

Daniel Vetter: Neat drm/i915 stuff for 3.16

Linus decided to have a bit fun with the 3.16 merge window and the 3.15 release, so I'm a bit late with our regular look at the new stuff for the Intel graphics driver.
First things first, Baytrail/Valleyview has finally gained support for MIPI DSI panels! Which means no more ugly hacks to get machines like the ASUS T100 going for users and no more promises we can't keep from developers - it landed for real this time around. Baytrail has also seen a lot of polish work in e.g. the infoframe handling, power domain reset, ...

Continuing on the new hardware platform this release features the first version of our prelimary support for Cherryview. At a very high level this combines a Gen8 render unit derived from Broadwell with a beefed-up Valleyview display block. So a lot of the enabling work boiled down to wiring up existing code, but of course there's also tons of new code to get all the details rights. Most of the work has been done by Ville and Chon Ming Lee with lots of help from other people.

Our modeset code has also seen lots of improvements. The user-visible feature is surely support for large cursors. On high-dpi panels 64x64 simply doesn't cut it and the kernel (and latest SNA DDX) now support up to the hardware limit of 256x256. But there's also been a lot of improvements under the hood: More of Ville's infrastructure for atomic pageflips has been merged - slowly all the required pieces like unified plane updates for modeset, two stage watermark updates or atomic sprite updates are falling into place. Still a lot of work left to do though. And the modesetting infrasfrastucture has also seen a bit of work by the almost complete removal of the ->mode_set hooks. We need that for both atomic modeset updates and for proper runtime PM support.

On that topic: Runtime power management is now enabled for a bunch of our recent platforms - all the prep work from Paulo Zanoni and Imre Deak in the past few releases has finally paid off. There's still leftovers to be picked up over the coming releases like proper runtime PM support for DPMS on all platforms, addressing a bunch of crazy corner cases, rolling it out on the newer platforms like Cherryview or Broadwell and cleaning the code up a bit. But overall we're now ready for what the marketing people call "connected standy", which means that power consumption with all devices turned off through runtime pm should be as low as when doing a full system suspend. It crucially relies upon userspace not sucking and waking the cpu and devices up all the time, so personally I'm not sure how well this will work out really.

Another piece for proper atomic pageflip support is the universal primary plane support from Matt Roper. Based upon his DRM core work in 3.15 he now enabled the universal primary plane support in i915 properly. Unfortunately the corresponding patches for cursor support missed 3.16. The universal plane support is hence still disabled by default. For other atomic modeset work a shout-out goes to Rob Clark who's locking conversion to wait/wound mutexes for modeset objects has been merged.

On the GEM side Chris Wilson massively improved our OOM handling. We are now much better at surviving a crash against the memory brickwall. And if we don't and indeed run out of memory we have much better data to diagnose the reason for the OOM. The top-down PDE allocator from Ben Widawsky better segregates our usage of the GTT and is one of the pieces required before we can enable full ppgtt for production use. And the command parser from Brad Volkin is required for some OpenGL and OpenCL features on Haswell. The parser itself is fully merged and ready, but the actual batch buffer copying to a secure location missed the merge window and hence it's not yet enabled in permission granting mode.

The big feature to pop the champagne though is the userptr support from Chris - after years I've finally run out of things to complain about and merged it. This allows userspace to wrap up any memory allocations obtained by malloc() (or anything else backed by normal pages) into a GEM buffer object. Useful for faster uploads and downloads in lots of situation and currently used by the DDX to wrap X shmem segments. But OpenCL also wants to use this.

We've also enabled a few Broadwell features this time around: eDRAM support from Ben, VEBOX2 support from Zhao Yakui and gpu turbo support from Ben and Deepak S.

And finally there's the usual set of improvements and polish all over the place: GPU reset improvements on gen4 from Ville, prep work for DRRS (dynamic refresh rate switching) from Vandana, tons of interrupt and especially vblank handling rework (from Paulo and Ville) and lots of other things.

June 10, 2014 09:30 AM

June 07, 2014

Rusty Russell: Donation to Jupiter Broadcasting

Chris Fisher’s Jupiter Broadcasting pod/vodcasting started 8 years ago with the Linux Action Show: still their flagship show, and how I discovered them 3 years ago.  Shows like this give access to FOSS to those outside the LWN-reading crowd; community building can be a thankless task, and as a small shop Chris has had ups and downs along the way.  After listening to them for a few years, I feel a weird bond with this bunch of people I’ve never met.

I regularly listen to Techsnap for security news, Scibyte for science with my daughter, and Unfilter to get an insight into the NSA and what the US looks like from the inside.  I bugged Chris a while back to accept bitcoin donations, and when they did I subscribed to Unfilter for a year at 2 BTC.  To congratulate them on reaching the 100th Unfilter episode, I repeated that donation.

They’ve started doing new and ambitious things, like Linux HOWTO, so I know they’ll put the funds to good use!

June 07, 2014 11:45 PM

June 06, 2014

Dave Jones: Monthly Fedora kernel bug statistics – May 2014

  19 20 rawhide  
Open: 88 308 158 (554)
Opened since 2014-05-01 9 112 20 (141)
Closed since 2014-05-01 26 68 17 (111)
Changed since 2014-05-01 49 293 31 (373)

Monthly Fedora kernel bug statistics – May 2014 is a post from:

June 06, 2014 04:16 PM

June 05, 2014

Michael Kerrisk (manpages): Linux/UNIX System Programming course scheduled for July

I've scheduled a further 5-day Linux/UNIX System Programming course to take place in Munich, Germany, for the week of 21-25 July 2014.

The course is intended for programmers developing system-level, embedded, or network applications for Linux and UNIX systems, or programmers porting such applications from other operating systems (e.g., Windows) to Linux or UNIX. The course is based on my book, The Linux Programming Interface (TLPI), and covers topics such as low-level file I/O; signals and timers; creating processes and executing programs; POSIX threads programming; interprocess communication (pipes, FIFOs, message queues, semaphores, shared memory),  network programming (sockets), and server design.
The course has a lecture+lab format, and devotes substantial time to working on some carefully chosen programming exercises that put the "theory" into practice. Students receive a copy of TLPI, along with a 600-page course book containing the more than 1000 slides that are used in the course. A reading knowledge of C is assumed; no previous system programming experience is needed.

Some useful links for anyone interested in the course:

Questions about the course? Email me via

June 05, 2014 01:38 PM

June 03, 2014

Dave Jones: Daily log June 2nd 2014

Spent the day making some progress on trinity’s “dump buffers after we detect tainted kernel” code, before hitting a roadblock. It’s possible for a fuzzing child process to allocate a buffer to create (for eg) a mangled filename. Because this allocation is local to the child, attempts to reconstruct what it pointed to from the dumping process will fail.
I spent a while thinking about this before starting on a pretty severe rewrite of how the existing logging code works. Instead of just writing the description of what is in the registers to stdout/logs, it now writes it to a buffer, which can be referenced directly by any of the other processes. A lot of the output code actually looks a lot easier to read as an unintended consequence.
No doubt I’ve introduced a bunch of bugs in the process, which I’ll spend time trying to get on top of tomorrow.

Daily log June 2nd 2014 is a post from:

June 03, 2014 03:48 AM

May 28, 2014

Dave Jones: Daily log May 27th 2014

Spent much of the day getting on top of the inevitable email backlog.
Later started cleaning up some of the code in trinity that dumps out syscall parameters to tty/log files.
One of the problems that has come up with trinity is that many people run it with logging disabled, because it’s much faster when you’re not constantly spewing to files, or scrolling text. This unfortunately leads to situations where we get an oops and we have no idea what actually happened. We have some information that we could dump in that case, which we aren’t right now.
So I’ve been spending time lately working towards a ‘post-mortem dump’ feature. The two pieces missing are 1. moving the logging functions away from being special cased to be called from within a child process, so that they can just chew on a syscall record. (This work is now almost done), and 2. maintain a ringbuffer of these records per child. (Not a huge amount of work once everything else is in place).

Hopefully I’ll have this implemented soon, and then I can get back on track with some of the more interesting fuzzing ideas I’ve had for improving.

Daily log May 27th 2014 is a post from:

May 28, 2014 03:11 PM

Michael Kerrisk (manpages): man-pages-3.68 is released

I've released man-pages-3.68. The release tarball is available on The browsable online pages can be found on The Git repository for man-pages is available on

Aside from a larger-than-usual set of minor improvements and fixes to various pages, the notable changes in man-pages-3.68 are the following:

May 28, 2014 02:57 PM