Kernel Planet

April 28, 2016

Pete Zaitcev: OpenStack Swift Proxy-FS by SwiftStack

SwiftStack's Joe Arnold and John Dickinson chose the Austin Summit and a low-key, #vBrownBag venue, to come out of closet with PROXY-FS (also spelled as ProxyFS), a tightly integrated addition to OpenStack Swift, which provides a POSIX-ish filesystem access to a Swift cluster.

Proxy-FS is basically a peer to a less known feature of Ceph Rados Gateway that permits accessing it over NFS. Both of them are fundamentally different from e.g. Swift-on-file in that the data is kept in Swift or Ceph, instead of a general filesystem.

The object layout is natural in that it takes advantage of SLO by creating a log-structured, manifested object. This way the in-place updates are handled, including appends. Yes, you can create a manifest with a billion of 1-byte objects just by invoking write(2). So, don't do that.

In response to my question, Joe promised to open the source, although we don't know when.

Another question dealt with the performance expectations. The small I/O performance of Proxy-FS is not going to be great in comparison to a traditional NFS filer. One of its key features is relative transparency: there is no cache involved and every application request goes straight to Swift. This helps to adhere to the principle of the least surprise, as well as achieve scalability for which Swift is famous. There is no need for any upcalls/cross-calls from the Swift Proxy into Proxy-FS that invalidate the cache, because there's no cache. But it has to be understood that Proxy-FS, as well as NFS mode in RGW, are not intended to compete with Netapp.

Not directly, anyway. But what they could do is to disrupt, in Christensen's sense. His disruption examples were defined as technologies that are markedly inferior to incumbents, as well as dramatically cheaper. Swift and Ceph are both: the filesystem performance sucks balls and the price per terabyte is 1/10th of NetApp (this statement was not evaluated by the Food and Drug Administration). If new applications come about that make use of these properties... You know the script.

April 28, 2016 04:42 PM

Daniel Vetter: X.Org Foundation Election Results

Two questions were up for voting, 4 seats on the Board of Directors and approval of the amended By-Laws to join SPI.

Congratulations to our reelected and new board members Egbert Eich, Alex Deucher, Keith Packard and Bryce Harrington. Thanks a lot to Lucas Stach for running. And also big thanks to our outgoing board member Matt Dew, who stepped down for personal reasons.

On the bylaw changes and merging with SPI, 61 out of 65 active members voted, with 54 voting yes, 4 no and 3 abstained. Which means we're well past the 2/3rd quorum for bylaw changes, and everything's green now to proceed with the plan to join SPI!

April 28, 2016 06:37 AM

April 27, 2016

James Bottomley: Unprivileged Build Containers

A while ago, a goal I set myself was to be able to maintain my build and test environments for architecture emulation containers without having to do any of the tasks as root and without creating any suid binaries to do this.  One of the big problems here is that distributions get annoyed (and don’t run correctly) if root doesn’t own most of the files … for instance the installers all check to see that the file got installed with the correct ownership and permissions and fail if they don’t.  Debian has an interesting mechanism, called fakeroot, to get around this using a preload library intercepting the chmod and chown system calls, but it’s getting a bit hackish to try to extend this to work permanently for an emulation container.

The correct way to do this is with user namespaces so the rest of this post will show you how.  Before we get into how to use them, lets begin with the theory of how user namespaces actually work.

Theory of User Namespaces

A user namespace is the single namespace that can be created by an unprivileged user.  Their job is to map a set of interior (inside the user namespace) uids, gids and projids1 to a set of exterior (outside the user namespace).

The way this works is that the root user namespace simply has a 1:1 identity mapping of all 2^32 identifiers, meaning it fully covers the space.  However, any new user namespace only need remap a subset of these.  Any id that is not mapped into the user namespace becomes inaccessible to that namespace.  This doesn’t mean completely inaccessible, it just means any resource owned or accessed by an unmapped id treats an attempted access (even from root in the namespace) as though it were completely unprivileged, so if the resource is readable by any id, it can still be read even in a user namespace where its owning id is unmapped.

User namespaces can also be nested but the nested namespace can only map ids that exist in its parent, so you can only reduce but not expand the id space by nesting.  The way the nested mapping works is that it remaps through all the parent namespaces, so what appears on the resource is still the original exterior ids.

User Namespaces also come with an owner (the uid/gid of the process that created the container).  The reason for this is that this owner is allowed to execute setuid/setgid to any id mapped into the namespace, so the owning uid/gid pair is the effective “root” of the container.  Note that this setuid/setgid property works on entry to the namespace even if uid 0 is not mapped inside the namespace, but won’t survive once the first setuid/setgid is issued.

The final piece of the puzzle is that every other namespace also has an owning user namespace, so while I cannot create a mount namespace as unprivileged user jejb, I can as remapped root inside my user namespace

jejb@jarvis:~> unshare --mount
unshare: unshare failed: Operation not permitted
jejb@jarvis:~> nsenter --user=/tmp/userns
root@jarvis:~# unshare --mount
root@jarvis:~#

And once created, I can always enter this mount namespace provided I’m also in my user namespace.

Setting up Unprivileged Namespaces

Any system user can actually create a user namespace.  However a non-root (meaning not uid zero in the parent namespace) user cannot remap any exterior id except their own.  This means that, because a build container needs a range of ids, it’s not possible to set up the intial remapped namespace without the help of root.  However, once that is done, the user can pretty much do every other operation2

The way remap ranges are set up is via the uid_map, gid_map and projid_map files sitting inside the /proc/<pid> directory.  These files may only be written to once and never updated3

As an example, to set up a build container, I need a remapping for every id that would be created during installation.  Traditionally for Linux, these are ids 0-999.  I want to remap them down to something unprivileged, say 100,000 so my line entry for this is

0 100000 1000

However, I also want an identity mapping for my own id (currently I’m at uid 1000), so I can still use my home directory from within the build container.  This also means I can create the roots for the containers within my home directory.  Finally, the nobody user and nobody,nogroup groups also need to be mapped, so the final uid map entries look like

0 100000 1000
1000 1000 1
65534 101001 1

For the groups, it’s even more complex because on openSUSE, I’m a member of the users group (gid 100) which sits in the middle of the privileged 0-999 group range, so the gid_map entry I use is

0 100000 100
100 100 1
101 100100 899
65533 101000 2

Which is almost up to the kernel imposed limit of five separate lines.

Finally, here’s how to set this up and create a binding for the user namespace.  As myself (I’m uid 1000 user name jejb) I do

jejb@jarvis:~> unshare --user
nobody@jarvis:~> echo $$
20211
nobody@jarvis:~>

Note that I become nobody inside the container because currently the map files are unwritten so there are no mapped ids at all.  Now as root, I have to write the mapping files and bind the entry file to the namespace somewhere

jarvis:/home/jejb # echo 1|awk '{print "0 100000 1000\n1000 1000 1\n65534 101001 1"}' > /proc/20211/uid_map
jarvis:/home/jejb # echo 1|awk '{print "0 100000 100\n100 100 1\n101 100100 899\n65533 101000 2"}' > /proc/20211/gid_map
jarvis:/home/jejb # touch /tmp/userns
jarvis:/home/jejb # mount --bind /proc/20211/ns/user /tmp/userns

Now I can exit my user namespace because it’s permanently bound and the next time I enter it I become root inside the container (although with uid 100000 outside)

jejb@jarvis:~> nsenter --user=/tmp/userns
root@jarvis:~# id
uid=0(root) gid=0(root) groups=0(root)
root@jarvis:~# su - jejb
jejb@jarvis:~> id
uid=1000(jejb) gid=100(users) groups=100(users)

Giving me a user namespace with sufficient mapped ids to create a build container.

Unprivileged Architecture Emulation Containers

Essentially, I can use the user namespace constructed above to bootstrap and enter the entire build container and its mount namespace with one proviso that I have to have a pre-created devices directory because I don’t possess the mknod capability as myself, so my container root also doesn’t possess it.  The way I get around this is to create the initial dev directory as root and then change the ownership to 100000.100000 (my unprivileged ids)

jejb@jarvis:~/containers/debian-amd64/dev> ls -l
total 0
lrwxrwxrwx 1 100000 100000 13 Feb 20 09:45 fd -> /proc/self/fd/
crw-rw-rw- 1 100000 100000 1, 7 Feb 20 09:45 full
crw-rw-rw- 1 100000 100000 1, 3 Feb 20 09:45 null
lrwxrwxrwx 1 100000 100000 8 Feb 20 09:45 ptmx -> pts/ptmx
drwxr-xr-x 2 100000 100000 6 Feb 20 09:45 pts/
crw-rw-rw- 1 100000 100000 1, 8 Feb 20 09:45 random
drwxr-xr-x 2 100000 100000 6 Feb 20 09:45 shm/
lrwxrwxrwx 1 100000 100000 15 Feb 20 09:45 stderr -> /proc/self/fd/2
lrwxrwxrwx 1 100000 100000 15 Feb 20 09:45 stdin -> /proc/self/fd/0
lrwxrwxrwx 1 100000 100000 15 Feb 20 09:45 stdout -> /proc/self/fd/1
crw-rw-rw- 1 100000 100000 5, 0 Feb 20 09:45 tty
crw-rw-rw- 1 100000 100000 1, 9 Feb 20 09:45 urandom
crw-rw-rw- 1 100000 100000 1, 5 Feb 20 09:45 zero

This seems to be sufficient of a dev skeleton to function with.  For completeness sake, I placed my bound user namespace into /run/build-container/userns and following the original architecture container emulation post, with modified susebootstrap and build-container scripts.  The net result is that as myself I can now enter and administer and update the build and test architecture emulation container with no suid required

jejb@jarvis:~> nsenter --user=/run/build-container/userns --mount=/run/build-container/ppc64
root@jarvis:/# id
uid=0(root) gid=0(root) groups=0(root)
root@jarvis:/# uname -m
ppc64
root@jarvis:/# su - jejb
jejb@jarvis:~> id
uid=1000(jejb) gid=100(users) groups=100(users)

The only final wrinkle is that root has to set up the user namespace on every boot, but that’s only because there’s no currently defined operating system way of doing this.

April 27, 2016 06:52 PM

April 26, 2016

LPC 2016: Checkpoint-Restore Microconference Accepted into 2016 Linux Plumbers Conference

This year will feature a four-fold deeper dive into checkpoint-restore technology, thanks to participation by people from a number of additional related projects! These are the OpenMPI message-passing library, Berkeley Lab Checkpoint/Restart (BLCR), and Distributed MultiThreaded CheckPointing (DMTCP) (not to be confused with TCP/IP), in addition to the Checkpoint/Restore in Userspace group that has participated in prior years.

Docker integration remains a hot topic, as is post-copy live migration, as well as testing/validation. As you might guess from the inclusion of people from BLCR and OpenMPI, checkpoint-restore for distributed workloads (rather than just single systems) is an area of interest.

Please join us for a timely and important discussion!

April 26, 2016 02:43 PM

April 25, 2016

Paul E. Mc Kenney: Checkpoint-Restore Microconference Accepted into 2016 Linux Plumbers Conference

This year will feature a four-fold deeper dive into checkpoint-restore technology, thanks to participation by people from a number of additional related projects! These are the OpenMPI message-passing library, Berkeley Lab Checkpoint/Restart (BLCR), and Distributed MultiThreaded CheckPointing (DMTCP) (not to be confused with TCP/IP), in addition to the Checkpoint/Restore in Userspace group that has participated in prior years.

Docker integration remains a hot topic, as is post-copy live migration, as well as testing/validation. As you might guess from the inclusion of people from BLCR and OpenMPI, checkpoint-restore for distributed workloads (rather than just single systems) is an area of interest.

Please join us for a timely and important discussion!

April 25, 2016 05:28 PM

April 24, 2016

LPC 2016: Testing & Fuzzing Microconference Accepted into 2016 Linux Plumbers Conference

Testing, fuzzing, and other diagnostics have made the Linux ecosystem much more robust than in the past, but there are still embarrassing bugs. Furthermore, million-year bugs will be happening many times per day across Linux’s huge installed base, so there is clearly need for even more aggressive validation.

The Testing and Fuzzing Microconference aims to significantly increase the aggression level of Linux-kernel validation, with discussions on tools and test suites including kselftest, syzkaller, trinity, mutation testing, and the 0day Test Robot. The effectiveness of these tools will be attested to by any of their victims, but we must further raise our game as the installed base of Linux continues to increase.

One additional way of raising the level of testing aggression is to document the various ABIs in machine-readable format, thus lowering the barrier to entry for new projects. Who knows? Perhaps Linux testing will be driven by artificial-intelligence techniques!

Join us for an important and spirited discussion!

April 24, 2016 07:43 PM

Daniel Vetter: Should the X.org Foundation join SPI? Vote Now!



In case you missed it: Please vote now on https://members.x.org/login.php!

April 24, 2016 09:14 AM

April 22, 2016

LPC 2016: Device Tree Microconference Accepted into 2016 Linux Plumbers Conference

Device-tree discussions are probably not quite as spirited as in the past, however, device tree is an active area. In particular, significant issues remain.

This microconference will cover the updated device-tree specification, debugging, bindings validation, and core-kernel code directions. In addition, there will be discussion of what does (and does not) go into the device tree, along with the device-creation and driver binding ordering swamp.

Join us for an important and spirited discussion!

April 22, 2016 01:33 PM

Matthew Garrett: Circumventing Ubuntu Snap confinement

Ubuntu 16.04 was released today, with one of the highlights being the new Snap package format. Snaps are intended to make it easier to distribute applications for Ubuntu - they include their dependencies rather than relying on the archive, they can be updated on a schedule that's separate from the distribution itself and they're confined by a strong security policy that makes it impossible for an app to steal your data.

At least, that's what Canonical assert. It's true in a sense - if you're using Snap packages on Mir (ie, Ubuntu mobile) then there's a genuine improvement in security. But if you're using X11 (ie, Ubuntu desktop) it's horribly, awfully misleading. Any Snap package you install is completely capable of copying all your private data to wherever it wants with very little difficulty.

The problem here is the X11 windowing system. X has no real concept of different levels of application trust. Any application can register to receive keystrokes from any other application. Any application can inject fake key events into the input stream. An application that is otherwise confined by strong security policies can simply type into another window. An application that has no access to any of your private data can wait until your session is idle, open an unconfined terminal and then use curl to send your data to a remote site. As long as Ubuntu desktop still uses X11, the Snap format provides you with very little meaningful security. Mir and Wayland both fix this, which is why Wayland is a prerequisite for the sandboxed xdg-app design.

I've produced a quick proof of concept of this. Grab XEvilTeddy from git, install Snapcraft (it's in 16.04), snapcraft snap, sudo snap install xevilteddy*.snap, /snap/bin/xevilteddy.xteddy . An adorable teddy bear! How cute. Now open Firefox and start typing, then check back in your terminal window. Oh no! All my secrets. Open another terminal window and give it focus. Oh no! An injected command that could instead have been a curl session that uploaded your private SSH keys to somewhere that's not going to respect your privacy.

The Snap format provides a lot of underlying technology that is a great step towards being able to protect systems against untrustworthy third-party applications, and once Ubuntu shifts to using Mir by default it'll be much better than the status quo. But right now the protections it provides are easily circumvented, and it's disingenuous to claim that it currently gives desktop users any real security.

comment count unavailable comments

April 22, 2016 01:51 AM

April 20, 2016

Paul E. Mc Kenney: Testing & Fuzzing Microconference Accepted into 2016 Linux Plumbers Conference

Testing, fuzzing, and other diagnostics have made the Linux ecosystem much more robust than in the past, but there are still embarrassing bugs. Furthermore, million-year bugs will be happening many times per day across Linux's huge installed base, so there is clearly need for even more aggressive validation.

The Testing and Fuzzing Microconference aims to significantly increase the aggression level of Linux-kernel validation, with discussions on tools and test suites including kselftest, syzkaller, trinity, mutation testing, and the 0day Test Robot. The effectiveness of these tools will be attested to by any of their victims, but we must further raise our game as the installed base of Linux continues to increase.

One additional way of raising the level of testing aggression is to document the various ABIs in machine-readable format, thus lowering the barrier to entry for new projects. Who knows? Perhaps Linux testing will be driven by artificial-intelligence techniques!

Join us for an important and spirited discussion!

April 20, 2016 04:43 PM

Daniel Vetter: X.org Foundation Election - Vote Now!

It's election season in X.org land, and it matters: Besides new board seats we're also voting on bylaw changes and whether to join SPI or not.

Personally, and as the secretary of the board I'm very much in favour of joining SPI. It will allow us to offload all the boring bits of running a foundation, and those are also all the bits we tend to struggle with. And that would give the board more time to do things that actually matter and help the community. And all that for a really reasonable price - running our own legal entity isn't free, and not really worth it for our small budget mostly consisting of travel sponsoring and the occasional internship.

And bylaw changes need a qualified supermajority of all members, every vote counts and not voting essentially means voting no. Hence please vote, and please vote even when you don't want to join - this is our second attempt and I'd really like to see a clear verdict from our members, one way or the other.

Thanks.

Voting closes by  Apr 26 23:59 UTC, but please don't cut it short, it's a computer that decides when it's over ...

April 20, 2016 07:24 AM

April 18, 2016

Paul E. Mc Kenney: Device Tree Microconference Accepted into 2016 Linux Plumbers Conference

Device-tree discussions are probably not quite as spirited as in the past, however, device tree is an active area. In particular, significant issues remain.

This microconference will cover the updated device-tree specification, debugging, bindings validation, and core-kernel code directions. In addition, there will be discussion of what does (and does not) go into the device tree, along with the device-creation and driver binding ordering swamp.

Join us for an important and spirited discussion!

April 18, 2016 03:21 PM

Matthew Garrett: One more attempt at SATA power management

Around a year ago I wrote some patches in an attempt to improve power management on Haswell and Broadwell systems by configuring Serial ATA power management appropriately. I got a couple of reports of them triggering SATA errors for some users, couldn't reproduce them myself and so didn't have a lot of confidence in them. Time passed.

I've been working on power management stuff again this week, so it seemed like a good opportunity to revisit these. I've made a few changes and pushed a couple of trees - one against master and one against 4.5.

First, these probably only have relevance to users of mobile Intel parts in the U or S range (/proc/cpuinfo will tell you - you're looking for a four-digit number that starts with 4 (Haswell), 5 (Broadwell) or 6 (Skylake) and ends with U or S), and won't do anything unless you have SATA drives (including PCI-based SATA). To test them, first disable anything like TLP that might alter your SATA link power management policy. Then check powertop - you should only be getting to PC3 at best. Build a kernel with these patches and boot it. /sys/class/scsi_host/*/link_power_management_policy should read "firmware". Check powertop and see whether you're getting into deeper PC states. Now run your system for a while and check the kernel log for any SATA errors that you didn't see before.

Let me know if you see SATA errors and are willing to help debug this, and leave a comment if you don't see any improvement in PC states.

comment count unavailable comments

April 18, 2016 02:15 AM

April 17, 2016

LPC 2016: PCI Microconference Accepted into 2016 Linux Plumbers Conference

Given that PCI was introduced more than two decades ago and that PCI Express was introduced more than ten years ago, one might think that the Linux plumbing already did everything possible to support PCI.

One would be quite wrong.

One issue with current PCI support is that resource allocation is handled on a per-architecture basis, leading to duplicate code, and, worse yet, duplicate bugs. This microconference will therefore look into possible consolidation of this code.

Another issue is integration of PCI’s message-signaled interrupts (MSI) into the kernel’s core interrupt code. Whilst the device tree bindings and relative core code for MSI management have just been integrated, legacy code needs more work, including architecture-specific code and PCI legacy host controller drivers. In addition, attention should be paid to the ACPI bindings, to the related ACPI kernel core code, and to the MSI passthrough usage in virtualized environments.

Advances in Virtualization technologies, System I/O devices and their memory management through IOMMU components are driving features in the PCI Express specifications (Address Translation Services (ATS) and Page Request Interface (PRI)). This microconference will also foster debate on the best way to integrate these features in the kernel in a seamless way for different architectures.

Of course, no discussion of PCI would be complete without considering firmware issues, hardware quirks, power management, and the interplay between device tree and ACPI.

Join us for an important new-to-Plumbers PCI discussion!

April 17, 2016 04:01 PM

April 15, 2016

Pavel Machek: Python is a nice... trap

Python is a nice... language

Easy to program in. No need to recompile, you can hack in as you would in shell. Fun.
Lets see if I can predict the weather. Oh yes, lets hack something in python. Hey, it works. But it is slow on a PC and so slow on a phone that future is already past when you predict is.
Then you try to improve the speed. That's very very bad idea. You can't optimize Python, but you can spend hours trying. You make a mental note to never ever use Python for computation again.
But Python should be fine for simple Gtk project, communicating over Dbus, right?
For bigger projects, Python has some support. Modules / object orientation really helps there. Lack of compilation really hurts there, but this is fun language, right? Allowing you to write nice and clean code.
Refactoring is hard, because variables are not declared.
Python is a nice... trap
But then your project grows larger, and you realize it takes 30 seconds to start up. Because many module need gtk, and importing gtk takes time. Heck... compiling _and_ running C based hello world is still faster then running Python based one.
Ok, C++ was tempting. Gtk Hello world takes 55 seconds to compile. My g++ does not support "auto", so the application starts with Glib::RefPtr<Gtk::Application> app = Gtk::Application::create(argc, argv, "org.gtkmm.example"). I think I'll stick with C.

Oh, and I already used my main SIM in N900/Debian... and I really used N900/Debian. I needed to tether a network, but not provide wifi hotspot. Done.

April 15, 2016 11:17 AM

Matthew Garrett: David MacKay

The first time I was paid to do software development came as something of a surprise to me. I was working as a sysadmin in a computational physics research group when a friend asked me if I'd be willing to talk to her PhD supervisor. I had nothing better to do, so said yes. And that was how I started the evening having dinner with David MacKay, and ended the evening better fed, a little drunker and having agreed in principle to be paid to write free software.

I'd been hired to work on Dasher, an information-efficient text entry system. It had been developed by one of David's students as a practical demonstration of arithmetic encoding after David had realised that presenting a visualisation of an effective compression algorithm allowed you to compose text without having to enter as much information into the system. At first this was merely a neat toy, but it soon became clear that the benefits of Dasher had a great deal of overlap with good accessibility software. It required much less precision of input, it made it easy to correct mistakes (you merely had to reverse direction in order to start zooming back out of the text you had entered) and it worked with a variety of input technologies from mice to eye tracking to breathing. My job was to take this codebase and turn it into a project that would be interesting to external developers.

In the year I worked with David, we turned Dasher from a research project into a well-integrated component of Gnome, improved its support for Windows, accepted code from an external contributor who ported it to OS X (using an OpenGL canvas!) and wrote ports for a range of handheld devices. We added code that allowed Dasher to directly control the UI of other applications, making it possible for people to drive word processors without having to leave Dasher. We taught Dasher to speak. We strove to avoid the mistakes present in so many other pieces of accessibility software, such as configuration that could only be managed by an (expensive!) external consultant. And we visited Dasher users and learned how they used it and what more they needed, then went back home and did what we could to provide that.

Working on Dasher was an incredible opportunity. I was involved in the development of exciting code. I spoke on it at multiple conferences. I became part of the Gnome community. I visited the USA for the first time. I entered people's homes and taught them how to use Dasher and experienced their joy as they realised that they could now communicate up to an order of magnitude more quickly. I wrote software that had a meaningful impact on the lives of other people.

Working with David was certainly not easy. Our weekly design meetings were, charitably, intense. He had an astonishing number of ideas, and my job was to figure out how to implement them while (a) not making the application overly complicated and (b) convincing David that it still did everything he wanted. One memorable meeting involved me gradually arguing him down from wanting five new checkboxes to agreeing that there were only two combinations that actually made sense (and hence a single checkbox) - and then admitting that this was broadly equivalent to an existing UI element, so we could just change the behaviour of that slightly without adding anything. I took the opportunity to delete an additional menu item in the process.

I was already aware of the importance of free software in terms of developers, but working with David made it clear to me how important it was to users as well. A community formed around Dasher, helping us improve it and allowing us to develop support for new use cases that made the difference between someone being able to type at two words per minute and being able to manage twenty. David saw that this collaborative development would be vital to creating something bigger than his original ideas, and it succeeded in ways he couldn't have hoped for.

I spent a year in the group and then went back to biology. David went on to channel his strong feelings about social responsibility into issues such as sustainable energy, writing a freely available book on the topic. He served as chief adviser to the UK Department of Energy and Climate Change for five years. And earlier this year he was awarded a knighthood for his services to scientific outreach.

David died yesterday. It's unlikely that I'll ever come close to what he accomplished, but he provided me with much of the inspiration to try to do so anyway. The world is already a less fascinating place without him.

comment count unavailable comments

April 15, 2016 06:26 AM

April 13, 2016

Matthew Garrett: Skylake's power management under Linux is dreadful and you shouldn't buy one until it's fixed

(Edit to add: this issue is restricted to the mobile SKUs. Desktop parts have very different power management behaviour)

Linux 4.5 seems to have got Intel's Skylake platform (ie, 6th-generation Core CPUs) to the point where graphics work pretty reliably, which is great progress (4.4 tended to lose all my windows every so often, especially over suspend/resume). I'm even running Wayland happily. Unfortunately one of the reasons I have a laptop is that I want to be able to do things like use it on battery, and power consumption's an important part of that. Skylake continues the trend from Haswell of moving to an SoC-type model where clock and power domains are shared between components that were previously entirely independent, and so you can't enter deep power saving states unless multiple components all have the correct power management configuration. On Haswell/Broadwell this manifested in the form of Serial ATA link power management being involved in preventing the package from going into deep power saving states - setting that up correctly resulted in a reduction in full-system power consumption of about 40%[1].

I've now got a Skylake platform with a nice shiny NVMe device, so Serial ATA policy isn't relevant (the platform doesn't even expose a SATA controller). The deepest power saving state I can get into is PC3, despite Skylake supporting PC8 - so I'm probably consuming about 40% more power than I should be. And nobody seems to know what needs to be done to fix this. I've found no public documentation on the power management dependencies on Skylake. Turning on everything in Powertop doesn't improve anything. My battery life is pretty poor and the system is pretty warm.

The best thing about this is the following statement from page 64 of the 6th Generation Intel ® Processor Datasheet for U-Platforms:

Caution: Long term reliability cannot be assured unless all the Low-Power Idle States are enabled.

which is pretty concerning. Without support for states deeper than PC3, Linux is running in a configuration that Intel imply may trigger premature failure. That's obviously not good. Until this situation is improved, you probably shouldn't buy any Skylake systems if you're planning on running Linux.

[1] These patches never went upstream. Someone reported that they resulted in their SSD throwing errors and I couldn't find anybody with deeper levels of SATA experience who was interested in working on the problem. Intel's AHCI drivers for Windows do the right thing, but I couldn't find anybody at Intel who could get any information from their Windows driver team.

comment count unavailable comments

April 13, 2016 10:16 PM

Paul E. Mc Kenney: PCI Microconference Accepted into 2016 Linux Plumbers Conference

Given that PCI was introduced more than two decades ago and that PCI Express was introduced more than ten years ago, one might think that the Linux plumbing already did everything possible to support PCI.

One would be quite wrong.

One issue with current PCI support is that resource allocation is handled on a per-architecture basis, leading to duplicate code, and, worse yet, duplicate bugs. This microconference will therefore look into possible consolidation of this code.

Another issue is integration of PCI's message-signaled interrupts (MSI) into the kernel's core interrupt code. Whilst the device tree bindings and relative core code for MSI management have just been integrated, legacy code needs more work, including architecture-specific code and PCI legacy host controller drivers. In addition, attention should be paid to the ACPI bindings, to the related ACPI kernel core code, and to the MSI passthrough usage in virtualized environments.

Advances in Virtualization technologies, System I/O devices and their memory management through IOMMU components are driving features in the PCI Express specifications (Address Translation Services (ATS) and Page Request Interface (PRI)). This microconference will also foster debate on the best way to integrate these features in the kernel in a seamless way for different architectures.

Of course, no discussion of PCI would be complete without considering firmware issues, hardware quirks, power management, and the interplay between device tree and ACPI.

Join us for an important new-to-Plumbers PCI discussion!

April 13, 2016 08:25 PM

April 12, 2016

LPC 2016: Call for Refereed-Track Proposals

We are pleased to announce the Call for Refereed-Track Proposals for the 2016 edition of the Linux Plumbers Conference, which will held be in Santa Fe, NM, USA on November 2-4 in conjunction with Linux Kernel Summit.

Refereed track presentations are 50 minutes in length and should focus on a specific aspect of the “plumbing” in the Linux system. Examples of Linux plumbing include core kernel subsystems, core libraries, windowing systems, management tools, device support, media creation/playback, and so on. The best presentations are not about finished work, but rather problems, proposals, or proof-of-concept solutions that require face-to-face discussions and debate.

Given that Plumbers is not colocated with LinuxCon this year, we are spreading the refereed-track talks over all three days. This provides a change of pace and also provides a conflict-free schedule for the refereed-track talks. (Yes, this does result in more conflicts between the referred-track talks and the Microconferences, but there never is a truly free lunch.)

Linux Plumbers Conference Program Committee members will be reviewing all submitted sessions. High-quality submisssion that cannot be accepted due to the limited number of slots will be forwarded to the Microconference leads for further consideration.

To submit a refereed track talk proposal follow the instructions at this website.

Submissions are due on or before Thursday 1 September, 2016 at 11:59PM Pacific Time. Since this is after the closure of early registration, speakers may register before this date and we’ll refund the registration for any selected presentation’s speaker, but for only one speaker per presentation.

Finally, we are still accepting Microconference submissions here.
 
Dates:

Submissions close: September 1, 2016
Speakers notified: September 22, 2016
Slides due: November 1, 2016

April 12, 2016 02:40 PM

LPC 2016: Live Kernel Patching Microconference Accepted into 2016 Linux Plumbers Conference

Live kernel patching was accepted into the Linux kernel in v4.0 in February 2015, so we can declare the 2014 LPC Live Kernel Patching Microconference to have been a roaring success! However, as was noted at the time, this is just the beginning of the real work. In short, the v4.0 work makes live kernel patching possible, but more work is required to make it more reliable and more routine.

Additional issues include stacktrace reliability, patch-safety criteria for kernel threads, thread consistency models, porting to non-x86 architectures, handling of loadable modules, compiler optimizations, userspace tooling, patching of data, automated regression testing, and patch-creation guidelines.

Join us for an important and spirited discussion!

April 12, 2016 01:01 AM

April 11, 2016

Paul E. Mc Kenney: 2016 Linux Plumbers Conference Call for Refereed-Track Proposals



 

 

Dates:

Submissions close: September 1, 2016
Speakers notified: September 22, 2016
Slides due: November 1, 2016

We are pleased to announce the Call for Refereed-Track Proposals for the 2016 edition of the Linux Plumbers Conference, which will held be in Santa Fe, NM, USA on November 2-4 in conjunction with Linux Kernel Summit.

Refereed track presentations are 50 minutes in length and should focus on a specific aspect of the "plumbing" in the Linux system. Examples of Linux plumbing include core kernel subsystems, core libraries, windowing systems, management tools, device support, media creation/playback, and so on. The best presentations are not about finished work, but rather problems, proposals, or proof-of-concept solutions that require face-to-face discussions and debate.

Given that Plumbers is not colocated with LinuxCon this year, we are spreading the refereed-track talks over all three days. This provides a change of pace and also provides a conflict-free schedule for the refereed-track talks. (Yes, this does result in more conflicts between the referred-track talks and the Microconferences, but there never is a truly free lunch.)

Linux Plumbers Conference Program Committee members will be reviewing all submitted sessions. High-quality submisssion that cannot be accepted due to the limited number of slots will be forwarded to the Microconference leads for further consideration.

To submit a refereed track talk proposal follow the instructions at this website.

Submissions are due on or before Thursday 1 September, 2016 at 11:59PM Pacific Time. Since this is after the closure of early registration, speakers may register before this date and we'll refund the registration for any selected presentation's speaker, but for only one speaker per presentation.

Finally, we are still accepting Microconference submissions here.

April 11, 2016 07:17 PM

Pavel Machek: pypy: suprisingly good

Using python for weather forecasting was a mistake... but it came with some surprises. It looks like pypy is as fast as gcc -O0 on simple correlation computation: https://gitlab.com/tui/tui/blob/master/nowcast/pypybench.py (and pypybench.c). gcc -O3 is twice as fast, and plain python2 is 20x slower (or 2000% slower). Python3 is 28x. slower. That's better than I expected for the JIT technology.

April 11, 2016 09:51 AM

Pavel Machek: Finally... power management on Nokia N900

After long long fight, it seems power management on Nokia N900 works for me for the first time. N900 is very picky about its configuration (you select lockdep, you lose video; you select something else 50mA power consumption... not good). That was the last major piece... I hope. I should have usable phone soon.

April 11, 2016 09:45 AM

Matthew Garrett: Making it easier to deploy TPMTOTP on non-EFI systems

I've been working on TPMTOTP a little this weekend. I merged a pull request that adds command-line argument handling, which includes the ability to choose the set of PCRs you want to seal to without rebuilding the tools, and also lets you print the base32 encoding of the secret rather than the qr code so you can import it into a wider range of devices. More importantly it also adds support for setting the expected PCR values on the command line rather than reading them out of the TPM, so you can now re-seal the secret against new values before rebooting.

I also wrote some new code myself. TPMTOTP is designed to be usable in the initramfs, allowing you to validate system state before typing in your passphrase. Unfortunately the initramfs itself is one of the things that's measured. So, you end up with something of a chicken and egg problem - TPMTOTP needs access to the secret, and the obvious thing to do is to put the secret in the initramfs. But the secret is sealed against the hash of the initramfs, and so you can't generate the secret until after the initramfs. Modify the initramfs to insert the secret and you change the hash, so the secret is no longer released. Boo.

On EFI systems you can handle this by sticking the secret in an EFI variable (there's some special-casing in the code to deal with the additional metadata on the front of things you read out of efivarfs). But that's not terribly useful if you're not on an EFI system. Thankfully, there's a way around this. TPMs have a small quantity of nvram built into them, so we can stick the secret there. If you pass the -n argument to sealdata, that'll happen. The unseal apps will attempt to pull the secret out of nvram before falling back to looking for a file, so things should just magically work.

I think it's pretty feature complete now, other than TPM2 support? That's on my list.

comment count unavailable comments

April 11, 2016 05:59 AM

April 08, 2016

Rusty Russell: Bitcoin Generic Address Format Proposal

I’ve been implementing segregated witness support for c-lightning; it’s interesting that there’s no address format for the new form of addresses.  There’s a segregated-witness-inside-p2sh which uses the existing p2sh format, but if you want raw segregated witness (which is simply a “0” followed by a 20-byte or 32-byte hash), the only proposal is BIP142 which has been deferred.

If we’re going to have a new address format, I’d like to make the case for shifting away from bitcoin’s base58 (eg. 1At1BvBMSEYstWetqTFn5Au4m4GFg7xJaNVN2):

  1. base58 is not trivial to parse.  I used the bignum library to do it, though you can open-code it as bitcoin-core does.
  2. base58 addresses are variable-length.  That makes webforms and software mildly harder, but also eliminates a simple sanity check.
  3. base58 addresses are hard to read over the phone.  Greg Maxwell points out that the upper and lower case mix is particularly annoying.
  4. The 4-byte SHA check does not guarantee to catch the most common form of errors; transposed or single incorrect letters, though it’s pretty good (1 in 4 billion chance of random errors passing).
  5. At around 34 letters, it’s fairly compact (36 for the BIP141 P2WPKH).

This is my proposal for a generic replacement (thanks to CodeShark for generalizing my previous proposal) which covers all possible future address types (as well as being usable for current ones):

  1. Prefix for type, followed by colon.  Currently “btc:” or “testnet:“.
  2. The full scriptPubkey using base 32 encoding as per http://philzimmermann.com/docs/human-oriented-base-32-encoding.txt.
  3. At least 30 bits for crc64-ecma, up to a multiple of 5 to reach a letter boundary.  This covers the prefix (as ascii), plus the scriptPubKey.
  4. The final letter is the Damm algorithm check digit of the entire previous string, using this 32-way quasigroup. This protects against single-letter errors as well as single transpositions.

These addresses look like btc:ybndrfg8ejkmcpqxot1uwisza345h769ybndrrfg (41 digits for a P2WPKH) or btc:yybndrfg8ejkmcpqxot1uwisza345h769ybndrfg8ejkmcpqxot1uwisza34 (60 digits for a P2WSH) (note: neither of these has the correct CRC or check letter, I just made them up).  A classic P2PKH would be 45 digits, like btc:ybndrfg8ejkmcpqxot1uwisza345h769wiszybndrrfg, and a P2SH would be 42 digits.

While manually copying addresses is something which should be avoided, it does happen, and the cost of making them robust against common typographic errors is small.  The CRC is a good idea even for machine-based systems: it will let through less than 1 in a billion mistakes.  Distinguishing which blockchain is a nice catchall for mistakes, too.

We can, of course, bikeshed this forever, but I wanted to anchor the discussion with something I consider fairly sane.

April 08, 2016 01:50 AM

April 07, 2016

Andi Kleen: Overview of Last Branch Records (LBRs) on Intel CPUs

I wrote a two part article on theory and practice of Last Branch
Records using Linux perf. They were published on LWN.

This includes the background (what is a last branch record and why
branch sampling), what you can use it for in perf, like hot path
profiling, frame pointerless callgraphs, or automatic micro benchmarks,
and also other uses like continuous compiler feedback.

The articles are now freely available:

Part 1: Introduction of Last Branch Records
Part 2: Advanced uses of Last Branch Records

April 07, 2016 02:26 PM

April 06, 2016

Pete Zaitcev: Amateur contributors to OpenStack

John was venting about our complicated contribution process in OpenStack and threw this off-hand remark:

I'm sure the fact that nearly 100% of @openstack contributors are paid to be so is completely unrelated. #eyeroll

While I share his frustration, one thing he may be missing is that OpenStack is generally useless to anyone who does not have thousands of computers dedicated to it. This is a significant barrier of entry for hobbyists, baked straight into the nature of OpenStack.

Exceptions that we see are basically people building little pseudo-clusters out of a dozen of VMs. They do it with an aim of advancing their careers.

April 06, 2016 04:44 PM

April 05, 2016

Matthew Garrett: There's more than one way to exploit the commons

There's a piece of software called XScreenSaver. It attempts to fill two somewhat disparate roles:

XScreenSaver does an excellent job of the second of these[2] and is pretty good at the first, which is to say that it only suffers from a disastrous security flaw once very few years and as such is certainly not appreciably worse than any other piece of software.

Debian ships an operating system that prides itself on stability. The Debian definition of stability is a very specific one - rather than referring to how often the software crashes or misbehaves, it refers to how often the software changes behaviour. Debian is very reluctant to upgrade software that is part of a stable release, to the extent that developers will attempt to backport individual security fixes to the version they shipped rather than upgrading to a release that contains all those security fixes but also adds a new feature. The argument here is that the new release may also introduce new bugs, and Debian's users desire stability (in the "things don't change" sense) more than new features. Backporting security fixes keeps them safe without compromising the reason they're running Debian in the first place.

This all makes plenty of sense at a theoretical level, but reality is sometimes less convenient. The first problem is that security bugs are typically also, well, bugs. They may make your software crash or misbehave in annoying but apparently harmless ways. And when you fix that bug you've also fixed a security bug, but the ability to determine whether a bug is a security bug or not is one that involves deep magic and a fanatical devotion to the cause so given the choice between maybe asking for a CVE and dealing with embargoes and all that crap when perhaps you've actually only fixed a bug that makes the letter "E" appear in places it shouldn't and not one that allows the complete destruction of your intergalactic invasion fleet means people will tend to err on the side of "Eh fuckit" and go drinking instead. So new versions of software will often fix security vulnerabilities without there being any indication that they do so[3], and running old versions probably means you have a bunch of security issues that nobody will ever do anything about.

But that's broadly a technical problem and one we can apply various metrics to, and if somebody wanted to spend enough time performing careful analysis of software we could have actual numbers to figure out whether the better security approach is to upgrade or to backport fixes. Conversations become boring once we introduce too many numbers, so let's ignore that problem and go onto the second, which is far more handwavy and social and so significantly more interesting.

The second problem is that upstream developers remain associated with the software shipped by Debian. Even though Debian includes a tool for reporting bugs against packages included in Debian, some users will ignore that and go straight to the upstream developers. Those upstream developers then have to spend at least 15 or so seconds telling the user that the bug they're seeing has been fixed for some time, and then figure out how to explain that no sorry they can't make Debian include a fixed version because that's not how things work. Worst case, the stable release of Debian ends up including a bug that makes software just basically not work at all and everybody who uses it assumes that the upstream author is brutally incompetent, and they end up quitting the software industry and I don't know running a nightclub or something.

From the Debian side of things, the straightforward solution is to make it more obvious that users should file bugs with Debian and not bother the upstream authors. This doesn't solve the problem of damaged reputation, and nor does it entirely solve the problem of users contacting upstream developers. If a bug is filed with Debian and doesn't get fixed in a timely manner, it's hardly surprising that users will end up going upstream. The Debian bugs list for XScreenSaver does not make terribly attractive reading.

So, coming back to the title for this entry. The most obvious failure of the commons is where a basically malicious actor consumes while giving nothing back, but if an actor with good intentions ends up consuming more than they contribute that may still be a problem. An upstream author releases a piece of software under a free license. Debian distributes this to users. Debian's policies result in the upstream author having to do more work. What does the upstream author get out of this exchange? In an ideal world, plenty. The author's software is made available to more people. A larger set of developers is willing to work on making improvements to the software. In a less ideal world, rather less. The author has to deal with bug mail about already fixed bugs. The author's reputation may be harmed by user exposure to said fixed bugs. The author may get less in the way of useful bug fixes or features because people are running old versions rather than fixing new ones. If the balance tips towards the latter, the author's decision to release their software under a free license has made their life more difficult.

Most discussions about Debian's policies entirely ignore the latter scenario, focusing more on the fact that the author chose to release their software under a free license to begin with. If the author is unwilling to handle the consequences of that, goes the argument, why did they do it in the first place? The unfortunate logical conclusion to that argument is that the author realises that they made a huge mistake and never does so again, and woo uh oops.

The irony here is that one of Debian's foundational documents, the Debian Free Software Guidelines, makes allowances for this. Section 4 allows for distribution of software in Debian even if the author insists that modified versions[4] are renamed. This allows for an author to make a choice - allow themselves to be associated with the Debian version of their work and increase (a) their userbase and (b) their support load, or try to distinguish what Debian ship from their identity. But that document was ratified in 1997 and people haven't really spent much time since then thinking about why it says what it does, and so this tradeoff is rarely considered.

Free software doesn't benefit from distributions antagonising their upstreams, even if said upstream is a cranky nightclub owner. Debian's users are Debian's highest priority, but those users are going to suffer if developers decide that not using free licenses improves their quality of life. Kneejerk reactions around specific instances aren't helpful, but now is probably a good time to start thinking about what value Debian bring to its upstream authors and how that can be increased. Failing to do so doesn't serve users, Debian itself or the free software community as a whole.

[1] The X server has no fundamental concept of a screen lock. This is implemented by an application asking that the X server send all keyboard and mouse input to it rather than to any other application, and then that application creating a window that fills the screen. Due to some hilarious design decisions, opening a pop-up menu in an application prevents any other application from being able to grab input and so it is impossible for the screensaver to activate if you open a menu and then walk away from your computer. This is merely the most obvious problem - there are others that are more subtle and more infuriating. The only fix in this case is to nuke the site from orbit.

[2] There's screenshots here. My favourites are the one that emulate the electrical characteristics of an old CRT in order to present a more realistic depiction of the output of an Apple 2 and the one that includes a complete 6502 emulator.

[3] And obviously new versions of software will often also introduce new security vulnerabilities without there being any indication that they do so, because who would ever put that in their changelog. But the less ethically challenged members of the security community are more likely to be looking at new versions of software than ones released three years ago, so you're probably still tending towards winning overall

[4] There's a perfectly reasonable argument that all packages distributed by Debian are modified in some way

comment count unavailable comments

April 05, 2016 11:53 PM

Matthew Garrett: TPMs, event logs, fine-grained measurements and avoiding fragility in remote-attestation

Trusted Platform Modules are fairly unintelligent devices. They can do some crypto, but they don't have any ability to directly monitor the state of the system they're attached to. This is worked around by having each stage of the boot process "measure" state into registers (Platform Configuration Registers, or PCRs) in the TPM by taking the SHA1 of the next boot component and performing an extend operation. Extend works like this:

New PCR value = SHA1(current value||new hash)

ie, the TPM takes the current contents of the PCR (a 20-byte register), concatenates the new SHA1 to the end of that in order to obtain a 40-byte value, takes the SHA1 of this 40-byte value to obtain a 20-byte hash and sets the PCR value to this. This has a couple of interesting properties:

But how do we know what those operations were? We control the bootloader and the kernel and we know what extend operations they performed, so that much is easy. But the firmware itself will have performed some number of operations (the firmware itself is measured, as is the firmware configuration, and certain aspects of the boot process that aren't in our control may also be measured) and we may not be able to reconstruct those from scratch.

Thankfully we have more than just the final PCR data. The firmware provides an interface to log each extend operation, and you can read the event log in /sys/kernel/security/tpm0/binary_bios_measurements. You can pull information out of that log and use it to reconstruct the writes the firmware made. Merge those with the writes you performed and you should be able to reconstruct the final TPM state. Hurrah!

The problem is that a lot of what you want to measure into the TPM may vary between machines or change in response to configuration changes or system updates. If you measure every module that grub loads, and if grub changes the order that it loads modules in, you also need to update your calculations of the end result. Thankfully there's a way around this - rather than making policy decisions based on the final TPM value, just use the final TPM value to ensure that the log is valid. If you extract each hash value from the log and simulate an extend operation, you should end up with the same value as is present in the TPM. If so, you know that the log is valid. At that point you can examine individual log entries without having to care about the order that they occurred in, which makes writing your policy significantly easier.

But there's another source of fragility. Imagine that you're measuring every command executed by grub (as is the case in the CoreOS grub). You want to ensure that no inappropriate commands have been run (such as ones that would allow you to modify the loaded kernel after it's been measured), but you also want to permit certain variations - for instance, you might have a primary root filesystem and a fallback root filesystem, and you're ok with either being passed as a kernel argument. One approach would be to write two lines of policy, but there's an even more flexible approach. If the bootloader logs the entire command into the event log, when replaying the log we can verify that the event description hashes to the value that was passed to the TPM. If it does, rather than testing against an explicit hash value, we can examine the string itself. If the event description matches a regular expression provided by the policy then we're good.

This approach makes it possible to write TPM policies that are resistant to changes in ordering and permit fine-grained definition of acceptable values, and which can cleanly separate out local policy, generated policy values and values that are provided by the firmware. The split between machine-specific policy and OS policy allows for the static machine-specific policy to be merged with OS-provided policy, making remote attestation viable even over automated system upgrades.

We've integrated an implementation of this kind of policy into the TPM support code we'd like to integrate into Kubernetes, and CoreOS will soon be generating known-good hashes at image build time. The combination of these means that people using Distributed Trusted Computing under Tectonic will be able to validate the state of their systems with nothing more than a minimal machine-specific policy description.

The support code for all of this should also start making it into other distributions in the near future (the grub code is already in Fedora 24), so with luck we can define a cross-distribution policy format and make it straightforward to handle this in a consistent way even in hetrogenous operating system environments. Remote attestation is a powerful tool for ensuring that your systems are in a valid state, but the difficulty of policy management has been a significant factor in making it difficult for people to deploy in their data centres. Making it easier for people to shield themselves against low-level boot attacks is a big step forward in improving the security of distributed workloads and makes bare-metal hosting a much more viable proposition.

comment count unavailable comments

April 05, 2016 04:08 PM

April 01, 2016

Rusty Russell: BIP9: versionbits In a Nutshell

Hi, I was one of the authors/bikeshedders of BIP9, which Pieter Wuille recently refined (and implemented) into its final form.  The bitcoin core plan is to use BIP9 for activations from now on, so let’s look at how it works!

Some background:

So, let’s look at BIP68 & 112 (Sequence locks and OP_CHECKSEQUENCEVERIFY) which are being activated together:

There are also two alerts in the bitcoin core implementation:

Now, when could the OP_CSV soft forks activate? bitcoin-core will only start setting the bit in the first period after the start date, so somewhere between 1st and 15th of May[1], then will take another period to lock-in (even if 95% of miners are already upgraded), then another period to activate.  So early June would be the earliest possible date, but we’ll get two weeks notice for sure.

The Old Algorithm

For historical purposes, I’ll describe how the old soft-fork code worked.  It used version as a simple counter, eg. 3 or above meant BIP66, 4 or above meant BIP65 support.  Every block, it examined the last 1000 blocks to see if more than 75% had the new version.  If so, then the new softfork rules were enforced on new version blocks: old version blocks would still be accepted, and use the old rules.  If more than 95% had the new version, old version blocks would be rejected outright.

I remember Gregory Maxwell and other core devs stayed up late several nights because BIP66 was almost activated, but not quite.  And as a miner there was no guarantee on how long before you had to upgrade: one smaller miner kept producing invalid blocks for weeks after the BIP66 soft fork.  Now you get two weeks’ notice (probably more if you’re watching the network).

Finally, this change allows for miners to reject a particular soft fork without rejecting them all.  If we’re going to see more contentious or competing proposals in the future, this kind of plumbing allows it.

Hope that answers all your questions!


 

[1] It would be legal for an implementation to start setting it on the very first block past the start date, though it’s easier to only think about version bits once every two weeks as bitcoin-core does.

April 01, 2016 01:28 AM

March 30, 2016

Pete Zaitcev: SwiftStack versus Swift

Erik Pounds posted an article on SwiftStack official blog where it presented a somewhat corporatist view of Swift and Ceph that comes down to this:

They are both productized by commercial companies so all enterprises can utilize them... Ceph via RedHat and Swift via SwiftStack.

This view is extremely reductionist along a couple of avenues.

First, it tries to sweep all of Swift under SwiftStack umbrella, whereas in reality it derives a lot of strength from not being controlled by SwiftStack. But way to assuage the fears of monoentity control by employing the PTL, guys. Fortunately, in the real and more complex world, Red Hat pays me to work on Swift, as well as offer Swift as a product, and I do not find that my needs are in any way sabotaged by PTL. Certainly our product focus differs; Red Hat's lifetime management offering, OSPd, manages OpenStack first and Swift as much as it's a part of OpenStack, whereas SwiftStack offer a Swift-specific product. Still it's not like Swift equals SwiftStack. I think RackSpace continue to operate the largest Swift cluster in the world.

Second, Erik somehow neglects to notice that Ceph provides a Swift compatibility through a component known as Rados Gateway. It is an option, you know, although obviously it can never be a better Swift than Swift itself, or better Amazon S3 than S3 itself.

March 30, 2016 07:36 PM

March 29, 2016

James Morris: Linux Security Summit 2016 – CFP Announced!

The 2016 Linux Security Summit (LSS) will be held in Toronto, Canada, on 25th and 26th, co-located with LinuxCon North America.  See the full announcement.

The Call for Participation (CFP) is now open, and submissions must be made by June 10th.  As with recent years, the committee is looking for refereed presentations and discussion topics.

This year, there will be a $100 registration fee for LSS, and you do not need to be registered for LinuxCon to attend LSS .

There’s also now an event twitter feed, for updates and announcements.

If you’ve been doing any interesting development, or deployment, of Linux security systems, please consider submitting a proposal!

March 29, 2016 11:08 AM

March 28, 2016

Paul E. Mc Kenney: Live Kernel Patching Microconference Accepted into 2016 Linux Plumbers Conference

Live kernel patching was accepted into the v4.0 Linux kernel in February 2015, so we can declare the 2014 LPC Live Kernel Patching Microconference to have been a roaring success! However, as was noted at the time, this is just the beginning of the real work. In short, the v4.0 work makes live kernel patching possible, but more work is required to make it more reliable and more routine.

Additional issues include stacktrace reliability, patch-safety criteria for kernel threads, thread consistency models, porting to non-x86 architectures, handling of loadable modules, compiler optimizations, userspace tooling, patching of data, automated regression testing, and patch-creation guidelines.

Join us for an important and spirited discussion!

March 28, 2016 04:52 PM

March 25, 2016

LPC 2016: Containers Microconference Accepted into 2016 Linux Plumbers Conference

The level of Containers excitement has increased even further this year, with much interplay between Docker, Kubernetes, Rkt, CoreOS, Mesos, LXC, LXD, OpenVZ, systemd, and much else besides. This excitement has led to some interesting new use cases, including even the use of containers on Android.

Some of these use cases in turn require some interesting new changes to the Linux plumbing, including mounts in unprivileged containers, improvements to cgroups resource management, ever-present security concerns, and interoperability between various sets of tools.

Please join us for an important discussion on an important aspect of cloud computing!

March 25, 2016 05:51 PM

March 22, 2016

Paul E. Mc Kenney: Containers Microconference Accepted into 2016 Linux Plumbers Conference

The level of Containers excitement has increased even further this year, with much interplay between Docker, Kubernetes, Rkt, CoreOS, Mesos, LXC, LXD, OpenVZ, systemd, and much else besides. This excitement has led to some interesting new use cases, including even the use of containers on Android.

Some of these use cases in turn require some interesting new changes to the Linux plumbing, including mounts in unprivileged containers, improvements to cgroups resource management, ever-present security concerns, and interoperability between various sets of tools.

Please join us for an important discussion on an important aspect of cloud computing!

March 22, 2016 05:56 PM

March 16, 2016

Gustavo F. Padovan: Collabora contributions to Linux Kernel 4.5

Linux Kernel 4.5 was released earlier this week, and once again Collabora engineers played a role in its development. In addition to their current projects, seven Collabora engineers contributed a total of 33 patches to the new Kernel.

As part of its continued committment to further increase ts participation to the Linux Kernel, Collabora is looking to expand its team of core software engineers. If you’d like to learn more, follow this link.

Here are some highlights of Collabora’s participation in Kernel 4.5:

Daniel Stone improved i915 runtime WARN() messages and fixed an important issue in the component subsystem when component_add() fails. Danilo Cesar made the DRM Docbook ready for Markdown text.

Gustavo Padovan improved the pm_runtime management on the drm/exynos driver and started work on de-staging the Android Sync Framework. On Rockchip, Sjoerd Simons enabled IR receiver to RK3288 Radxa Rock 2 Square, added multi_v7_defconfig for Rockchip audio and enabled RK3288 SPDIF clocks to change their parent. On the net side, Sjoerd added a patch to turn carrier off on phy attach to avoid unknown states and another patch to add ethernet0 alias for the RK3288 to help u-boot find this device-node.

During his brief time with us at Collabora, Heiko Stübner added the dts file for the veyron-brain board, a shutdown callback to platform variant dwc2 devices for a special clock handling to avoid getting stuck on the reboot/poweroff process and multi_v7_defconfig support to Rockchip’s io-domain driver, crypto module and rk808 clkout module. He also enabled support for veyron minnie touchscreen, adjusted temperature limits on veyron-speedy and fixed the edp-24m clock to be associated to the internal 24MHz oscillator all the time.

Martyn Welch added a driver for the Zodiac Aerospace RAVE Watchdog Processor, while Tomeu Vizoso added a device_is_bound() helper function and setter for dev.pm_domain that comes with extra checkings. Tomeu also added a patch to allow USB devices to remain runtime-suspended when sleeping and another patch to optimize sleep by going direct_complete if driver has no prepare and PM callbacks. Lastly, Tomeu also fixed a freq issue on Tegra devfreq_dev_profile.target callback.

Following is a list of all patches submitted by Collabora for this kernel release:

Daniel Stone (3):

Danilo Cesar Lemes de Paula (1):

Gustavo Padovan (9):

Heiko Stübner (8):

Martyn Welch (2):

Sjoerd Simons (5):

Tomeu Vizoso (5):

March 16, 2016 06:23 PM

March 15, 2016

Michael Kerrisk (manpages): man-pages-4.05 is released

I've released man-pages-4.05. The release tarball is available on kernel.org. The browsable online pages can be found on man7.org. The Git repository for man-pages is available on kernel.org.

This release resulted from patches, bug reports, reviews, and comments from more than 70 contributors. The release includes changes to more than 400 man pages. Among which the more significant changes in man-pages-4.05 are the following:

March 15, 2016 08:17 AM

March 14, 2016

Eric Sandeen: Return of the low-power server: a 10W, 6.5 terabyte, commodity x86 Linux box

5 years ago, I wrote about the build of the server hosting sandeen.net, which ran using only 18W of power.  After 5 years, it was time for an upgrade, and I’m pleased to say that I put together a much more capable system which now runs at only 10W!  This system handles external email and webserving, as well as handling media serving, energy datalogging, and backup storage for household purposes.  It’s the server which dished up this blog for you today.

sandeen.net wattage, as measured by a Kill-a-Watt meter [amzn]

If you run a server 24/7 at home, that always-on power consumption can really add up.  If you have an old beast running at 250W, that’s using about 2MWh of power per year, and will cost you over $200/year in electricity at $0.10/kWh.  A 10W build uses only 4% of that!  Here’s how I put it together:

I chose this motherboard because it has 4 SATA ports; the pairs of drives are in MD RAID1 mirror configurations using XFS for all filesystems, and the large 3T media drives are spun down most of the time; the system does get over 10W when they’re spinning.  And I have to say, the Samsung drives seem really nice; they are warranted for 150 terabytes written, or 10 years, whichever comes first.

I took most of the suggestions from the powertop tool, and I run this script at startup to put most devices into power-saving mode; this saves about 2W from the boot-up state. When boot-up power consumption is 12W, 2W is a significant fraction!  ;)

#!/bin/bash

# Set spindown time, see 
# https://sandeen.net/wordpress/computers/spinning-down-a-wd20ears/
hdparm -S 3 /dev/sdc
hdparm -S 3 /dev/sdd

# Idle these disks right now.
hdparm -y /dev/sdc
hdparm -y /dev/sdd

x86_energy_perf_policy -v powersave

# Stuff from powertop:
echo 'min_power' > '/sys/class/scsi_host/host0/link_power_management_policy';
echo 'min_power' > '/sys/class/scsi_host/host1/link_power_management_policy';
echo 'min_power' > '/sys/class/scsi_host/host2/link_power_management_policy';
echo 'min_power' > '/sys/class/scsi_host/host3/link_power_management_policy';
echo '0' > '/proc/sys/kernel/nmi_watchdog';
# I'm not willing to mess with data integrity...
# echo '1500' > '/proc/sys/vm/dirty_writeback_centisecs';
# Put most pci devices into low power mode
echo 'auto' > '/sys/bus/pci/devices/0000:00:1c.0/power/control';
echo 'auto' > '/sys/bus/pci/devices/0000:00:00.0/power/control';
echo 'auto' > '/sys/bus/pci/devices/0000:00:13.0/power/control';
echo 'auto' > '/sys/bus/pci/devices/0000:00:1a.0/power/control';
echo 'auto' > '/sys/bus/pci/devices/0000:00:14.0/power/control';
echo 'auto' > '/sys/bus/pci/devices/0000:00:1f.0/power/control';
echo 'auto' > '/sys/bus/pci/devices/0000:00:1c.1/power/control';
echo 'auto' > '/sys/bus/pci/devices/0000:00:1f.3/power/control';
echo 'auto' > '/sys/bus/pci/devices/0000:01:00.0/power/control';
# realtek chip doesn't like being in low power, see
# https://lkml.org/lkml/2016/2/24/88
# echo 'auto' > '/sys/bus/pci/devices/0000:02:00.0/power/control';
echo 'auto' > '/sys/bus/pci/devices/0000:00:02.0/power/control';

The old server took the network down to 100mbps, but that just feels too pokey most days, so I left it at full gigabit speed.

I’ve been very happy with this setup; it runs quiet, cool, fast, and cheap, while using less power than an old-fashioned night-light.  Jetway says the board will have a lifecycle through Q1 2019, so hopefully anyone reading this post well into the future can still pick one up if desired.

If you’re curious, here’s dmesg, lspci, and cpuinfo from the box.  Note that the cpu can even do virtualization!

March 14, 2016 01:00 PM

March 12, 2016

Matthew Garrett: I stayed in a hotel with Android lightswitches and it was just as bad as you'd imagine

I'm in London for Kubecon right now, and the hotel I'm staying at has decided that light switches are unfashionable and replaced them with a series of Android tablets.

A tablet displaying the text UK_bathroom isn't responding. Do you want to close it?
One was embedded in the wall, but the two next to the bed had convenient looking ethernet cables plugged into the wall. So.

I managed to borrow a couple of USB ethernet adapters, set up a transparent bridge (brctl addbr br0; brctl addif br0 enp0s20f0u1; brctl addif br0 enp0s20f0u2; ifconfig br0 up) and then stuck my laptop between the tablet and the wall. tcpdump -i br0 showed traffic, and wireshark revealed that it was Modbus over TCP. Modbus is a pretty trivial protocol, and notably has no authentication whatsoever. tcpdump showed that traffic was being sent to 172.16.207.14, and pymodbus let me start controlling my lights, turning the TV on and off and even making my curtains open and close. What fun!

And then I noticed something. My room number is 714. The IP address I was communicating with was 172.16.207.14. They wouldn't, would they?

I mean yes obviously they would.

It's basically as bad as it could be - once I'd figured out the gateway, I could access the control systems on every floor and query other rooms to figure out whether the lights were on or not, which strongly implies that I could control them as well. Jesus Molina talked about doing this kind of thing a couple of years ago, so it's not some kind of one-off - instead, hotels are happily deploying systems with no meaningful security, and the outcome of sending a constant stream of "Set room lights to full" and "Open curtain" commands at 3AM seems fairly predictable.

We're doomed.

(edited: this previously claimed I could only access systems on my own floor, but it turns out that each floor is a separate broadcast domain and I just needed to set a gateway to access the others)

(further edit: I'm deliberately not naming the hotel. They were receptive to my feedback and promised to do something about the issue.)

comment count unavailable comments

March 12, 2016 12:57 PM

March 10, 2016

Daniel Vetter: Neat drm/i915 stuff for 4.6

The 4.5 release is close, it's time to look at what's in store for the next kernel's merge window in the Intel graphics driver.

Headline features for sure are that FBC and PSR are enabled by default. And this time around I'm really hopeful that it will stick, since Paulo&Rodrigo have done a stellar job hunting down all the corner cases, writing testcases for them all and in general making sure we have a really solid foundation for display power saving features. There's still some oddball cornercases, which means it's not yet enabled everywhere and on all platforms, but like I said: Looking really good, and the culmination of over 1 year of effort to get the code infrastructure fixed up and solid.

Another project ongoing is atomic display support, with again lots of work from Maarten and Matt and others to move things forward. Specifically Maarten adapted the load detect logic to atomic and removed a pile of legacy structures no longer needed in preparation of next steps. Matt continued to work on atomic display fifo watermark updates. Another area that has seen a lot of work in the background is runtime PM. Mika, Imre and Ville have massively improved the debugging infrastructure we have to track down rare bugs in our power status tracking code. They also merged lots of fixes in this area. Unfortunately we're not yet at a point where we can enable overall runtime PM for the device by default.

More on the feature side is pixel clock limit checks for all outputs and platforms from Mika Kahola. This is related to the work to also enable dynamic display clock scaling on Gen9, but that part is still being worked on. Ville worked a lot on how offsets and alignment are handled for display planes, all in preparation to support rotated multi-planar formats like NV12. Again a feature where a lot of hard work is required to make the final patch to enable it all look really simple.

On the plain hardware and platform enabling side Jani implemented support for version 3 of the VBT DSI descriptions, which should extend DSI panel support to all current hardware. Which includes the Surface 3.

Finally on the GEM side there have been mostly small fixes and imrpovements under the hood. Tvrtko decoupled the internal engine representation from the userspace ABI defines. He also restructed the CS irq handler code, and started to fix up some locking issues in execlist. Chris tracked down some coherency issues in the execlist interrupt handler. Dave Gordon finally started to somewhat untangle the execlist initialization logic and some of the confusion in how all the different software structures connect.

One real feature work by Alex Dai though was enabled ADS for GuC, which is some means to hand additional metadata to the GuC firmware scheduler. But since GuC based command submission isn't enabled yet, this doesn't have a direct impact.

And of course there have been bugfixes all over the place, as usual.

March 10, 2016 10:27 AM

Pavel Machek: liferea .. allows tracking. Too much tracking for my taste.

I noticed that every time I click a header in liferea, on http://servis.idnes.cz/rss.aspx?c=zpravodaj feed, network shows usage. But there's nothing displayed...
I turned off network, and three broken images... So it turns out liferea is downloading three invisible images to assist idnes in tracking.
Can I somehow tell liferea not to download anything just because I clicked the headline?

March 10, 2016 09:01 AM

March 07, 2016

Pete Zaitcev: The "fast-post" is merged into Swift

I'm just back from a hackathon at HPE site in Bristol, England, where we made a final look to so-called "fast-post" patch and merged it in. It was developed by Alistair Coles and basically made POST work as everyone expected it to work, at last.

In the original Swift, it was found that when you do a POST to an object, in the presence of failures it was possible to end with some nodes having old data but new (posted) attributes. The bad part was that the replication mechanism could not do anything to recoincile the inconsistency, and then your GET returns varying data forever depending on what node you hit. It occured when new timestamp from POST attached itself to old data (and other equivalent scenarios).

This is some of a fundamental issue with using a timestamp based replication in Swift. Greg and Chuck knew about it all along, and their solution was known as "POST to PUT". They made Swift Proxy to fetch the object, then update its attributes for the POST, then do essentially a PUT. This way timestamps, data, and attributes are always consistent, as they are in the initial PUT. If this POST-to-PUT thing occurs across a failure, replication uses timestamps to restore consistently correctly.

The problem with that, POST-to-PUT is slow, as well as deceptive. Users think they issue a lightweight POST, but actually they prompt a massive data move inside the cluster if the object is big.

Alasdair's insight was that the root of the problem was not that timestamps were no good as a basic mechanism, but that the "fast" POST broke them by assigning new timestamps to old data (or old attributes, metadata, Content-Type). As long as each indepentently settable thing had its own timestamp, there was no problem. In Swift, we have 3 of those: object data, object metadata, and Content-Type (don't ask). So, store 3 timestamps with each object and presto!

The actual patch employs an additional trick by not changing the container DB schema. Instead, it encodes 3 timestamps into 1 field where the timestamp used to live. This way a smooth migration is possible in a cluster where old async pendings still float, for example. It looks a little kludgy at first, but I convinced myself that it made sense under the circumstances.

P.S. The fast-post is not a default, even now. Needs Container Sync updated to be compatible. I think Eran was going to look into that.

March 07, 2016 07:51 PM

March 04, 2016

Pavel Machek: blinking cursor of death

Yesterday, my / filesystem on Debian on Nokia N900 went full. Ok. So I removed some cached dpkgs. I believe phone still worked today morning.

Now, at the point where it should start X, screen goes dark, and then blinking cursor reapears. Normally, I'd hit ctrl-alt-f1, and debug
stuff from there, but N900 has no alt, and no f1. I'll image filesystem now. I can take a look at logs and fix stuff on filesystem,
but can't do other experiments easily. Debian 7.9. If you have some ideas what log to look at, let me know.

Window manager warning: CurrentTime used to choose focus window; focus window may not be correct.
Window manager warning: Got a request to focus the no_focus_window with a timestamp of 0.  This shoul\
dn't happen!
[1456818626,000,xklavier.c:xkl_engine_start_listen/]    The backend does not require manual layout ma\
nagement - but it is provided by the application
** Message: applet now removed from the notification area
polkit-mate-authentication-agent-1: Fatal IO error 11 (Resource temporarily unavailable) on X server \
:0.
g_dbus_connection_real_closed: Remote peer vanished with error: Underlying GIOStream returned 0 bytes\
on an async read (g-io-error-quark, 0). Exiting.
The application 'mate-panel' lost its connection to the display :0;
most likely the X server was shut down or you killed/destroyed
the application.
(nm-applet:3386): Gdk-WARNING **: nm-applet: Fatal IO error 0 (Success) on X server :0.

This is second time it happened, so it would be good to solve, OTOH just restoring from backup might be easier.

March 04, 2016 10:53 PM

February 28, 2016

Andy Grover: I’m Part of SFConservancy’s GPL Compliance Project for Linux

I believe GPL enforcement in general, and specifically around the Linux kernel, is a good thing. Because of this, I am one of the Linux copyright holders who has signed an agreement for the Software Freedom Conservancy to enforce the GPL on my behalf. I’m also a financial supporter of Conservancy.

That I hold any copyright at all to the work I’ve done is actually somewhat surprising. Usually one of the documents you sign when beginning employment at a company is agreeing that the company has ownership of work you do for them, and even things you create off-hours. Fair enough. But, my current employer does not do this. Its agreement lets its employees retain individual copyright to their work for open source development projects. I haven’t checked, but I suspect the agreement I signed with my less-steeped-in-F/OSS previous employers did not.

It’s something to ask about when considering accepting a new position.

I consider myself an idealist, but not a zealot. For projects I’ve started, I’ve used MIT, AGPLv3, MPLv2, GPLv2, GPLv3, LGPLv2, and Apache 2.0 licenses, based on the individual circumstances.

I believe if you’re going to build upon someone else’s work, it’s only fair to honor the rules they have set — the license — that allow it. Everyone isn’t doing that now with Linux, and are using the ambiguity over what is a “derived work” as a fig leaf. Enforcing the GPL is the only way to ensure the intent of the license is honored. We’ll also hopefully eventually gain some clarity on exactly what constitutes a derived work from the courts.

Please consider becoming a Conservancy Supporter.

February 28, 2016 02:29 AM

February 26, 2016

Pete Zaitcev: Ceph needs a build revolution

I've been poking at this Ceph thing since June, or 9 months now, and I feel like I'm overdue for a good rant. Today's topic is, Ceph's build system is absolutely insufferable. It's unbelievably fragile, and people break it every week at the most. Then it takes a week to fix, or even worse: it only breaks for me, but not for others, and then it stays broken, because I cannot possibly wade into the swamp this deep to fix it.

Here's today's post-10.0.3 trunk:

[zaitcev@lembas ceph-tip]$ sh autogen.sh 
..................
[zaitcev@lembas ceph-tip]$ ./configure --prefix=$(pwd)/build --with-radosgw
..................
[zaitcev@lembas ceph-tip]$ make -j3
..................
  CXX      common/PluginRegistry.lo
  CXXLD    libcommon_crc.la
ar: `u' modifier ignored since `D' is the default (see `U')
ar: common/.libs/libcommon_crc_la-crc32c_intel_fast_asm.o: No such file or direc
tory
Makefile:13137: recipe for target 'libcommon_crc.la' failed
[zaitcev@lembas ceph-tip]$ find . -name '*crc32c_intel*'
./src/common/.deps/libcommon_crc_la-crc32c_intel_fast_asm.Plo
./src/common/.deps/libcommon_crc_la-crc32c_intel_fast_zero_asm.Plo
./src/common/.deps/libcommon_crc_la-crc32c_intel_baseline.Plo
./src/common/.deps/libcommon_crc_la-crc32c_intel_fast.Plo
./src/common/.libs/libcommon_crc_la-crc32c_intel_baseline.o
./src/common/.libs/libcommon_crc_la-crc32c_intel_fast.o
./src/common/crc32c_intel_baseline.c
./src/common/crc32c_intel_baseline.h
./src/common/crc32c_intel_fast.c
./src/common/crc32c_intel_fast.h
./src/common/crc32c_intel_fast_asm.S
./src/common/crc32c_intel_fast_zero_asm.S
./src/common/libcommon_crc_la-crc32c_intel_fast.lo
./src/common/libcommon_crc_la-crc32c_intel_fast_asm.lo
./src/common/libcommon_crc_la-crc32c_intel_baseline.o
./src/common/libcommon_crc_la-crc32c_intel_fast.o
./src/common/libcommon_crc_la-crc32c_intel_fast_zero_asm.lo
./src/common/libcommon_crc_la-crc32c_intel_baseline.lo
[zaitcev@lembas ceph-tip]$ 

Sometimes these things fix themselves after a fresh clone/autogen.sh/configure/make. But doing so all the time is prohibited by how long Ceph builds. Literally it takes many hours (depending if you use autotools or Cmake, and how parallel your build is). I bought a 4-core laptop with 16 GB and SSD just for that. A $1,200 later, I only have to wait 4 hours. Yay, I can build Ceph 2 times in 1 day.

The situation is completely insane, and it remained so for the months I spent working on this. The worst is that I don't understand how people even deal with this without killing themselves. If you look at the pull requests, obviously a large number of developers manage to build this thing somehow... unless all of them post untested patches all the time.

UPDATE: Waiting a bit and a fresh clone made the build to complete, but then:

..................
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory '/q/zaitcev/ceph/ceph-tip/selinux'
[zaitcev@lembas ceph-tip]$ echo $?
0
[zaitcev@lembas ceph-tip]$ ./src/vstart.sh -n -d -r -i 192.168.132.2
ls: cannot access compressor/*/: No such file or directory
** going verbose **
./src/vstart.sh: line 374: ./init-ceph: No such file or directory
[zaitcev@lembas ceph-tip]$

We are about to freeze Jewel with this codebase.

February 26, 2016 11:05 PM

February 25, 2016

Pete Zaitcev: git rebase - proceed with caution

In the age of Github, we're not supposed to do the "git merge" anymore, but a "git rebase" instead. Everyone knows that. However, the rebase has its quirks. One stepped on goes like this:

  1. Have 2 patches: A and B, on top of a tree T. Submit A upstream.
  2. Upstream merges A.
  3. Do "git rebase" in T. You'd think A would disappear and you get to keep B on top of T (or T'=T+A).
  4. But instead, git finds some conflicts and throws you a 3-way merge node, which contains your A, upstream A, and some small, unrelated conflict. It has an empty commit message, obviously.
  5. Resolve the merge, "git commit -a", "git rebase --continue", and you end with next commit being empty.
  6. If at this point you think that it's the former A and do "git rebase --skip", then — hold onto your chair — B is thrown into a conflict as well, and its commit message is attached to a follow-on empty commit too, just like A was. If you skip that one, you lose the commit message forever. There's no reset you could do at that point.

Well, if you pushed on a branch, you could go back to Github and salvage the message from there.

Anyhow, rebase takes a certain care. You can't assume that it always works, or assume that you could return to the previous state of repository. In this sense it's fundamentally different from the merge, where you can always do "git reset", no matter how much you screwed up.

February 25, 2016 08:09 PM

Matthew Garrett: I bought some awful light bulbs so you don't have to

I maintain an application for bridging various non-Hue lighting systems to something that looks enough like a Hue that an Amazon Echo will still control them. One thing I hadn't really worked on was colour support, so I picked up some cheap bulbs and a bridge. The kit is badged as an iSuper iRainbow001, and it's terrible.

Things seemed promising enough at first, although the bulbs were alarmingly heavy (there's a significant chunk of heatsink built into them, which seems to get a lot warmer than I'd expect from something that claims a 7W power consumption). The app was a bit clunky, but eh - I wasn't planning on using it for long. I pressed the button on the bridge, launched the app and could control the bulbs. The first thing I noticed was that they had a separate "white" and "colour" mode. White mode was pretty bright, but colour mode massively less so - presumably the white LEDs are entirely independent of the RGB ones, and much higher intensity. Still, potentially useful as mood lighting.

Anyway. Next step was to start playing with the protocol, which meant finding the device on my network. I checked anything that had picked up a DHCP lease recently and nmapped them. The OS detection reported Linux, which wasn't hugely surprising - there was no GPL notice or source code included with the box, but I'm way past the point of shock at that. It also reported that there was a telnet daemon running. I connected and got a login prompt. And then I typed admin as the username and admin as the password and got a root prompt. So, there's that. The copy of Busybox included even came with tftp, so it was easy to get copies of tcpdump and strace on there to see what was up.

So. Next step. Protocol sniffing. I wanted to know how discovery worked, so reset the device to factory and watched what happens. The app on my phone sent out a discovery packet on UDP port 18602 which looked like this:

INLAN:CLIP:23.21.212.48:CLPT:22345:MAC:02:00:00:00:00:00

The CLIP and CLPT fields refer to the cloud server that allows for management when you're not on the local network. The mac field contains an utterly fake address. If you send out a discovery packet and your mac hasn't been registered with the device, you get a denial back. If your mac has been (and by "mac" here I mean "utterly fake mac that's the same for all devices"), you get back a response including the device serial number and IP address. And if you just leave out the mac field entirely, you get back a response no matter whether your address is registered or not. So, that's a start. Once you've registered one copy of the app with the device, anything can communicate with it by just using the same fake mac in the discovery packets.

Onwards. The command format turns out to be simple. They start ##, are followed by two ascii digits encoding a command, four ascii digits containing a bulb id, two ascii digits containing the number of following bytes and then the command data (in ascii). An example is:

##05010002ff

which decodes as command 5 (set white intensity) on bulb 1 with two bytes of data following, each of which is an f. This translates as "Set bulb 1 to full white intensity". I worked out the rest pretty quickly - command 03 sets the RGB colour of the bulb, 0A asks the bridge to search for new bulbs, 0B tells you which bulbs are available and their capabilities and 0E gives you the MAC addresses of the bulbs(‽). 0C crashes the server process, and 06 spews a bunch of garbage back at you and potentially crashes the bulb in a hilarious way that involves it flickering at about 15Hz. It turns out that 06 is actually the "Rename bulb" command, and if you send it less data than it's expecting something goes hilariously wrong in string parsing and everything is awful.

Ok. Easy enough, but not hugely exciting. What about the remote protocol? This turns out to involve sending a login packet and then a wrapped command packet. The login has some length data, a header of "MQIsdp", a long bunch of ascii-encoded hex, a username and a password.

The username is w13733 and the password is gbigbi01. These are hardcoded in the app. The ascii-encoded hex can be replaced with 0s and the login succeeds anyway.

Ok. What about the wrapping on the command? The login never indicated which device we wanted to talk to, so presumably there's some sort of protection going on here oh wait. The command packet is a length, the serial number of the bridge and then a normal command packet. As long as you know the serial number of the device (which it tells you in response to a discovery packet, even if you're unauthenticated), you can use the cloud service to send arbitrary commands to the device (including the one that crashes the service). One of which involves the device then doing some kind of string parsing that doesn't appear to work that well. What could possibly go wrong?

Ok, so that seemed to be the entirety of the network protocol. Anything else to do? Some poking around on the bridge revealed (a) that it had an active wireless device and (b) a running DHCP server. They wouldn't, would they?

Yes. Yes, they would.

The devices all have a hardcoded SSID of "iRainbow", although they don't broadcast it. There's no security - anybody can associate. It'll then hand out an IP address. It's running telnetd on that interface as well. You can bounce through there to the owner's internal network.

So, in summary: it's a device that infringes my copyright, gives you root access in response to trivial credentials, has access control that depends entirely on nobody ever looking at the packets, is sufficiently poorly implemented that you can crash both it and the bulbs, has a cloud access protocol that has no security whatsoever and also acts as an easy mechanism for people to circumvent your network security. This may be the single worst device I've ever bought.

I called the manufacturer and they insisted that the device was designed in 2012, was no longer manufactured or supported, that they had no source code to give me and there would be no security fixes. The vendor wants me to pay for shipping back to them and reserves the right to deduct the cost of the original "free" shipping from the refund. Everything is awful, which is why I just ordered four more random bulbs to play with over the weekend.

comment count unavailable comments

February 25, 2016 07:54 AM

February 23, 2016

James Bottomley: Are GPLv2 and CDDL incompatible?

Canonical recently threw this issue into sharp relief by their decision to ship CDDL licensed ZFS as a module of the GPLv2 licensed Linux kernel.  Reading their legal justification for this leaves me somewhat unconvinced because it’s essentially the same “not a derivative work” argument that a number of dubious actors have used to justify binary modules.  So what I’d like to do is look at this issue from a completely different viewpoint.  First by accepting the premise that CDDL and GPLv2 are incompatible (since there’s some debate on this) and secondly by accepting the even more controversial proposition that creating a kernel module is a derivative work.  I don’t want to debate these premises because it’s a worst case assumption I’m using as inputs to make the following analysis possible.

What is compliance?

One of the curious thing about CDDL and GPLv2 is that they’re both copyleft (albeit in differing forms) and the compliance requirements: the release of complete corresponding source code for your binary containing the licensed work.  In fact, the only significant difference is the requirement for build scripts, which is in GPLv2 but not in CDDL.  Therefore you can say that if you follow the compliance regime for GPLv2 on CDDL code, you’ll be in full compliance with the CDDL.  The licences do, in fact, have compatible compliance requirements.  This fact becomes very relevant when you consider the requirements for bringing a copyright lawsuit in the first place.

Where’s the Harm?

Copyright law is something called a tort in law.  That essentially means a branch of law for resolving disputes between individual parties.  However, in order to stem what could be seen as frivolous lawsuits, bringing a claim under tort law requires not just a showing that someone broke the rules of whatever agreement they were under but also that quantifiable harm resulted1.  The essential elements of a tort claim are a showing of a violation, a theory of the harm produced and a request for damages based on the harm2.

All of the bodies which do GPL enforcement recently published a charter in which they make clear that the sole requirement from an enforcement action should be compliance with the terms of the licence.  However, as I showed above, it is perfectly possible to be compatibly in compliance with both the CDDL and the GPLv2.  So the question becomes if the party is already in compliance, even though there’s a technical violation of the terms of the licence produced by the combination, what would our theory of harm be given that we don’t really seem to have anything extra we’d ask of the violating party?

Of course, one can wax lyrical about the “harm to the licence” of allowing incompatible combinations.  However, here we’re on a very sticky wicket because there have been a lot of works published (including by the FSF itself) bemoaning the problems of licence incompatibility.  Indeed, part of the design of the GPLv3 process was to make the licence more amenable to combination with other open source licences.  So suddenly becoming ardent advocates for the benefits of licence incompatibility is probably to be unlikely to fly before a judge as a theory of harm.

Conclusion

What the above analysis shows is that even though we presumed combination of GPLv2 and CDDL works to be a technical violation, there’s no way actually to prosecute such a violation because we can’t develop a convincing theory of harm resulting.  Because this makes it impossible to take the case to court, effectively it must be concluded that the combination of GPLv2 and CDDL, provided you’re following a GPLv2 compliance regime for all the code, is allowable.  This conclusion is the same as the one Canonical reached, but the means by which I got there are very different.

Note that this conclusion would apply to mixing any open source licence with GPLv2: provided the compliance regimes are compatible and the stricter one is followed, it’s difficult to develop a theory of harm and thus the combination is allowable.

Final Thought

The above analysis is all from the point of view of the Linux kernel compliance activities.  However, with ZFS, there is another copyright holder: Oracle.  Nothing prevents Oracle suing for copyright violation with a theory of harm that says something like the CDDL was deliberately designed to be incompatible with GPLv2 to prevent ZFS being shipped in Linux and as the shipper of products base on ZFS, they’ve suffered commercial harm (which would be quantifiable) by this action.

February 23, 2016 03:08 AM

Pete Zaitcev: Ha

Seen today (re. Linux Mint intrustion):

First, I believe that Linux Mint will come out of this stronger than ever. Second, this will force others to take ISO security more seriously. This also provides end users with a stronger reason to pay closer attention to what they’re doing.

Really, that's what he said.

February 23, 2016 12:51 AM

February 22, 2016

Matthew Garrett: Freedom, the US Government, and why Apple are still bad

The US Government is attempting to force Apple to build a signed image that can be flashed onto an iPhone used by one of the San Bernardino shooters. To their credit, Apple have pushed back against this - there's an explanation of why doing so would be dangerous here. But what's noteworthy is that Apple are arguing that they shouldn't do this, not that they can't do this - Apple (and many other phone manufacturers) have designed their phones such that they can replace the firmware with anything they want.

In order to prevent unauthorised firmware being installed on a device, Apple (and most other vendors) verify that any firmware updates are signed with a trusted key. The FBI don't have access to Apple's firmware signing keys, and as a result they're unable to simply replace the software themselves. That's why they're asking Apple to build a new firmware image, sign it with their private key and provide it to the FBI.

But what do we mean by "unauthorised firmware"? In this case, it's "not authorised by Apple" - Apple can sign whatever they want, and your iPhone will happily accept that update. As owner of the device, there's no way for you to reconfigure it such that it will accept your updates. And, perhaps worse, there's no way to reconfigure it such that it will reject Apple's.

I've previously written about how it's possible to reconfigure a subset of Android devices so that they trust your images and nobody else's. Any attempt to update the phone using the Google-provided image will fail - instead, they must be re-signed using the keys that were installed in the device. No matter what legal mechanisms were used against them, Google would be unable to produce a signed firmware image that could be installed on the device without your consent. The mechanism I proposed is complicated and annoying, but this could be integrated into the standard vendor update process such that you simply type a password to unlock a key for re-signing.

Why's this important? Sure, in this case the government is attempting to obtain the contents of a phone that belonged to an actual terrorist. But not all cases governments bring will be as legitimate, and not all manufacturers are Apple. Governments will request that manufacturers build new firmware that allows them to monitor the behaviour of activists. They'll attempt to obtain signing keys and use them directly to build backdoors that let them obtain messages sent to journalists. They'll be able to reflash phones to plant evidence to discredit opposition politicians.

We can't rely on Apple to fight every case - if it becomes politically or financially expedient for them to do so, they may well change their policy. And we can't rely on the US government only seeking to obtain this kind of backdoor in clear-cut cases - there's a risk that these techniques will be used against innocent people. The only way for Apple (and all other phone manufacturers) to protect users is to allow users to remove Apple's validation keys and substitute their own. If Apple genuinely value user privacy over Apple's control of a device, it shouldn't be a difficult decision to make.

comment count unavailable comments

February 22, 2016 10:01 PM

LPC 2016: Power Management and Energy-Awareness Microconference Accepted into 2016 Linux Plumbers Conference

Power management and energy efficiency continues to receive considerable attention, and is becoming a Plumbers tradition (see the 2015 wiki for last year’s instance). This year’s microconference will continue these efforts.

This microconference will look at elimination of timers from cpufreq governors, unifying idle management in SoCs with power resources shared between CPUs and I/O devices, load balancing utilizing workload consolidation and/or platform energy models to improve energy-efficiency and/or performance, improving CPU frequency selection efficiency by utilizing information provided by the scheduler in intel_pstate and cpufreq governors, idle injection (CFS-based vs. play idle with kthreads [PDF]), ACPI compliance tests (from the power management perspective) and more.

We hope to see you there!

February 22, 2016 07:40 PM

Paul E. Mc Kenney: Power Management and Energy-Awareness Microconference Accepted into 2016 Linux Plumbers Conference

Power management and energy efficiency continues to receive considerable attention, and is becoming a Plumbers tradition (see the 2015 wiki for last year's instance). This year's microconference will continue these efforts.

This microconference will look at elimination of timers from cpufreq governors, unifying idle management in SoCs with power resources shared between CPUs and I/O devices, load balancing utilizing workload consolidation and/or platform energy models to improve energy-efficiency and/or performance, improving CPU frequency selection efficiency by utilizing information provided by the scheduler in intel_pstate and cpufreq governors, idle injection (CFS-based vs. play idle with kthreads [PDF]), ACPI compliance tests (from the power management perspective) and more.

We hope to see you there!

February 22, 2016 04:13 PM

February 18, 2016

Matthew Garrett: Canonical, Ubuntu and why I seem so upset about them all the time

I had no access to the internet for most of my childhood. Nobody in my area knew anything about programming. I learned a great deal from a number of CDs that included free software source code and archives of project mailing lists. When I got to university, I learned even more by being able to develop a Debian-based OS for use in our computer facilities. That gave me the experience and knowledge that I needed to become involved in Debian development, which in turn gave me the background required to be able to help Ubuntu become the first free software operating system to work out of the box on modern laptops. From there, I've been able to build my career around developing free software.

Ubuntu can be translated as "I am who I am because of who we all are". I am who I am because people made the choice to release their software under licenses that permitted examination, modification and redistribution. I am who I am because I was able to participate in communities that took advantages of those freedoms to produce new and better software. I am who I am because when my priorities differed from those of existing communities, it was still possible for me to benefit from their work and for them to benefit from mine.

Free software doesn't mean that the software is entirely free of restrictions. While a core aspect is the right to distribute modified versions of code, it has never been fundamental to free software that you be able to do so while still claiming that the code is the original version. Various approaches have been taken to make it possible for users to distinguish modified versions, ranging from simply including license terms that require modified versions be marked as such, to licenses that require that you change the name of the package if you modify it. However, what's probably the most effective approach has been to apply trademark law to the problem. Mozilla's trademark policy is an example of this - if you modify the code in ways that aren't approved by Mozilla, you aren't entitled to use the trademarks.

A requirement that you avoid use of trademarks in an infringing way is reasonable. Mozilla products include support for building with branding disabled, which makes it very straightforward for a user to build a modified version of Firefox that can be redistributed without any trademark issues. Red Hat have a similar policy for Fedora and RHEL[1] - you simply replace the packages that contain the branding and you're done.

Canonical's IP policy around Ubuntu is fundamentally different. While Mozilla make it clear that you simply no longer have a right to use the trademarks under trademark law, Canonical appear to require that you remove all trademarks entirely even if using them wouldn't be a violation of trademark law. While Mozilla restrict the redistribution of modified binaries that include their trademarks, Canonical insist that you rebuild everything even if the package doesn't contain any trademarks. And while Mozilla give you a single build option that creates binaries that conform with their trademark requirements, Canonical will refuse to tell you what you have to do.

When asked about this at SCALE earlier this year, Mark Shuttleworth claimed that Ubuntu's policy was consistent with that of other projects. This is inaccurate. Nobody else requires that you rebuild every package before you can redistribute it in a modified distribution - such a restriction is a violation of freedom 2 of the Free Software Definition, and as a result the binary distributions of Ubuntu are not free software. Nobody else refuses to discuss whether you're required to remove non-infringing trademarks in order to be able to redistribute. Nobody else responds to offers to make it easier for users to produce non-infringing derivatives with a flat refusal.

Mark claims that I'm only raising this issue because I work for a competitor and wish to harm Canonical. Nothing could be further from the truth. I began discussing this before working for my current employers - my previous employers had no meaningful market overlap with Canonical at all. The reason I care is because I care about free software. I care about people being able to derive new and interesting things from existing code. I care about a small team of people being able to take Ubuntu and make something better in the same way that Ubuntu did with Debian. I care about ensuring that users receive the freedom to do this without having to jump through a significant number of hoops in the process. Ubuntu has been a spectacularly successful vehicle for getting free software into the hands of users. Mark's generosity in funding this experiment has undoubtedly made the world a better place. Canonical employs a large number of talented developers writing high quality software, many of whom I'm fortunate enough to be able to call friends. And Canonical are squandering that by restricting the rights of their users and alienating the free software community.

I want others to be who they are because of my work and the work of all the others like me. Anything that makes that more difficult saddens me, and so I do what I can to fix it. I criticise Canonical's policies in the hope that we, as a community, can convince Canonical to agree that this kind of artificial barrier to modification hurts us more than it helps them. In many ways, Canonical remain one of our best hopes for broadening the reach of free software, and this is why it's unfortunate that they do so in a way that makes it more difficult for people to have the same experiences that I did.

[1] While it's easy to turn a trademark infringing version of RHEL into a non-infringing one, Red Hat don't provide publicly available binary packages for RHEL. If you get hold of them somehow you're entitled to redistribute them freely, but Red Hat's subscriber agreement indicates that if you do this as a Red Hat customer you will lose access to further binary updates - a provision that I find utterly repugnant. Its inclusion reduces my respect for Red Hat and my enthusiasm for working with them, and given the official Red Hat support for CentOS it appears to make no sense whatsoever. Red Hat should drop it.

comment count unavailable comments

February 18, 2016 11:09 PM

February 15, 2016

James Bottomley: efitools arm 32 bit build fixed

It turned out there was a bug in gnu-efi, in the linker script.  I’ve added a local fix for this to the OBS UEFI project and filed a bug with gnu-efi on sourceforge.  The net result is that the arm 32 bit binaries are now passing most of their tests (the remaining problems may be due to faults in the UEFI environment, still investigating).  I should also have the OVMF image for arm (the ArmVirt package) building for 32 bit arm shortly.

I’ve released version 1.6.1 of efitools to make sure a new version gets rebuilt.

February 15, 2016 04:09 PM

Dave Airlie: virglrenderer project gets mailing-list and hosting

Just FYI:

There is now a mailing list for virglrenderer library hosted at freedesktop. It is to be used for development discussion of the virgl->GL renderer library and for patches to it.

https://lists.freedesktop.org/mailman/listinfo/virglrenderer-devel

The git tree is also now hosted at:
git://anongit.freedesktop.org/git/virglrenderer
https://cgit.freedesktop.org/virglrenderer

My personal repo will only be used for my own development stuff.

I'll also get a freedesktop patchwork instance set up for this asap.

I'm also contemplating bugzilla vs phabricator.

Dave.

February 15, 2016 02:51 AM

February 14, 2016

James Bottomley: efitools for arm released

Thanks to using architecture emulation containers, I can now build and test efitools for arm (both 64 bit: aarch64 and 32 bit armv7l) on my x86_64 laptop.  To test the actual EFI binaries, you need the arm equivalent of OVMF which is ArmVirt.  I’ve got the build service spitting out the images (still called OVMF because of package naming requirements) for those who want to try this at home.  To run the UEFI image, you need qemu-system-<arch> which, on SUSE, is provided by the qemu-arm package.  The Linaro ARM64 site has detailed instructions, but for the impatient, the required command line (for aarch64) is

qemu-system-aarch64 \
        -m 2048 \
        -cpu cortex-a57 \
        -M virt \
        -bios /home/jejb/ovmf-aarch64.bin \
        -serial stdio \
        -drive if=none,file=fat:.,id=hd0 \
        -device virtio-blk-device,drive=hd0

Which brings up the UEFI bios image and mounts the current directory as an additional fat volume.  The Linaro site recommends 1024 for the memory size, but if you want to boot a kernel or do anything fancy, you’ll find 2048 is the minimum.  The same thing will work for 32 bit arm (except that you want to remove the -cpu line).

To get efitools to build, I had to get the arm versions of sbsigntools up and running, which I have.  The source tree I used for this is here.  There were a few modifications necessary to build arm, but nothing drastic.  Since the Ubuntu version of the git tree appears to be dead, I’ve merged back all the ubuntu patches and will maintain this going forwards.  I’ll bump the release version to 0.8 when I’m sure everything is stable.

Status

Using this, the current status is that the aarch64 binaries are all running fine and have verified nicely.  Unfortunately, there’s still some problem with generating efi binaries for the 32 bit arm systems.  Most of the simple binaries are running OK, but the more complex ones (like KeyTool) are showing symptoms of data relocation problems which I’m still investigating.  I’ll push a new release of efitools when I can get this sorted out, although it looks like a more generic problem inside gnu-efi itself.  Edk2 builds of am32 efi binaries seem to be working, so I’ll see if I can deduce what’s missing from gnu-efi.

February 14, 2016 05:47 PM

February 13, 2016

James Bottomley: Constructing Architecture Emulation Containers

Usually container related stuff goes on $EMPLOYER blog, but this time, I had a container need for my hobbies. the problem: how to build and test efitools for arm and aarch64 while not possessing any physical hardware.  The solution is to build an architecture emulation container using qemu and mount namespaces such that when its entered you find yourself in your home directory but with the rest of Linux running natively (well emulated natively via qemu) as a new architecture.  Binary emulation in Linux is nothing new: the binfmt_misc kernel module does it, and can execute anything provided you’ve told it what header to expect and how to do the execution.  Most distributions come with a qemu-linux-user package which will usually install the necessary binary emulators via qemu to run non-native binaries.  However, there’s a problem here: the installed binary emulator usually runs as /usr/bin/qemu-${arch}, so if you’re running a full operating system container, you can’t install any package that would overwrite that.  Unfortunately for me, the openSUSE Build Service package osc requires qemu-linux-user and would cause the overwrite of the emulator and the failure of the container.  The solution to this was to bind mount the required emulator into the / directory, where it wouldn’t be overwritten and to adjust the binfmt_misc paths accordingly.

Aside about binfmt_misc

The documentation for this only properly seems to exist in the kernel Documentation directory as binfmt_misc.txt.  However, very roughly, the format is

:name:type:offset:magic:mask:interpreter:flags

name is just a handle which will appear in /proc/sys/fs/binfmt_misc, type is M for magic or E for extension (Magic means recognise the type by the binary header, the usual UNIX way and E means recognise the type by the file extension, the Windows way). offset is where in the file to find the magic header to recognise;  magic and mask are the mask to and the binary string with and the magic to find once the masking is done.  Interpreter is the name of the interpreter to execute and flags tells binfmt_misc how to execute the interpreter.  For qemu, the flags always need to be OC meaning open the binary and generate credentials based on it (this can be seen as a security problem because the interpreter will execute with the same user and permissions as the binary, so you have to trust it).

If you’re on a systemd system, you can put all the above into /etc/binfmt.d/file.conf and systemd will feed it to binfmt_misc on boot.  Here’s an example of the aarch64 emulation file I use.

Bootstrapping

To bring up a minimal environment that’s fully native, you need to bootstrap it by installing just enough binaries using your native system before you can enter the container.  At a minimum, this is enough shared libraries and binaries to run the shell.  If you’re on a debian system, you probably already know how to use debootstrap to do this, but if you’re on openSUSE, like me, this is a much harder proposition because persuading zypper to install non native binaries isn’t easy.  The first thing you need to know is that you need to install an architectural override for libzypp in the file pointed to by the ZYPP_CONF environment variable. Here’s an example of a susebootstrap shell script that will install enough of the architecture to run zypper (so you can install all the packages you actually need).  Just run it as (note, you must have the qemu-<arch> binary installed because the installer will try to run pre and post scripts which may fail if they’re binary unless the emulation is working):

susebootstrap --arch <arch> <location>

And the bootstrap image will be build at <location> (I usually choose somewhere in my home directory, but you can use /var/tmp or anywhere else in your filesystem tree).  Note this script must be run as root because zypper can’t change ownership of files otherwise.  Now you are ready to start the architecture emulation container with <location> as the root.

Building an Architecture Emulation Container

All you really need now is a mount namespace with <location> as the real root and all the necessary Linux filesystems like /sys and /proc mounted.  Additionally, you usually want /home and I also mount /var/tmp so there’s a standard location for all my obs build directories.  Building a mount namespace is easy: simply unshare –mount and then bind mount everything you need.  Finally you use pivot_root to swap the new and old roots and unmount -l the old root (-l is necessary because the mount point is in-use outside the mount namespace as your real root, so you just need it unbinding, you don’t need to wait until no-one is using it).

All of this is easily scripted and I created this script to perform these actions.  As a final act, the script binds the process and creates an entry link in /run/build-container/<arch>.  This is the command line I used for the example below:

build-container --arch arm --location /home/jejb/tmp/arm

Now entering the build container is easy (you still have to enter the namespace as root, but you can exec su – <user> to become whatever your non-root user is).

jejb@jarvis:~> sudo -s
jarvis:/home/jejb # uname -m
x86_64
jarvis:/home/jejb # nsenter --mount=/run/build-container/arm
jarvis:/ # uname -m
armv7l
jarvis:/ # exec su - jejb
jejb@jarvis:~> uname -m
armv7l
jejb@jarvis:~> pwd
/home/jejb

And there you are, all ready to build binaries and run them on an armv7 system.

Aside about systemd and Shared Subtrees

On a normal linux system, you wouldn’t need to worry about any of this, but if you’re running systemd, you do, because systemd has some very inimical properties (to mount namespaces) you need to be aware of.

In Linux, a bind mount creates a subtree.  Because you can bind mount from practically anywhere to anywhere, you can have many such subtrees that are substantially related.  The default way to create subtrees is “private” this means that even if the subtrees are effectively the same set of files, a mount operation on one isn’t seen by any of the others.  This is great, because it’s precisely what you want for containers.  However, if a subtree is set to shared (with the mount –make-shared command) then all mount and unmount operations a propagated to every shared copy.  The reason this matters for systemd is because systemd at start of day sets every mount point in the system to shared.  Unless you re-privatise the bind mounts as you create the architecture emulation container, you’ll notice some very weird effects.  Firstly, because pivot_root won’t pivot to a shared subtree, that call will fail but secondly, you’ll notice that when you umount -l /old-root it will propagate to the real root and unmount everything (like your root /proc /dev and /sys) effectively rendering your system unusable.  the mount –make-rprivate /old-root recursively descends the /old-root and sets all the mounts to private so the umount -l simply detached the /old-root instead of propagating all the umount events.

February 13, 2016 11:12 PM

February 10, 2016

Daniel Vetter: ARM kernel cross compiling

I do a lot of cross driver subsytem refactorings, and DRM has lots of drivers that only run on ARM. Which means I routinely break a leg or arm since at least in the past cross-compiling was somehow always super painful. But I've just learned (thanks to Daniel Stone) that cross-compiling this stuff has become real easy, so here's my handy script for this. This assumes Debian, but the difference is just in installing a different cross-compiler toolchain.

First get the tooling:

$ sudo apt-get install gcc-arm-linux-gnueabihf
 
Then create another git checkout. I prefer the recently merged worktree support, since with that all your branches and remotes transparently work in the new checkout, too.

~/kernel/src/ $ git worktree add ../armhf HEAD
 
With that we're all set up. For building any $branch wrap the following lines into a scrip:

$ cd ~/kernel/armhf
$ git checkout --detach $branch
$ make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- multi_v7_defconfig zImage modules


I'm using --detach to avoid complaints from the git worktree code that I've checked out a branch already in the main repo. Note: Never accidentally run plain make in the ARM build directory - mixing up architectures seriously confuses the kernel build system.

February 10, 2016 04:01 PM

Pavel Machek: Fire from the moonlight

http://what-if.xkcd.com/145/

Actually, I think you can make fire from the moonlight -- it just won't be easy. All you need is square inch (or so) of mirrors on moon. If there was not enough of mirros on moon hundred years ago, I'm pretty sure we put enough junk there in last 100 years.. probably including some mirrors for distance measurement.

February 10, 2016 02:16 PM

February 09, 2016

Pete Zaitcev: A Short History Of Removable Media Behind The Iron Curtain

CKS blogged something to today that caught my eye, in the context of filesystems defined in host byte order:

In the beginning, storage was close to 100% system specific. Not only did you not think of moving a disk from a Vax to a Sun, you probably couldn't; the entire peripheral interconnect system was almost always different, from the disk to host cabling to the kind of backplane that the controller boards plugged into.

Although the above could be the case in the West, in USSR it was quite common to move removable media between systems, even back in the 1970s. The cartridge of choice was the disk pack for IBM 2311, a 7.25MB, 11" hard disk stack of 4 or 5 platters. It could be easily transported between BESM-6, IBM/370 clones (ES-1022 and up), PDP-11 clones (SM-3, SM-4, SM-1420, etc.), HP 2000 clones (SM-2), and Mitra-15 (ES-1010). It only went out of service by about 1986, with the adoption of 29MB disk packs.

Granted, UNIX only worked on PDP-11. It reached ES series too late for the 7.25MB packs, in the ES-1045 generation. However, SM-4 brought an "RP-5" cartridge with a single platter, which for a while was a gold standard for minicomputers. In Russian practice, it used a half-density recording for 2.5MB instead of 5MB in western cartridges. The "RP" ("rk" in UNIX) drive was hooked to a wide variety of mini- and micro-computers during their brief popularity before they were supplanted by PCs. Aside from the original SM and smaller LSI-11 compatibles (DVK), it was connected to Iskra-226 (Wang micro), Videoton's Z80-based TRS-80, Mitra-225, and basically any and every microcomputer, mostly based on the 8080 clone KR580IK80. The most popular format, I suspect, was a trivial filesystem used by DEC's RT-11 OS family.

When PCs came about, it was rather common to move around their hard drives with MFM interface, although Russian domestic winchesters required extreme care. Unplugging one with unparked heads could easily scratch the surface. Soon, the imports flooded the market and the 20MB Seagate ST-225 became the gold standard. It lasted until the single-cable IDE replaced it. Interestingly enough, the last hold-overs of LSI-11 line used various trivial IDE controllers with CPU-controlled access. You could attach something like ST-157 to them.

Chris was building an argument that the lack of portable disk media was the reason why everyone made their Berkeley filesystem in host byte order. He's not necessarily incorrect about that. Even if Russians used cartridge and winchester drives pervasively, nobody cared about them and they were not setting the software standards.

BTW, since we're on topic, in case of Linux, the order independence was not easily won. The original port to Amiga (by Geert, IIRC), used a big-endian ext or ext2, if not minix even. When SPARC port came about, DaveM was not sure if to follow Geert at first (again, IIRC). Someone made an argument that byte-swapping would waste CPU, and it carried some weight. Also, adding macros into all the correct positions were challenging. I remember arguing for order independence, in part because I did have an easy access to PCs with SCSI HBAs. Not that anyone listened to my opinion, but eventually DaveM and Tytso went with it as well, and the rest is history.

February 09, 2016 07:47 PM